Improving small object detection via context-aware and feature-enhanced plug-and-play modules

He, Xiao; Zheng, Xiaolong; Hao, Xiyu; Jin, Heng; Zhou, Xiangming; Shao, Lihuan

doi:10.1007/s11554-024-01426-8

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

Research
Published: 01 March 2024

Volume 21, article number 44, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Xiao He¹,
Xiaolong Zheng¹,
Xiyu Hao¹,
Heng Jin²,
Xiangming Zhou² &
…
Lihuan Shao¹

432 Accesses
Explore all metrics

Abstract

Detecting small objects is a challenging task in computer vision due to the objects only occupying a limited number of pixels and having blurred contours. These factors result in minimal discriminative features being available to effectively model the objects. In this paper, we propose three lightweight plug-and-play modules that can be seamlessly integrated into object detection algorithms, particularly those in the YOLO series, to improve the accuracy of detecting small objects. The Spatially Enhanced Convolutional Block Attention Module (SE-CBAM) is integrated into the feature extraction layer of the network to enhance the feature extraction capability of neural networks. Additionally, a Contextual Information Pooling Enhancement Module (CIE-Pool) is included at the multi-scale feature fusion stage to extract and improve object background information, which enhances the recognition rate of small objects. To improve the detection of small objects, a new layer is added to the detection head, which incorporates the shallow feature map obtained from the feature extraction network after Adaptive Feature Processing (AFP), thereby obtaining more and richer information about small objects. The efficacy of the algorithm has been evaluated on the VisDrone2021 and AI-TOD datasets. The experimental results demonstrate that the method proposed in this paper greatly improves the detection accuracy of small objects while maintaining real-time capabilities. Furthermore, it maintains high accuracy and speed even when dealing with complex background conditions and detecting small objects with high blur.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Focused Feature Pyramid Network for Small Object Detection

Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detection

Article 15 December 2023

An attention-based feature pyramid network for single-stage small object detection

Article 18 November 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets employed in this study were obtained from publicly available repositories, accessible via their corresponding references. The code used in this study is available at https://github.com/hzshonny/DetectAerialObjects

References

Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., Qu, R.: A survey of deep learning-based object detection. IEEE Access 7, 128837–128868 (2019). https://doi.org/10.1109/ACCESS.2019.2939201
Article Google Scholar
Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: A survey (2023) arXiv:1905.05055 [cs.CV]
Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., Xie, X., Han, J.: Towards large-scale small object detection: survey and benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. (2023). https://doi.org/10.1109/tpami.2023.3290594
Article Google Scholar
Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst., Man., Cybern.: Syst. 52(2), 936–953 (2022). https://doi.org/10.1109/TSMC.2020.3005231
Article Google Scholar
Cha, Y., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Aided Civ. Infrastruct. Eng. (2018). https://doi.org/10.1111/mice.12334
Article Google Scholar
Arnold, E., Al-Jarrah, O.Y., Dianati, M., Fallah, S., Oxtoby, D., Mouzakitis, A.: A survey on 3d object detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 20(10), 3782–3795 (2019). https://doi.org/10.1109/TITS.2019.2892405
Article Google Scholar
Wang, T., Chen, Y., Qiao, M., Snoussi, H.: A fast and robust convolutional neural network-based defect detection model in product quality control. Int. J. Adv. Manuf. Technol. 94, 3465–3471 (2018)
Article Google Scholar
Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3119563
Article Google Scholar
Wang, J., Yang, W., Guo, H., Zhang, R., Xia, G.-S.: Tiny object detection in aerial images. In: 25th International Conference on Pattern Recognition (ICPR), pp. 3791–3798 (2021). https://doi.org/10.1109/ICPR48806.2021.9413340
Uijlings, J.R.R., Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Article Google Scholar
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018) 1804.02767
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020) 2004.10934
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International Conference on Computer Vision (ICCV), pp. 8439–8448 (2019). https://doi.org/10.1109/ICCV.2019.00853
Jocher, G.: YOLOv5 by Ultralytics. https://doi.org/10.5281/zenodo.3908559 . https://github.com/ultralytics/yolov5
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics
Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, L.: Deepproposal: Hunting objects by cascading deep convolutional layers. In: IEEE International Conference on Computer Vision (ICCV), pp. 2578–2586 (2015). https://doi.org/10.1109/ICCV.2015.296
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018). https://doi.org/10.1109/CVPR.2018.00644
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4203–4212 (2018). https://doi.org/10.1109/CVPR.2018.00442
Etten, A.V.: You Only Look Twice: Rapid multi-scale object detection in satellite imagery. CoRR abs/1805.09512 (2018) 1805.09512
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-Outside Net: Detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883 (2016). https://doi.org/10.1109/CVPR.2016.314
Yuan, Y., Xiong, Z., Wang, Q.: VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans. Image Process. 28(7), 3423–3434 (2019). https://doi.org/10.1109/TIP.2019.2896952
Article MathSciNet Google Scholar
Müller, J., Dietmayer, K.: Detecting traffic lights by single shot detection. In: 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 266–273 (2018). https://doi.org/10.1109/ITSC.2018.8569683
Yan, B., Li, J., Yang, Z., Zhang, X., Hao, X.: AIE-YOLO: auxiliary information enhanced YOLO for small object detection. Sensors 22(21), 8221 (2022)
Article Google Scholar
Wang, M., Yang, W., Wang, L., Chen, D., Wei, F., KeZiErBieKe, H., Liao, Y.: FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 90, 103752 (2023)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: Efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Yang, R., Li, W., Shang, X., Zhu, D., Man, X.: KPE-YOLOv5: an improved small target detection algorithm based on YOLOv5. Electronics 12(4), 817 (2023)
Article Google Scholar
Zhou, W., Cai, C., Zheng, L., Li, C., Zeng, D.: ASSD-YOLO: a small object detection method based on improved YOLOv7 for airport surface surveillance. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-17628-4
Article Google Scholar
Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186 (2021). https://doi.org/10.1109/ICAIIC51459.2021.9415217
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Neural Information Processing Systems (NeurIPS), vol. 29 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: Keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 6568–6577 (2019). https://doi.org/10.1109/ICCV.2019.00667
Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3500–3509 (2021). https://doi.org/10.1109/ICCV48922.2021.00350
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)

Download references

Author information

Authors and Affiliations

College of Electronics and Information, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
Xiao He, Xiaolong Zheng, Xiyu Hao & Lihuan Shao
Zhejiang Dahua Technology Co., Ltd., Hangzhou, 310053, Zhejiang, China
Heng Jin & Xiangming Zhou

Authors

Xiao He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiyu Hao
View author publications
You can also search for this author in PubMed Google Scholar
Heng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiangming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lihuan Shao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiao He: validation, data curation, formal analysis, investigation, writing—original draft, writing—review, editing. Xiaolong Zheng: conceptualization, methodology, investigation, formal analysis, visualization, writing—review, editing. Xiyu Hao: data curation, investigation, formal analysis, writing—review, editing. Heng Jin: investigation, formal analysis, writing—review, editing. Xiangming Zhou: investigation, formal analysis, writing—review, editing. Lihuan Shao: conceptualization, investigation, formal analysis, supervision, writing—review, editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaolong Zheng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, X., Zheng, X., Hao, X. et al. Improving small object detection via context-aware and feature-enhanced plug-and-play modules. J Real-Time Image Proc 21, 44 (2024). https://doi.org/10.1007/s11554-024-01426-8

Download citation

Received: 17 October 2023
Accepted: 23 January 2024
Published: 01 March 2024
DOI: https://doi.org/10.1007/s11554-024-01426-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Focused Feature Pyramid Network for Small Object Detection

Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detection

An attention-based feature pyramid network for single-stage small object detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical Focused Feature Pyramid Network for Small Object Detection

Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detection

An attention-based feature pyramid network for single-stage small object detection

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation