Abstract
Detecting small objects is a challenging task in computer vision due to the objects only occupying a limited number of pixels and having blurred contours. These factors result in minimal discriminative features being available to effectively model the objects. In this paper, we propose three lightweight plug-and-play modules that can be seamlessly integrated into object detection algorithms, particularly those in the YOLO series, to improve the accuracy of detecting small objects. The Spatially Enhanced Convolutional Block Attention Module (SE-CBAM) is integrated into the feature extraction layer of the network to enhance the feature extraction capability of neural networks. Additionally, a Contextual Information Pooling Enhancement Module (CIE-Pool) is included at the multi-scale feature fusion stage to extract and improve object background information, which enhances the recognition rate of small objects. To improve the detection of small objects, a new layer is added to the detection head, which incorporates the shallow feature map obtained from the feature extraction network after Adaptive Feature Processing (AFP), thereby obtaining more and richer information about small objects. The efficacy of the algorithm has been evaluated on the VisDrone2021 and AI-TOD datasets. The experimental results demonstrate that the method proposed in this paper greatly improves the detection accuracy of small objects while maintaining real-time capabilities. Furthermore, it maintains high accuracy and speed even when dealing with complex background conditions and detecting small objects with high blur.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets employed in this study were obtained from publicly available repositories, accessible via their corresponding references. The code used in this study is available at https://github.com/hzshonny/DetectAerialObjects
References
Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., Qu, R.: A survey of deep learning-based object detection. IEEE Access 7, 128837–128868 (2019). https://doi.org/10.1109/ACCESS.2019.2939201
Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: A survey (2023) arXiv:1905.05055 [cs.CV]
Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., Xie, X., Han, J.: Towards large-scale small object detection: survey and benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. (2023). https://doi.org/10.1109/tpami.2023.3290594
Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst., Man., Cybern.: Syst. 52(2), 936–953 (2022). https://doi.org/10.1109/TSMC.2020.3005231
Cha, Y., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Aided Civ. Infrastruct. Eng. (2018). https://doi.org/10.1111/mice.12334
Arnold, E., Al-Jarrah, O.Y., Dianati, M., Fallah, S., Oxtoby, D., Mouzakitis, A.: A survey on 3d object detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 20(10), 3782–3795 (2019). https://doi.org/10.1109/TITS.2019.2892405
Wang, T., Chen, Y., Qiao, M., Snoussi, H.: A fast and robust convolutional neural network-based defect detection model in product quality control. Int. J. Adv. Manuf. Technol. 94, 3465–3471 (2018)
Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3119563
Wang, J., Yang, W., Guo, H., Zhang, R., Xia, G.-S.: Tiny object detection in aerial images. In: 25th International Conference on Pattern Recognition (ICPR), pp. 3791–3798 (2021). https://doi.org/10.1109/ICPR48806.2021.9413340
Uijlings, J.R.R., Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018) 1804.02767
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020) 2004.10934
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International Conference on Computer Vision (ICCV), pp. 8439–8448 (2019). https://doi.org/10.1109/ICCV.2019.00853
Jocher, G.: YOLOv5 by Ultralytics. https://doi.org/10.5281/zenodo.3908559 . https://github.com/ultralytics/yolov5
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics
Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, L.: Deepproposal: Hunting objects by cascading deep convolutional layers. In: IEEE International Conference on Computer Vision (ICCV), pp. 2578–2586 (2015). https://doi.org/10.1109/ICCV.2015.296
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018). https://doi.org/10.1109/CVPR.2018.00644
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4203–4212 (2018). https://doi.org/10.1109/CVPR.2018.00442
Etten, A.V.: You Only Look Twice: Rapid multi-scale object detection in satellite imagery. CoRR abs/1805.09512 (2018) 1805.09512
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-Outside Net: Detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883 (2016). https://doi.org/10.1109/CVPR.2016.314
Yuan, Y., Xiong, Z., Wang, Q.: VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans. Image Process. 28(7), 3423–3434 (2019). https://doi.org/10.1109/TIP.2019.2896952
Müller, J., Dietmayer, K.: Detecting traffic lights by single shot detection. In: 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 266–273 (2018). https://doi.org/10.1109/ITSC.2018.8569683
Yan, B., Li, J., Yang, Z., Zhang, X., Hao, X.: AIE-YOLO: auxiliary information enhanced YOLO for small object detection. Sensors 22(21), 8221 (2022)
Wang, M., Yang, W., Wang, L., Chen, D., Wei, F., KeZiErBieKe, H., Liao, Y.: FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 90, 103752 (2023)
Hu, J., Shen, L., Sun, G.: squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: Efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Yang, R., Li, W., Shang, X., Zhu, D., Man, X.: KPE-YOLOv5: an improved small target detection algorithm based on YOLOv5. Electronics 12(4), 817 (2023)
Zhou, W., Cai, C., Zheng, L., Li, C., Zeng, D.: ASSD-YOLO: a small object detection method based on improved YOLOv7 for airport surface surveillance. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-17628-4
Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186 (2021). https://doi.org/10.1109/ICAIIC51459.2021.9415217
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Neural Information Processing Systems (NeurIPS), vol. 29 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: Keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 6568–6577 (2019). https://doi.org/10.1109/ICCV.2019.00667
Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3500–3509 (2021). https://doi.org/10.1109/ICCV48922.2021.00350
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)
Author information
Authors and Affiliations
Contributions
Xiao He: validation, data curation, formal analysis, investigation, writing—original draft, writing—review, editing. Xiaolong Zheng: conceptualization, methodology, investigation, formal analysis, visualization, writing—review, editing. Xiyu Hao: data curation, investigation, formal analysis, writing—review, editing. Heng Jin: investigation, formal analysis, writing—review, editing. Xiangming Zhou: investigation, formal analysis, writing—review, editing. Lihuan Shao: conceptualization, investigation, formal analysis, supervision, writing—review, editing. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, X., Zheng, X., Hao, X. et al. Improving small object detection via context-aware and feature-enhanced plug-and-play modules. J Real-Time Image Proc 21, 44 (2024). https://doi.org/10.1007/s11554-024-01426-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01426-8