TSD-Truncated Structurally Aware Distance for Small Pest Object Detection
Abstract
:1. Introduction
- Few available features. Due to the small size of objects, in the process of feature extraction, as the number of CNN layers increases, the target feature information tends to be weakened layer by layer, making it difficult to extract discriminative features. Further, in the context of a multi-layer network, it may also cause the missed detection of some objects.
- High positioning accuracy requirements. Because the small objects occupy a small area in the image, it is more difficult to locate their bounding box than for objects of normal size. In addition, in the anchor-based detector, the number of anchors matching small objects during training is much lower than that of usual-scale objects, which also makes small objects more difficult to detect to some extent.
- Small objects are gathering. Firstly, due to the habits of pests, it is easy for them to gather together under the catching device. Secondly, after multiple convolutions, the adjacent small objects in the aggregation area will be aggregated into one point on the feature map of the later layer, which will make the detection model unable to distinguish objects effectively. Lastly, if the distance between the small objects in the aggregation area is too close, this will make it difficult for the bounding box to regress, and the model will struggle to converge.
- A new metric, truncated structurally aware distance (TSD), is proposed to measure the similarity of bounding boxes and replace IoU in the label assignment. TSD simply uses the standardized Chebyshev distance to describe the similarity of bounding boxes (as shown in Figure 1), which could solve the problem regarding IoU sensitivity to small objects’ localization bias.
- Instead of using , we design a new loss function dubbed TSD loss based on TSD; it can use the truncated method to describe the structural regression loss for small pests.
- The proposed TSD can significantly improve the performance of the network for small object detection in popular anchor-based detectors, and the performance is improved from 46.0% to 50.7% on Faster R-CNN on the Pest24 dataset.
2. Methodology
2.1. Truncated Structurally Aware Distance
2.2. Truncated Structurally Aware Distance Loss
2.3. Detector
- Bottom-up. Specifically, according to the size of the feature map, ResNet is divided into four stages as the backbone network: Stage2, Stage3, Stage4, and Stage5. Each stage outputs Conv2, Conv3, Conv4 and Conv5 in its last layer, and these output layers are defined as , , , . It is a simple feature extraction process.
- Top-bottom. Up-sampling starts from the highest layer. The nearest neighbor up-sampling method instead of a deconvolution operation has been used directly in our up-sampling process.
- Horizontal connection. The up-sampled results are fused with the feature map generated from the bottom-up process. After the fusion, the fused features are processed by the convolution kernel to eliminate the aliasing effect of up-sampling.
3. Experiment
3.1. Dataset
3.2. Implementation Details
3.3. Ablation Study
3.4. Comparison of Different Metrics
3.5. Comparison of Different Detectors
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
- Zhou, Y.; Wen, S.; Wang, D.; Mu, J.; Richard, I. Object Detection in Autonomous Driving Scenarios Based on an Improved Faster-RCNN. Appl. Sci. 2021, 11, 11630. [Google Scholar] [CrossRef]
- He, Y.; Liu, Z. A feature fusion method to improve the driving obstacle detection under foggy weather. IEEE Trans. Transp. Electrif. 2021, 7, 2505–2515. [Google Scholar] [CrossRef]
- Li, F.; Xi, Q. DefectNet: Toward fast and effective defect detection. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
- Zeng, N.; Wu, P.; Wang, Z.; Li, H.; Liu, W.; Liu, X. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
- Yang, S.; Jiao, D.; Wang, T.; He, Y. Tire Speckle Interference Bubble Defect Detection Based on Improved Faster RCNN-FPN. Sensors 2022, 22, 3907. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, G.; Zhang, K.; Zhao, Y.; Wang, Q.; Li, X. Detection of small aerial object using random projection feature with region clustering. IEEE Trans. Cybern. 2020, 52, 3957–3970. [Google Scholar] [CrossRef]
- Yu, D.; Ji, S. A new spatial-oriented object detection framework for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Liu, R.; Yu, Z.; Mo, D.; Cai, Y. An improved faster-RCNN algorithm for object detection in remote sensing images. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7188–7192. [Google Scholar]
- Wang, R.; Jiao, L.; Xie, C.; Chen, P.; Du, J.; Li, R. S-RPN: Sampling-balanced region proposal network for small crop pest detection. Comput. Electron. Agric. 2021, 187, 106290. [Google Scholar] [CrossRef]
- Liu, J.; Wang, X. Tomato diseases and pests detection based on improved Yolo V3 convolutional neural network. Front. Plant Sci. 2020, 11, 898. [Google Scholar] [CrossRef]
- Du, J.; Liu, L.; Li, R.; Jiao, L.; Xie, C.; Wang, R. Towards densely clustered tiny pest detection in the wild environment. Neurocomputing 2022, 490, 400–412. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef] [Green Version]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Qiao, S.; Chen, L.C.; Yuille, A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10213–10224. [Google Scholar]
- Wei, D.; Chen, J.; Luo, T.; Long, T.; Wang, H. Classification of crop pests based on multi-scale feature fusion. Comput. Electron. Agric. 2022, 194, 106736. [Google Scholar] [CrossRef]
- Teng, Y.; Zhang, J.; Dong, S.; Zheng, S.; Liu, L. MSR-RCNN: A Multi-Class Crop Pest Detection Network Based on a Multi-Scale Super-Resolution Feature Enhancement Module. Front. Plant Sci. 2022, 13, 810546. [Google Scholar] [CrossRef]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1222–1230. [Google Scholar]
- Abbas, A.; Jain, S.; Gour, M.; Vankudothu, S. Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar] [CrossRef]
- Nazki, H.; Yoon, S.; Fuentes, A.; Park, D.S. Unsupervised image translation using adversarial networks for improved plant disease recognition. Comput. Electron. Agric. 2020, 168, 105117. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. 2018. Available online: https://openaccess.thecvf.com/content_ECCV_2018/html/Hei_Law_CornerNet_Detecting_Objects_ECCV_2018_paper.html (accessed on 3 August 2018).
- Su, Q.; Tang, J.; Zhai, M.; He, D. An intelligent method for dairy goat tracking based on Siamese network. Comput. Electron. Agric. 2022, 193, 106636. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef] [Green Version]
- Wang, Q.J.; Zhang, S.Y.; Dong, S.F.; Zhang, G.C.; Yang, J.; Li, R.; Wang, H.Q. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection. Comput. Electron. Agric. 2020, 175, 105585. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
- Xu, C.; Wang, J.; Yang, W.; Yu, L. Dot Distance for Tiny Object Detection in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1192–1201. [Google Scholar]
Method | ||||||||
---|---|---|---|---|---|---|---|---|
CE Loss | 48.2 | 29.5 | 33.3 | 17.2 | 35.7 | 70.0 | ||
Focal Loss | 0.25 | 1 | 37.9 | 23.3 | 26.4 | 13.1 | 30.2 | 60.0 |
0.25 | 2 | 37.8 | 23.2 | 26.2 | 13.0 | 29.8 | 70.0 | |
0.25 | 3 | 36.9 | 22.7 | 25.7 | 13.0 | 29.4 | 70.0 | |
0.5 | 2 | 51.0 | 31.0 | 36.0 | 17.3 | 37.6 | 70.0 | |
0.5 | 3 | 46.7 | 28.4 | 32.3 | 16.1 | 35.0 | 80.0 |
Detector | Assigning | NMS | Loss | ||||||
---|---|---|---|---|---|---|---|---|---|
Faster R-CNN | 50.3 | 30.3 | 33.4 | 18.2 | 36.3 | 70.0 | |||
50.9 | 30.8 | 34.6 | 17.8 | 36.9 | 70.0 | ||||
50.7 | 30.7 | 34.7 | 19.3 | 37.2 | 60.0 | ||||
51.0 | 31.0 | 36.0 | 17.3 | 37.6 | 70.0 |
Metrics | mAP | |||||
---|---|---|---|---|---|---|
IoU | 46.0 | 26.9 | 28.9 | 16.3 | 32.2 | 50.0 |
GIoU | 43.8 | 23.4 | 22.9 | 14.0 | 27.5 | 40.0 |
DIoU | 45.4 | 24.5 | 23.7 | 15.1 | 29.2 | 40.0 |
CIoU | 46.1 | 24.9 | 24.0 | 16.2 | 29.3 | 30.0 |
Ours | 51.0 | 31.0 | 36.0 | 17.3 | 37.6 | 70.0 |
Method | Backbone | ||||||
---|---|---|---|---|---|---|---|
SSD [17] | ResNet-50 | 30.7 | 17.1 | 17.5 | 7.0 | 22.8 | 50.0 |
RetinaNet [38] | ResNet-50-FPN | 44.3 | 27.0 | 30.1 | 15.1 | 33.1 | 50.0 |
YOLO [14] | DarkNet53 | 22.4 | 11.5 | 9.8 | 6.8 | 15.2 | 40.0 |
DotD [41] | ResNet-50-FPN | 45.5 | 26.8 | 28.8 | 16.1 | 32.1 | 60.0 |
NWD [32] | ResNet-50-FPN | 36.7 | 17.5 | 13.6 | 31.8 | 37.2 | 10.0 |
Cascade R-CNN [40] | ResNet-50-FPN | 46.0 | 25.4 | 25.1 | 15.0 | 31.0 | 50.0 |
Cascade R-CNN * | ResNet-50-FPN | 48.9 | 27.7 | 28.3 | 15.2 | 34.2 | 60.0 |
Faster R-CNN [39] | ResNet-50-FPN | 46.0 | 26.9 | 28.9 | 16.3 | 32.2 | 50.0 |
MobileNetV2 | 41.8 | 22.5 | 22.3 | 11.5 | 29.2 | 50.0 | |
VGG16 | 47.1 | 27.3 | 29.2 | 16.8 | 33.3 | 40.0 | |
Faster R-CNN * | ResNet-50-FPN | 51.0 | 31.0 | 36.0 | 17.3 | 37.6 | 70.0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, X.; Dong, J.; Zhu, Z.; Ma, D.; Ma, F.; Lang, L. TSD-Truncated Structurally Aware Distance for Small Pest Object Detection. Sensors 2022, 22, 8691. https://doi.org/10.3390/s22228691
Huang X, Dong J, Zhu Z, Ma D, Ma F, Lang L. TSD-Truncated Structurally Aware Distance for Small Pest Object Detection. Sensors. 2022; 22(22):8691. https://doi.org/10.3390/s22228691
Chicago/Turabian StyleHuang, Xiaowen, Jun Dong, Zhijia Zhu, Dong Ma, Fan Ma, and Luhong Lang. 2022. "TSD-Truncated Structurally Aware Distance for Small Pest Object Detection" Sensors 22, no. 22: 8691. https://doi.org/10.3390/s22228691