A Forest Wildlife Detection Algorithm Based on Improved YOLOv5s
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Acquisition and Pre-Processing
2.1.1. Forest Wildlife Data Set
2.1.2. Data Set Annotation and Augmentation
2.2. Experimental Conditions
2.3. Forest Wildlife Detection Network
2.3.1. YOLOv5
2.3.2. Swin Transformer
2.3.3. SENet Channel Attention Mechanism
2.3.4. Integration of Swin Transformer and SENet-YOLOv5
2.3.5. Loss Function Improvement
3. Results
3.1. Evaluation Criteria
3.2. Results of Forest Wildlife Detection Experiments
3.2.1. Ablation Study
3.2.2. Test Results
4. Discussion
- Animal detection based on the Swin Transformer model has good results [41,42]. In contrast, the improved method we propose in this paper is based on the original YOLOv5 network model and takes some steps to improving the training of the model. First, we use our proposed data enhancement method and some enhancement methods to enhance the richness of the data set. Second, we introduced the idea of channel-based attention by replacing the original Concat with a weighted channel splicing method (denoted as ConcatE), which increases the number of channel layers for key feature information and improves the attention to important channel information. In addition, we found the optimal backbone network structure suitable for this data set through comparative experiments, and we used the Swin Transformer module to replace the CSP_1 layer in the YOLOv5 backbone network and the CSP_2 layer in the Neck network, while retaining the other CNN-based CBL and CSP layers, thus taking advantage of convolutional, attentional, and Self-Attentional mechanisms. To address the non-overlap problem, we employ a new loss function (DIOU_Loss) to speed up the convergence of the model and introduce an adaptive class suppression loss (L_BCE) to suppress false detection of confusing classes and ensure the accuracy of the tail data. Ensuring the accuracy of detection between animal species with high similarity levels. By analyzing the confusion matrix, we find that L_BCE further reduces the impact of data imbalance on the detection results and improves the detection accuracy. The experimental results demonstrate the sophistication of our improved model with an accuracy of 90.2%, a recall of 83.3%, and a mAP of 89.4%.
- Based on the experimental results, we observed that the difference between the detection results of the models before and after the proposed improvements on two data sets with different data volumes was relatively small, and all of the improved methods achieved significant improvements. In particular, the experimental results on data set 1 indicated that our improved algorithm model improved the mAP metric by 16.8%, 20%, 16.9%, and 10.5% when compared to the YOLOv5s, YOLOv3, RetinaNet, and Faster-RCNN methods, respectively. These results indicated that our improvements were very effective in enhancing the detection performance of the proposed model. In addition, our improved algorithm is well suited for edge deployment and embedded development with the help of some control algorithms [43] and hardware device [44], as the inference speed of the model ensures the feasibility of real-time detection.
- Our model effectively solves some of the problems of omission and false detection that occur during the detection process in complex environments. The difference between the detection results of ten types of forest wildlife before and after the improvement of the two models is not significant, and the effect of data collection area on wildlife detection results is also not significant. Although the best test results were obtained from the Ursus thibetanus in the Hupingshan National Nature Reserve, rather than the Odocoileus hemionus with the most abundant training data, with a mAP of 94.7, the detection accuracy of other wild animals in Hupingshan was lower than that of North American wild animals with richer training data. These results are reasonable. Ursus thibetanus are characterized by high discrimination, large feature differences, large size, and relatively sufficient training data. Therefore, the biggest factor affecting the detection results in the first place remains the training data, which is closely related to the amount and diversity of data. Secondly the single-stage detection algorithm based on regression thinking is better at detecting large-sized targets than small ones, and we optimize the detection ability for small targets. In addition, the probability of false detection is greater for conspecifics with high feature similarity, and we also propose an improvement method for this point, which effectively solves the problem of maintaining a high level of detection accuracy when detecting animal species with high similarity.
- Although we undertook some work to improve the algorithmic model, there are still some shortcomings. Specifically, we observed some contradictions between the complexity of the network structure and the model detection performance. In order to balance the model detection performance and FPS performance, we made certain tradeoffs. We employed multi-scale feature fusion and global feature extraction, which increased the computational effort and slowed down inference. Although we lost some of the original inference speed, to a certain extent, this improved the model’s detection of difficult targets. The current GPU acceleration optimization of the Transformer model is not sufficient, which limits the inference speed. However, with the optimization of GPU hardware support and improvement of the model structure in the future, the speed of Transformer-based models is expected to further improve. In addition, we intend to work on improving the proposed algorithm through the use of more efficient strategies [45] to reduce the impact on FPS in future research.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Schneider, T.C.; Kowalczyk, R.; Köhler, M. Resting site selection by large herbivores–The case of European bison (Bison bonasus) in Białowieza Primeval Forest. Mamm. Biol. 2013, 78, 438–445. [Google Scholar] [CrossRef]
- Noad, M.J.; Cato, D.H.; Stokes, M.D. Acoustic Tracking of Humpback Whales: Measuring Interactions with the Acoustic Environment. In Proceedings of the Acoustics, Gold Coast, Australia, 3–5 November 2004; pp. 353–358. [Google Scholar]
- Andreychev, A.V. Daily and seasonal feeding activity of the greater mole-rat (Spalax microphthalmus, Rodentia, Spalacidae). Biol. Bull. 2019, 46, 1172–1181. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey; IEEE: New York, NY, USA, 2023. [Google Scholar]
- Chen, G.; Han, T.X.; He, Z.; Kays, R.; Forrester, T. Deep convolutional neural network-based species recognition for wild animal monitoring. In 2014 IEEE International Conference on Image Processing (ICIP); IEEE: New York, NY, USA, 2014. [Google Scholar]
- Villa, A.G.; Salazar, A.; Vargas, F. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol. Inform. 2017, 41, 24–32. [Google Scholar] [CrossRef]
- Norouzzadeh, M.S.; Nguyen, A.; Kosmala, M.; Swanson, A.; Palmer, M.S.; Packer, C.; Clune, J. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl. Acad. Sci. USA 2018, 115, E5716–E5725. [Google Scholar] [CrossRef] [PubMed]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; Lecun, Y. Overfeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv 2013, arXiv:1312.6229. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Wei, F.; Sun, X.; Li, H.; Wang, J.; Lin, S. Point-set anchors for object detection, instance segmentation and pose estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part X 16; Springer International Publishing: New York, NY, USA, 2020. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part I 14; Springer International Publishing: New York, NY, USA, 2016. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast r-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Li, H.; Jiang, F.; Guo, F.; Meng, W. A real-time detection method of safety hazards in transmission lines based on YOLOv5s. In International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP 2022); SPIE: Bellingham, WA, USA, 2022. [Google Scholar]
- Chen, R.; Little, R.; Mihaylova, L.; Delahay, R.; Cox, R. Wildlife Surveillance using Deep Learning Methods. Ecol. Evol. 2019, 9, 9453–9466. [Google Scholar] [CrossRef]
- Zhao, T.; Yi, X.; Zeng, Z.; Feng, T. MobileNet-Yolo based wildlife detection model: A case study in Yunnan Tongbiguan Nature Reserve, China. J. Intell. Fuzzy Syst. 2021, 41, 2171–2181. [Google Scholar] [CrossRef]
- Pan, S.; Yang, Q. A survey on transfer learning. Transactions on knowledge and data engineering. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [PubMed]
- Li, Y.; Mao, H.; Girshick, R.; He, K. Exploring plain vision transformer backbones for object detection. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Jannat, F.; Willis, A.R. Improving classification of remotely sensed images with the Swin Transformer. In SoutheastCon 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
- Liu, Z.; Tan, Y.; He, Q.; Xiao, Y. SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4486–4497. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin unetr: Swin Transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Naseer, M.M.; Ranasinghe, K.; Khan, S.H.; Hayat, M.; Shahbaz Khan, F.; Yang, M. Intriguing properties of vision transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 23296–23308. [Google Scholar]
- Beery, S.; Morris, D.; Perona, P. The iWildCam 2019 Challenge Dataset. arXiv 2019, arXiv:1907.07617. [Google Scholar]
- Devries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Li, T.; Wang, J.; Zhang, T. L-DETR: A Light-Weight Detector for End-to-End Object Detection with Transformers. IEEE Access 2022, 10, 105685–105692. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Wang, T.; Zhu, Y.; Zhao, C.; Zeng, W.; Wang, J.; Tang, M. Adaptive Class Suppression Loss for Long-Tail Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V 13; Springer International Publishing: New York, NY, USA, 2014. [Google Scholar]
- Agilandeeswari, L.; Meena, S. Swin transformer based contrastive self-supervised learning for animal detection and classification. Multimed. Tools Appl. 2023, 82, 10445–10470. [Google Scholar] [CrossRef]
- Gu, T.; Min, R. A Swin Transformer based Framework for Shape Recognition. In Proceedings of the 2022 14th International Conference on Machine Learning and Computing (ICMLC), Guangzhou, China, 18–21 February 2022; pp. 388–393. [Google Scholar]
- Deng, L.; Liu, T.; Jiang, P.; Xie, F.; Zhou, J.; Yang, W.; Qi, A. Design of an Adaptive Algorithm for Feeding Volume–Traveling Speed Coupling Systems of Rice Harvesters in Southern China. Appl. Sci. 2023, 13, 4876. [Google Scholar]
- Deng, L.; Liu, T.; Jiang, P.; Qi, A.; He, Y.; Li, Y.; Yang, M.; Deng, X. Design and Testing of Bionic-Feature-Based 3D-Printed Flexible End-Effectors for Picking Horn Peppers. Agronomy 2023, 13, 2231. [Google Scholar] [CrossRef]
- Liu, T.; Ma, Y.; Yang, W.; Ji, W.; Wang, R.; Jiang, P. Spatial-temporal interaction learning based two-stream network for action recognition. Inform. Sci. 2022, 606, 864–876. [Google Scholar] [CrossRef]
Area | Species | Number of Images | Number of Adjustments of Rotation | Number of Gaussian Blur/Noise | Number of Image Fusions | Total |
---|---|---|---|---|---|---|
Ursus thibetanus | 525 | 525 | 453 | 525 | ||
Lophura nycthemera | 400 | 400 | 362 | 400 | ||
China | Prionailurus bengalensis | 327 | 327 | 293 | 327 | 5853 |
Macaca mulatta | 162 | 162 | 131 | 162 | ||
Elaphodus cephalophus | 93 | 93 | 93 | 93 | ||
North America | Lynx rufus | 647 | 647 | 621 | 647 | |
Odocoileus hemionus | 1065 | 1065 | 956 | 1065 | ||
Procyon lotor | 655 | 655 | 611 | 655 | 15,447 | |
Tamiasciurus hudsonicus | 804 | 804 | 761 | 804 | ||
Vulpes vulpes | 758 | 758 | 711 | 758 |
Group | Model | Average Accuracy (%) | Average Recall (%) | mAP@0.5 (%) | Detection Speed (FPS) |
---|---|---|---|---|---|
1 | YOLOv5s | 82.2 | 63.9 | 72.6 | 53 |
2 | YOLOv5s + Data Augmentation | 85.4 | 69.5 | 76.5 | 53 |
3 | YOLOv5s + Data Augmentation + ConcatE | 87.4 | 72.8 | 78.4 | 53 |
4 | YOLOv5s + Data Augmentation + Swin T | 89.4 | 74.6 | 85.5 | 41 |
5 | YOLOv5s + Data Augmentation + ConcatE + Swin T | 90.5 | 79.5 | 87.7 | 40 |
6 | YOLOv5s+ Data Augmentation + ConcatE + Swin T + DIOU_Loss + L_BCE | 90.2 | 83.3 | 89.4 | 40 |
Model | mAP@0.5 (%) | Detection Speed (FPS) | Model Size (MB) |
---|---|---|---|
YOLOv5s | 72.6 | 53 | 14.6 |
YOLOv3 | 69.4 | 41 | 240.8 |
RetinaNet | 72.5 | 49 | 49.3 |
Faster-RCNN | 78.9 | 34 | 112.6 |
Improved algorithm | 89.4 | 40 | 15.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, W.; Liu, T.; Jiang, P.; Qi, A.; Deng, L.; Liu, Z.; He, Y. A Forest Wildlife Detection Algorithm Based on Improved YOLOv5s. Animals 2023, 13, 3134. https://doi.org/10.3390/ani13193134
Yang W, Liu T, Jiang P, Qi A, Deng L, Liu Z, He Y. A Forest Wildlife Detection Algorithm Based on Improved YOLOv5s. Animals. 2023; 13(19):3134. https://doi.org/10.3390/ani13193134
Chicago/Turabian StyleYang, Wenhan, Tianyu Liu, Ping Jiang, Aolin Qi, Lexing Deng, Zelong Liu, and Yuchen He. 2023. "A Forest Wildlife Detection Algorithm Based on Improved YOLOv5s" Animals 13, no. 19: 3134. https://doi.org/10.3390/ani13193134