Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

A Review of YOLO Object Detection Algorithms Based

Uploaded by

kiritoasashi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

A Review of YOLO Object Detection Algorithms Based

Uploaded by

kiritoasashi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Frontiers in Computing and Intelligent Systems

ISSN: 2832-6024 | Vol. 4, No. 2, 2023

A Review of YOLO Object Detection Algorithms based


on Deep Learning
Xiaohan Cong *, Shixin Li, Fankai Chen, Chen Liu, Yue Meng
College of Electronic Engineering, Tianjin University of Technology and Education, Tianjin 300222, China
* Corresponding author: Xiaohan Cong (Email: 1136094608@qq.com)

Abstract: Object detection is a research hotspot in the field of computer vision, and YOLO series shows good performance in
object detection, and has been widely used in robot vision, unmanned driving and other fields in recent years. This paper first
introduces the YOLO series algorithm, including the principle, innovation points, advantages and disadvantages of various
algorithms, then introduces the application field of YOLO series, and finally analyzes its future development trend to provide
reference for the topic research.
Keywords: Object Detection; YOLO; Deep Learning.

candidate network (RPN), overcomes the speed bottleneck of


1. Introduction Fast R-CNN, improves detection accuracy and operational
Object detection is a kind of image segmentation based on efficiency, and achieves end-to-end target detection. mAP
object geometric and statistical features, also known as object achieved 69.9% on the VOC2007 dataset. One of the fast
extraction. With the development of computer and object versions, mAP, achieved 59.9% and is the first ever near real-
detection theory, the object detection field of deep learning time deep learning object detector. At this point, the basic
should be in intelligent security, security monitoring, architecture of the two-stage detector is determined.
intelligent medical treatment, traffic detection, robot vision, This paper first introduces the YOLO series algorithm,
unmanned driving and other fields. [1]-[3] including the principle, innovation and advantages and
Based on the Convolutional Neural Network (CNN), in disadvantages of various algorithms, then introduces the
2012 AlexNet [4] won the champion of ImageNet image application field of YOLO series, and finally analyzes its
recognition competition by a significant margin, and since future development trend. [7]
then deep learning has received wide attention. OverFeat was
born in 2013 and is regarded as the pioneer of single-stage 2. YOLO Series Algorithm
object detection due to its integration of location and Development Process
detection technology. The proposal of R-CNN in 2014 has
achieved a very big breakthrough in object detection tasks. 2.1. Introduction to YOLO
After that, the object detection algorithm based on deep In 2016, the one-stage object detection network was
learning began to emerge in the field of natural image proposed. When tested in the same configuration, it was
recognition, and the two detection algorithms of single-stage found to be able to process 45fps images per second while
object detection and two-stage object detection also competed easily running the detection in real time. Because of its speed
with each other in their fields and excelled. The performance and its special use method, the authors give it the name YOLO
of R-CNN on VOC2007 has been significantly improved, and (You Only Look Once). [8]
the average detection accuracy (mAP) has been improved by This method completely abandons the detection mode of
nearly 25% compared with the traditional method. But the "region candidate + regression" in the two stages, scales the
regression process of this method is extremely time - and image to be measured into a uniform size and divides it into
memory-intensive. In the same year, in order to improve the multiple grids, then predicts the target category according to
complex training process of R-CNN and improve the the grid where the target center is located, and outputs the
detection speed, He proposed the Spatial pyramid pool detection results on the last convolution layer. The core idea
network (SPPNet) [5], which can effectively reduce the of YOLO is to transform target detection into a regression
information redundancy caused by repeated calculation. The problem, using the whole image as the input of the network,
detection speed is more than 20 times that of R-CNN, and the and just through a neural network, the location of the
mAP can reach 66%. The network only fine-tuned the full bounding box and its category can be obtained. [9]
connection layer, but did not deal with other feature layers in First, the input image is uniformly adjusted to 448×448
the training process. In 2015, Girshick once again proposed pixels, and then the image is divided into S×S grid. If the
the Fast R-CNN detector [6], which integrated the R-CNN center of the detection object falls into a grid cell, the grid cell
and SPPNet network structure, and was able to train both the is responsible for detecting the object. Each grid cell scores
target category detector and the bounding box regresser its bounding boxes to predict the likelihood of detection
during detection. It was found that the FAST R-CNN detector objects in the grid cells. If the predicted value is 0, it means
significantly accelerated the training process and test speed, no detection objects are present in the grid cells. Each of
and further improved the detection accuracy. With mAP YOLO's grid cells can provide an envelope of complex
reaching 66.9%. Soon after, Ren proposed Faster R-CNN [7], numbers, but an envelope only selects the highest scoring,
which replaces slow selective search with efficient regional most likely object for prediction. Since each grid cell can only

17
make local predictions, it can avoid falling between several 2.4. YOLOv4
cells with good confidence at the same time, but it is not a
Compared with YOLOv3 algorithm, YOLOv4 combines
matter of grabbing the things needed. [10] With the further
many previous research techniques, combines them with
development of neural networks, the YOLO series of object
appropriate innovative algorithms, and achieves a perfect
detection shows good performance, and many branches are
balance between speed and accuracy compared with the
derived, as shown in Figure 1 below.
previous YOLO series. However, YOLOv4 is not much
different from YOLOv3 in essence. YOLOv4 uses
CSPDarknet53 as the backbone network for feature extraction,
which reduces the consumption of computing resources. SPP
structure and residual connection network are introduced to
accelerate and optimize the convolution process, thus
improving the prediction performance and speed; post-
processing optimization measures such as dynamic
adjustment of IoU threshold and Logistic Activation instead
of SoftMax were added to optimize the accuracy and
robustness of detection results. The OpenCV DNN module is
used to optimize the forward calculation performance, and the
detection speed is greatly improved on the premise of
ensuring the detection accuracy. However, the computing
resource consumption is high, because the YOLOv4 model is
relatively large and requires a lot of calculation. Compared
Figure 1. Part of the YOLO series products with some other networks, YOLOv4 is more complex.
2.2. YOLOv2 Although YOLOV4 has great advantages in performance, it
is also more complex, requiring more training time and more
Compared with YOLOv1 algorithm, YOLOv2 adopts network parameters. Poor effect on s mall target detection:
Darknet19 network, which includes 19 convolution layers Because YOLOv4 uses a larger anchor box and higher
and 5 max pooling layers. 3x3 and 1x1 convolution are prediction resolution, its effect on small target detection is
mainly adopted, which are two kinds of convolution layers. worse than that of other algorithms.
The 1x1 convolution here can compress the channel number
of feature map. In this way, model computation and 2.5. YOLOv5
parameters can be reduced; The introduction of Anchor and Compared with YOLOv4 algorithm, YOLOv5 is smaller
K-means clustering improved the recall rate; In order to than YOLOv4 model, so it has higher computing efficiency
prevent overfitting, BN layer is used after each convolutional and lower video memory occupation. Some new evaluation
layer to speed up model convergence. Finally, global avg pool indexes are added to YOLOv5, such as mAP@.5, mAP@.75,
is used for prediction. The feature fusion module etc., which help to evaluate the model performance in a more
(passthrough) is introduced to fuse fine-grained features. By comprehensive way. YOLOv5 enhances the learning ability
using YOLOv2, the mAP value of the model is not of CNN, which makes it lightweight while maintaining its
significantly improved, but the calculation amount is reduced. accuracy in the detection process. YOLOv5 can deduce
However, YOLOv2 will lose small targets, because the effectively from single image, batch image, video and even
resolution of the feature map is subsampled during the design, webcam port input directly. YOLOv5s's object recognition
YOLOv2 cannot detect small targets well. [11] The CNN speed of up to 140FPS is impressive. From YOLOv5n to
infrastructure used in YOLOv2 is relatively simple and does YOLOv5x, the detection accuracy of these five YOLOV5S
not use RPN network, which leads to its accuracy gap models gradually increased, while the detection speed
compared with some advanced object detection algorithms. gradually decreased. [13] However, the biggest drawback of
2.3. YOLOv3 YOLOv5 is its weak target detection capability. In some
complex scenes, the effect will be weak. For non-standard
Compared with YOLOv2 algorithm, YOLOv3 adopts scenes, the performance is poor. If the scene is special or the
Darknet-53 as the backbone network. Compared with target distance is relatively far, the accuracy of YOLOv5 will
Darknet-19, YOLOv3 has deeper network, more parameters be decreased and the performance is poor. Shallow network
and more adequate training, so the detection performance is depth. Compared with some other target detection algorithms,
better. YOLOv3 adopts multi-scale prediction, which can YOLOv5 has a shallow network depth, which will affect its
improve the detection accuracy of small targets by detecting performance.
objects on different scales. Feature pyramids with different
convolution kernel sizes are used to detect objects of different 2.6. YOLOv7
sizes, which overcomes the shortcomings of YOLOv2 in at present, the model accuracy and inference performance
small target detection. YOLOv3 uses three anchor frames of are more balanced is YOLOv7 model (corresponding to the
different sizes, which can better adapt to objects of different open-source git version 0.1). YOLOV7 is the most advanced
sizes and aspect ratios; And the introduction of a residual algorithm of YOLO series at present, surpassing the previous
network structure, which is able to learn features better and YOLO series in accuracy and speed. YOLOv7 introduces the
speed up training. [12] However, YOLOv3's inference is ResNet50 deep residual network, replacing the
slower because the detection process needs to run multiple CSPDarkNet53 of YOLOv5; To improve the generation of
convolutional layers; YOLOv3 has problems with inaccurate Anchor frame, YOLOv7 introduces Anchor Free frame,
object positions. which does not require preset anchor frame, and can adapt to
various scales and aspect ratio targets; The multi-scale

18
training strategy can improve the detection ability of the and other aspects, target detection is a big topic, and real-time
model for s mall targets and better adapt to the change of online detection is the top priority, which is worthy of
image resolution. Mosaic data enhancement method is continued research and development. This paper puts forward
introduced, which can randomly combine images to increase several prospects for the future development direction of
the diversity and quantity of training samples. The CUDNN YOLO:
library is used for deep learning calculation, which improves 1. Data enhancement: Adding more data can improve the
the training and reasoning speed of the model; YOLOv7 accuracy of the model. Data enhancement can be achieved by
algorithm is built based on deep learning technology, which rotating, scaling, cutting, flipping, color transformation, etc.,
can improve its detection performance through continuous on the training image. Consider using more data enhancement
learning and support transfer learning of deep models. [14] techniques to augment the data set.
However, higher computing resources, more layers and 2. Better pre-trained model: Consider using stronger pre-
parameters are required, so higher computing resources are trained models to initialize the YOLOv7 network to improve
required, and more efficient Gpus are required for training the accuracy and robustness of the model. For example, larger
and reasoning. The detection effect of small size and crowded and deeper models such as ResNet, EfficientNet, etc., can be
scenes is not good. Although YOLOv7 has improved and used.
optimized it, the detection effect is still poor. 3 Adaptive learning rates: To better optimize the model,
you can use the adaptive learning rate to adjust the learning
3. YOLO Algorithm Application Field rate of each layer to better balance speed and precision during
based on Deep Learning training.
4. Improvement of activation function: It is possible to
YOLO series involves many application fields. For consider using other activation functions such as Swish, Mish,
YOLOv7, which has the best performance at present, the etc. to replace LeakyReLU used in YOLOv7 to improve the
following application examples are introduced: performance of the model.
1. Real-time target detection: YOLOv7 can be used for 5. Improvement of target detection loss function: The
real-time target detection, such as automatic driving, video cross-entropy loss function used by YOLOv7 is one of the
surveillance and security systems. It needs to process high- common target detection loss functions, but it may not be
resolution images and high-speed video, and can correctly optimal in some cases. You can try to use other target
detect targets, thus helping monitors and decision makers to detection Loss functions, such as Focal Loss, IoU Loss, etc.
react quickly. [15]- [16] 6. multi-task learning: In addition to target detection, other
2. Object recognition: YOLOv7 can also implement object tasks such as semantic segmentation and instance
recognition. For example, in a large warehouse, there are segmentation can be included in the training to improve the
many different types of products and equipment. Each object comprehensive performance of the model.
can be accurately identified and classified using YOLOv7, 7.Model compression: Some model compression
allowing for better management and control of inventory and techniques, such as pruning and quantization, can be
production processes. considered to reduce the model size and inference time of
3. Face recognition: YOLOv7's face detection function can YOLOv7, so as to adapt to the deployment on low-power
be used for applications such as security access control, devices. These are just some ideas, and how to combine and
intelligent gate, party check-in, etc. By using YOLOv7 for implement them still needs specific experiments and
face detection and recognition, it can ensure that only debugging. How to achieve these improvements while
authorized personnel can enter the area, and the activities of maintaining the performance of the model also needs to be
participants can be tracked. considered.
4. Natural Language Processing: YOLOv7 can also be used
in conjunction with natural language processing (NLP) 5. Conclusion
technology to enable automatic text extraction and
classification. For example, monitoring specific keywords on Based on the development history of YOLO, this paper
social media to see what people are saying about an event. introduces the principle, innovation points, advantages and
This can help businesses and institutions understand the views disadvantages, application fields and future development
of opinion leaders and consumers to make better marketing trend of YOLO series algorithms, and shows the effect that
strategies and decisions. YOLO series algorithms can achieve at present, which can
5. Robot control: YOLOv7 can be used as a vision engine greatly improve the efficiency of production and life in the
in robot control. Robots can use YOLOv7 to detect and track application field. It has great application prospects in the field
objects, such as humans, other robots or obstacles. This can of safety monitoring with high flexibility and timeliness. This
help the robot avoid collisions, locate the target correctly, and paper looks forward to YOLO series and provides ideas for
react quickly in case of an emergency. the future development of YOLO series.

4. Look to the Future References


The current YOLO algorithm based on deep learning has a [1] LIU Li, OUYANG Wanli, WANG Xiaogang, et al. Deep
learning for generic object detection: A survey[J]. International
great improvement in performance compared to traditional Journal of Computer Vision, 2020, 128(2): 261 -- 318. doi: 10.
detection algorithms. Although the detection speed and 1007/ s11263-019-01247-4.
accuracy are greatly improved, there are still some unsolved
problems, such as incomplete data set, less data for small [2] ZOU Zhengxia, SHI Zhenwei, GUO Yuhong, et al. Object
detection in 20 years: A survey[J]. arXiv preprint arXiv:
target detection, low detection performance, and failure to 1905.05055, 2019.
achieve real-time online detection without reducing detection
accuracy. At present, in the development of society, country

19
[3] DALAL N and TRIGGS B. Histograms of oriented gradients [9] Zhou Xiaoyan, Wang Ke, Li Lingyan. An overview of object
for human detection[C]. 2005 IEEE Computer Society detection algorithms based on Deep Learning [J]. Electronic
Conference on Computer Vision and Pattern Recognition, San Measurement Technology, 2017, 40(11):5.
Diego, USA, 2005:886-893. doi: 10.1109/ CVPR.2005.177.
[10] Zhang Qi, ZHANG Rongmei, Chen Bin. A review of image
[4] Liu Pu, Zhang Xing-Hui, Zhang Zhi-Li et al. Overview of recognition technology based on deep learning [J]. 2019.
Object Detection from RCNN to YOLO [C]// China High-tech
Industrialization Research Association Intelligent Information [11] Zheng Wei-cheng, LI Xue-Wei, LIU Hong-zhe. Overview of
Processing Industrialization Branch. The sixteenth national object Detection Algorithms based on Deep Learning [C]// The
signal and intelligent information processing and application of 22nd Annual Conference on New Network Technologies and
academic conference proceedings. [publisher unknown], Applications, 2018, Network Application Branch of China
2022:8. DOI: 10.26914 / Arthur c. nkihy. 2022.053359. Computer Users Association. 0.

[5] Liu Yang. Masks worn under the dense crowd scene detection [12] Liu Yan-Qing. Improved Object Detection Algorithm Based on
algorithm research [D]. Hebei university of engineering, 2022. YOLO Series [D]. Jilin University.
The DOI: 10.27104 /, dc nki. Ghbjy. 2022.000365. [13] Zheng Weicheng, LI Xuewei, LIU Hongzhe. An overview of
[6] Shi Duan-Yang, Lin Qiang, Hu Bing, ZHANG Xin-Yu. Radar object detection algorithms based on Deep Learning [J]. China
Science and Technology,2022,20(06):589-605. Broadband, 2022(3):3.

[7] Zhou Shujuan, Zhang Yu, Zhang Heng, Wang Qi, Liu Guangjie, [14] Li Bingzhen, Jiang Wenzhi, Gu Guide, et al. Review of object
Zhu Jinlong. Object recognition of TEM nanoparticle detection algorithms based on Convolutional neural Networks
structures based on deep learning [J]. Journal of Changchun [J]. Computer and Digital Engineering, 2022(005):050.
Normal University,2022,41(12):35-40. [15] Guo Zefang. Overview of Deep Learning Algorithms for Image
[8] Zhang Shan, Lu Yujiao, Luo Dawei. Overview of object Object Detection [J]. Mechanical Engineering & Automation,
Detection Algorithms based on Deep Learning [C]// 2019 (1):4.
Proceedings of the 12th Annual Conference on New Network [16] Wang Yan. Research on visual detection and tracking
Technologies and Applications, Network Application Branch, technology of surgical instruments based on deep learning.
China Computer Users Association, 2018.

20

You might also like