Deep Learning Techniques For Vehicle Detection and Classification From Images Videos - A Survey
Deep Learning Techniques For Vehicle Detection and Classification From Images Videos - A Survey
Deep Learning Techniques For Vehicle Detection and Classification From Images Videos - A Survey
Review
Deep Learning Techniques for Vehicle Detection and
Classification from Images/Videos: A Survey
Michael Abebe Berwo 1 , Asad Khan 2, * , Yong Fang 1 , Hamza Fahim 3, * , Shumaila Javaid 3 ,
Jabar Mahmood 1 , Zain Ul Abideen 4 and Syam M.S. 5
Abstract: Detecting and classifying vehicles as objects from images and videos is challenging in
appearance-based representation, yet plays a significant role in the substantial real-time applications
of Intelligent Transportation Systems (ITSs). The rapid development of Deep Learning (DL) has
resulted in the computer-vision community demanding efficient, robust, and outstanding services to
be built in various fields. This paper covers a wide range of vehicle detection and classification ap-
proaches and the application of these in estimating traffic density, real-time targets, toll management
and other areas using DL architectures. Moreover, the paper also presents a detailed analysis of DL
techniques, benchmark datasets, and preliminaries. A survey of some vital detection and classifica-
tion applications, namely, vehicle detection and classification and performance, is conducted, with a
detailed investigation of the challenges faced. The paper also addresses the promising technological
advancements of the last few years.
means to locate samples of real-world objects in images. In this context, vehicle detection is
closely related to vehicle classification, since it involves defining the presence and location
of the vehicle in an image. However, the image is useless unless it is properly analyzed to
extract useful knowledge. Hand-crafted features (namely, Histogram of Oriented Gradient
(HOG) [5], Haar [6], and LBP [7]) are the most appropriate techniques to detect vehicles,
but they fail to provide a general solution, and the classifiers require some modifications to
fit various parameters. A shallow neural network is utilized as well for vehicle detection,
though its performance has not provided the desired quality. Handling this massive amount
of data necessitates the growth of an innovative method capable of performing quickly,
precisely, and consistently. Advancing the efficiency of vehicle detection and classification
accuracy, precision, and robustness through DL techniques, such as DCNNs, RCNNs,
and DNNs, improves the robustness of schemes in detecting and classifying vehicles from
images or video frames.
Rapid improvement and innovative ideas are utilized to improve the accuracy of detec-
tion and classification of DL schemes and to reduce computational costs during the training
and testing phases of DL schemes. Among these innovative approaches are those involving
the modification of DCNNs, transferring learning (TL), hyper-parameter optimization,
and implementation of image-preprocessing techniques (enhancement, scaling, median
filtering, fuzzy filtering, and Ensemble Learning (EL), in the proposed DL architectures.
For better understanding , the abbreviations are given in the Abbreviations section.
The main contributions of this survey article are as follow:
• We survey the methodologies, benchmark datasets, loss and activation functions, and opti-
mization algorithms used in vehicle identification and classification in deep learning.
• We survey the strategies for vehicle detection and classification studies in Deep Con-
volutional Neural Networks.
• We address the taxonomy of deep learning approaches and other functions in object
detection and classification tasks (as shown in Figure 1).
• We present promising technological future directions and tasks in improving deep
learning schemes for researchers.
This paper is organized into the following sections. Section 2 explains a detailed
analysis of DL techniques. Section 3 discusses the publicly available benchmark datasets
and performance evaluation metrics. Section 4 explains the application of activation and
loss functions in DL. Section 5 explains the optimization algorithms in DL. Section 6 explains
applications of DL in vehicle detection and classification and compares recently employed
techniques. Section 7 briefly discusses some promising future directions and tasks that
have been adopted to improve and optimize DL schemes and to solve the difficulties and
challenges that occur during training and testing of the models. Section 8 is the conclusion
of the survey.
Sensors 2023, 23, 4832 3 of 35
Figure 1. Taxonomy of the Deep Learning Approaches in Vehicle Detection and Classification Tasks.
2.1. Techniques
In this subsection we discuss deep learning techniques.
Haar features are calculated by adding and subtracting the sums of rectangles and the
differences across an image patch. As this was highly efficient at calculating the symmetry
structure in detecting vehicles [11], it was ideal for real-time detection. The Haar feature
vector and the AdaBoost [12,13] were widely used in CV to detect objects in a variety of
feature applications, including vehicle recognition [11].
HOG features are extracted in the following phases:
• Evaluating the edge and discretizing the image;
• Removing edge sharpness.
The HOG feature vector integrated with the Support Vector Machine (SVM) classifier
has been widely employed to recognize object orientation, i.e., on-road vehicle detec-
tion [14,15]. The HOG–SVM [16] performed admirably in multi-vehicle detection tasks.
In addition, a blend of HOG [5] and Haar [6] was employed for vehicle recognition, detec-
tion, and tracking [17].
Sensors 2023, 23, 4832 5 of 35
Local Binary Pattern (LBP) [7] features have performed better in different applications,
including texture classification, face recognition, segmentation, image retrieval, and surface
crack detection. The cascade classifier (Haar–LBP–HOG feature) [18] is detects vehicles with
bounding boxes. In addition to the previously mentioned features and classifiers for vehicle
detection and classification problems, statistical architectures, based on horizontal and
vertical edge features, were proposed for vehicle detection [19], side-view car detection [20],
online vehicle detection [21], and vehicle detection in severe weather using HOG–LBP
fusion [22].
classification layers (hence, M represents the number of object classes and 1 represents the
background) to perform the final object classification. Optimizing convolution parameters,
such as IoU, is accomplished with SGD. An IoU of less than 0.5 is considered incorrect
for a region proposal; otherwise, it is correct. In R-CNN, without sharing computation,
the region proposal and classification problems are carried out independently. However,
R-CNN has problems concerning computational cost and training time for classification.
To solve the problem of too much time required in the training process, convolutional
feature maps with high resolution can be generated at a low cost using the Fast R-CNN
architecture proposed by Girshick [26].
Fast R-CNN: The Fast R-CNN [26] network takes as input an entire image and a set of
object proposals. It follows the following specific steps:
• Generate a convolution feature by using various convolution and max-pooling layers
on the entire image;
• Extract a fixed-length feature vector from the feature map for each object proposal of
Region of Interest pooling layers;
• Feed each feature vector into a sequence of FC layers to generate softmax probability
predictions over M object classes plus 1 background (M + 1). The other layer generates
four real-valued n. Fast R-CNN utilizes a streamlined training process with a fine-
tuning step that jointly optimizes a softmax classifier and Bbox regressors.
Training a softmax classifier, SVMs, and regressors in separate stages accelerates
the training time over the standard R-CNN architecture. The entire process architecture
includes loss, the SGD optimizer, the mini-batch sampling strategy, and BP through the
RoI pooling layers. However, Fast R-CNN uses a selective search approach over the
convolution feature map to explore its pooling map, increasing its run time. Using a new
region proposal network (RPN), Shaoqing et al. [27] proposed a faster RCNN architecture
to improve the Fast RCNN network in terms of run time and detection performance in
order to better estimate the object region at various aspect ratios and scales.
Faster R-CNN: In terms of operation time and detection performance, the faster
RCNN [27] is a more advanced variant of the RCNN. Instead of the traditional method,
selective search replaces RPN’s outstanding prediction of object regions at various scales
and aspect ratios. Anchors are placed at each convolutional feature location to create a
variety of region proposals. The anchor box in Faster RCNN has three different aspect
ratios and three different scales.
It comprises four systems to achieve object detection tasks: candidate region produc-
ing, feature extraction, classification, and location fine-tuning. In the RPN architecture,
the feature map is computed using a sliding window of 3 × 3, which is then output to the
Bbox classification and Bbox regression layers. Each point on the feature map is traversed
by the sliding window, which places z anchor boxes where they are needed. The feature
map’s z anchor boxes are used to extract its elements.
R-FCN: The two-step object detection architecture can be categorized into two distinct
groups. One group represents classification networks like GoogleNet [28], ResNet [29],
AlexNet [24], VGGNet [30]. Their computation is shared by all ROIs and an image test
is conducted using one forward computation. In the second group, no computation is
shared to all ROIs since it aims to classify the object regions. Dai et al. [31] proposed the
R-FCN architecture of an improved version of the faster RCNN and partially eliminated
the problem of position sensitivity and position variance by increasing the sharing of
convolutional parameters. For the RFCN algorithm, the primary goal is the creation of
“position-sensitive score maps.” If the ROI is not part of the object, it is determined by
comparing it to the ROI sub-region, which consists of the corresponding parts (s × s). There
is a shared convolutional layer at the end of the RFCN network’s network.
An additional layer of dimensional convolution (4 × s2 ) is applied to the score maps
to produce class-independent Bboxes. A softmax is used to calculate the results, after
averaging the s2 scores, to produce (M + 1) dimensional vectors.
Sensors 2023, 23, 4832 7 of 35
A comparison study was carried out on the most widely utilized two-step object
detectors on both the COCO dataset [32] and the PASCAL VOC 07 [33] dataset. In [34],
experimentation showed that RCNN achieved 66% of the mAP on the PASCAL VOC 07
dataset [33], while Fast RCNN achieved 66% of the same dataset. In addition, the Fast
RCNN network was nine times faster than the standard RCNN network. Wang et al. [35]
conducted a comparative study on three networks, namely, fast RCNN, faster RCNN,
and the RFCN, on two publicly available datasets, i.e., the COCO [32] dataset and the
PASCAL VOC 07 [33] dataset. On the COCO test dataset, faster RCNN improved detection
accuracy by 3.2% compared to slow RCNN. Furthermore, the tasking positions on both
RFCN and the faster RCNN on both datasets were compared. The experimental results
revealed that RFCN outperformed the faster RCNN with superior detection accuracy and
less operational run time. Table 1 displays the fundamental advantages and disadvantages
of the most widely utilized two-step object detectors.
Table 1. Summary of the Two-step Algorithms in Object Detection and Classification Applications.
Numerous single-step object detector algorithms have been utilized for various applications,
such as, among others, real-time vehicle object detection, vehicle recognition, in the last
couple of years. Some of the most widely employed algorithms are the following: SSD [36],
RetinaNet [37], YOLO [38], YOLOv2 [39], YOLOv3 [40], YOLOv4 [41], and YOLOv5 [42].
RetinaNet Algorithm: Lin et al. [37] proposed a RetinaNet algorithm that performs
the focal loss as a classification loss. It solves the class imbalance between the positive
and negative samples, which minimizes the prediction accuracy. The author introduced a
focal loss to minimize the weight loss by avoiding several negative samples given in the
background. The algorithm utilizes the ResNet [43] model as a backbone and FPN [44]
as feature extraction architecture. It consists of two processes: generating a set of region
proposals via FPN and classification of each candidate.
SSD Algorithm: Liu et al. [36] proposed an SSD algorithm based on a feedforward
convolutional architecture that generates a fixed-size sum of bounding boxes and scores for
existing object class samples, followed by an NMS stage to generate the detection process.
The SSD algorithm utilizes a VGG16 [43] architecture as a backbone for feature extraction
and six more convolutional layers for detection. It generates sequences of feature maps of
various scales, followed by a 3 × 3 filter on each feature map to generate default Bboxes. It
only detects at the top layers to get the best prediction Bbox and class label.
YOLO Algorithm: The YOLO algorithm [38] is a CNN-based object detection one-step
detector that was designed after two-step object detection became the faster RCNN detector.
The YOLO algorithm is most applicable for real-time image detection. It has a few region
proposals per image compared to the faster RCNN. It utilizes a grid size of (t × t) to split
the images into grid features for image classification. Grid cells can be used to estimate
Bbox bounding boxes and C class probabilities for C object classes for each box. For each
box, the probability (P) and the IOU between the ground truth and the box are considered.
The YOLO algorithm has 2 FC layers and 24 convolution layers. However, the algorithm
has the problem of weak object localization, which affects the classification accuracy.
YOLOv2 Algorithm: The YOLOv2 algorithm [39] is an improved version of the YOLO
algorithm in detection precision and offers higher speed than the standard YOLO algorithm.
It contains 6 consecutive tasks to efficiently perform the detection process, namely the BN,
high-resolution classifier, convolution with anchor box, various aspect ratios and scales of
the anchor box, fine-grained feature techniques, and multi-scale training.
The training process of the YOLOv2 algorithm [39] is carried out through the SGD
optimizer, which employs a mini-batch. For example, mean, mini-batch, and variance are
calculated and utilized for activation purposes.
Then, every mini-batch activation is normalized using the standard deviation of 1 and
0 mean. In the end, all elements in every mini-batch are sampled using an uniform distri-
bution. This process is carried out through techniques of batch normalization (BN) [45].
Sensors 2023, 23, 4832 9 of 35
Recent studies show that the CNN-based object detection algorithms (single-step
and two-step object detectors) are gaining momentum in vehicle detection/recognition
and classification. The algorithms are employed to detect and classify object classes from
images and videos. Kausa et al. [56] utilized both single and two-step object detector
approaches for two-wheeled and four-wheeled vehicle detection from publicly available
datasets. Vasavi et al. [57] also applied integrated YOLO and RCNN algorithms for vehicle
detection and classification from high-resolution images. In YOLOv3, a faster RCNN
algorithm for detecting vehicles at night, using tail light images, was implemented , by [58].
It is essential to understand some of the object detection algorithms’ strengths and
limitations (see Tables 1 and 2). The detection and classification performance of the model is
affected by various factors. Many studies have aimed to fix or decrease errors in predicting
the exact object class and to ensure the algorithms work better.
Table 2. Summary of the Single-step Algorithms in Object Detection and Classification Applications.
We summarized the performance of the one-step and two-step object detectors on the
COCO dataset and PASCAL VOC. The performance of deep learning-based object detection
is affected by a series of elements, such as the following: feature extraction classifiers, type
of backbone, image size and scale, training strategy, loss function, activation function,
number of region proposals, etc. These elements make it challenging to compare several
algorithms without a shared benchmark background. Table 3 shows the performance of
the various algorithms employed in object detection tasks. The algorithms were compared
using various performance evaluation metrics, such as FPs and average precision (AP)
at inference time. The AP0.5 represents the average precision of the object classes when
the estimated Bbox has IoU > 0.5 with ground truth and the AP0.5−0.95 in 0.5 steps. The
performances of the selected models were assessed on the same-sized input, where possible,
to offer flexibility between inference time and detection accuracy.
Sensors 2023, 23, 4832 11 of 35
Table 3. The summary of Performances of the Various Algorithms Employed in Object Detection.
tection challenges with three object classes: road, sky, and vertical. Further, Zhang et al. [63]
labeled 252 captured RGB images from Velodyne scans and the tracking challenges for
ten object classes: sky, car, building, vegetable, fence, cyclist, sidewalk, road, pedestrian,
and sign pole. Ros et al. [64] also labeled 216 images from two odometer challenges from
eleven object classes: sky, car, road, fence, bicyclist, sign, building, sidewalk, pedestrian,
pole, and tree.
Stanford Car Dataset: The Stanford Car Dataset [65] is one of the publicly available
car datasets for extensive research purposes. It contains 8144 training sample images and
8041 unseen images with object classes of 196 car types. It was launched in 2013, and its
publicity has increased in object class detection and scene. Authors has extensively re-
searched 3D object representations outperforming their 2D counter-parts for fine-grained
categorization, and illustrated their effectiveness for estimating 3D geometry from images.
MotorBike7500 Dataset: The MotorBike7500 Dataset [66] is one of the benchmark
motorcycle image datasets. It contains 7500 annotated images captured under real-time
road traffic scenes with 60% occlusion rate. The images were resized to 640 × 364 pixels
with 41,040 region of interest-annotated objects. The ground truth describes the frames
covered by the objects, class, name, height, and width of the Bbox surrounding the ob-
ject and provides an Id, which introduces a performance of 92% of the schemes on the
benchmark dataset.
MotorBike10000 Dataset: The MotorBike10000 Dataset [66] is the extension of Motor-
Bike7500 benchmark motorcycle image dataset. It contains a range of 10,000 annotated
images captured under windy conditions with 60% occlusion rate. The images were resized
to 640 × 364 pixels with 56,975 RoI annotated objects. The ground truth produced describes
the frames covered by the objects, class, name, height, and width of the Bbox surrounding
the object and provides an Id, which introduces the performance of 92% of the schemes on
the benchmark dataset.
Tsinghua–Tencent Traffic Sign Dataset: The Tsinghua–Tencent Traffic Sign (TTTS)
Dataset [67] consists of 30,000 samples of traffic signs and 100,000 images. The pictures are
captured under diverse climatic conditions and lighting.
Tsinghua–Daimler Cyclist Benchmark: The Tsinghua–Daimler Cyclist Benchmark
(TDCB) [68] provides a benchmark dataset for cyclist detection with six object classes:
Mopedrider, pedestrian, Tricyclist, Cyclist, Wheelchair user, and Motorcyclist. It consists of
Bbox of training, testing, and validation datasets of 16,202, 13,163, and 3045, respectively.
Experimental results show an average precision of 89% for the easy case, which gradually
reduces when the difficulty increases.
Cityscapes Dataset: The Cityscapes dataset [69] includes several collections of street
scenes 20,000 and 500 weakly marked and full-length pictures from 50 different cities under
diverse seasons, respectively.
GRAM Road-Traffic Monitoring (GRAM–RTM) Dataset: The GRAM–RTM Dataset [70]
consists of video clips recorded under diverse conditions and on several platforms using
surveillance cameras. It is widely utilized to evaluate the architecture of tracking several
vehicles labeled in different classes, such as large trucks, cars, trucks, and vans. Each video
clip contains 240 diverse object classes.
MIO–TCD Dataset: The MIO–TCD Dataset [71] is a dataset widely utilized for mo-
torized traffic analysis. It consists of 11 object categories, such as motorcycles, bicycles,
pedestrians, cars, buses, and trucks, with 786,702 labeled images captured under various
times, seasons, and periods using traffic surveillance cameras.
UA–DETRACT Benchmark Dataset: The UA–DETRACT Benchmark Dataset [72]
contains 100 video clips recorded at 24 diverse locations with diverse traffic patterns
and conditions, such as traffic crossings, highways, and T-junctions, using a Canon EOS
550D camera.
LSVH Dataset: The LSVH Benchmark Dataset [73] consists of 16 video clips of vehicles
with large-scale variations captured using surveillance cameras under diverse weather,
scene, time, and resolution conditions.
Sensors 2023, 23, 4832 13 of 35
COCO Dataset: The Microsoft COCO Benchmark Dataset [32] consists of 91 object
classes of 328,000 images with 2,500,000 labeled samples. It is also significantly more
prominent in several samples per class than PASCAL VOC [33].
PASCAL VOC Dataset: The PASCAL VOC Benchmark Dataset [33] is a publicly avail-
able dataset that contains annotated images collected from the Flickr photo-sharing website.
It is a widely utilized dataset in object detection and classification to evaluate architectures.
ImageNet Dataset: The ImageNet Benchmark Dataset [74] consists of 80,000 synets of
WillNet with an average of 500–1000 clean and full resolution images, having 12 subtrees
with 5247 synets and 3.2 million images.
Caltech101 Dataset: The Caltech101 Benchmark Dataset [75] consists of images of
101 object classes. It is widely utilized in object recognition tasks.
Caltech256 Dataset: The Caltech256 Benchmark Dataset [76] is a series of the Cal-
tech101 benchmark dataset which maximizes the object classes into 256 to improve the
performance of multi-class object recognition with few training samples.
DAWN Dataset: The purpose of the DAWN Dataset [77] is to explore the effectiveness
of vehicle detection and classification approaches of a wide range of natural images for
traffic situations in the cross-generalization of adverse environmental conditions. It shifts
substantially in terms of vehicle category, size, orientation, pose, illumination, position,
and occlusion. Furthermore, this dataset demonstrates a systematic preference for traffic
scenes during bad winter weather, heavy snowfall, sleet rain, hazardous weather, sand and
dust storms.
Table 4. Cont.
f ( x i ) = w T + bi (1)
Data input, weight, and biases are all represented by xi , w, and bi , respectively. Ad-
ditional computation is then necessary to translate these linear outputs into non-linear
outputs for the AF, notably to learn patterns in data from the mapping from Equation (2).
These net architectures produce the following results:
Each layer’s output is fed into a subsequent layer until the final output is achieved,
but, by default, they are linear. For each net, the anticipated output determines the type
Sensors 2023, 23, 4832 15 of 35
of AF deployed. Since the output is linear, non-linear results are not an issue. Transfer
functions (TF) are applied to the outputs of linear net architectures to generate addi-
tional computation for the converted non-linear outputs. Mathematically, it is defined in
Equation (3).
The loss function consists of classification loss (Cls) and location loss (Lls). The deep
two-step object detector algorithms equip a hybrid of both L1 loss and Cross-Entropy [86]
for regression and Bbox classification. In contrast, the deep single-step object detector
algorithms suffer from severe positive–negative instance imbalance, due to dense sampling
of possible object locations. Lin et al. [37] proposed Focal Loss to solve the imbalance
problem. However, optimizing object detectors with traditional detection approaches
to loss functions may result in sub-optimal solutions due to limited connections with
performance evaluation metrics. Therefore, Jiang et al. [87] predicted IOU during training,
IOU loss series in IOU loss, bounded IOU loss, and generalized IOU loss. To directly
optimize IOU between estimated and actual values, IOU loss and distance IOU loss are
used. This work epitomizes the essence of developing practical loss functions toward better
orientation with performance evaluation metrics for object detection tasks.
Regression-based problems using loss functions have merit and limitations. Table 10
shows some of the pros and limitations of commonly used loss functions in regression-
based problems.
Hence, φt is the LR , and 5 F (δt ) is the gradient of the cost function for the tth iterate.
Stochastic Gradient Descent (SGD): this updates the parameters (δt ) frequently, so
the objective function is subject to wild swings, due to the SGD [91] algorithm’s rapid
gradient computations and improvement. Nevertheless, a sluggish learning rate can
improve SGD, resulting in a lengthy training period. In addition, the architecture’s speed
is hampered by the frequent transfer of data between GPU memory and local memory.
The mathematical process of the SGD algorithm is depicted in Equation (5).
Hence, Fi (δ) , l (yi , f δ ( xi )) at the tth iteration, randomly pick i and update the parameter.
Nesterov Momentum (NM): In this method, the gradient is calculated based on fu-
ture positions of the parameters rather than the current positions of the parameters [92].
An increase in momentum does not indicate where the parameters end up. A mathematical
representation of the NM algorithm can be found in Equation (6).
mt = β t−1 + (1 − β) 5 Fi (δt )
(6)
δt+1 = δt − αt mt
Gt = Gt−1 + ∆F (δt )2
φ (7)
δt+1 = δt − √ ∆F (δt )
Gt + e
where G is the sum of the past gradients and e is a small value for numerical stability.
However, the Adagrad approach has the disadvantage of treating all the past gradients
equally and manually selecting global LR. It also uses exponentially weighted decay for
the history gradients. It is suggested that an Adadelta algorithm solves these limitations.
Adadelta: The Adadelta optimization approach was derived from the Adagrad ap-
proach so as to improve the following limitations of the Adagrad [94]:
• The continual decay of φs throughout the training phase;
• The requirement for a manually selected global learning rate.
Thus, it combines the merits of the Adagrad and Momentum approaches. Mainly, it scales
the LR based on the past gradient. Nevertheless, it only utilizes the latest time window
instead of the whole history, as is the case for Adagrad. It also employs a component that
Sensors 2023, 23, 4832 20 of 35
where, η is weight decay, e is a small value for numerical stability, and φ is the learning rate.
Adaptive Momentum Estimation: The Adaptive Momentum Estimation [96] is an
alternative method that calculates adaptive LRs for each parameter. Furthermore, it stores
the exponential weighted-decaying mean of the historical squared gradients. It combines
the RMSProp and momentum approaches with a bias correction mechanism. Adam’s up-
date rule consists of the following steps, and, mathematically, it is defined in Equation (10).
Hence, β 1 can be 0.9, β 2 can be 0.999, and e is a small value for numerical stability. mt
the mean gradient, vt is the uncentered variance of the gradients.
Adapg: The Adapg is also a new optimization algorithm, which combines both the
Adadelta and Adam optimizers [88]. Mathematically, it is defined in Equation (11).
The optimization algorithms have been widely utilized to reduce errors and accelerate
architecture processing time with less computational cost by updating the parameters on the
dataset samples. A comparison study [90] of optimization approaches for DL architectures
using four publicly available datasets was conducted to investigate the efficiency of the
approaches. The datasets were labeled as Faces in the Wild (LFW), MNIST, Kaggle Flowers,
and CIFAR10 by pointing out their various attributes against SGD, NM, Adagrad, Adadelta,
RMSProp, and Adam OAs. Zaheer et al. [99] conducted a study of OAs on training DL
architectures involving the learning of the parameters to meet the loss function to reduce the
loss during the training phase. They employed six methods using different datasets: MNIST,
CIFAR10, FASHIONMNIST, and CIFAR100 on SGD, NM, Adagrad, Adadelta, RMSProp,
and Adam approaches. They achieved the optimal training results for FASHIONMNIST
1.0 with RMSProp and Adam at 400 epochs, MNIST 1.0 with RMSProp and Adam at
200 epochs, CIFAR100 1.0 with RMSProp and Adam at 100 epochs, and CIFAR10 1.0 with
RMSProp and Adam at 200 epochs. Their experimental results illustrated that the Adam
optimizer performed outstandingly at the testing stage and RMSProp with Adam at the
training step.
To summarize, RMSProp is Adagrad’s extension designed to alleviate the significantly
reduced LR. It is identical to Adadelta, except that Adadelta utilizes the RMS of parameter
updates in the numerator update rule. Finally, Adam summarizes bias correction and
momentum to RMSProp. RMSProp, Adam, and Adadelta are similar approaches that
outperform in related fashions. According to Zaheer et al. [99], its bias-correction aids
Adam optimizer in outperforming RMSProp during testing and RMSProp with Adam
during training. From various studies and papers, Adam might be the special optimization
algorithm overall choice [100].
They increased the usage of the architecture in the following ways: NAS optimization
and feature enrichment. There are several steps in this process. First, they implemented a
Retinax-based image adaptive correction algorithm to improve image quality and minimize
shadow and illumination effects. Then, they utilized a backbone model, NAS, for feature
extraction in order to produce the best cross-layer connection for extracting multiple layers
of features. Finally, they used object feature enrichment to integrate the multiple layers of
features and contextual data.
Beyond designing robust or context-assisted object detectors, several studies have
been conducted on various approaches. Nguyen et al. [81] proposed an improved system
based on faster RCNN for fast vehicle detection. They replaced the NMS algorithm with the
Soft-NMS algorithm to solve the problem of duplicate proposals, and a contextual-aware
RoI pooling layer was adopted to adjust the proposals to a specified size without losing
crucial contextual information. At the end of the MobileNet algorithm, the framework of
depth-wise separable convolution is used to generate a classifier for each identified vehicle.
Wang et al. [22] proposed an R-FCN algorithm equipped with deformable convolution
and RoI pooling for vehicle detection. It has a better detection time and more precision.
Wang et al. [35] conducted comparative studies on the most widely employed algorithms,
Faster RCNN, RetinaNet, YOLOv3, RFCN, and SSD. They showed that RFCN is very
powerful for generalizing real scenes and has outstanding detection on rainy days and at
nighttime. Moreover, the SSD network also has good generalization ability and can detect
most target vehicles in an environment with poor lighting conditions.
Arora et al. [110] recommended a fast RCNN architecture to detect vehicles under
various environmental conditions. The proposed model obtained an average of recall,
accuracy, and precision of 98.44%, 94.20%, and 90%, respectively. Charouh et al. [111]
suggested a resource-efficient CNN-based model for detecting moving vehicles on large-
scale datasets. Rajput et al. [112] proposed a toll management system, using Yolov3
architecture, for vehicle identification and classification. Amrouche and his colleagues
proposed a Yolov4 architecture for a real-time vehicle detection and tracking system [113].
Wang et al. [114] introduced an integrated part-aware refinement network, which combines
multi-scale training and component confidence generation strategies in vehicle detection.
This system improves detection accuracy and time taken in detecting various vehicles on
publicly available datasets.
Faris et al. [115] proposed a Yolo-v5 architecture vehicle detector using the techniques
of transfer learning on publicly available datasets, namely, PKU, COCO, and DAWN.
The experimental result showed that the proposed model achieved a state-of-the-art in the
detection of various vehicles. Huang et al. [116] introduced an embedded system of Yolov4,
K-means and TensorRT to detect the real-time target from UAV images. They achieved a
confidence and miss detection rate of 89.6% and 3.8%, respectively. Furthermore, to balance
the architecture’s detection accuracy and computational complexity, Qiu et al. [117] intro-
duced a linear transform approach, increasing the detection accuracy and the detection
frame using simple operations over the input image. However, the road and the various
shapes and sizes of vehicles affect the system’s detection accuracy and detection frame in
the detecting and recognizing scheme. Yolov7-RAR was proposed to minimize the miss
detection of non-linear features and speed up the architecture in [118].
To further improve detection accuracy, some researchers implemented an ensemble
learning technique on pre-trained models. Mittal et al. [119] proposed an EnsembleNet
model for vehicle detection and estimation of traffic density with a detection accuracy of
98%. Figure 7 is a sample block diagram of the vehicle detection process, using multi-type
vehicle images, and based on fine-tuned DNN models.
Sensors 2023, 23, 4832 24 of 35
dramatically shifted from the model-based approach to the vision-based approach to improve
classification accuracy and to resolve the challenges faced during real-time classification.
Several DL studies have been conducted to address classification problems since the
excellent performance, exhibited by Krizhevsky et al. [24] in the ImageNet LSVTC [126]
using DConvNets. Several DL studies have been conducted to address classification prob-
lems. Szegedey et al. [28] introduced a novel DNN using Inception networks that maximize
the depth of architectures without increasing the number of parameters. Simonyan and
Zisserman [30] demonstrated that 3 × 3 receptive fields in the first conv layers were more
effective than 11 × 11 receptive fields with stride four or 7 × 7 with a stride of 2, which
improved the performance on ILSVRC.
Manugmai and Nuthong [127] proposed a DL-based vehicle classification approach to
classify vehicle type and color. They showed that the ConvNet architecture outperformed
the conventional machine learning approaches in classification. Wang et al. [128] proposed
AVC using center-strengthened ConvNet to extract more features from a central image
by ROI pooling, based on the VGG model joined with the ROI pooling layer to obtain
elaborate feature maps. Awang and Azmi [129] presented a ConvNet architecture with a
skipping strategy model to classify vehicles with identical sizes of different object classes,
and Jahan et al. [130] proposed real-time vehicle classification using ConvNet. They used
two ways to find features and classify different types of vehicles.
Lee and Chung [131] proposed a DL-based vehicle classification using an ensemble
of K local experts and global networks. They used multi-crop testing, network training
of k local experts, and global networks with an ensemble of AlexNet [126], ResNet [29],
and GoogleNet [28] to classify various vehicles.They achieved outstanding performance
on the MIT–CCD classification challenges. In order to improve the mean precision of the
models, Liu et al. [132] proposed a two-step approach of DA and an ensemble of ConvNet
algorithms to solve the imbalance dataset problem in calibrating with hyperparameter opti-
mization of parameters. They showed that the ensemble technique with DA improved the
precision. Liu et al. [80] presented a semi-supervised network motivated by a combination
of various DNNs with DA techniques based on GAN. It includes several steps to improve
classification accuracy on the MIO–TCD dataset.
Furthermore, Jagannathan et al. [133] proposed a GMM and ensemble DL approach to
detect and classify various moving vehicles on both the BIT-vehicle dataset and the MIO-
TCD dataset. They utilized adaptive histogram equalization and GMM to improve image
quality, and a steerable pyramid transform and Weber local descriptor (WLD) were used
to extract feature vectors. Then, the extracted feature vectors were fed into the ensemble
Dl approach for the vehicle classification task. They showed that the proposed model
outperformed the benchmark models on both datasets.
Table A2 summarizes the comparison of DL-based vehicle classification architectures
from the literature review. The type of loss function used, the different datasets used,
different hyperparameters, the framework of the model, and the type of hardware used all
lead to different results.
7. Future Directions
Despite the rapid growth and promising object detection and classification processing
in DL applications, there are still several open issues for future work.
Various methods for detecting and classifying small vehicles in publicly available
datasets have been developed. To enhance the classification and localization accuracy of
small vehicle objects under several occlusions, inter-class variation, intra-class variation,
illumination, light, environment, etc. it is necessary to modify the model architecture in the
following aspects:
Multi-task joint optimization and Multi-model information combination: Due to the
relationship between several tasks in vehicle object classification and detection, Multi-task
joint optimization has been studied by several researchers, such as the following: in person
re-identification [134], human action grouping and recognition [135], dangerous object de-
Sensors 2023, 23, 4832 26 of 35
tection [134], fast object detection [136], multi-task vehicle recognition and tracking [137],
multi-task vehicle pose estimation [138]. Moreover, several approaches have been integrated
to improve the performance of the architectures.
Scale and size alteration: Objects typically appear in a variety of scales and sizes, which
is more noticeable in small objects. For scale- or size-variant objects, multi-scale object
classifiers and detectors are required to maximize the robustness to scale and size changes.
Powerful backbone algorithms, such as ResNet, Inception, MobileNet, and AlexNet, can
be utilized for scale-/size-invariant detection and classification tasks. FPN generates
multi-scale feature maps and GAN-based narrow representation variations between small
and vast objects with lower computational complexity for the multi-scale detectors and
classifiers. The network offers insights into producing a meaningful feature pyramid
for scale-adaptive detectors. It is necessary to integrate cascade architecture and scale
distribution estimation to identify objects adaptively.
Spatial Correlations and Contextual Modeling: Spatial distribution plays an essential
role in object detection and image classification. Therefore, region proposal generation and
grid regression are employed to get probable object locations. However, the corrections
between several proposals and object classes are disregarded. In addition, the global
structure information is uncontrolled by position-sensitive score maps in RFCN. To solve
these problems, use of various techniques, such as sequential reasoning tasks and subset
selection, in a collaborative way is advocated.
Cascade Architecture: In the cascade network, a cascade of detectors is built in several
phases. However, the existing cascade architectures are made greedy, where previous
phases in cascades are fixed when training a new phase. So, the optimization of different
ConvNets cannot be accomplished, which makes the need for end-to-end optimization for
the ConvNet cascade architecture even more important.
Weakly supervised and Unsupervised Learning: Practically, it is inefficient and labor-
intensive to label a large volume of bounding boxes manually. To address this issue,
different architectures can be combined to perform exceptionally well by utilizing image-
level supervision to assign object classes to match object regions and object boundaries.
This technique leads to improved detection flexibility and minimized labor costs.
Model Optimization: A technique of model optimization in DL applications and
schemes is essential to balance accuracy, speed, and memory, by choosing an optimal
detector and classifier.
Detection or Classification in Videos: Real-time object classification and detection in
videos is a significant issue for video surveillance and autonomous driving. Conventional
object classifiers or detectors are usually designed for image-wise detection and classifica-
tion, while simply ignoring the correlations between video frames. An essential direction
of research is to enhance detection or classification performance by searching for spatial
and temporal correlations.
Lightweight Classification or Detection: The lightweight architectures have been
greatly compromised by classification errors developing in models. There is still a shortage
of detection accuracy. While great efforts have been made in recent years, the speed of
detection and of classification speed are not yet balanced.
8. Conclusions
In this paper, a comprehensive survey of some of the significant growth, successes,
and demerits associated with applying DL techniques in vehicle (object) detection and
classification is presented. To prove the efficiency of applying DL techniques in vehicle
(object) detection and classification, benchmark datasets, loss functions, activation func-
tions, and various experiments and studies recently implemented and completed in vehicle
detection and classification are reviewed. Detailed analysis of deep learning techniques
and reviews of some significant detection and classification applications in vehicle de-
tection and classification, in-depth analysis of their challenges and promising technical
improvements in recent years are addressed. Finally, we suggest many future directions in
Sensors 2023, 23, 4832 27 of 35
thoroughly understanding the object detection and classification landscape. This survey
is also meaningful for the growth of Nets and related learning frameworks, which offer
valuable insights and guidelines for future progress.
Author Contributions: Conceptualization M.A.B., Y.F. and H.F.; investigation. M.A.B., J.M., S.J. and
H.F.; writing-original draft preparation, M.A.B., J.M. and S.J.; Experiments, M.A.B., Z.U.A and Y.F.;
Review and editing, M.A.B., J.M. and A.K.; Supervision, Y.F.; Funding acquisition, A.K., H.F., S.J.
and S.M.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was sponsored by the Guangzhou Government Project under Grant No.
62216235 and the National Natural Science Foundation of China (Grant No. 622260-1).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare that they have no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
Appendix A
Appendix A.1
Table A1. Summary of Various Algorithms and Datasets utilized in Vehicle Detection .
Appendix A.2
Table A2. Summary of Various Algorithms and Datasets utilized in Vehicle Classification.
References
1. Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Berlin, Germany, 2022.
2. Hassaballah, M.; Hosny, K.M. Recent advances in computer vision. Stud. Comput. Intell. 2019, 804, 1–84.
3. Javaid, S.; Zeadally, S.; Fahim, H.; He, B. Medical Sensors and Their Integration in Wireless Body Area Networks for Pervasive
Healthcare Delivery: A Review. IEEE Sens. J. 2022, 22, 3860–3877. [CrossRef]
4. Berwo, M.A.; Fang, Y.; Mahmood, J.; Retta, E.A. Automotive engine cylinder head crack detection: Canny edge detection with
morphological dilation. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit
and Conference (APSIPA ASC), Tokyo, Japan, 14–17 December 2021; pp. 1519–1527.
5. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20– 25 June 2005; Volume 1,
pp. 886–893.
6. Mita, T.; Kaneko, T.; Hori, O. Joint haar-like features for face detection. In Proceedings of the Tenth IEEE International Conference
on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; Volume 2, pp. 1619–1626.
Sensors 2023, 23, 4832 30 of 35
7. Zhang, G.; Huang, X.; Li, S.Z.; Wang, Y.; Wu, X. Boosting local binary pattern (LBP)-based face recognition. In Proceedings of the
Chinese Conference on Biometric Recognition, Guangzhou, China, 13–14 December 2004; Springer: Berlin/Heidelberg, Germany,
2004; pp. 179–186.
8. Javaid, S.; Saeed, N.; Qadir, Z.; Fahim, H.; He, B.; Song, H.; Bilal, M. Communication and Control in Collaborative UAVs: Recent
Advances and Future Trends. IEEE Trans. Intell. Transp. Syst. 2023, 1–21. [CrossRef]
9. Fahim, H.; Li, W.; Javaid, S.; Sadiq Fareed, M.M.; Ahmed, G.; Khattak, M.K. Fuzzy Logic and Bio-Inspired Firefly Algorithm
Based Routing Scheme in Intrabody Nanonetworks. Sensors 2019, 19, 5526. [CrossRef] [PubMed]
10. Javaid, S.; Fahim, H.; Zeadally, S.; He, B. Self-powered Sensors: Applications, Challenges, and Solutions. IEEE Sens. J. 2023, 1.
[CrossRef]
11. Wen, X.; Zheng, Y. An improved algorithm based on AdaBoost for vehicle recognition. In Proceedings of the 2nd International
Conference on Information Science and Engineering, Wuhan, China, 25–26 December 2010; pp. 981–984.
12. Broggi, A.; Cardarelli, E.; Cattani, S.; Medici, P.; Sabbatelli, M. Vehicle detection for autonomous parking using a soft-cascade
AdaBoost classifier. In Proceedings of the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Ypsilanti, MI, USA, 8–11 June
2014; pp. 912–917.
13. Tang, Y.; Zhang, C.; Gu, R.; Li, P.; Yang, B. Vehicle detection and recognition for intelligent traffic surveillance system. Multimed.
Tools Appl. 2017, 76, 5817–5832. [CrossRef]
14. Ali, A.M.; Eltarhouni, W.I.; Bozed, K.A. On-Road Vehicle Detection using Support Vector Machine and Decision Tree Clas-
sifications. In Proceedings of the 6th International Conference on Engineering & MIS 2020, Istanbul, Turkey, 4–6 July 2020;
pp. 1–5.
15. Javaid, S.; Wu, Z.; Fahim, H.; Fareed, M.M.S.; Javed, F. Exploiting Temporal Correlation Mechanism for Designing Temperature-
Aware Energy-Efficient Routing Protocol for Intrabody Nanonetworks. IEEE Access 2020, 8, 75906–75924. [CrossRef]
16. Wei, Y.; Tian, Q.; Guo, J.; Huang, W.; Cao, J. Multi-vehicle detection algorithm through combining Harr and HOG features. Math.
Comput. Simul. 2019, 155, 130–145. [CrossRef]
17. Shobha, B.; Deepu, R. A review on video based vehicle detection, recognition and tracking. In Proceedings of the 2018 3rd
International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), Bengaluru,
India, 20–22 December 2018; pp. 183–186.
18. Ren, H.; Li, Z.N. Object detection using generalization and efficiency balanced co-occurrence features. In Proceedings of the IEEE
International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 46–54.
19. Sun, Z.; Bebis, G.; Miller, R. On-road vehicle detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 694–711.
20. Ren, H. Boosted Object Detection Based on Local Features. Ph.D. Thesis, Applied Sciences, School of Computing Science,
Burnaby, BC, Canada, 2016.
21. Neumann, D.; Langner, T.; Ulbrich, F.; Spitta, D.; Goehring, D. Online vehicle detection using Haar-like, LBP and HOG feature
based image classifiers with stereo vision preselection. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los
Angeles, CA, USA, 11–14 June 2017; pp. 773–778.
22. Wang, Z.; Zhan, J.; Duan, C.; Guan, X.; Yang, K. Vehicle detection in severe weather based on pseudo-visual search and HOG–LBP
feature fusion. Proc. Inst. Mech. Eng. Part J. Automob. Eng. 2022, 7, 1607–1618. [CrossRef]
23. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
24. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf.
Process. Syst. 2017, 60 , 84–90. [CrossRef]
25. Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013,
104, 154–171. [CrossRef]
26. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December
2015; pp. 1440–1448.
27. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28 , 1137–1149. [CrossRef]
28. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June
2015; pp. 1–9.
29. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
30. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
31. Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst.
2016, 29. Available online: https://proceedings.neurips.cc/paper_files/paper/2016/file/577ef1154f3240ad5b9b413aa7346a1e-
Paper.pdf (accessed on 25 April 2023).
32. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in
context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer:
Berlin/Heidelberg, Germany, 2014; pp. 740–755.
Sensors 2023, 23, 4832 31 of 35
33. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J.
Comput. Vis. 2010, 88, 303–338. [CrossRef]
34. Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021,
51, 6400–6429. [CrossRef]
35. Wang, H.; Yu, Y.; Cai, Y.; Chen, X.; Chen, L.; Liu, Q. A comparative study of state-of-the-art deep learning algorithms for vehicle
detection. IEEE Intell. Transp. Syst. Mag. 2019, 11, 82–95. [CrossRef]
36. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg,
Germany, 2016; pp. 21–37.
37. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988.
38. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788.
39. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
40. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
41. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
42. Wen, H.; Dai, F. A Study of YOLO Algorithm for Multi-target Detection. J. Adv. Artif. Life Robot. 2021, 2, 70–73.
43. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections
on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA,
4–9 February 2017.
44. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125.
45. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings
of the International Conference on Machine Learning, PMLR, Lille, France, 6 July–1 July 2015; pp. 448–456.
46. Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern
Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855.
47. Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636.
48. Yang, G.; Feng, W.; Jin, J.; Lei, Q.; Li, X.; Gui, G.; Wang, W. Face mask recognition system with YOLOV5 based on image
recognition. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu,
China, 11–14 December 2020; pp. 1398–1404.
49. Javaid, S.; Wu, Z.; Hamid, Z.; Zeadally, S.; Fahim, H. Temperature-aware routing protocol for Intrabody Nanonetworks. J. Netw.
Comput. Appl. 2021, 183–184, 103057. [CrossRef]
50. Song, X.; Gu, W. Multi-objective real-time vehicle detection method based on yolov5. In Proceedings of the 2021 International
Symposium on Artificial Intelligence and its Application on Media (ISAIAM), Xi’an, China, 21–23 May 2021; pp. 142–145.
51. Snegireva, D.; Kataev, G. Vehicle Classification Application on Video Using Yolov5 Architecture. In Proceedings of the 2021
International Russian Automation Conference (RusAutoCon), Sochi, Russia, 5–11 September 2021; pp. 1008–1013.
52. Berwo, M.A.; Wang, Z.; Fang, Y.; Mahmood, J.; Yang, N. Off-road Quad-Bike Detection Using CNN Models. In Proceedings of
the Journal of Physics: Conference Series, Nanjing, China, 25-27 November 2022; IOP Publishing: Bristol, UK, 2022; Volume 2356,
p. 012026.
53. Jin, X.; Li, Z.; Yang, H. Pedestrian Detection with YOLOv5 in Autonomous Driving Scenario. In Proceedings of the 2021 5th CAA
International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–5.
54. Li, Y.; He, X. COVID-19 Detection in Chest Radiograph Based on YOLO v5. In Proceedings of the 2021 IEEE International
Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China,
24–26 September 2021; pp. 344–347.
55. Berwo, M.A.; Fang, Y.; Mahmood, J.; Yang, N.; Liu, Z.; Li, Y. FAECCD-CNet: Fast Automotive Engine Components Crack
Detection and Classification Using ConvNet on Images. Appl. Sci. 2022, 12, 9713. [CrossRef]
56. Kausar, A.; Jamil, A.; Nida, N.; Yousaf, M.H. Two-wheeled vehicle detection using two-step and single-step deep learning models.
Arab. J. Sci. Eng. 2020, 45, 10755–10773. [CrossRef]
57. Vasavi, S.; Priyadarshini, N.K.; Harshavaradhan, K. Invariant feature-based darknet architecture for moving object classification.
IEEE Sens. J. 2020, 21, 11417–11426. [CrossRef]
58. Li, Q.; Garg, S.; Nie, J.; Li, X.; Liu, R.W.; Cao, Z.; Hossain, M.S. A highly efficient vehicle taillight detection approach based on
deep learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4716–4726. [CrossRef]
59. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the
2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361.
60. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237.
[CrossRef]
Sensors 2023, 23, 4832 32 of 35
61. Alvarez, J.M.; Gevers, T.; LeCun, Y.; Lopez, A.M. Road scene segmentation from a single image. In Proceedings of the European
Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 376–389.
62. Ros, G.; Alvarez, J.M. Unsupervised image transformation for outdoor semantic labelling. In Proceedings of the 2015 IEEE
Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; pp. 537–542.
63. Zhang, R.; Candra, S.A.; Vetter, K.; Zakhor, A. Sensor fusion for semantic segmentation of urban scenes. In Proceedings of the
2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 1850–1857.
64. Ros, G.; Ramos, S.; Granados, M.; Bakhtiary, A.; Vazquez, D.; Lopez, A.M. Vision-based offline-online perception paradigm for
autonomous driving. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI,
USA, 5–9 January 2015; pp. 231–238.
65. Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D Object Representations for Finet-Grained Categorization. In Proceedings of the 4th
International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 8 December 2013.
66. Espinosa, J.E.; Velastin, S.A.; Branch, J.W. Motorcycle detection and classification in urban Scenarios using a model based on
Faster R-CNN. arXiv 2018, arXiv:1808.02299.
67. Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-sign detection and classification in the wild. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2110–2118.
68. Li, X.; Flohr, F.; Yang, Y.; Xiong, H.; Braun, M.; Pan, S.; Li, K.; Gavrila, D.M. A new benchmark for vision-based cyclist detection.
In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV), Gotenburg, Sweden, 19–22 June 2016; pp. 1028–1033.
69. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset
for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223.
70. Guerrero-Gómez-Olmedo, R.; López-Sastre, R.J.; Maldonado-Bascón, S.; Fernández-Caballero, A. Vehicle tracking by simultaneous
detection and viewpoint estimation. In Proceedings of the International Work-Conference on the Interplay Between Natural and
Artificial Computation, Mallorca, Spain, 10–14 June 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 306–316.
71. Luo, Z.; Branchaud-Charron, F.; Lemaire, C.; Konrad, J.; Li, S.; Mishra, A.; Achkar, A.; Eichel, J.; Jodoin, P.M. MIO-TCD: A new
benchmark dataset for vehicle classification and localization. IEEE Trans. Image Process. 2018, 27, 5129–5141.
72. Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A new benchmark and protocol
for multi-object detection and tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [CrossRef]
73. Hu, X.; Xu, X.; Xiao, Y.; Chen, H.; He, S.; Qin, J.; Heng, P.A. SINet: A scale-insensitive convolutional neural network for fast
vehicle detection. IEEE Trans. Intell. Transp. Syst. 2018, 20, 1010–1019. [CrossRef]
74. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of
the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
75. Li, F.F.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611.
76. Griffin, G.; Holub, A.; Perona, P. Caltech-256 object category dataset. 2007. Available online: https://authors.library.caltech.edu/
7694/?ref=https://githubhelp.com (accessed on 25 April 2023).
77. Kenk, M.A.; Hassaballah, M. DAWN: Vehicle detection in adverse weather nature dataset. arXiv 2020, arXiv:2008.05402.
78. Zuraimi, M.A.B.; Zaman, F.H.K. Vehicle Detection and Tracking using YOLO and DeepSORT. In Proceedings of the 2021 IEEE
11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 3–4 April 2021; pp. 23–29.
79. Xu, B.; Wang, B.; Gu, Y. Vehicle detection in aerial images using modified yolo. In Proceedings of the 2019 IEEE 19th International
Conference on Communication Technology (ICCT), Xi’an, China, 16–19 October 2019; pp. 1669–1672.
80. Liu, W.; Liao, S.; Hu, W.; Liang, X.; Zhang, Y. Improving tiny vehicle detection in complex scenes. In Proceedings of the 2018
IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6.
81. Nguyen, H. Improving faster R-CNN framework for fast vehicle detection. Math. Probl. Eng. 2019, 2019, 3808064. [CrossRef]
82. Dai, X. HybridNet: A fast vehicle detection system for autonomous driving. Signal Process. Image Commun. 2019, 70, 79–88.
[CrossRef]
83. Nguyen, H. Multiscale Feature Learning Based on Enhanced Feature Pyramid for Vehicle Detection. Complexity 2021,
2021, 5555121. [CrossRef]
84. Fan, Q.; Brown, L.; Smith, J. A closer look at Faster R-CNN for vehicle detection. In Proceedings of the 2016 IEEE intelligent
vehicles symposium (IV), Gotenburg, Sweden, 19–22 June 2016; pp. 124–129.
85. Liu, P.; Zhang, G.; Wang, B.; Xu, H.; Liang, X.; Jiang, Y.; Li, Z. Loss function discovery for object detection via convergence-
simulation driven search. arXiv 2021, arXiv:2102.04700.
86. Muthukumar, V.; Narang, A.; Subramanian, V.; Belkin, M.; Hsu, D.; Sahai, A. Classification vs regression in overparameterized
regimes: Does the loss function matter? J. Mach. Learn. Res. 2021, 22, 1–69.
87. Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; Jiang, Y. Acquisition of localization confidence for accurate object detection. In Proceedings of
the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–799.
88. Sun, R. Optimization for deep learning: Theory and algorithms. arXiv 2019, arXiv:1912.08957.
89. Li, P. Optimization Algorithms for Deep Learning; Department of Systems Engineering and Engineering Management, The Chinese
University of Hong Kong: Hong Kong, 2017.
90. Soydaner, D. A comparison of optimization algorithms for deep learning. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2052013.
[CrossRef]
Sensors 2023, 23, 4832 33 of 35
91. Darken, C.; Chang, J.; Moody, J. Learning rate schedules for faster stochastic gradient search. In Proceedings of the Neural
Networks for Signal Processing, Citeseer, 1992; Volume 2. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1
&type=pdf&doi=9db554243d7588589569aea127d676c9644d069a (accessed on 25 April 2023).
92. Nesterov, Y. A method for unconstrained convex minimization problem with the rate of convergence O (1/kˆ 2). Doklady an Ussr
1983, 269, 543–547.
93. Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res.
2011, 12, 2121–2159.
94. Zeiler, M.D. Adadelta: An adaptive learning rate method. arXiv 2012, arXiv:1212.5701.
95. Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA
Neural Netw. Mach. Learn. 2012, 4, 26–31.
96. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
97. Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Mao, M.; Ranzato, M.; Senior, A.; Tucker, P.; Yang, K.; et al. Large scale
distributed deep networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper_files/
paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf (accessed on 25 April 2023).
98. Mukkamala, M.C.; Hein, M. Variants of rmsprop and adagrad with logarithmic regret bounds. In Proceedings of the International
Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 2545–2553.
99. Zaheer, R.; Shaziya, H. A study of the optimization algorithms in deep learning. In Proceedings of the 2019 Third International
Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 10–11 January 2019; pp. 536–539.
100. Javaid, S.; Wu, Z.; Fahim, H.; Mabrouk, I.B.; Al-Hasan, M.; Rasheed, M.B. Feedforward Neural Network-Based Data Aggregation
Scheme for Intrabody Area Nanonetworks. IEEE Syst. J. 2022, 16, 1796–1807. [CrossRef]
101. Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055.
102. Viola, P.; Jones, M. Rapid Object Detection using a Boosted Cascade of Simple. In Proceedings of the 2001 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, CVPR, Kauai, HI, USA, 8–14 December 2001.
103. Haselhoff, A.; Kummert, A. A vehicle detection system based on haar and triangle features. In Proceedings of the 2009 IEEE
Intelligent Vehicles Symposium, Xi’an, China, 3-5 June 2009; pp. 261–266.
104. Kim, K.J.; Kim, P.K.; Chung, Y.S.; Choi, D.H. Multi-scale detector for accurate vehicle detection in traffic surveillance data. IEEE
Access 2019, 7, 78311–78319. [CrossRef]
105. Chen, W.; Qiao, Y.; Li, Y. Inception-SSD: An improved single shot detector for vehicle detection. J. Ambient. Intell. Humaniz.
Comput. 2020, 13, 5047–5053. [CrossRef]
106. Zhao, M.; Zhong, Y.; Sun, D.; Chen, Y. Accurate and efficient vehicle detection framework based on SSD algorithm. IET Image
Process. 2021, 15, 3094–3104. [CrossRef]
107. Zhang, L.; Wang, H.; Wang, X.; Chen, S.; Wang, H.; Zheng, K. Vehicle object detection based on improved retinanet. In
Proceedings of the Journal of Physics: Conference Series, Nanchang, China, 26–28 October 2021; IOP Publishing: Bristol, UK,
2021; Volume 1757, p. 012070.
108. Wang, X.; Cheng, P.; Liu, X.; Uzochukwu, B. Focal loss dense detector for vehicle surveillance. In Proceedings of the 2018
International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 2–4 April 2018; pp. 1–5.
109. Luo, J.q.; Fang, H.s.; Shao, F.m.; Zhong, Y.; Hua, X. Multi-scale traffic vehicle detection based on faster R–CNN with NAS
optimization and feature enrichment. Def. Technol. 2021, 17, 1542–1554. [CrossRef]
110. Arora, N.; Kumar, Y.; Karkra, R.; Kumar, M. Automatic vehicle detection system in different environment conditions using fast
R-CNN. Multimed. Tools Appl. 2022, 81, 18715–18735. [CrossRef]
111. Charouh, Z.; Ezzouhri, A.; Ghogho, M.; Guennoun, Z. A resource-efficient CNN-based method for moving vehicle detection.
Sensors 2022, 22, 1193. [CrossRef] [PubMed]
112. Rajput, S.K.; Patni, J.C.; Alshamrani, S.S.; Chaudhari, V.; Dumka, A.; Singh, R.; Rashid, M.; Gehlot, A.; AlGhamdi, A.S. Automatic
Vehicle Identification and Classification Model Using the YOLOv3 Algorithm for a Toll Management System. Sustainability 2022,
14, 9163. [CrossRef]
113. Amrouche, A.; Bentrcia, Y.; Abed, A.; Hezil, N. Vehicle Detection and Tracking in Real-time using YOLOv4-tiny. In Proceedings
of the 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria,
8–9 May 2022; pp. 1–5.
114. Wang, Q.; Xu, N.; Huang, B.; Wang, G. Part-Aware Refinement Network for Occlusion Vehicle Detection. Electronics 2022, 11, 1375.
[CrossRef]
115. Farid, A.; Hussain, F.; Khan, K.; Shahzad, M.; Khan, U.; Mahmood, Z. A Fast and Accurate Real-Time Vehicle Detection Method
Using Deep Learning for Unconstrained Environments. Appl. Sci. 2023, 13, 3059. [CrossRef]
116. Huang, F.; Chen, S.; Wang, Q.; Chen, Y.; Zhang, D. Using deep learning in an embedded system for real-time target detection
based on images from an unmanned aerial vehicle: Vehicle detection as a case study. Int. J. Digit. Earth 2023, 16, 910–936.
[CrossRef]
117. Qiu, Z.; Bai, H.; Chen, T. Special Vehicle Detection from UAV Perspective via YOLO-GNS Based Deep Learning Network. Drones
2023, 7, 117. [CrossRef]
118. Zhang, Y.; Sun, Y.; Wang, Z.; Jiang, Y. YOLOv7-RAR for Urban Vehicle Detection. Sensors 2023, 23, 1801. [CrossRef]
Sensors 2023, 23, 4832 34 of 35
119. Mittal, U.; Chawla, P.; Tiwari, R. EnsembleNet: A hybrid approach for vehicle detection and estimation of traffic density based on
faster R-CNN and YOLO models. Neural Comput. Appl. 2023, 35, 4755–4774. [CrossRef]
120. Gupte, S.; Masoud, O.; Martin, R.F.; Papanikolopoulos, N.P. Detection and classification of vehicles. IEEE Trans. Intell. Transp.
Syst. 2002, 3, 37–47. [CrossRef]
121. Petrovic, V.S.; Cootes, T.F. Analysis of Features for Rigid Structure Vehicle Type Recognition. In Proceedings of the BMVC,
Kingston, UK, 7–9 September 2004; Kingston University: London, UK, 2004; Volume 2, pp. 587–596.
122. Psyllos, A.; Anagnostopoulos, C.N.; Kayafas, E. Vehicle model recognition from frontal view image measurements. Comput.
Stand. Interfaces 2011, 33, 142–151. [CrossRef]
123. Peng, Y.; Jin, J.S.; Luo, S.; Xu, M.; Au, S.; Zhang, Z.; Cui, Y. Vehicle type classification using data mining techniques. In The Era of
Interactive Media; Springer: Berlin/Heidelberg, Germany, 2013; pp. 325–335.
124. Dong, Z.; Wu, Y.; Pei, M.; Jia, Y. Vehicle type classification using a semisupervised convolutional neural network. IEEE Trans.
Intell. Transp. Syst. 2015, 16, 2247–2256. [CrossRef]
125. Awang, S.; Azmi, N.M.A.N.; Rahman, M.A. Vehicle type classification using an enhanced sparse-filtered convolutional neural
network with layer-skipping strategy. IEEE Access 2020, 8, 14265–14277. [CrossRef]
126. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al.
Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [CrossRef]
127. Maungmai, W.; Nuthong, C. Vehicle classification with deep learning. In Proceedings of the 2019 IEEE 4th International
Conference on Computer and Communication Systems (ICCCS), Singapore, 23–25 February 2019; pp. 294–298.
128. Wang, K.C.; Pranata, Y.D.; Wang, J.C. Automatic vehicle classification using center strengthened convolutional neural network.
In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA
ASC), Kuala, Malaysia, 12–15 December 2017; pp. 1075–1078.
129. Fahim, H.; Javaid, S.; Li, W.; Mabrouk, I.B.; Hasan, M.A.; Rasheed, M.B.B. An Efficient Routing Scheme for Intrabody Nanonet-
works Using Artificial Bee Colony Algorithm. IEEE Access 2020, 8, 98946–98957. [CrossRef]
130. Jahan, N.; Islam, S.; Foysal, M.F.A. Real-Time Vehicle Classification Using CNN. In Proceedings of the 2020 11th International
Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–6.
131. Taek Lee, J.; Chung, Y. Deep learning-based vehicle classification using an ensemble of local expert and global networks. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July
2017; pp. 47–52.
132. Liu, W.; Zhang, M.; Luo, Z.; Cai, Y. An ensemble deep learning method for vehicle type classification on visual traffic surveillance
sensors. IEEE Access 2017, 5, 24417–24425. [CrossRef]
133. Jagannathan, P.; Rajkumar, S.; Frnda, J.; Divakarachari, P.B.; Subramani, P. Moving vehicle detection and classification using
gaussian mixture model and ensemble deep learning technique. Wirel. Commun. Mob. Comput. 2021, 2021, 5590894. [CrossRef]
134. Chen, W.; Chen, X.; Zhang, J.; Huang, K. A multi-task deep network for person re-identification. In Proceedings of the AAAI
Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31.
135. Liu, A.A.; Su, Y.T.; Nie, W.Z.; Kankanhalli, M. Hierarchical clustering multi-task learning for joint human action grouping and
recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 102–114. [CrossRef]
136. Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection.
In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer:
Berlin/Heidelberg, Germany, 2016; pp. 354–370.
137. Kanacı, A.; Li, M.; Gong, S.; Rajamanoharan, G. Multi-task mutual learning for vehicle re-identification. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019.
138. Phillips, J.; Martinez, J.; Bârsan, I.A.; Casas, S.; Sadat, A.; Urtasun, R. Deep multi-task learning for joint localization, perception,
and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June
2021; pp. 4679–4689.
139. Sang, J.; Wu, Z.; Guo, P.; Hu, H.; Xiang, H.; Zhang, Q.; Cai, B. An improved YOLOv2 for vehicle detection. Sensors 2018, 18, 4272.
[CrossRef] [PubMed]
140. Mansour, A.; Hassan, A.; Hussein, W.M.; Said, E. Automated vehicle detection in satellite images using deep learning. In
Proceedings of the International Conference on Aerospace Sciences and Aviation Technology, Cairo, Egypt, 9–11 April 2019; The
Military Technical College: Cairo, Egypt, 2019; Volume 18, pp. 1–8.
141. Sowmya, V.; Radha, R. Heavy-Vehicle Detection Based on YOLOv4 featuring Data Augmentation and Transfer-Learning
Techniques. In Proceedings of the Journal of Physics: Conference Series, Nanchang, China, 26–28 October 2021; IOP Publishing:
Bristol, UK, 2021; Volume 1911, p. 012029.
142. Wang, L.; Lu, Y.; Wang, H.; Zheng, Y.; Ye, H.; Xue, X. Evolving boxes for fast vehicle detection. In Proceedings of the 2017 IEEE
international conference on multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1135–1140.
Sensors 2023, 23, 4832 35 of 35
143. Kim, K.J.; Kim, P.K.; Chung, Y.S.; Choi, D.H. Performance enhancement of yolov3 by adding prediction layers with spatial
pyramid pooling for vehicle detection. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and
Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6.
144. Wang, X.; Wang, S.; Cao, J.; Wang, Y. Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-net. IEEE
Access 2020, 8, 110227–110236. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.