Intelligent Traffic-Monitoring System Based On YOLO and Convolutional Fuzzy Neural Networks
Intelligent Traffic-Monitoring System Based On YOLO and Convolutional Fuzzy Neural Networks
Intelligent Traffic-Monitoring System Based On YOLO and Convolutional Fuzzy Neural Networks
ABSTRACT With the rapid pace of urbanization, the number of vehicles traveling between cities has
increased significantly. Consequently, many traffic-related problems have emerged, such as traffic jams
and excessive numbers and types of vehicles. To solve traffic problems, road data collection is important.
Therefore, in this paper, we develop an intelligent traffic-monitoring system based on you only look
once (YOLO) and a convolutional fuzzy neural network (CFNN), which record traffic volume, and vehicle
type information from the road. In this system, YOLO is first used to detect vehicles and is combined
with a vehicle-counting method to calculate traffic flow. Then, two effective models (CFNN and Vector-
CFNN) and a network mapping fusion method are proposed for vehicle classification. In our experiments,
the proposed method achieved an accuracy of 90.45% on the Beijing Institute of Technology public dataset.
On the GRAM-RTM data set, the mean average precision and F-measure (F1) of the proposed YOLO-CFNN
and YOLO-VCFNN vehicle classification methods are 99%, superior to those of other methods. On actual
roads in Taiwan, the proposed YOLO-CFNN and YOLO-VCFNN methods not only have a high F1 score
for vehicle classification but also have outstanding accuracy in vehicle counting. In addition, the proposed
system can maintain a detection speed of more than 30 frames per second in the AGX embedded platform.
Therefore, the proposed intelligent traffic monitoring system is suitable for real-time vehicle classification
and counting in the actual environment.
INDEX TERMS Traffic-monitoring system, fuzzy neural network, vehicle classification, feature fusion,
deep learning.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
14120 VOLUME 10, 2022
C.-J. Lin, J.-Y. Jhang: Intelligent Traffic-Monitoring System Based on YOLO and CFNNs
into AdaBoost to filter important features. Then, the filtered AlexNet to improve the traditional CNN by deepening the
features were input into a support vector machine (SVM) model architecture and using the ReLU excitation func-
for classification to improve its recognition accuracy. tion and the dropout layer to increase the effectiveness
Sun et al. [3] and David and Athira [4] used Garbor filters of the network during learning and prevent overfitting.
to obtain vehicle characteristics and then input them into an Szegedy et al. [16] proposed GoogLeNet, which uses mul-
SVM to determine whether a vehicle is present in an image. tiple filters of different sizes to extract features that enrich
Wei et al. [5] designed a two-step vehicle-detection method. feature information. Simonyan and Zisserman [17] proposed
First, they used Haar-like features and AdaBoost to obtain two models, namely VGG-16 and VGG-19. They replaced
the region of interest with vehicles and subsequently used the the large convolution kernel by successively using multiple
histogram of oriented gradients (HOG) [6] and an SVM to small convolution kernels to perform operations and proved
reverify the region. According to their experimental results, that increasing the depth of a model can improve its accuracy.
their method exhibited improved vehicle-detection capability. He et al. [18] proposed the ResNet model. They used resid-
Yan et al. [7] designed a vehicle-detection system that used ual blocks to solve the problem of gradient disappearance
vehicle shadows to select the boundaries of vehicles and the and convergence inability due to excessive network depth.
HOG to extract features. These features were then input into Howard et al. [19] proposed MoblieNet, which uses deep
an AdaBoost classifier and SVM classifier for verification. separation convolution to extract fewer and more useful fea-
In this method, when vehicles block each other, they are tures and reduces the number of redundant parameters in
regarded as one vehicle because the shadows are connected a CNN model.
to each other, which weakens the detection effect. The aforementioned studies have focused on improving
In terms of dynamics, Seenouvong et al. [8] pro- the feature description capabilities of a CNN to extend the
posed a vehicle-detection and counting system based on application of CNNs to more complex problems, such as
dynamic features. Background subtraction was used to obtain object detection. Several researchers [20]–[24] have used
a difference map from a given current image to achieve seg- region-based CNN (R-CNN) series models to solve the
mentation of the corresponding foreground image. In addi- vehicle-detection problem. R-CNN uses the region proposal
tion, various morphological operations were used to obtain network (RPN) [25] to extract the position of an object and
the outline and bounding box of a moving object, detect then classifies it by using a traditional CNN. RetinaNet [26]
moving vehicles, and count the vehicles passing through is the latest network architecture of R-CNN models. The
a designated area. A few researchers have used Gaussian R-CNN framework comprises a two-stage mechanism and
mixture models (GMMs) [9], [10] to model the background uses a multilayer neural network for classification [27], [28].
or adaptive background [11]–[13] with the aim of solving This architecture substantially increases the number of
the problem of background subtraction due to background parameters used and decreases the execution speed; thus, it is
images. Poor foreground segmentation is caused by grad- unsuitable for real-time detection. To solve this problem, one-
ual changes in brightness. The aforementioned static and stage mechanism methods have been proposed for vehicle
dynamic methods have many limitations in overcoming this detection, such as the you-only-look-once (YOLO) frame-
problem. For example, traditional feature extraction meth- work model [29]–[31] and the single-shot multibox detector
ods must be manually designed by experts on the basis of (SSD) [32] framework model. One-stage methods are fast
their experience, meaning that the process is complicated. and can detect objects in real time, but their classification
Moreover, the extracted features are mostly pieces of shallow accuracy is lower than that of R-CNN methods [33], [34].
vertical and horizontal information, which cannot effectively The aforementioned object-detection methods have the
describe the changes in vehicle features and cannot be widely following problems: 1) Two-stage object-detection methods
used. The dynamic feature method increases the complex- have high classification accuracy, but the large of network
ity of subsequent image processing operations in cases of parameters decrease the detection speed. 2) One-stage object-
extensive background changes in addition to yielding poor detection methods have a high real-time detection speed
detection results. With recent advancements in deep learning, but lower accuracy than two-stage object-detection methods.
these conventional methods have gradually been replaced by 3) To increase the number of object categories, the entire
deep learning techniques. network must be retrained, which is time-consuming and
reduces the scalability of the method.
II. LITERATURE REVIEW Recently, fuzzy neural networks (FNNs) [35]–[39] that
In recent years, deep learning has been widely used in many have a human-like fuzzy inference mechanism and the pow-
fields, and good prediction results have been obtained with erful learning functions of neural networks have been widely
this method. Compared with traditional methods that require used in various fields, such as classification, control, and fore-
artificial feature determination, the convolutional neural net- casting. Asim et al. [35] applied an adaptive network-based
work (CNN) method greatly improves the accuracy of image fuzzy inference system to classification problems. Compared
recognition. Initially, Lecun et al. [14] proposed the LeNet with traditional neural networks, this method yielded higher
model to solve the problem of recognizing handwritten digits classification accuracy. Lin et al. [36] used an interval type-2
in the banking industry. Krizhevsky et al. [15] proposed FNN and tool chips to predict flank wear, and their method
yielded superior prediction results. A few researchers have • Category extensions (e.g., vehicle type) only require
used a locally recurrent functional link fuzzy neural net- training of the classification model (CFNN) with-
work [37] and Takagi–Sugeno–Kang-type FNNs [38], [39] out retraining of the object detection model (YOLO).
to solve system identification and prediction problems, and This not only saves substantial training time but also
both methods have yielded good results. In this study, an FNN improves the flexibility of category extension.
was embedded into a deep learning network to reduce the • The proposed intelligent traffic monitoring system was
number of parameters used in the network and obtain superior implemented on the NVIDIA AGX Xavier embedded
classification results. Conventional CNNs use pooling, global platform and applied to provincial highway 1 (T362)
pooling [40], and channel pooling [41] methods for feature in Kaohsiung, Taiwan for real-time vehicle tracking,
fusion. Global pooling methods sum the spatial information counting, and classification.
and perform operations on each feature map to achieve fea- The remainder of this paper is organized as follows:
ture fusion and can be divided into global average pooling Section 3 introduces the proposed YOLO-CFNN method
(GAP) [42] and global max pooling (GMP) [43]. Thus, global for intelligent traffic monitoring. The experimental results
pooling methods are more robust to spatial translations of of the proposed method are described in Section 4.
the input and prevent overfitting. Channel pooling methods Section 5 presents our conclusions and an outline of future
include channel average pooling (CAP) [44] and channel work.
max pooling (CMP) [45], which perform feature fusion by
computing average or maximum pixel values, respectively, III. PROPOSED YOLO-CFNN FOR INTELLIGENT TRAFFIC
at the same positions in each channel of feature maps. MONITORING
Furthermore, these methods only compress features and do In this section, an intelligent traffic-monitoring system is
not contain learnable weights, leading to poor classification introduced. The proposed system has three functions, namely
results. In this study, a new feature fusion method named net- (1) vehicle detection, (2) vehicle counting, and (3) vehicle
work mapping was proposed to enhance the utility of feature classification. The system architecture is illustrated in Fig. 1.
fusion and explore the effectiveness of different feature fusion
methods.
To design an intelligent traffic-monitoring system with
fast execution speed, high classification accuracy, and high
category extensibility, a two-stage object-detection method
was adopted in this study. The proposed intelligent traffic-
monitoring system based on YOLO and a convolutional
FNN (CFNN) collects real-time information on traffic vol-
ume and vehicle type on the road. In this system, a novel
modified YOLOv4-tiny (mYOLOv4-tiny) is first used to
FIGURE 1. Three functions of the proposed intelligent traffic-monitoring
detect vehicles and is then combined with a vehicle count- system.
ing method to calculate the traffic flow. Furthermore,
two effective models (CFNN and Vector-CFNN) and a A flowchart of the proposed intelligent traffic-monitoring
network mapping fusion method that improve the com- system is presented in Fig. 2. First, real-time road images
putational efficiency, classification accuracy, and category are obtained from traffic cameras. Then, the proposed
extensibility were proposed for vehicle classification. The mYOLOv4-tiny model is used to detect the position of a
proposed model architecture has fewer network parameters vehicle. To solve the problem of the repeated recording
compared to other models; therefore, the system can achieve of the same car as different vehicles in different frames,
real-time, high-accuracy vehicle classification with limited
hardware resources and flexible extensibility for different
categories.
The contributions of this study can be summarized as
follows:
• An intelligent traffic-monitoring system was developed
to record real-time information about traffic volume, and
vehicle types.
• An mYOLOv4-tiny model was proposed to achieve
real-time object detection and improve detection
efficiency.
• Two effective models (CFNN and Vector-CFNN) that
adopt a new network mapping fusion method were
implemented to increase the classification accuracy and FIGURE 2. Flowchart of the proposed intelligent traffic-monitoring
greatly reduce the number of model parameters. system.
in Fig. 5. In the CFNN model in Fig. 5(a), at the outset, the TABLE 1. Different fusion methods.
convolutional layer is used to extract features from the image,
and the maximum pooling layer is then used to compress
these features to reduce the amount of calculation. The inter-
active stacking method is used to increase the model depth to
complete various shape feature combinations, and a feature
fusion layer is added to reduce the dimensionality of the fea-
ture size and integrate information. Finally, the fused feature
information is sent to the FNN for classification to obtain the
classification result of vehicle type. To solve the problem of
multiple redundant parameters in the traditional CNN model,
this study proposes a Vector-CFNN model (i.e., Fig. 5(b)).
The architecture of this model is similar to that of CFNN, and
the traditional convolutional layer is replaced with a two-layer
vector kernel convolutional layer [47] to further reduce the
number of parameters and computational complexity of the
model. Fig. 6, and the calculation formula is as follows:
Xn
fz = wzi ∗ xi . (1)
i=1
where fz is the output of the zth fusion, n is the total number
of input features, xi is the ith input feature element, and wzi is
the ith input weight used in the zth fusion result.
FIGURE 5. Schematic of the proposed network architecture: (a) CFNN FIGURE 6. Schematic of the network mapping fusion method.
model and (b) Vector-CFNN model.
2) FNNS
Next, the feature fusion layer and FNN classifier are FNNs mimic human logical thinking and learning abilities.
explained in the proposed models. In terms of network design, an FNN can be divided into the
input layer, fuzzification layer, rule layer, and defuzzification
1) FEATURE FUSION LAYER layer. The fuzzy set is contained in the fuzzy layer, and its
In the feature fusion layer, different fusion methods can be members can have different degrees of membership on the
used to integrate different types of feature information to interval is [0, 1]; this is known as a membership function.
obtain more useful features. Given a large number of input The fuzzy membership function converts input data to a value
features, a suitable fusion method is selected to compress in [0, 1] based on the degree of membership of a specified set,
the features and reduce the dimensionality of the information providing a measure of the degree of similarity of an element
between them. For method selection, the features are fused of a fuzzy set. Common fuzzy membership functions include
using either pooling operations or network mapping. Based triangular, trapezoidal, bell-shaped, and Gaussian; among
on the different operation rules between features, different these, the Gaussian membership function has the highest
fusion results can be obtained, as summarized in Table 1. accuracy [48]. Therefore, the Gaussian function is adopted as
In this study, a network mapping fusion method is pro- the membership function in the proposed CFNN. The feature
posed. This method assigns a weight to the information of vectors extracted by convolution operation are classified by
each extracted feature and then integrates these weights to a FNN. The If–then can be used to represent the fuzzy rules
obtain new features. The calculation method is shown in to make fuzzy inferences (Fig. 7).
A. EXPERIMENTAL DESIGN
To evaluate the output results of the model, this study used the
category with the highest model output value (top-1) as the
classification result and accuracy as the evaluation indicator.
The calculation formula is as follows:
(TP + TN )
accuracy = (2)
(TP + FP + TN + FN )
where TP, FP, TN, and FN denote true positive, false positive,
true negative, and false negative, respectively. The mean aver-
age precision (mAP), precision, recall, F-measure (F1), and
detection speed (FPS) were also adopted to verify the effec- and Keras were used as the deep learning environment and
tiveness of various object detection models. The evaluation developmental tool, respectively, and an RTX2080Ti graphics
indicators can be calculated as follows: card was used to train the network model. The parameter
Pk=n settings of the proposed CFNN and Vector-CFNN models are
k=1 APk summarized in Tables 2 and 3, respectively.
mAP = (3)
n In the CFNN model, the input image size is set to 224 ×
TP
precision = (4) 224 × 3, and four sets of convolutional layers and pooling
TP + FP layers are used to achieve feature extraction. In each convo-
TP
recall = (5) lutional layer uses a 3 × 3 (see Table 1), 3 × 1, or 1 × 3
TP + FN (see Table 2) convolution kernel to extract features. Each
2precision × recall feature is compressed through the largest pooling layer of
F1 = (6)
precision + recall size 2 × 2 to reduce the computational load. In the convo-
frame lutional layer, 32, 64, and 128 are used as the number of
FPS = (7)
second convolution kernels in the first three layers to extract various
Here, n indicates the number of classes, and APk denotes the shape feature combinations. Then, the number of convolution
(AP) of class k. In the experimental environment, TensorFlow kernels in the last layer is set to 64, and the feature fusion layer
TABLE 4. Number of each vehicle type. TABLE 5. Experimental results of CFNN and Vector-CFNN models with
various feature fusion methods.
segmented (Fig. 8), and the vehicle types and numbers after
segmentation are listed in Table 4.
In the training and testing of the model, according to
the processing method described in [49], 200 vehicles were
randomly selected from each category to be the training and
testing data. In total, 2400 images each were used as the
training and test datasets for the experiment. Ten experiments
were performed using these data sets and the average of the
values obtained in these experiments was used for evalua- two CFNN models is higher than that of other deep learning
tion. This study used different fusion methods to evaluate classification methods. The accuracy of the two CFNN and
the performance of the proposed CFNN and Vector-CFNN Vector-CFNN models is 0.89% and 1.93% higher, respec-
models. The experimental results are listed in Table 5. The tively, than that of PCN-Net, and 51.7% and 57.1% fewer
accuracies of the CFNN and Vector-CFNN models reached parameters are used in the CFNN and Vector-CFNN models,
90.20% and 90.45%, respectively, with the network mapping respectively, than in PCN-Net.
fusion method. Compared with the global pooling and chan-
nel pooling methods, the proposed network mapping fusion C. VEHICLE CLASSIFICATION RESULTS ON THE
method has higher accuracy. GRAM-RTM DATA SET
Moreover, the two proposed models were compared with The GRAM-RTM (M-30) data set [51] was used to compare
other common models, namely AlexNet, GoogLeNet, VGG- the performance of the proposed YOLO-CFNN and state-of-
16, VGG-19, ResNet50, Sparse Laplacian CNN [49], and the-art object detection methods, including RetinaNet, SSD,
PCN-Net [50]. The experimental comparison results are sum- YOLOv4, and YOLOv4 tiny. The M30 contains 7520 frames
marized in Table 6. According to the table, the accuracy of the with a resolution of 800 × 480 at 30 fps recorded using a
TABLE 8. Vehicle classification results for the proposed YOLO-CFNN with various object detection methods.
TABLE 11. Vehicle type classification results obtained using CFNN and
Vector-CFNN with different fusion methods.
FIGURE 14. Precision vs. recall curves of the various detection methods by using actual road traffic videos at 7:00.
FIGURE 15. Precision versus recall curves of the various detection methods by using actual road traffic videos at 17:00.
traffic scene was used for verification. Three actual road Still images from the three videos are displayed
traffic videos were used to evaluate the proposed vehicle in Fig. 11.
counting method. Each video was 5 min long, and the In the evaluation, the proposed vehicle flow counting result
two selected videos were recorded at 07:00 and 17:00. was divided by the manual counting result to determine the
The remaining videos were taken in rainy conditions. accuracy of vehicle counting. In addition, different occlusion
FIGURE 16. Precision versus recall curves of the various detection methods using actual road traffic videos for rainy conditions.
TABLE 13. Traffic flow counting results obtained using actual road traffic videos at 7:00.
conditions were included in the real road scene as presented were 88.82% and 85.55%, respectively. However, the mAP
in Fig. 12. As shown in Fig. 12, a larger bus blocks a car, for trailers was only 64.44%. Although YOLOv4-tiny has a
resulting in a missed count. The visual vehicle detection and detection speed of 145 FPS, the motorcycle detection per-
counting are shown in Fig. 13. The text in the first half of formance was poor (65.49%). The proposed YOLO-CFNN
the green label in Fig. 13 represents the type of vehicle and and YOLO-VCFNN are superior to other methods in terms
the text in the second half represents the number of counts. of F1 score (99%). After introducing the counting method
When a vehicle enters the virtual detection zone, the proposed into CFNN and VCFNN, FPS can be maintained above 30 to
intelligent traffic-monitoring system immediately performs achieve real-time detection. The two proposed methods also
vehicle classification and counting. had an accuracy of 97.05% in traffic flow vehicle counting.
The traffic flow counting results of each video are summa- For the afternoon road traffic video (Table 14), the mAP
rized in Tables 13–15. The precision versus recall curves of and F1 of YOLO-CFNN and YOLO-VCFNN were higher
the proposed YOLO-CFNN and YOLO-VCFNN models are than those of other methods. The accuracy of flow count-
shown in Figs. 14–16. As shown in Table 13, the mAP of Reti- ing was 98.5%. For the rain video (Table 15), except for
naNet and SSD was 94%, but the F1 scores were only 76.06% the SSD method, the mAP of the motorcycle detection was
and 86.28%, respectively. The mAP and F1 score of YOLOv4 lower because images captured in rainy conditions are blurry,
TABLE 14. Traffic flow counting results obtained using actual road traffic videos at 17:00.
TABLE 15. Traffic flow counting results obtained using actual road traffic videos for rainy conditions.
affecting the judgment results. However, the mAP and F1 of • Compared with the current state-of-the-art object detec-
the two proposed methods were higher than 90%, and the tion methods (Retinanet, SSD, YOLOv4, and YOLOv4
counting accuracy was 100%. These scenarios reveal that the tiny), the proposed YOLO-CFNN and YOLO-VCFNN
proposed intelligent traffic-monitoring system is suitable for have a high mAP rate, accurate counting accuracy,
real-time vehicle counting in actual environments and has a and real-time vehicle counting and classification ability
high counting accuracy. (over 30FPS).
The experimental results indicated that the performance
of the proposed CFNN and Vector-CFNN models was supe-
V. CONCLUSION
rior to that of common deep learning models. On the BIT
In this study, an intelligent traffic-monitoring system was
dataset, compared with the pooling method, the proposed
proposed to calculate traffic flows and classify vehicle types.
network mapping fusion method improved the recognition
The major contributions of this study are as follows:
accuracy by 3.59%–5.92%. In addition, compared with the
• A novel intelligent traffic-monitoring system combin- PCN-Net model, the proposed CFNN and Vector-CFNN
ing a YOLOv4-tiny model and counting method was models improved the accuracy by 1.93% and reduced the
proposed for traffic volume statistics and vehicle type number of parameters by 57.1%. On the GRAM-RTM data
classification. set, the mAP and F1 of the two proposed vehicle classifi-
• The proposed CFNN and Vector-CFNN were designed cation methods were 99%, higher than those of other meth-
by introducing the fusion method and FNN, which ods. In addition, among the FPS indicators, the proposed
can not only effectively reduce the number of network method was 1.65 times faster than the traditional YOLOv4.
parameters, but also enhance the classification accuracy. On the T362 vehicle type dataset, compared with the gen-
• The proposed network mapping fusion method was eral pooling methods, the accuracy of the proposed network
superior to the commonly used pooling method, and it mapping fusion method was 2.3%–5.36% higher. In addi-
could effectively integrate image features and improve tion, compared with the AlexNet model, the accuracy of the
the classification accuracy. proposed CFNN and Vector-CFNN models was 1.19% and
1.83% higher, respectively, and the number of parameters [16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
decreased by 98.8%. In three actual road traffic scenarios, V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015,
the proposed YOLO-CFNN and YOLO-VCFNN methods pp. 1–9.
yielded a high F1 score for vehicle classification and high [17] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
accuracy for vehicle counting. In summary, the CFNN and large-scale image recognition,’’ 2014, arXiv:1409.1556.
[18] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
Vector-CFNN models proposed in this study not only have recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
favorable vehicle classification effects but also have fewer Jun. 2016, pp. 770–778.
parameters relative to other models. Therefore, the proposed [19] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
models are suitable for information analysis in environments M. Andreetto, and H. Adam, ‘‘MobileNets: Efficient convolutional neural
networks for mobile vision applications,’’ 2017, arXiv:1704.04861.
with limited hardware performance. [20] K. Shi, H. Bao, and N. Ma, ‘‘Forward vehicle detection based on incre-
In terms of the extensibility of the proposed models, many mental learning and fast R-CNN,’’ in Proc. 13th Int. Conf. Comput. Intell.
factors that affect the machining accuracy of machine tools Secur. (CIS), Dec. 2017, pp. 73–76.
[21] S.-C. Hsu, C.-L. Huang, and C.-H. Chuang, ‘‘Vehicle detection using sim-
in intelligent manufacturing have been identified, such as plified fast R-CNN,’’ in Proc. Int. Workshop Adv. Image Technol. (IWAIT),
temperature and tool wear. Therefore, developing an accurate Jan. 2018, pp. 1–3.
model of the effects of these factors is crucial. In future stud- [22] S. Rujikietgumjorn and N. Watcharapinchai, ‘‘Vehicle detection with sub-
ies, the proposed CFNN and Vector-CFNN models and the class training using R-CNN for the UA-DETRAC benchmark,’’ in Proc.
14th IEEE Int. Conf. Adv. Video Signal Based Surveill. (AVSS), Aug. 2017,
network mapping fusion method will be applied for modeling pp. 1–5.
in intelligent manufacturing. [23] W. Zhang, Y. Zheng, Q. Gao, and Z. Mi, ‘‘Part-aware region proposal for
vehicle detection in high occlusion environment,’’ IEEE Access, vol. 7,
pp. 100383–100393, 2019.
REFERENCES [24] L. Wang, Y. Lu, H. Wang, Y. Zheng, H. Ye, and X. Xue, ‘‘Evolving boxes
[1] A. Mohamed, A. Issam, B. Mohamed, and B. Abdellatif, ‘‘Real-time for fast vehicle detection,’’ in Proc. IEEE Int. Conf. Multimedia Expo.
detection of vehicles using the Haar-like features and artificial neuron (ICME), Jul. 2017, pp. 1135–1140.
networks,’’ Proc. Comput. Sci., vol. 73, pp. 24–31, Jan. 2015. [25] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-
[2] X. Wen, L. Shao, W. Fang, and Y. Xue, ‘‘Efficient feature selection and time object detection with region proposal networks,’’ IEEE Trans. Pattern
classification for vehicle detection,’’ IEEE Trans. Circuits Syst. Video Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
Technol., vol. 25, no. 3, pp. 508–517, Mar. 2015. [26] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, ‘‘Focal loss for dense
[3] Z. Sun, G. Bebis, and R. Miller, ‘‘On-road vehicle detection using Gabor object detection,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2,
filters and support vector machines,’’ in Proc. 14th Int. Conf. Digit. Signal pp. 318–327, Feb. 2020.
Process. (DSP), Jul. 2002, pp. 1019–1022. [27] N. A. Al-Sammarraie, Y. M. H. Al-Mayali, and Y. A. Baker El-Ebiary,
[4] H. David and T. A. Athira, ‘‘Improving the performance of vehicle detec- ‘‘Classification and diagnosis using back propagation artificial neural net-
tion and verification by log Gabor filter optimization,’’ in Proc. 4th Int. works (ANN),’’ in Proc. Int. Conf. Smart Comput. Electron. Enterprise
Conf. Adv. Comput. Commun., Aug. 2014, pp. 50–55. (ICSCEE), Jul. 2018, pp. 1–5.
[5] Y. Wei, Q. Tian, J. Guo, W. Huang, and J. Cao, ‘‘Multi-vehicle detection [28] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed,
algorithm through combining Harr and HOG features,’’ Math. Comput. and H. Arshad, ‘‘State-of-the-art in artificial neural network applications:
Simul., vol. 155, pp. 130–145, Jan. 2018. A survey,’’ Heliyon, vol. 4, no. 11, Nov. 2018, Art. no. e00938.
[6] S. Bougharriou, F. Hamdaoui, and A. Mtibaa, ‘‘Linear SVM classifier [29] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
based HOG car detection,’’ in Proc. 18th Int. Conf. Sci. Techn. Autom. 2018, arXiv:1804.02767.
Control Comput. Eng. (STA), Dec. 2017, pp. 241–245.
[30] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal
[7] G. Yan, M. Yu, Y. Yu, and L. Fan, ‘‘Real-time vehicle detection using
speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
histograms of oriented gradients and AdaBoost classification,’’ Optik,
[31] Z. Jiang, L. Zhao, S. Li, and Y. Jia, ‘‘Real-time object detection method
vol. 127, no. 19, pp. 7941–7951, 2016.
based on improved YOLOv4-tiny,’’ 2020, arXiv:2011.04244.
[8] N. Seenouvong, U. Watchareeruetai, C. Nuthong, K. Khongsomboon, and
N. Ohnishi, ‘‘A computer vision based vehicle detection and counting [32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg,
system,’’ in Proc. 8th Int. Conf. Knowl. Smart Technol. (KST), Feb. 2016, ‘‘SSD: Single shot multibox detector,’’ in Proc. Eur. Conf. Comput. Vis.,
pp. 224–227. Amsterdam, The Netherlands, Oct. 2016, pp. 21–37.
[9] P. K. Bhaskar and S.-P. Yong, ‘‘Image processing based vehicle detection [33] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,
and tracking method,’’ in Proc. Int. Conf. Comput. Inf. Sci. (ICCOINS), Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy, ‘‘Speed/Accuracy
Jun. 2014, pp. 1–5. trade-offs for modern convolutional object detectors,’’ in Proc. IEEE Conf.
[10] N. Seenouvong, U. Watchareeruetai, C. Nuthong, K. Khongsomboon, and Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3296–3297.
N. Ohnishi, ‘‘Vehicle detection and classification system based on virtual [34] P. Soviany and R. T. Ionescu, ‘‘Optimizing the trade-off between single-
detection zone,’’ in Proc. 13th Int. Joint Conf. Comput. Sci. Softw. Eng. stage and two-stage deep object detectors using image difficulty predic-
(JCSSE), Jul. 2016, pp. 1–5. tion,’’ in Proc. 20th Int. Symp. Symbolic Numeric Algorithms Sci. Comput.
[11] M. Anandhalli and V. P. Baligar, ‘‘Improvised approach using background (SYNASC), Sep. 2018, pp. 209–214.
subtraction for vehicle detection,’’ in Proc. IEEE Int. Advance Comput. [35] Y. Asim, B. Raza, A. K. Malik, A. R. Shahid, M. Faheem, and Y. J. Kumar,
Conf. (IACC), Jun. 2015, pp. 303–308. ‘‘A hybrid adaptive neuro-fuzzy inference system (ANFIS) approach for
[12] N. S. Sakpal and M. Sabnis, ‘‘Adaptive background subtraction in images,’’ professional bloggers classification,’’ in Proc. 22nd Int. Multitopic Conf.
in Proc. Int. Conf. Adv. Commun. Comput. Technol. (ICACCT), Feb. 2018, (INMIC), Nov. 2019, pp. 1–6.
pp. 439–444. [36] C.-J. Lin, J.-Y. Jhang, S.-H. Chen, and K.-Y. Young, ‘‘Using an interval
[13] N. Shah, A. Pingale, V. Patel, and N. V. George, ‘‘An adaptive background type-2 fuzzy neural network and tool chips for flank wear prediction,’’
subtraction scheme for video surveillance systems,’’ in Proc. IEEE Int. IEEE Access, vol. 8, pp. 122626–122640, 2020.
Symp. Signal Process. Inf. Technol. (ISSPIT), Dec. 2017, pp. 13–17. [37] D. K. Bebarta, R. Bisoi, and P. K. Dash, ‘‘Locally recurrent functional
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn- link fuzzy neural network and unscented H-infinity filter for shortterm
ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, prediction of load time series in energy markets,’’ in Proc. IEEE Power,
pp. 2278–2324, Nov. 1998. Commun. Inf. Technol. Conf. (PCITC), Oct. 2015, pp. 663–670.
[15] A. Krizhevsky, I. Sutskever, and G. Hinton, ‘‘ImageNet classification with [38] J.-W. Yeh and S.-F. Su, ‘‘Efficient approach for RLS type learning in TSK
deep convolutional neural networks,’’ in Proc. 25th Int. Conf. Neural Inf. neural fuzzy systems,’’ IEEE Trans. Cybern., vol. 47, no. 9, pp. 2343–2352,
Process. Syst. (NIPS), vol. 1, Dec. 2012, pp. 1097–1105. Sep. 2017.
[39] C.-J. Lin, C.-H. Lin, and J.-Y. Jhang, ‘‘Dynamic system identification and [51] R. Guerrero-Gómez-Olmedo, R. J. López-Sastre, S. Maldonado-Bascón,
prediction using a self-evolving Takagi–Sugeno–Kang-type fuzzy CMAC and A. Fernández-Caballero, ‘‘Vehicle tracking by simultaneous detec-
network,’’ Electronics, vol. 9, no. 4, p. 631, Apr. 2020. tion and viewpoint estimation,’’ in Natural and Artificial Computation in
[40] M. Lin, Q. Chen, and S. Yan, ‘‘Network in network,’’ 2013, Engineering and Medical Applications, J. M. F. Vicente, J. R. Sánchez,
arXiv:1312.4400. F. de la Paz López, F. J. T. Moreo, Eds. Berlin, Germany: Springer, 2013,
[41] Z. Ma, D. Chang, J. Xie, Y. Ding, S. Wen, X. Li, Z. Si, and J. Guo, ‘‘Fine- pp. 306–316.
grained vehicle classification with channel max pooling modified CNNs,’’
IEEE Trans. Veh. Technol., vol. 68, no. 4, pp. 3224–3233, Apr. 2019.
[42] V. Christlein, L. Spranger, M. Seuret, A. Nicolaou, P. Kral, and A. Maier, CHENG-JIAN LIN (Senior Member, IEEE)
‘‘Deep generalized max pooling,’’ in Proc. Int. Conf. Document Anal. received the B.S. degree in electrical engineer-
Recognit. (ICDAR), Sep. 2019, pp. 1090–1096. ing from the Tatung Institute of Technology,
[43] Z. Li, S.-H. Wang, R.-R. Fan, G. Cao, Y.-D. Zhang, and T. Guo, ‘‘Teeth Taipei, Taiwan, in 1986, and the M.S. and Ph.D.
category classification via seven-layer deep convolutional neural network degrees in electrical and control engineering from
with max pooling and global average pooling,’’ Int. J. Imag. Syst. Technol., the National Chiao Tung University, Taiwan, in
vol. 29, no. 4, pp. 577–583, May 2019. 1991 and 1996, respectively. Currently, he is a
[44] Z. Gao, Y. Li, Y. Yang, N. Dong, X. Yang, and C. Grebogi, Chair Professor with the Computer Science and
‘‘A coincidence-filtering-based approach for CNNs in EEG-based recog- Information Engineering Department, National
nition,’’ IEEE Trans. Ind. Informat., vol. 16, no. 11, pp. 7159–7167, Chin-Yi University of Technology, Taichung, Tai-
Nov. 2020.
wan, and the Dean of the Intelligence College, National Taichung University
[45] L. Cheng, D. Chang, J. Xie, R. Ma, C. Wu, and Z. Ma, ‘‘Channel max
of Science and Technology, Taichung. His current research interests include
pooling for image classification,’’ in Intelligence Science and Big Data
Engineering. Visual Data Engineering, Z. Cui, J. Pan, S. Zhang, L. Xiao, machine learning, pattern recognition, intelligent control, image processing,
and J. Yang, Eds. Cham, Switzerland: Cham, Switzerland: Springer, 2019, intelligent manufacturing, and evolutionary robot.
pp. 273–284.
[46] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, ‘‘Simple online JYUN-YU JHANG received the B.S. and M.S.
and realtime tracking,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), degrees from the Department of Computer Science
Sep. 2016, p. 346. and Information Engineering, National Chin-Yi
[47] J. Ou and Y. Li, ‘‘Vector-kernel convolutional neural networks,’’ Neuro- University of Technology, Taichung, Taiwan,
computing, vol. 330, pp. 253–258, Feb. 2019.
in 2015, and the Ph.D. degree in electrical and
[48] N. Talpur, M. N. M. Salleh, and K. Hussain, ‘‘An investigation of
control engineering from the National Yang Ming
membership functions on performance of ANFIS for solving classifica-
tion problems,’’ IOP Conf. Ser., Mater. Sci. Eng., vol. 226, Aug. 2017, Chiao Tung University, Taiwan, in 2021. He is
Art. no. 012103. an currently an Assistant Professor with the
[49] Z. Dong, Y. Wu, M. Pei, and Y. Jia, ‘‘Vehicle type classification using a Computer Science and Information Engineering
semisupervised convolutional neural network,’’ IEEE Trans. Intell. Transp. Department, National Taichung University of Sci-
Syst., vol. 16, no. 4, pp. 2247–2256, Aug. 2015. ence and Technology, Taichung. His current research interests include
[50] F. C. Soon, H. Y. Khaw, J. H. Chuah, and J. Kanesan, ‘‘Semisupervised fuzzy logic theory, type-2 neural fuzzy systems, evolutionary computation,
PCA convolutional network for vehicle type classification,’’ IEEE Trans. machine learning, and computer vision and application.
Veh. Technol., vol. 69, no. 8, pp. 8267–8277, Aug. 2020.