Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

2023-IEEE TITS-Analysis of Recent Deep Learning-Based Intrusion Detection Methods For In-Vehicle Network

Uploaded by

basir.keiston
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2023-IEEE TITS-Analysis of Recent Deep Learning-Based Intrusion Detection Methods For In-Vehicle Network

Uploaded by

basir.keiston
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO.

2, FEBRUARY 2023 1843

Analysis of Recent Deep-Learning-Based Intrusion


Detection Methods for In-Vehicle Network
Kai Wang , Aiheng Zhang , Haoran Sun, and Bailing Wang

Abstract— The development and popularity of vehicle-to- the ECUs handle the information transmission concerning all
everything communication have caused more risks to the in- aspects of the car behaviors. Specific CAN messages contain
vehicle networks security. As a result, an increasing number communication priority and various control field information,
of various and effective intrusion detection methods appear to
guarantee the security of in-vehicle networks, especially deep- but neither information about the sender or receiver ECUs
learning-based methods. Nevertheless, the state-of-the-art deep- nor message authentication mechanisms are embedded, which
learning-based intrusion detection methods lack a quantitative significantly degrades both security and safety [3].
and fair horizontal performance comparison analysis. Also, they To solve the security and safety problems, many recent stud-
have no comparative analysis of the detection capability for ies about intrusion detection on CAN bus have emerged [4].
the unknown attacks as well as on the time and hardware
resource consumption of their intelligent intrusion detection Some surveys for IVNs evaluated the comparative perfor-
models. Therefore, this paper investigates ten representative mance of previous intrusion detection methods, such as [5],
advanced deep-learning-based intrusion detection methods and [6] and [7], respectively based on traditional statistics, machine
illustrates the characteristics and advantages of each method. learning, and ensemble learning, but their performances are not
Moreover, quantitative and fair experiments are set to make good as deep-learning-based methods. For example, the tradi-
horizontal comparison analyses. Also, this study provides some
significant suggestions on baseline method selection and valuable tional anomaly detection methods based on time statistics have
guidance, for the direction of future research about lightweight been proved quite fragile [8]. Moreover, message authentica-
models and the ability to detect unknown attacks. tion or encryption will lead to inapplicable performance with
Index Terms— In-vehicle intrusion detection, deep learning, unacceptable delay in resource-constrained CAN devices [9].
neural network, vehicular networks. A method based on segmented federated learning [10] is
lightweight and can deal with imbalanced data using servers,
I. I NTRODUCTION but the computing mechanism between vehicle and external

V EHICLE-TO-EVERYTHING (V2X) communication, as


well as vehicular networking, has become a popular trend
as a key functional component of the emerging intelligent
server communication is complex and has large consumption.
Therefore, we only focus on deep-learning-based methods with
local vehicle limited resources computing.
transportation system in the current information society [1]. Among all, intrusion detection using deep learning tech-
In recent years, more and more manufacturers have embedded nologies may be the most dominant approach for IVNs, for
communication protocols in cars, such as Controller Area Net- its ability to process an increased amount of data without
work (CAN), to realize various intelligent services and even requiring prior knowledge of domain-specific expertise [11],
autonomous driving [2]. However, more network connections [12]. There are several surveys on the topic of intrusion
provide more opportunities for attackers, and there are a large detection system (IDS) using deep learning technologies for
number of evolving new attacks in cyberspace, resulting in the vehicular CAN bus [3], [4], [13], [14], [15]. However,
more risks to vehicle security and passenger safety. they do not have a fair horizontal performance comparison
CAN bus, although built by Bosch in 1985, is still in fact in an identical dataset as well as the same experimental
the de-facto standard for the communications of Electronic settings, making it very difficult to infer the usage scenarios
Control Units (ECUs) in the in-vehicle networks (IVNs) of and performance differences of existing solutions. Also, the
modern cars, due to its simplicity, low prices, high effi- baseline methods the experiments conducted in these studies
ciency, and stability. In detail, CAN is a message-oriented are relatively traditional and backward in performance. Fur-
broadcast-based serial communication protocol, through which thermore, they lack an evaluation of the model’s ability to
detect unknown attacks (e.g., zero-day attacks [16]), which
Manuscript received 18 February 2022; revised 20 July 2022 and
16 September 2022; accepted 1 November 2022. Date of publication reflects the degree of safety and security that models can
24 November 2022; date of current version 8 February 2023. This work was provide, and resource consumption to adapt to constrained
supported in part by the National Natural Science Foundation of China under embedded resource situations.
Grant 62272129 and in part by the Double First-Class Scientific Research
Funds of HIT under Grant IDGA10002093. The Associate Editor for this Hence, our main work and contribution are shown
article was S. Garg. (Corresponding author: Bailing Wang.) below:
Kai Wang, Aiheng Zhang, and Bailing Wang are with the School of Com-
puter Science and Technology, Harbin Institute of Technology, Weihai 264200, • We select the most representative and advanced methods
China (e-mail: dr.wangkai@hit.edu.cn; zahboyos@163.com; wbl@hit.edu.cn). with the compared performance advantages as the repre-
Haoran Sun is with the Big Data Center, State Grid Corporation of China,
Beijing 100031, China (e-mail: ran131110@163.com). sentative of each class of algorithms, carry out horizontal
Digital Object Identifier 10.1109/TITS.2022.3222486 comparison experiments, and get more fair comparative
1558-0016 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
1844 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2023

results of detection effect. Thus, we can provide baseline previously known but limited intrusion data, and then use the
selection suggestions for future research. one-shot transfer learning technology to enable the detection
• We study and evaluate the inference delay and memory ability for new intrusions. The one-shot transfer learning refers
consumption of 10 representative algorithms in order to training the model with only hints of a new type of intrusion
to make them adapt to the limited embedded resource sample and then migrating the newly learned mapping relation
environment of IVNs. And, we present a comprehensive to the original model. Hence, the generalization ability of the
analysis of the models’ adaptability. model can be enhanced and over-biasing problems due to the
• We evaluate the detection ability of those 10 models imbalanced datasets are more likely to be solved.
toward unknown attacks, and find that the existing algo- The CAN-ADF [23] is an ensemble framework of rule-
rithms have weak detection ability for unknown attacks, based and recurrent neural network (RNN)-based models to
or even do not take unknown attacks into consideration. detect typical in-vehicle intrusions (e.g., hazardous Denial of
According to this point, we put forward the future devel- Service (DoS), fuzzing and replay attacks), which represents
opment direction of intrusion detection technology in the the RNN-based structures and multi-classification models. The
in-vehicle network. work in [19] proposes to detect intrusion behaviors via anom-
The paper is organized as follows. Section II intro- aly analysis based on the time series prediction (TSP) method
duces the promising deep-learning-based IDSs in recent on the data field of every CAN message, using a typical
years. Section III designs the comparative experimental LSTM structure, but get advanced detection performance.
method. Section IV describes the experimental results and The optimized deep denoising autoencoder (O-DAE) proposed
analyses. Section V gives suggestions on future research direc- in [9] employs an evolutionary-based optimization algorithm
tions in the vehicular intrusion detection areas, and Section VI namely ecogeography-based optimization (EBO) into each
concludes this study. For convenient reading, we provide a list layer of the deep denoising autoencoder for training without
of abbreviations in Table VII in the appendix. premature convergence. O-DAE is selected as the representa-
tive algorithm of the autoencoder structure. The work in [20]
builds the enhanced generative adversarial network (E-GAN),
II. S TATE - OF - THE -A RT D EEP -L EARNING -BASED
which is based on the basic principle of the GAN model
I NTRUSION D ETECTION M ETHODS IN R ECENT Y EARS
in [27] but enhances the GAN discriminator by adding a CAN
Since the “Hacking and Countermeasure Research Lab” communication matrix. That is also a state-of-the-art solution
(HCR Lab, http://ocslab.hksecurity.net) contributes to the data- as a representative algorithm of GAN structure.
driven security field by sharing their datasets (e.g., the The lightweight dynamic autoencoder network (LDAN)
CAN dataset for intrusion detection [17] and the car-hacking in [21] is another representative of the autoencoder structure.
dataset [18]) to the public, the data-driven security equipped Its main characteristic is the lightweight design which reduces
with deep learning models for intrusion detection in IVNs, the computational cost and model size of the deep learning
especially in CAN bus, has been exploited extensively in past method. The autoencoder and classifier network in LDAN are
years. constructed with lightweight neural units. Although the LDAN
In this study, we have done a great number of literature model in the original study was trained based on non-V2X
researches about IDSs using deep learning technologies. To datasets (UNSW-15, KDD99), its characteristic of unsuper-
compare a wider variety of algorithms, we selected one vised and lightweight is quite suitable for the limited resources
algorithm from each category with comparative advantages of embedded IVNs, which is also the reason we selected it.
as the representative algorithm. The algorithms selected have We also investigated other intrusion detection algorithms not
comprehensive coverage of a wide range of model structures proposed for vehicular networks, which are inapplicable to
and training or inference types, such as based on convolutional vehicular network environments or have inferior performance
neural network (CNN) structure [18], Long Short Term Mem- to LDAN, and are excluded in this paper.
ory (LSTM) structure [19], Generative Adversarial Network In addition, there are some advanced integrated structures,
(GAN) network structure [20] and autoencoder structure [9], which are difficult to categorize into a single structure but are
[21]. And, they also involve supervised and unsupervised pioneering. We choose three models combined with typical
models, binary and multi-classification models. Finally, ten up- and basic structures which have high detection performance.
to-date models with the best performance in each category are For example, a model named HyDL-IDS was proposed in [24],
selected and previous study details are shown in Table I. which is a combination of CNN and LSTM structures using
For instance, the Reduced Inception-ResNet model pro- the spatial and temporal representation of IVN traffic. Another
posed in [18] is one typical state-of-the-art solution, which is work in [25] is CANet combined with LSTM and autoen-
taken as the representative algorithm based on CNN structure, coder structures. Because of the characteristics of these two
because of its almost perfect detection performance and ability structures, CANet is an unsupervised learning method and can
to process large amounts of data. The CANTransfer in [22] also capture the temporal dynamics of CAN messages. Rec-
is another advanced IDS, which is chosen as the representa- CNN model in [26] is a CNN-based structure combined with
tive algorithm using Transfer Learning based on Convolution recurrence plots to generate data, which adds the temporal
LSTM (ConvLSTM) structure. Due to the advantage of spatial- dependency of a sequence into image data input to the model.
temporal characteristics of the ConvLSTM, CANTransfer can However, the limitation of all the above works is that they
firstly extract knowledge from plenty of normal data as well as lack a horizontal comparison with each other and evaluations

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ANALYSIS OF RECENT DEEP-LEARNING-BASED INTRUSION DETECTION METHODS FOR IVN 1845

TABLE I
T HE R EPRESENTATIVE S TATE - OF - THE -A RT C OUNTERMEASURES IN P REVIOUS S TUDIES

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
1846 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2023

of resource consumption. These works, except CAN-ADF, TABLE II


lack consideration for unknown attack detection capabilities C AR -H ACKING D ATASET
during the design phase. As a result, our main work is
to re-implement these models in a consistent experimental
environment and to obtain a fair comparison result on both
performance comparison and resource consumption. Also,
we evaluate the capabilities of unknown attack detection for
each model.
III. Q UANTITATIVE E XPERIMENTAL M ETHOD
To comprehensively explore the comparative performance
advantages and resource consumption in a fair experimental
environment, all of the ten representative methods above are
implemented from scratch in this section. Settings of the fair
and quantitative experimental are described below. TABLE III
ATTACK C LASSIFICATION FOR I NTRUSION D ETECTION
A. Physical Configurations
The artificial intelligent server used for all the experiments
is equipped with an Intel(R) Xeon(R) CPU E5-2640 v4 @
2.40GHz, a GPU of NVIDIA-SMI 450.80.02, and a 128GB
memory. In addition, the operating system is CentOS 7 and
the deep learning methods are implemented with TensorFlow
2.0 and NumPy.

B. Real Datasets for the evaluation of selected detection methods. They can
all attack vehicles via network connections and have slightly
To evaluate the practical performance of related intrusion
different attacking behaviors yet huge different levels of detec-
detection methods, the datasets used in experiments should be
tion difficulty. In all the comparative experiments in this paper,
collected from real cars in real scenarios rather than theoretical
as shown in Table III, the DoS attack is used as training data
hypotheses. Up to now, it is difficult to collect CAN traffic data
that makes the models learn how to distinguish attacks, which
of IVN. There are two different but all real and widely used
we called known attacks. Fuzzy and impersonation attacks are
open datasets, the CAN dataset for intrusion detection [17]
treated as unknown threats that the models have never met
(termed as dataset-1) and the car-hacking dataset [18] (termed
before in the training process (that is, they are never used for
as dataset-2), which are constructed by logging the real-time
training but directly used for inference, which is also known
CAN message via the on-board diagnostic (OBD-II) port of
as predicting process, see Table IV). They are used to test
two running vehicles (KIA Soul and Hyundai Sonata) with
whether the current algorithm has the ability to detect not only
message attacks.
known attacks but also unknown and sophisticated threats.
However, the dataset-1 loses data label information. Among
As illustrated in Fig. 1 where nodes A and C are legitimate
the 10 representative methods, there are both supervised mod-
ECUs in the target vehicle while node B is the attacker, the
els (i.e. the models must be trained with the help of labeled
details of these attacks are given as follows.
data) and unsupervised models. Also, we need to use the same
dataset in our quantitative experiments. Thus, only dataset-2 1) DoS attack: Since the CAN bus is a shared communica-
is adopted as the benchmark dataset to carry out comparison tion channel with a broadcast nature, all the ECU nodes
experiments in our study. ( A, B and C) send messages with different priorities
Dataset-2 contains normal CAN messages and four typical determined by embedded CAN IDs (e.g., 0 × 2C0 in
attack data: DoS attack, fuzzy attack, spoofing the drive message from node A and 0×5A2 from C). Specifically,
gear attack and spoofing the RPM gauze attack. They are a lower CAN ID means a higher priority to use the CAN
stored in five .csv files respectively. Each piece of data bus for communications. Based on this, a DoS attacker
contains four data features, including timestamp, identifier (node B) aims to flood the CAN bus with numerous
(ID, in hexadecimal format), data length code (DLC, valued forged messages with low ID values (even the lowest
from 0 to 8) and data payload (8 bytes), and the label of 0 × 000 with the highest priority) in every short time
a CAN message. It also has a very imbalanced style, where interval. Thus, almost all the communication resources
there are 988,987 attack-free CAN messages, and 14,237,978 are occupied so that messages from other nodes will
normal messages (except for the attack-free ones) mixed with be delayed or denied from publishing into the same
2,331,497 anomaly messages, as shown in Table II. channel. In this case, communication in the involved
CAN bus will be created for other ECUs with unac-
C. Attack Scenarios ceptable time latency. This may lead to system failure,
In this study, three types of network-based message attacks, for example, unable to respond to driver’s commands in
DoS attack, fuzzy attack and impersonation attack, are used time and cause traffic accidents.

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ANALYSIS OF RECENT DEEP-LEARNING-BASED INTRUSION DETECTION METHODS FOR IVN 1847

Fig. 1. Three types of network-based attacks in dataset-1 (From [17]).

TABLE IV
T HE PARTITION OF THE C AN D ATASET FOR T RAINING AND T ESTING

2) Fuzzing attack: In this type of attack scenario, fake mes- model and their personalized optimal hyperparameter settings.
sages sent from malicious ECUs intrude into the CAN The details of our experimental setup are described in this part.
bus at a slower rate than DoS attack. However, instead
1) The training parameters: Because all the detection meth-
of using low ID values like DoS attack, fuzzy attack
ods in this comparison experiment are based on deep
has random ID values, which avoids the vulnerability
learning models, we need to search for the optimal
of obvious detection characteristics. A fuzzy attack can
hyperparameters for each model in order to achieve the
cause unexpected malfunctions such as an abrupt shift
best effect of them. Therefore, we provide a 3-fold
in a vehicle gearbox or an error reminder from the
cross-validation strategy to get the best hyperparameters
dashboard of a vehicle, thus, driving safety will be seri-
for each model and avoid overfitting. The average loss
ously impacted. It is very challenging to achieve simple
curves of 3-fold cross-validation are shown in Fig. 2.
detection based on some unique features, especially for
The point with the lowest loss value in the validation
the fuzzy attack that sends messages at the same rate and
curve is the optimal epoch of the corresponding model.
with the same IDs (e.g., 0×2C0 and 0×5A2 in Fig. 1(b))
For other hyperparameters, we also follow the methods in
as normal CAN messages. Therefore, more sophisticated
the original studies to get the best ones. For example, [26]
countermeasures are required to detect fuzzy attacks.
proposed to select the hyperparameters for Rec-CNN
3) Impersonation attack: This attack can realize unautho-
models using an algorithm called Hyperband which is
rized service access by eavesdropping or spoofing legit-
also used in our study.
imate authentication credentials, such as replaying some
2) The partition of the dataset: We designed the experiments
previously sniffed CAN messages from other ECUs [22],
not only to evaluate the performance of detecting known
spoofing the drive gear and the RPM gauze [18]. Attack-
attacks but also to evaluate whether these intrusion detec-
ers can manipulate the drive gear to a given constant
tion methods have the ability to detect unknown attacks.
but valid value (e.g., 0 × 2C0 in Fig. 1(c)) by using an
As shown in Table IV, 70% of normal and DoS attack
impersonation node (e.g., node B) which assumes the
messages are taken as a training set, and we randomly
identity of the legitimate node A connected in the CAN
take consecutive 5% messages from the remaining 30%
bus, resulting in legitimate gear abnormal behavior. This
as a testing set for each test. Especially, CANTransfer is
type of attack generally intrudes into the CAN bus at a
also proposed to use one-shot transfer learning in [22]
reasonable rate that seems perfectly normal, making it
to detect unknown attacks. Therefore, we evaluate the
very difficult to detect.
effect of one-shot transfer learning by adding a hint
(16 messages) of fuzzy attack data as the training set
for transfer learning, compared with the zero-shot method
D. Experimental Setup for Model Training and Predicting
(original CANTransfer without hints of data for training).
To make a horizontal comparison among the recent state- 3) The data pre-processing: The data pre-processing meth-
of-the-art intrusion detection methods, the quantitative experi- ods fully follow the original studies of each proposed
ments are conducted, but we retain the characteristics of each intrusion detection model. The details of pre-processing

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
1848 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2023

TABLE V
T HE D ATA P RE -P ROCESSING FOR I NTRUSION D ETECTION M ODELS

all aspects, including the running performance in practical and


the detection performance.
1) Evaluation Metrics for Running Performance: The run-
ning performance is mainly reflected in time consumption
and hardware resource consumption. In our study, the time
consumption is assessed in two phases: training and predicting.
The average training time per epoch is taken to evaluate
whether the detection methods are lightweight enough. The
average predicting time per message is crucial for the security
of IVNs. For high-speed cars, serious traffic accidents may
occur if they are attacked and the intrusion is not detected in
time. Memory usage is taken to evaluate the hardware resource
consumption. If excessive resources are consumed, it will also
affect the normal operation of ECU and cause security risks.
Fig. 2. The average loss value of training and validation process for each
model under 3-fold cross-validation strategy. 2) Evaluation Metrics for Detection Performance: The
experimental results are recorded in confusion matrices,
including four possibilities, true positive (TP), true negative
for each model are shown in Table V. Window size (TN), false positive (FP) and false negative (FN), defined as
means how many consecutive pieces of data make up a follows.
group and form a matrix. 11-feature data is obtained from
• TP - attack samples correctly labeled anomalous.
the original 4-feature data (described in Section III-B,
• TN - normal samples correctly labeled normal.
Dataset-2) by subdividing the 64 bits data payload feature
• FP - normal samples incorrectly labeled anomalous.
into eight features (D1-D8).
• FN - attack samples incorrectly labeled normal.
To achieve a comprehensive evaluation of detection meth-
E. Comparative Evaluation Metrics ods, we calculated some metrics from confusion matrices.
In-vehicle intrusion detection technology is radically The metrics are accuracy, precision, recall, FPR, FNR and
designed for practical implementation in automotive industries F1-score.
to ensure the security of IVNs and the safety of passengers. Defined in Equation (1), accuracy is a global evaluation
Therefore, the detection performance should be evaluated from metric that is defined by the ratio of the sum of TP and TN

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ANALYSIS OF RECENT DEEP-LEARNING-BASED INTRUSION DETECTION METHODS FOR IVN 1849

to all the predicted data.


TP +TN
Accur acy = (1)
T P + T N + FP + FN
Precision and recall are two important metrics, reflecting
the model’s ability to extract and distinguish attack samples.
Precision is the ratio of TP to the number of samples detected
as attacks, as defined in Equation (2). Recall, also called true
positive rate (TPR), is defined by the ratio of TP to the number
of actual attack samples, as defined in Equation (3).
TP
Pr eci si on = (2)
T P + FP
TP
Recall = (3) Fig. 3. The time consumption of ten representative models (the bar chart
T P + FN describes the average training time per epoch of models; the line graph
describes the average predicting time per message of models).
Besides, false positive rate (FPR) and false negative rate
(FNR) are also evaluation metrics, describing the probability
of detection errors. FPR is the ratio of FP to the number of
actual normal samples, as defined in Equation (4). FNR is the
ratio of FN to the number of actual attack samples, as defined
in Equation (5), which is complementary to recall.
FP
FPR = (4)
FP + T N
FN
FNR = (5)
T P + FN
However, the testing dataset has a very imbalanced style,
which leads to biased results. ROC Area Under the Curve
(AUC) is a common metric to deal with data imbalance. ROC
curve refers to a graph whose horizontal axis is FPR and
Fig. 4. The memory consumption of ten representative models during
vertical axis is TPR. It shows the relationship between FPR inference process.
and TPR with changing threshold value range. When ROC
AUC is close to 1, the model has high performance.
Furthermore, to evaluate the model comprehensively from we analyzed the time and memory consumption of the 10 rep-
a positive perspective. We used a metric called F-score, which resentative models to evaluate their suitability under resource-
integrates precision and recall together by adding weight limited on-board environment. The model with less time and
coefficient β, as defined in Equation (6). memory consumption is considered more lightweight, because
  Pr eci si on · Recall the time consumption is related to the depth and computational
Fβ = 1 + β 2 ·  2  (6) complexity of models, and the memory consumption can
β · Pr eci si on + Recall
reflect the complexity of model parameters and the size of
In this study, we took β = 1 and then got the models.
standard F1-score, reflecting the comprehensive evaluation The experimental results of average training time and pre-
with the balance between recall and precision. Combining dicting time consumption are shown in Fig. 3. It demonstrates
Equations (2), (3) and (6), F1-score is calculated as: that the Rec-CNN model has less time consumption in both
TP the training process and inference process due to the only
F1 = (7) 5 network layers of the Rec-CNN model [26]. On the contrary,
TP + F P+F N
2 Reduced Inception-ResNet has a relatively deep and complex
In summary, we took confusion matrices, accuracy, network structure [18]. It shows a result of the longest training
precision, recall, FPR, F1-score and ROC curves to com- time and predicting time as expected, about 25 min per epoch
prehensively evaluate the detection performance of models. and 1.5633 ms per message respectively. Although O-DAE
We discarded the FNR metric just because it had been implied and LDAN are both based on autoencoder structure, LDAN
in recall (r ecall = 1 − F N R). uses the dynamic network completely consisting of lightweight
units, which makes it less time consumption than O-DAE.
IV. R ESULTS AND A NALYSIS In Fig. 4, it shows the memory usage of 10 representative
models when they predict CAN messages. We concern more
A. Running Performance Evaluation about the inference process than the training process in our
Since the running performance of intrusion detection models study under the restricted on-board conditions, since the infer-
has an important influence on vehicle safety and security, ence process is on ECUs while the training process is offline.

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
1850 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2023

TABLE VI
T HE D ETECION P ERFORMANCE OF T EN M ETHODS U NDER A LL T YPES OF ATTACKS

In addition, we suggest to take predicting time and memory B. Detection Performance Evaluation
consumption metrics together for a comprehensive evaluation According to the experimental setup described in
of models’ running performance. From this aspect, we think Section III, we tried to replicate each of the models
that HyDL-IDS, CANet and Rec-CNN have a higher running described above and got the results shown in Table VI.
performance due to their relatively less predicting time and Since IDS is an active protective measure, it is sensitive to
memory consumption, lower than 0.5 ms per message and 70 false positive, which could interrupt the normal operation of
MB respectively. On the other hand, LDAN is also considered systems, and needs high detection accuracy, influencing the
as having a good running performance for the reason that it effectiveness of IDS. The detailed evaluation and specific
has the lowest memory usage, 43.7 MB, and relatively short analysis based on experimental results are illustrated from
predicting time, 0.928 ms per message. two aspects presented below.

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ANALYSIS OF RECENT DEEP-LEARNING-BASED INTRUSION DETECTION METHODS FOR IVN 1851

Fig. 5. The confusion matrices of ten models for DoS attack inference.

1) The Ability of Detecting Unknown Attacks: One purpose Referring to metrics except for accuracy, we can dis-
of our study is to find out whether the representative intrusion cover that only CANTransfer, CAN-ADF and HyDL-IDS
detection methods have ability of detecting unknown attacks, can barely detect unknown attacks, while others are not.
which is an inferential capability of detecting simple attacks, In fact, CANTransfer gets the ability to detect unknown fuzzy
such as DoS, to detecting more complex attacks, like fuzzy attack due to the use of 1-shot transfer learning technol-
and impersonation attacks. Thus, only DoS attack samples ogy. It improves the performance of detecting fuzzy attack
are adopted as training set, while other types of attacks are from 0 to 0.9794 of precision, 0.0309 of recall and 0.0599 of
regarded as unknown attacks for models. F1-score.
As shown in Table VI, we can see that the accuracy of each It is worth noting that CAN-ADF and HyDL-IDS get
method, regardless the type of attacks, is at an acceptable high a bit of ability to detect fuzzy and RPM spoofing attacks
level. For example, Reduced Inception-ResNet has 0.9993, with similar performance. The reason is that CAN-ADF has
0.8730, 0.8223 and 0.7774 of accuracy for DoS, fuzzy, gear combined heuristic algorithm with deep learning models,
spoofing and RPM spoofing attack respectively, which seems defining a rule for detecting unknown attacks. Moreover,
to be an acceptable result. However, combining with other another possible reason is that both CAN-ADF and HyDL-IDS
metrics, we find that precision and recall for unknown attacks, have taken deep learning structures for extracting spatial and
including fuzzy, gear spoofing and RPM spoofing attacks, are temporal features of CAN messages. Furthermore, the result
all 0, illustrating that there is no true positive of any type shows that no one model can detect gear spoofing attack as
of unknown attack. The relatively high accuracy in results is unknown attack, which indicates that gear spoofing attack may
mainly caused by imbalanced datasets, so accuracy metric has have the most complex mechanism among all four types of
low reference value here. attacks.

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
1852 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2023

Fig. 6. The detection performance of all methods against known DoS attack (the average value of each metric is shown in the last row of the table and at
the far right bar of each group).

Fig. 7. The ROC curves of ten models for DoS attack inference.

2) The Comparison of Detecting Known DoS Attacks: and CAN-ADF perform better under imbalanced datasets.
Based on our quantitative experiment, the results of known The reason is that CANTransfer utilizes transfer learning and
DoS attack is used to evaluate the detection performance of CAN-ADF utilizes rule-based framework, both of which are
all models and make a horizontal comparison between each beneficial to avoid overfitting training and enhance general-
model. Normalized confusion matrices, as shown in Fig. 5, ization ability, hence the imbalanced datasets will have less
is taken and shows that all models mentioned in our study influence on these two models.
can predict the labels of normal message and DoS attacks Moreover, we classify these ten models into three levels
correctly. In addition, accuracy, precision, recall, FPR and according to the detection performance under DoS attack.
F1-score are calculated and shown in Fig. 6, where the results Among all, Reduce Inception-ResNet, CANet and CANTrans-
can be compared intuitively. As the state-of-the-art intrusion fer are the top 3 models with F1-score over 0.99 and FPR no
detection technologies, all methods described in this study more than 0.002. Therefore, these three models belong to the
have acceptable performance, with high accuracy (all over first level for their excellent detection performance. As the
0.98), high precision (all over 0.90), high recall (all over 0.97), result of having detection performance scores around average
high F1-score (all over 0.94) and low FPR (all below 0.02). values, CAN-ADF, O-DAE and HyDL-IDS are classified into
For a further analysis and comparison, ROC curves are the second level. They also have considerable satisfactory
taken to show more details in Fig. 7. The results demonstrate detection performance but are not so perfect as the top three
that all models have satisfactory detection performance dealing with scores difference from them about 0.02. The last level
with imbalanced datasets. It also indicates that CANTransfer includes LDAN, E-GAN, Rec-CNN and TSP, whose detection

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ANALYSIS OF RECENT DEEP-LEARNING-BASED INTRUSION DETECTION METHODS FOR IVN 1853

performance on DoS attacks has a relatively big gap between TABLE VII
them and the first two levels. T HE A BBREVIATION

V. D ISCUSSION AND F UTURE W ORK


Having combined running performance and detection per-
formance, we re-evaluated these representative intrusion detec-
tion methods based on the experimental results. Although
Reduced Inception-ResNet has the best detection performance
and Rec-CNN has the best running performance, they lack
good comprehensive performance to be selected as a compar-
ative baseline for in-vehicle environment. By comprehensive
analysis, CANet is the best to be selected as comparative
baseline, which has high-level running performance as well as
the first-level detection performance. In addition, considering
the ability of detecting unknown attacks, HyDL-IDS may be
also a good alternative baseline as the result of its high-level
running and detection performance and its existing ability
for unknown attacks detecting. From this aspect, we can
also find that considering both spatial and temporal features
of attack data may be helpful to learn more about internal
data characteristics and improve the generalization ability of
models, and then models can obtain some ability to detect
unknown attacks.
According to the results of our experiments, we find that the
existing intrusion detection methods based on deep learning
have made the detection performance on CAN traffic reach
a very high degree. For example, Reduced Inception-ResNet,
CANTransfer and CANet all achieve close to 100% detection
scores. However, the challenges of deep-learning-based IDSs
are the ability to detect unknown and sophisticated attacks and
the ability to reduce their time and resource consumption for
embedded systems. These factors will determine whether a
model is suitable to be applied in the resource-constrained in-
vehicle network environment in reality and whether the model
can ensure the safety and security of the CAN bus.
Therefore, the enhancement of models’ detection ability for
unknown attacks, the running speed, as well as the reduction embedded in-vehicle environment, this paper focuses on inves-
of resource consumption are suggested for future research in tigating and discussing ten representative state-of-the-art deep-
the field of in-vehicle intrusion detection technology based on learning-based IDSs: Reduced Inception-ResNet, CANTrans-
deep learning. Also, we suggest designing feature extraction fer, CAN-ADF, TSP, O-DAE, LDAN, E-GAN, HyDL-IDS,
architecture with shallow layers based on both time and space CANet and Rec-CNN. By setting quantitative experiment,
into intrusion detection models and using transfer learning to this paper discusses the ability to detect unknown attacks,
improve the generalization ability of models. In this way, the the running performance and detection performance of these
models can be lightweight enough for an in-vehicle environ- intrusion detection methods. The result shows that CANet
ment and may get the ability to detect unknown attacks. and HyDL-IDS may be suitable to be selected as baseline
methods for their great comprehensive performance. Also,
VI. C ONCLUSION we provide significant suggestion and valuable guidance for
the development direction of in-vehicle intrusion detection
With the development and popularity of V2X communica-
method about reducing time delay and resource consumption
tion, the connectivity between vehicles and powerful networks
and improving the ability of detecting unknown attacks.
become much more than ever, causing more chances for
attackers to take over a car by CAN. Therefore, many intrusion A PPENDIX
detection methods are designed to ensure the security of IVNs,
especially deep-learning-based methods, having much more See Table VII.
capabilities and better performance than traditional algorithms.
R EFERENCES
However, studies about deep-learning-based intrusion detec-
tion methods in a quantitative and fair horizontal performance [1] H. H. Jeong, Y. C. Shen, J. P. Jeong, and T. T. Oh, “A comprehensive
survey on vehicular networking for safe and efficient driving in smart
comparison analysis is insufficient. To get valuable conclu- transportation: A focus on systems, protocols, and applications,” Veh.
sions about the selection of proper baseline method under Commun., vol. 31, Oct. 2021, Art. no. 100349.

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.
1854 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 2, FEBRUARY 2023

[2] J. P. Jeong et al., “A comprehensive survey on vehicular networks for [25] M. Hanselmann, T. Strauss, K. Dormann, and H. Ulmer, “CANet: An
smart roads: A focus on IP-based approaches,” Veh. Commun., vol. 29, unsupervised intrusion detection system for high dimensional CAN bus
Jan. 2021, Art. no. 100334. data,” IEEE Access, vol. 8, pp. 58194–58205, 2020.
[3] E. Aliwa, O. Rana, C. Perera, and P. Burnap, “Cyberattacks and [26] A. K. Desta, S. Ohira, I. Arai, and K. Fujikawa, “Rec-CNN: In-vehicle
countermeasures for in-vehicle networks,” ACM Comput. Surv., vol. 54, networks intrusion detection using convolutional neural networks trained
no. 1, pp. 1–37, Apr. 2021. on recurrence plots,” Veh. Commun., vol. 35, Jun. 2022, Art. no. 100470.
[4] G. Loukas, E. Karapistoli, E. Panaousis, P. Sarigiannidis, A. Bezemskij, [Online]. Available: https://www.sciencedirect.com/science/
and T. Vuong, “A taxonomy and survey of cyber-physical intrusion article/pii/S2214209622000171
detection approaches for vehicles,” Ad Hoc Netw., vol. 84, pp. 124–147, [27] Y. Yang, G. Xie, J. Wang, J. Zhou, Z. Xia, and R. Li, “Intrusion detection
Oct. 2019. for in-vehicle network by using single GAN in connected vehicles,”
[5] H. Ji, Y. Wang, H. Qin, Y. Wang, and H. Li, “Comparative performance J. Circuits, Syst. Comput., vol. 30, no. 1, Jan. 2021, Art. no. 2150007.
evaluation of intrusion detection methods for in-vehicle networks,” IEEE
Access, vol. 6, pp. 37523–37532, 2018.
[6] T. Moulahi, S. Zidi, A. Alabdulatif, and M. Atiquzzaman, “Compar-
ative performance evaluation of intrusion detection based on machine
learning in in-vehicle controller area network bus,” IEEE Access, vol. 9,
pp. 99595–99605, 2021. Kai Wang received the B.S. and Ph.D. degrees
[7] D. Swessi and H. Idoudi, “Comparative study of ensemble learning tech- from Beijing Jiaotong University. He is currently an
niques for fuzzy attack detection in in-vehicle networks,” in Advanced Associate Professor with the Faculty of Computing,
Information Networking and Applications, L. Barolli, F. Hussain, and Harbin Institute of Technology (HIT), China. Before
T. Enokido, Eds. Cham, Switzerland: Springer, 2022, pp. 598–610. joining HIT, he was a Post-Doctoral Researcher
[8] B. Groza and P.-S. Murvay, “Efficient intrusion detection with Bloom in computer science and technology with Tsinghua
filtering in controller area networks,” IEEE Trans. Inf. Forensics Security, University. He has published more than 40 papers
vol. 14, no. 4, pp. 1037–1051, Apr. 2019. in prestigious international journals and conferences,
[9] Y. Lin, C. Chen, F. Xiao, O. Avatefipour, K. Alsubhi, and A. Yunianta, including IEEE Network, IEEE S YSTEMS J OUR -
“An evolutionary deep learning anomaly detection framework for in- NAL, IEEE T RANSACTIONS ON N ETWORK AND
vehicle networks—CAN bus,” IEEE Trans. Ind. Appl., early access, S ERVICE M ANAGEMENT, and ACM Transactions
Jul. 17, 2020, doi: 10.1109/TIA.2020.3009906. on Internet Technology. His current research interests include lightweight
[10] Y. Sun, H. Ochiai, and H. Esaki, “Intrusion detection with segmented and intelligent security technologies, such as deep learning applications on
federated learning for large-scale multiple LANs,” in Proc. Int. Joint industrial control network or in-vehicle network intrusion detection. He is a
Conf. Neural Netw. (IJCNN), Jul. 2020, pp. 1–8. Senior Member of the China Computer Federation (CCF).
[11] A. Jolfaei, N. Kumar, M. Chen, and K. Kant, “Guest editorial introduc-
tion to the special issue on deep learning models for safe and secure
intelligent transportation systems,” IEEE Trans. Intell. Transp. Syst.,
vol. 22, no. 7, pp. 4224–4229, Jul. 2021.
[12] A. Mchergui, T. Moulahi, and S. Zeadally, “Survey on artificial intelli-
gence (AI) techniques for vehicular ad-hoc networks (VANETs),” Veh. Aiheng Zhang received the B.S. degree in computer
Commun., vol. 34, Apr. 2021, Art. no. 100403. science and technology from the Beijing University
[13] K. Kim, J. S. Kim, S. Jeong, J.-H. Park, and H. K. Kim, “Cybersecurity of Technology, Beijing, China. She is currently
for autonomous vehicles: Review of attacks and defense,” Comput. pursuing the master’s degree in computer technol-
Secur., vol. 103, Aug. 2021, Art. no. 102150. ogy with the Harbin Institute of Technology (HIT),
[14] W. Wu et al., “A survey of intrusion detection for in-vehicle networks,” China. Her research interests include intelligent and
IEEE Trans. Intell. Transp. Syst., vol. 21, no. 3, pp. 919–933, Mar. 2020. lightweight in-vehicle intrusion detection models.
[15] S. Sharma and A. Kaul, “A survey on intrusion detection systems and
honeypot based proactive security mechanisms in VANETs and VANET
cloud,” Veh. Commun., vol. 12, pp. 138–164, Apr. 2018.
[16] M. Keramati, “An attack graph based procedure for risk estimation of
zero-day attacks,” in Proc. 8th Int. Symp. Telecommun. (IST), Sep. 2016,
pp. 723–728.
[17] H. Lee, S. H. Jeong, and H. K. Kim, “OTIDS: A novel intrusion
detection system for in-vehicle network by using remote frame,” in Proc.
15th Annu. Conf. Privacy, Secur. Trust (PST), Aug. 2017, pp. 57–66. Haoran Sun received the B.S. degree in com-
[18] H. M. Song, J. Woo, and H. K. Kim, “In-vehicle network intrusion puter science and technology from Jilin University,
detection using deep convolutional neural network,” Veh. Commun., Changchun, China, and the master’s degree in com-
vol. 21, pp. 1–13, Jan. 2020. puter technology from the Harbin Institute of Tech-
[19] H. Qin, M. Yan, and H. Ji, “Application of controller area network nology (HIT), China. He is currently an Engineer
(CAN) bus anomaly detection based on time series prediction,” Veh. with the Big Data Center, State Grid Corporation
Commun., vol. 27, Jan. 2021, Art. no. 100291. of China. His research interests include transfer
[20] G. Xie, L. T. Yang, Y. Yang, H. Luo, R. Li, and M. Alazab, “Threat learning and data-sample-generating methods for in-
analysis for automotive can networks: A GAN model-based intrusion vehicle intrusion detection.
detection technique,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 7,
pp. 4467–4477, Jul. 2021.
[21] R. Zhao et al., “An efficient intrusion detection method based on
dynamic autoencoder,” IEEE Wireless Commun. Lett., vol. 10, no. 8,
pp. 1707–1711, Aug. 2021.
[22] S. Tariq, S. Lee, and S. S. Woo, “CANTransfer: Transfer learning based
intrusion detection on a controller area network using convolutional Bailing Wang received the Ph.D. degree from
LSTM network,” in Proc. 35th Annu. ACM Symp. Appl. Comput., the School of Computer Science and Technol-
Mar. 2020, pp. 1048–1055. ogy, Harbin Institute of Technology (HIT), China,
[23] S. Tariq, S. Lee, H. K. Kim, and S. S. Woo, “CAN-ADF: The controller in 2006. He is currently a Professor with the Faculty
area network attack detection framework,” Comput. Secur., vol. 94, of Computing, HIT. He has published more than
Jul. 2020, Art. no. 101857. 80 papers in prestigious international journals and
[24] W. Lo, H. Alqahtani, K. Thakur, A. Almadhor, S. Chander, and conferences, and has been selected for the China
G. Kumar, “A hybrid deep learning based intrusion detection system national talent plan. His research interests include
using spatial–temporal representation of in-vehicle network traffic,” Veh. information content security, industrial control net-
Commun., vol. 35, Jun. 2022, Art. no. 100471. [Online]. Available: work security, and V2X security.
https://www.sciencedirect.com/science/article/pii/S2214209622000183

Authorized licensed use limited to: Harbin Institute of Technology. Downloaded on February 10,2023 at 05:59:35 UTC from IEEE Xplore. Restrictions apply.

You might also like