A Deep Learning-Based DDoS Detection Framework for Internet of Things
A Deep Learning-Based DDoS Detection Framework for Internet of Things
A Deep Learning-Based DDoS Detection Framework for Internet of Things
Abstract—Intrusion detection system (IDS) is an active defense to detect continuously emerging zero-day attacks, especially
mechanism implemented by the Internet of Things (IoT), which in dynamic and heterogeneous IoT-based systems [6].
can identify the intrusion behavior and initiate alarms.However, To address the limitations, techniques based on deep learn-
there are concerns regarding the sustainability and feasibility to
existing schemes when facing the increasing of threats in IoT. ing have been successfully employed in Network Intrusion
In particular, these concerns in terms of the increasing levels Detection applications (NIDS) [7]. The modern techniques are
of adaptive performance and the insufficient levels of detection able to facilitate deeper analysis of network data and identify
accuracy. In this paper, we present a novel deep learning method any attacks quickly, such as convolutional neural network
to address the aforementioned concerns. We detail the proposed (CNN) and recurrent neural network (RNN). However, the
convolution neural network model based on the developed
feature fusion mechanism. Furthermore, we also propose a traditional single network structure used in current models
Symmetric logarithmic loss function based on categorical cross cannot well reflect the feature correlation between multidi-
entropy. In addition, the proposed detection framework has mensional attributes. Additionally, the impact caused by false
been applied to GPU-enabled TensorFlow, and evaluated using labels in training datasets are not sufficiently considered [8].
the benchmark of NSL-KDD datasets. Extensive experimental In this paper, we aim to develop a deep learning based
results indicate that the developed model outperforms traditional
approaches and has great potential to be applied for attacks
DDoS detection method to address the above concerns. In
detection in IoTs. particular, a novel CNN model is proposed. In comparison
Index Terms—Internet of Things, Intrusion detection, Convo- with the previous methods, a multilayer convolution feature
luti onal neural network, Symmetric Logarithmic Loss function fusion mechanism is designed to maximize the correlations
between data features. In addition, we develop a symmetry
logarithmic loss function based on categorical cross entropy,
I. I NTRODUCTION for the sake of improving the stability of the classification
model. Furthermore, extensive comparative experiments have
The Internet of Things (IoT) paradigm is envisioned to been carried out, and the results demonstrate that the proposed
improve the quality of modern life. Various IoT-based applica- system can effectively and accurately detect DDoS attacks.
tions, such as automatic driving, digital health, and smart grid, The primary research contributions to this paper are sum-
are developing at a breathtaking pace [1]. According to Cisco, marized as follows.
the number of devices connected to the Internet is expected • We develop a novel CNN based DDoS detection model
to reach 50 million in 2020. The benefits brought by IoT by adding a multilayer convolution feature fusion mecha-
systems are undeniable, while the profit-driven production of nism, where the extracted features in convolution blocks
IoT devices with little security considerations is introducing of different depths can maximize the correlations be-
devastating issues into cyberspace [2]. Among many cases, tween data features.
the distributed denial-of-service (DDoS) attack has become • We design a symmetric logarithmic loss function based
one of the most serious threats, due to the resource con- on categorical cross entropy. The function enables to
strained characteristics of IoT networks [3]. compare the desired probability score of the output with
Therefore, detecting DDoS attacks at an early stage is of the average score of other categories, which can make
great significance. In recent years, solutions based on machine the classification model more stable.
learning (ML) have been investigated, including Naive Bayes, • We conduct extensive experiments on the benchmark
Support Vector Machines (SVM), Decision Trees etc [4] [5]. NSL-KDD datasets. The evaluation results illustrate that
However, the above shallow learning techniques have limi- the proposed model is superior to traditional methods in
tations. To begin with, most of them required comparatively terms of DDoS attack detection in IoT networks.
high level of human expert interaction to extract useful data The reminder of this paper is organized as below. We
and patterns, which is often labour expensive and impractical. present the related research in the field of NIDS in section II.
In addition, while rule-based or supervised learning models We then describe the proposed CNN-based DDoS detection
can distinguish existing known attacks, they are inadequate framework in Section III. Section IV provides the analysis,
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on April 18,2023 at 22:56:03 UTC from IEEE Xplore. Restrictions apply.
which is followed by the experiments and evaluations in normalization and dimension reduction, inputting features
Section V. Finally, the summary and future plans for the into the detection model which is based on an improved
proposed work are provided in Section VI. convolution feature fusion network. By training the model,
the classification of normal or attacks on the softmax layer
II. R ELATED W ORK can be achieved.
Traditional ML models have been widely attempted to
detect network intrusions. One of the earliest study found
in literature that employed Bayesian algorithm as classifier,
which has advantages of simplicity, easy to implement, and
applicability to binary and multi-class classification [9]. K-
nearest neighbor algorithm was also applied for detecting
DDoS attack in wireless sensor network, but it is difficult
to determine the optimal K value for large datasets [10].
Ambusaidi et al. [11] employed SVM model and developed
a mutual information based feature selection algorithm to
improve the detection performance. However, with the size
and dimension of the dataset increase, the accuracy of the
classifier will decreases. Doshi et al. [12] tested five different Fig. 1. IoT intrusion detection system
ML detection methods on a dataset of normal and DDoS
attack traffic collected from an experimental IoT-based net-
work. Because traditional ML schemes heavily depend on B. CNN Model Based on Feature Fusion
feature engineering, it is often time-consuming and complex We represent the feature vector as 𝑥 ∈ 𝑋 𝑀 , and the
to calculate the correlation between features. Overall, it is corresponding classify label as 𝑦 𝑥 ∈ [𝑘] = {1...𝑘 }. The
impractical to detect attacks by applying traditional ML marked dataset is represented as 𝑆 = 𝑥𝑖 , 𝑦 𝑥𝑖 : 𝑖 ∈ {1...𝑁 } ,
algorithms in real-time applications. where both the random variables 𝑥 and 𝑦 𝑥 in the sample come
Deep learning, as an important branch of ML, has attracted from the unknown joint distribution 𝐷. The expected learning
increasing attention from researchers in the field of NIDS. function can be represented as 𝑓 : 𝑋 → 𝑌 , which denotes the
Roy et al. [13] developed an anomaly detection method mapping from the input vector to the output vector. For the
based on artificial neural network (ANN). It can recognise network structure that contains 𝑙 hidden layers, we start by
abnormal patterns by describing the posterior distributions of putting the feature vector 𝑥 0 into the network, then the output
data-constrained classes. Experimental results indicate that vector for each convolution layer is 𝑥 𝑙 . Using 𝑊 𝑙 and 𝑏 𝑙 to
the accuracy of intrusion detection could be improved by represent network weight and bias, respectively, we can get
applying deep neural network. Cordero et al. [14] developed
⎧
⎪
an unsupervised approach by utilizing features of network ⎨ 𝑥 𝑙 = 𝑓 𝑢𝑙 ,
⎪
flows, together with an extended version of RNN model. (1)
⎪
⎪ 𝑢 𝑙 = 𝑊 𝑙 𝑥 𝑙−1 + 𝑏 𝑙 ,
However, they did not fully disclose the exact accuracy of the ⎩
proposed model. Javaid et al. [15] proposed a deep learning where 𝑢 is the activation function. The function has many
method by combining a sparse self-encoder with softmax types such as sigmoid, relu, tanh, etc. In each convolution
regression. Evaluations on the benchmark dataset of NSL- layer, the output vector is obtained by convolving feature
KDD showed that their 5-class detection can achieve an vectors of the previous layer with the convolution kernel
average f-score of 75.76%. Recently, a comprehensive survey through activation function. In the feature fusion mechanism,
concludes that detection strategies based on deep learning can the output vector is the value of combining multiple input
offer better accuracy across different sample sizes and various vectors through convolution, thus the output feature vector of
abnormal traffic types [16]. 1-th layer is
III. P ROPOSED D ETECTION S CHEME 𝑚
A. System Overview 𝑥 𝑖,𝑙 𝑗 = 𝑓 𝑥𝑖,𝑙−1𝑗 ∗ 𝑘 𝑖,𝑙 𝑗 + 𝑏 𝑙𝑗 , (2)
The IoT intrusion detection system in this paper is shown in 𝑖=1
Fig.1, and the proposed intrusion detection model is applied where 𝑚 is a selection set of input feature vectors; 𝑖 represents
to the server. Firstly, the sink nodes receive the information the dimension of feature vector; 𝑗 denotes the convolution
about the end nodes accessing to the network then aggregated kernel parameter; * is the convolution operation; and 𝑏 𝑙𝑗 is a
them to the IoT gateways. The server can obtain the end additive bias.
nodesdata flowing through interacting the gateway nodes. In the process of convolution above, the training com-
We can extract features include IP, port, network protocol, plexity of the model can be reduced by sharing weights.
transmission flow and the network connection frequency, We make feature non-linear transformation by applying the
which associated with DDoS attack. After the process of Relu activation function. We introduce the pooling layer in
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on April 18,2023 at 22:56:03 UTC from IEEE Xplore. Restrictions apply.
convolution, which includes average pooling and maximum under the classification framework by this way. In the model
pooling. The purpose of it is to keep the position of the feature architecture for binary or categorical classification, the Cross
vector unchanged, so that the outputs can make corresponding Categorical Entropy loss function (CCE) is usually used as
dimension to the inputs in next convolution layer. The pooling shown below.
layer functions that will be applied to the feature values of
𝑘
each layer are presented as follows.
𝐿( 𝑝, 𝑦 𝑖 ) = − [𝑦 𝑖 𝑙𝑜𝑔 ( 𝑝 𝑖 )] . (6)
𝑖=1
𝑥 𝑖,𝑙 𝑗 = 𝑝𝑜𝑜𝑙 (𝑥𝑖,𝑙−1𝑗 ), ∀𝑖 ∈ 𝑀. (3)
We note it is an asymmetric loss function. For making the
Based on the above operations, the feature fusion mecha- model more stable, we develop a novel Symmetric Logarith-
nism is shown in Fig. 2. Firstly, we fuse features of multiple mic Loss function (SLL) by introducing a constant value
convolutional layers, and then the classifier on the fused 𝜆 (𝜆 > 0) on the base of CCE. In the classification layer,
features is trained. Specificallywe connect these feature values we define the output and the classification label as 𝑝 and 𝑦,
through a fully connected layer, and compose their feature respectively. The 𝑗 𝑡 ℎ represents the coordinate of unit vector,
vectors into a composite vector. The purpose of this operation if we enter a label 𝑦, with 𝑦 𝑗 = 1 and 𝑦 𝑖 = 0, ∀𝑖 ≠ 𝑗, the SLL
is to enhance the semantic information of the low-detph function is defined as follows.
features and the detail perception of the high-detph features.
Therefore, we can maximize the correlation between the
feature values. 𝐿( 𝑝, 𝑒 𝑗 ) = − 𝑙𝑜𝑔(𝜆 + 𝑝 𝑗 ) + 𝑙𝑜𝑔 (𝜆 + 1)
𝑘
1 (7)
+ 𝑙𝑜𝑔(𝜆 + 𝑝 𝑖 ), (𝜆 → 0) .
𝑖=1,𝑖≠𝑡
𝑘 −1
𝑘 𝑘 𝑘
Fig. 2. Multilayer feature fusion network 𝐿 𝑝, 𝑒 𝑗 = 𝑙𝑜𝑔 (𝜆 + 1) − 𝜆 + 𝑝𝑗
𝑗=1 𝑗=1 𝑗=1
𝑘 𝑘
1
C. Optimization of Loss Function + 𝑙𝑜𝑔 (𝜆 + 𝑝 𝑖 ) − 𝑙𝑜𝑔 𝜆 + 𝑝 𝑗
𝑗=1
𝑘 −1 𝑖=1
In the proposed network structure, we can optimize the
parameters, such as weight and bias of the CNN model, = 𝑘𝑙𝑜𝑔 (𝜆 + 1) .
through minimizing the loss function. When the difference (8)
between the expected results and the real output of the model
is too large, we can use the gradient descent methods referring IV. M ODEL S TABILITY A NALYSIS
to the loss function to reduce the distance. The loss function is The network of the CNN model in this paper is shown
defined as the map 𝐿 : 𝑅 ×𝑌 → 𝑅. Accordingly, the expected in TABLE I. It includes the proposed CNN fusion mech-
loss of joint distribution D under the classification framework anism proposed. The input data is obtained through two-
based on loss function L is denoted as layer convolution operation and maximization to obtain the
58-dimensional eigenvalue, which is then flattened through
∫ convolution to expand the feature dimension to 7296. Then
𝑅𝑒𝑥 𝑝 ( 𝑓 ) = 𝐸 𝐷 [𝐿( 𝑓 (𝑥), 𝑌 )] = 𝐿( 𝑓 (𝑥), 𝑦 𝑥 )𝑃(𝑥) 𝑑𝑥. (4) the 56-dimensional eigenvalues are expanded to 1856 by
flattening through convolution. Finally, the 55-dimensional
A loss function can be seen as a symmetric loss function eigenvalues are expanded to 14080 by flattening through
when it satisfies: convolution. We fused and added the eigenvalues after three
times of flattening, and then the dimension after fusion was
𝑘 increased to 23242. This process completed the fusion of fea-
1
𝑅 𝑝𝑟 𝑒 ( 𝑓 ) = 𝐿( 𝑓 (𝑥), 𝑦 𝑥 ) = 𝐶, ∀𝑥 ∈ 𝑋, ∀ 𝑓 , (5) ture detection data, which greatly improved the performance
𝑀 𝑖=1 of feature detection. Through the fusion mechanism, the
where 𝐶 is a constant number. According to the law of correlation between the eigenvalues can be maximized, and
large numbers, if 𝑀 → ∞, 𝑅 𝑝𝑟 𝑒 ( 𝑓 ) → 𝑅𝑒𝑥 𝑝 ( 𝑓 ), the the end nodes can be detected and classified more accurately.
classification model is more stable. Therefore, when dealing We use the stochastic gradient descent algorithm (SGD) as the
with large scale dataset, we can minimize the expected risk optimizer, and the classification layer uses SLL to minimize
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on April 18,2023 at 22:56:03 UTC from IEEE Xplore. Restrictions apply.
the risk of datasets with noise labels and optimize the stability TABLE I
of the model. CNN NETWORK OF THIS PAPER
We also introduce the model analysis in this section,
Input Layer Type Output Shape
including analysis of classification performance and learn-
ing performance. For the training network, there is a fully input 1 Input Layer ( 𝑁 𝑜𝑛𝑒, 1, 58)
connected layer after feature fusion to find the most suitable conv1d 1 Conv1D ( 𝑁 𝑜𝑛𝑒, 32, 58)
feature value. It is followed by a softmax layer, and the dimen- conv1d 2 Conv1D ( 𝑁 𝑜𝑛𝑒, 32, 58)
sion of input vector is 𝑝𝑟𝑒 𝑓 [ 𝑓 (2 ∗ 1)], which represents the max pooling1d 1 MaxPooling1D ( 𝑁 𝑜𝑛𝑒, 32, 58)
prediction probability. The data value of this vector represents
conv1d 3 Conv1D ( 𝑁 𝑜𝑛𝑒, 128, 58)
the prediction probability belongs to each category (normal
conv1d 4 Conv1D ( 𝑁 𝑜𝑛𝑒, 128, 57)
or DDoS attack). The output value of posterior estimation is
represented as follows. conv1d 7 Conv1D ( 𝑁 𝑜𝑛𝑒, 256, 56)
conv1d 8 Conv1D ( 𝑁 𝑜𝑛𝑒, 256, 55)
𝑝𝑟𝑒 𝑓 𝑗 (𝑥) ( 𝑁 𝑜𝑛𝑒, 128, 58)
𝑝 𝑗 = 𝑘 , (9) conv1d 5 Input Layer
𝑖=1 𝑝𝑟𝑒 ( 𝑓𝑖 (𝑥)) max pooling1d 3 MaxPooling1D ( 𝑁 𝑜𝑛𝑒, 256, 55)
𝑘
where 𝑓 (𝑥) ∈ R, 𝑝 ∈ 𝑅 𝑘 , with 𝑝 𝑗 ≥ 0 and 𝑖=1 𝑝 𝑖 = 1. conv1d 6 Conv1D ( 𝑁 𝑜𝑛𝑒, 128, 57)
Therefore, through training the fully connected layer, the flatten 3 Flatten ( 𝑁 𝑜𝑛𝑒, 14080)
samples prediction lable is the index of the largest value in max pooling1d 2 MaxPooling1D ( 𝑁 𝑜𝑛𝑒, 128, 57)
the softmax output layer. For the learning rate of the proposed flatten 2 Flatten ( 𝑁 𝑜𝑛𝑒, 1856)
model, the stochastic descent gradient of the optimized loss
flatten 1 Flatten ( 𝑁 𝑜𝑛𝑒, 7296)
function SLL is shown as
concatenate 1 Concatenate ( 𝑁 𝑜𝑛𝑒, 23242)
1 𝜕𝑝
𝜕𝐿 ( 𝑝, 𝜃) − 𝜆+ 𝑝 𝑗 · 𝜕𝜃𝑗 , 𝑖 𝑓 𝑗 = 𝑖 softmax 1 Softmax ( 𝑁 𝑜𝑛𝑒, 2)
= 1 𝜕𝑝𝑗 (10)
𝜕𝑝𝑗 𝜆+ 𝑝 𝑗 · 𝜕𝜃 , 𝑖𝑓 𝑗 ≠𝑖 ,
where 𝜃 is a parameter of convolution in the learning process. KDD Test+ dataset, including attacks of apache2, mailbomb,
The update rate of 𝜃 will be slowed down when the stochastic processtable, udpstormand worms that were not appeared in
descent gradient decreases, and it will stop when the value the training data.
of gradient equals 0. However, when 𝜆 is a very small
value, the efficiency of learning will be unaffected and all B. Data Preprocessing
the components of 𝑝 𝑗 are involved in the gradient updating Each train data is marked as normal or DDoS with 41-
process. Experiments have verified that when 𝜆 = 0.001, the dimensional features, including 38-dimensional character fea-
network structure of the model will be optimally stable. tures and 3- dimensional non-character features. Aiming to
V. P ERFORMANCE E VALUATION guarantee the efficiency of the proposed method, we need to
preprocess the experimental data as follows.
We designed a performance experiment to verify the CNN
1) Date type conversion: We convert the date types
model proposed in this paper in IoT attack detection. The
through the way of one-hot encoding. Each date includes
experimental device is a server with two CPUs, among which
four non-character features: protocol type, service, flag, and
the type of CPU is the Intel Xeon Silver 4110 with a fre-
class. Specifically, the function of the protocol type has
quency of 2.10Ghz, and the type of GPU is NVIDIA GeForce
three properties: TCP, UDP, and ICMP. The corresponding
GTX 1080 Ti. The two-level classification model is trained
values of one-hot encoding for these properties are different
and tested by the data related to IoT attacks that extracted
binary vectors, which are [1, 0, 0], [0, 1, 0], and [0, 0,
from the NSL-KDD dataset. At the same time, comparative
1], respectively. Similarly, the service element contains 70
experiments are designed to compare the proposed intrusion
attributes, the element flag has 11 attributes, and the class
detection method with traditional ML based strategies.
has 2 properties. After date type converting, there are 122-
A. Dataset Description dimensional character features in every piece of date.
NSL-KDD dataset is a benchmark dataset for intrusion 2) Data normalization: In order to make the values of
detection, which is frequently used to evaluate the effectivity input data between 0 to 1, we provide the normalization
of various network intrusion detection models. Due to the processing on the dates as follows.
target of detecting related attacks in IoTs, we extracted
𝑦 − 𝑀𝑚𝑖𝑛
113,270 dates as training data from KDD Train+ dataset, 𝑦= , (11)
including normal traffic information and DDoS traffic infor- 𝑀𝑚𝑎𝑥 − 𝑀𝑚𝑖𝑛
mation related to IoT systems. The types of DDoS attack in where y is the value to be normalized 𝑀𝑚𝑖𝑛 is the smallest
training dataset include back, land, neptune, pod, smurf, and number in a dimension, and 𝑀𝑚𝑎𝑥 denotes the biggest number
teardrop. We extracted a total of 17171 data as test data from in a dimension.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on April 18,2023 at 22:56:03 UTC from IEEE Xplore. Restrictions apply.
3) PCA processing: We use the method of maximum vari- 2) Comparison of different loss functions: Under the op-
ance solution (also known as Principal Component Analysis) timizer of SGD, the relationship between the model per-
to train the dataset, in order to reduce the similar features and formance (𝐴𝐶 and 𝐹𝐴 ) and iterations based on SLL and
avoid datas over-fitting problem in the training process. We CCE is shown in Fig. 4 and Fig.5, respectively. The 𝐴𝐶 and
set the similarity value between extracted features in PCA 𝐹𝐴 of the model under SLL loss function are respectively
to 0.99. Through this way, we can not only maintain the 92.99% and 0.7%. Under the loss function CCE, the 𝐴𝐶
regularity between those original features, but also short the and 𝐹𝐴 are respectively 89.65% and 1.3%. Its obvious that
training time of the designed model. In our work, the date can the performances of the model based on SLL loss function
be reduced from 122-dimensional features to 58-dimensional are better than those based on CCE loss function. It can be
features by PCA process. concluded that the loss value of the model applying SLL is
lower than applying CCE as shown in Fig.6. Furthermore,
C. Evaluation Indicators the model under SSL is more convergent and stable. Our
In general, the performance of intrusion detection algo- experiments process 20 times iterates, and the loss value
rithms is measured with accuracy (𝐴𝐶 ), detection rate (𝐷 𝑅 ), gradually tends to be stable after 16 times.
and false alarm rate (𝐹𝐴). A good detection algorithm should
have high 𝐴𝐶 , high 𝐷 𝑅 , and low 𝐹𝐴. Their calculation
methods are shown as
𝑇𝑃 + 𝑇 𝑁
𝐴𝐶 = , (12)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
𝑇𝑃
𝐷𝑅 = , (13)
𝑇𝑃 + 𝐹 𝑁
𝐹𝑃
𝐹𝐴 = , (14)
𝑇𝑁 + 𝐹𝑃
where 𝑇𝑃 represents the intrusion data that is correctly clas-
sified as abnormal; 𝑇𝑁 denotes normal data that is correctly
classified as normal; 𝐹𝑃 represents normal data that is in-
correctly classified as an intrusion; By contrast, 𝐹𝑁 means
abnormal data that is classified as normal incorrectly. Fig. 4. 𝐴𝐶 comparison of different loss functions
D. Experiment Results
1) Comparison under different optimizers: The use of 3) Comparison of different detection algorithms: Under
optimizer function in CNN model has significant impact on the same experiment condition, we conduct experiments for
the experimental results. Therefore, we compare the detection IoT intrusion detection separately by using classical CNN,
accuracy under different optimizers in the CNN model. The SVM, DT, Bayes, KNN, and RNN algorithms. TABLE. II
accuracy 𝐴𝐶 is shown in Fig. 3. Comparative results demon- presents the different performance of these methods. The
strate that the developed scheme has better performance with proposed model in this paper is obviously better than the
SGD optimizer. classical CNN model and the traditional machine learning
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on April 18,2023 at 22:56:03 UTC from IEEE Xplore. Restrictions apply.
ACKNOWLEDGMENT
R EFERENCES
[1] T. Qiu, N. Chen, K. Li, M. Atiquzzaman, and W. Zhao, “How can
heterogeneous internet of things build our future: A survey,” IEEE
Fig. 6. Loss comparison of different loss functions Communications Surveys & Tutorials, vol. 20, no. 3, pp. 2011–2027,
2018.
TABLE II [2] S. Yu, G. Wang, X. Liu, and J. Niu, “Security and privacy in the
PERFORMANCE COMPARISON age of the smart internet of things: an overview from a networking
perspective,” IEEE Communications Magazine, vol. 56, no. 9, pp. 14–
18, 2018.
Performance 𝐴𝐶 /% 𝐷 𝑅 /% 𝐹𝐴 /% 𝑅𝑢𝑛 𝑡𝑖𝑚𝑒/𝑠 [3] S. Yu, G. Wang, and W. Zhou, “Modeling malicious activities in cyber
Proposed 92.99 84.62 0.70 15.79 space,” IEEE network, vol. 29, no. 6, pp. 83–87, 2015.
[4] N. Chaabouni, M. Mosbah, A. Zemmari, C. Sauvignac, and P. Faruki,
Classical CNN 89.23 82.93 1.5 14.53 “Network intrusion detection for iot security based on learning tech-
SVM 86.42 77.57 3.79 110.13 niques,” IEEE Communications Surveys & Tutorials, 2019.
[5] K. Yang, J. Ren, Y. Zhu, and W. Zhang, “Active learning for wireless
KNN 87.37 81.30 1.73 95.25 iot intrusion detection,” IEEE Wireless Communications, vol. 25, no. 6,
DT 82.55 73.10 4.50 36.15 pp. 19–25, 2018.
[6] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning ap-
Bayes 85.74 75.69 4.21 30.27 proach to network intrusion detection,” IEEE Transactions on Emerging
Topics in Computational Intelligence, vol. 2, no. 1, pp. 41–50, 2018.
RNN 90.2 82.40 0.90 17.32
[7] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
learning for iot big data and streaming analytics: A survey,” IEEE
Communications Surveys & Tutorials, vol. 20, no. 4, pp. 2923–2960,
2018.
based methods. Notably, in comparison with the RNN based [8] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and
method, the accuracy of the proposed model is definitely defenses for deep learning,” IEEE transactions on neural networks and
increased by 3%. In addition, the false alarm rate also has learning systems, 2019.
[9] S. Mishra, C. Mahanty, S. Dash, and B. K. Mishra, “Implementation
an absolutely decrease of 3.8%. Overall, the new developed of bfs-nb hybrid model in intrusion detection system,” in Recent
method achieves high 𝐷 𝑅 and low 𝐹𝐴 without consuming a Developments in Machine Learning and Data Analytics. Springer,
lot of running time. 2019, pp. 167–175.
[10] D. Papamartzivanos, F. G. Mármol, and G. Kambourakis, “Dendron:
VI. S UMMARY AND F UTURE W ORK Genetic trees driven rule induction for network intrusion detection
systems,” Future Generation Computer Systems, vol. 79, pp. 558–574,
In this paper, we discuss the challenges faced by current 2018.
IoT intrusion detection methods. We propose a novel CNN [11] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an intrusion
detection system using a filter-based feature selection algorithm,” IEEE
model by developing a multilayer convolution feature fusion transactions on computers, vol. 65, no. 10, pp. 2986–2998, 2016.
mechanism and a loss function based on categorical cross [12] R. Doshi, N. Apthorpe, and N. Feamster, “Machine learning ddos
entropy. Compared with existing deep learning based methods detection for consumer internet of things devices,” in 2018 IEEE
that mainly focused on traditional network intrusion issues, Security and Privacy Workshops (SPW). IEEE, 2018, pp. 29–35.
[13] S. S. Roy, A. Mallik, R. Gulati, M. S. Obaidat, and P. V. Krishna, “A
we specifically handle with the DDoS attacks in the IoT deep learning based artificial neural network approach for intrusion de-
scenario. The proposed model is implemented in TensorFlow tection,” in International Conference on Mathematics and Computing.
and extensive experiments have been conducted on the NSL- Springer, 2017, pp. 44–53.
[14] C. G. Cordero, S. Hauke, M. Mühlhäuser, and M. Fischer, “Analyzing
KDD dataset. The results demonstrate that the proposed flow-based anomaly intrusion detection using replicator neural net-
approach can offer better accuracy with low false alarm. works,” in 2016 14th Annual Conference on Privacy, Security and Trust
From the perspective of future work, we plan to further (PST). IEEE, 2016, pp. 317–324.
[15] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach
to optimize the network structure to achieve better detection for network intrusion detection system,” in Proceedings of the 9th
results. In addition, we will try to extend the proposed model EAI International Conference on Bio-inspired Information and Com-
for multivariate classification detection tasks. Moreover, We munications Technologies (formerly BIONETICS). ICST (Institute for
Computer Sciences, Social-Informatics and ), 2016, pp. 21–26.
will investigate systems for detecting other typical attacks of [16] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep
the IoT applications. learning and its applications to machine health monitoring,” Mechanical
Systems and Signal Processing, vol. 115, pp. 213–237, 2019.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on April 18,2023 at 22:56:03 UTC from IEEE Xplore. Restrictions apply.