Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
53 views

Network Anomaly Detection Using LSTMBased Autoencoder

This paper proposes a hybrid model using LSTM autoencoder and One-Class SVM (OC-SVM) for network anomaly detection in software-defined networking (SDN) environments. The LSTM autoencoder is trained on only normal data to learn normal traffic patterns and extract latent features, which are then fed to the OC-SVM to build a model to detect anomalies. The model was tested on the recent InSDN intrusion detection dataset for SDN networks. Experimental results showed the proposed model achieved higher detection rates and reduced processing time compared to using OC-SVM alone.

Uploaded by

sofia oct
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Network Anomaly Detection Using LSTMBased Autoencoder

This paper proposes a hybrid model using LSTM autoencoder and One-Class SVM (OC-SVM) for network anomaly detection in software-defined networking (SDN) environments. The LSTM autoencoder is trained on only normal data to learn normal traffic patterns and extract latent features, which are then fed to the OC-SVM to build a model to detect anomalies. The model was tested on the recent InSDN intrusion detection dataset for SDN networks. Experimental results showed the proposed model achieved higher detection rates and reduced processing time compared to using OC-SVM alone.

Uploaded by

sofia oct
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/346006810

Network Anomaly Detection Using LSTM Based Autoencoder

Conference Paper · November 2020

CITATIONS READS
74 1,849

4 authors, including:

Mahmoud Said Elsayed Nhien-An Le-Khac


University College Dublin University College Dublin
22 PUBLICATIONS 647 CITATIONS 293 PUBLICATIONS 3,665 CITATIONS

SEE PROFILE SEE PROFILE

Anca D Jurcut
University College Dublin
110 PUBLICATIONS 1,723 CITATIONS

SEE PROFILE

All content following this page was uploaded by Mahmoud Said Elsayed on 18 November 2020.

The user has requested enhancement of the downloaded file.


Network Anomaly Detection Using LSTM Based Autoencoder
Mahmoud Said Elsayed Nhien-An Le-Khac
University College Dublin, Ireland University College Dublin, Ireland
mahmoud.abdallah@ucdconnect.ie an.lekhac@ucd.ie

Soumyabrata Dev Anca Delia Jurcut


University College Dublin, Ireland University College Dublin, Ireland
soumyabrata.dev@ucd.ie anca.jurcut@ucd.ie

ABSTRACT using sophisticated tools to create attacks exploiting vulnerabilities


Anomaly detection aims to discover patterns in data that do not con- in the server protocols.
form to the expected normal behaviour. One of the significant issues For these reasons, IDSs are essential tools to guarantee the avail-
for anomaly detection techniques is the availability of labeled data ability, confidentiality, and integrity of the data. In general, IDSs
for training/validation of models. In this paper, we proposed a hyper are of two types: signature-based and anomaly-based detection
approach based on Long Short Term Memory (LSTM) autoencoder systems. In signature-based techniques, malicious traffic can be
and One-class Support Vector Machine (OC-SVM) to detect anom- detected based on the predefined rules. Although these techniques
alies based attacks in an unbalanced dataset, by training the models are widely used in commercial products due to their high detection
using only examples of normal classes. The LSTM-autoencoder is rate and low false alarms, they cannot detect unknown or novel
trained to learn the normal traffic pattern and to learn the com- attacks. The attacker techniques are evolved every day, and adapted
pressed representation of the input data (i.e. latent features) and to make the anomalous activities similar to normal activities. There-
then feed it to an OC-SVM approach. The hybrid model overcomes fore, any change in the attack signature, even if it is very small,
the shortcomings of the separate OC-SVM, in which its low ca- can help the attacker to bypass the defined rules easily. In addition,
pability to operate with massive and high-dimensional datasets. with the era of IoT and big data, an extensive number of rules are
Additionally, we perform our experiments using the most recent required to cover the daily attacks that can occur in the network
dataset (InSDN) of Intrusion Detection Systems (IDSs) for SDN system, making the database that stores the defined rules relativity
environments. The experimental results show that the proposed large. Thus, the continuous updating of the database leads to a slow-
model provides higher detection rate and reduces the processing down in the system performance. Anomaly-based IDSs gained the
time significantly. Hence, our method provides great confidence in attention of the research community, due to their ability to discover
securing SDN networks from malicious traffic. novel attacks that are not used in the training models. Machine
learning techniques are used in anomaly detection to build a model
KEYWORDS that can differentiate between anomalies events from the rest of
the data. The anomaly detection aims to find the pattern in the
Security countermeasures, Anomaly detection detection, Deep Learn-
data that deviates from other observation [4]. Hence, it is applied
ing, LSTM, SDN, InSDN, Malicious traffic, Autoencoder
in several applications, including fraud detection [32], medical ap-
ACM Reference Format: plications [1], video surveillance [20], data leakage prevention [5],
Mahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev, and Anca and intrusion detection [17]. However, the notion of anomaly dif-
Delia Jurcut. 2020. Network Anomaly Detection Using LSTM Based Au-
fers across the various applications and contexts. For example, an
toencoder. In The 16th ACM Symposium on QoS and Security for Wireless
anomaly can be equipment failures in the industrial domain, credit
and Mobile Networks (Q2SWINet ’20), November 16–20, 2020, Alicante, Spain.
ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3416013.3426457 card fraud in the fraud detection domain, suspicious movements in
the video surveillance, etc. However, in the cybersecurity domain,
1 INTRODUCTION the IT administrators narrow down the meaning of anomaly, and
they consider any events that deviate from normal as anomalies, i.e.,
Intrusion is the main cause of a security breach, where a malicious the anomaly is an indicator of malicious activities in the network
user can damage or steal vital information of the network system traffic.
in a short time. Moreover, it can cause further financial losses and There are three different techniques to train the anomaly de-
huge damages in IT critical infrastructure. For example, $350M and tection models, including supervised, unsupervised, and the semi-
$70M are the sizes of the loss caused by Yahoo and respectively, supervised manner. The concept of supervised learning is to train
Bitcoin data breach [18]. The intruder techniques have been evolved the detection model using labeled data for normal and for anomalies
Permission to make digital or hard copies of part or all of this work for personal or events. An example of supervised detection algorithms includes
classroom use is granted without fee provided that copies are not made or distributed Support Vector Machine (SVM), Bayesian networks, artificial neural
for profit or commercial advantage and that copies bear this notice and the full citation networks (ANN). Although, supervised learning techniques per-
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s). form better compared to signature-based techniques, these methods
Q2SWINet ’20, November 16–20, 2020, Alicante, Spain fail to detect the zero-day attacks that can occur in the network
© 2020 Copyright held by the owner/author(s). daily. In addition, such techniques always need balanced and labeled
ACM ISBN 978-1-4503-8120-8/20/11.
https://doi.org/10.1145/3416013.3426457 data for training. However, the availability of labeled data is usually
a major issue and usually is not available for researchers. Besides, 2 ABNORMAL EVENT DETECTION
the labeling process can be a time-consuming task, error-prone, In this section, we discuss the various machine learning and deep
and tedious. Unsupervised methods do not require any labeled data learning models that have been proposed for attack detection in
for training. The goal of unsupervised methods is to organize the network traffic. With the advent of deep learning based models,
data into separate clusters, by grouping them based on similarities the accuracy of attack detection has further improved. The deep
within each data cluster. One of the top limitations of using unsu- learning based methods are useful, because the discriminative fea-
pervised learning is that the false alarms are relativity high since no tures are automatically generated, without the use of generating
answer labels are available. Besides, it always needs the expertise hand-crafted features.
of the user to interpret and label the classes which follow that clas-
sification. Examples of unsupervised approaches include Hidden
Markov Models, k-Means clustering, and Hierarchical clustering. In 2.1 Related Work
semi-supervised learning, only the normal data is used for training. Most of the current works based on anomaly detection methods
Then, the trained model is applied in testing data, which includes applied ANN for classification tasks. The labeled data is used dur-
both normal and anomalies events. In practice, it is easy to obtain a ing the training stage, and then the learned model is applied on
normal data class than to find anomalies data classes, where obtain- testing data to classify it into one of the classes. In [28], the authors
ing the anomalies data classes is costly in most application domains. presented a flow-based detection approach using Self Organizing
This is the approach used by our proposed approach. In this paper, Maps (SOMs) classifier in the network environment. The evaluated
we consider a point anomaly detection to decide whether if the accuracy of the proposed model in the testing phase is 74.67%. For
individual instance is anomaly compared to the remaining data. overall evaluation, the precision, recall, and f-measure obtained
from the conducted model are 83%, 76%, and 75%, respectively.
Latah et al. [19] proposed a five-stage hybrid classifier system
1.1 Contribution to enhance the detection rate against malicious traffics inside the
The main contributions of this paper are as follows – (a) We pro- network. The model combines three different machine learning
posed a deep learning based on LSTM-autoencoder model for anom- classifiers, including the K-Nearest Neighbor approach (KNN), Ex-
aly detection. The idea is to train the deep learning model using treme Learning Machine (ELM), and Hierarchical Extreme Learning
normal data only. In this case, the model is capable of replicat- Machine (H-ELM). The overall accuracy of the presented approach
ing the input data at the output layer with a low reconstruction is 84.29%, while the percentage of precision, recall, and F1-score is
error. In the case of anomalies, the trained model fails to recon- 94.18, 77.18, and 84.83, respectively.
struct anomalous instances, given a high error rate. The error is Prasath et al. [22] proposed a Novel Agent Program (NAP) frame-
used as an indicator to differentiate between normal and anom- work to secure a communication model of the virtual switches in
alies instances. (b) We combined the OC-SVM algorithms with the network. The meta-heuristic Bayesian network classification
the LSTM-autoencoder to enhance the performance of the LSTM- (MHBNC) approach is used to classify the incoming packets into
autoencoder model. The lower dimension of the input data (i.e. normal or attack traffic. The proposed MHBNC model has achieved
extracted from the LSTM-autoencoder model) are trained with the an overall accuracy of 82.99%. Besides, the realized precision, recall,
OC-SVM algorithm to achieve better classification results, whilst and f-score is 77%, 74%, and 75%, respectively.
significantly reducing the training time. (c) We used the recently One of the main advantages of the aforementioned techniques
generated dataset InSDN [9] to ensure an accurate evaluation of the is the ability to handle high-dimensional data sets with high per-
proposed approach, since InSDN dataset is representative of the at- formance. However, these approaches mainly rely on labeled and
tacks specific to SDN environments. The dataset used for evaluation balanced data, and this is an issue. The majority of real data are
is critical since the performance of IDSs relies on the quality of the unbalanced, where the anomalies data are often challenging and
training datasets. This a significant contribution, since the current less frequently to obtain compared to normal data.
IDSs are based on fundamentals that are not representative of SDN On the other side, the autoencoder, which considers a specific
environments or suffer from several shortcomings related to the kind of feed-forward neural networks, gained the research commu-
intrusion dataset generated for the classification process.(d) The nity’s attention. The autoencoder is mainly applied in outlier based
computational overhead of the proposed DL model is also evaluated anomaly detection rather than classification problems. One of the
to verify the model performance for real-time intrusion detection. first studies that involved autoencoder for outlier detection was
Based on the experiment results, the proposed model is able to proposed by Hawskin et al. [12] and it is used widely in the research
identify anomalies with a high detection rate and solve the problem community. In recent years, the number of studies using autoen-
of unbalanced and unlabeled datasets. coder as a complementary algorithm for feature reduction tasks has
The structure of the remaining part of the paper is as follows: increased. The autoencoder achieves great success in generating ab-
Section 2 provides a systematic review of the various learning stract features of high dimensional data. It can significantly increase
models for detecting attacks in network traffic. We propose our the anomaly detection accuracy in comparison to linear and kernel
deep learning model for encoding the input feature space and the PCA [24]. It can detect subtle anomalies that the linear PCA fails to
binary classification framework in Section 3. Section 4 presents detect. Furthermore, the autoencoder is easy to train and does not
our benchmarking results in the publicly available InSDN dataset. require complex computation like kernel PCA. A comprehensive
Finally, Section 5 concludes the paper and discusses our future study of using the autoencoder in anomaly detection approaches is
work. discussed in [3].
In our previous work [8], we used the reconstruction error as a
threshold to detect anomalies on the NSL-KDD dataset. Although
the obtained results are significantly high, it seems that the experi-
ment results are more specific for the NSL-KDD dataset. However,
the traces of the NSL-KDD dataset were generated two decades ago
and the distribution of normal and malicious traffic significantly
deviates from each other. So, the simple threshold is an excellent
choice to separate between different boundaries. In contrast, in
modern traffic, the attacker can use very sophisticated tools to gen-
erate new attack classes that are extensively similar to legitimate
traffic. Therefore, some anomaly error rates are quite close to the
standard legitimate error rate, and it is not easy for anomaly de-
tection systems to detect them with a high performance rate [27].
Thus, the simple threshold is insufficient, especially for high dimen-
sion data, as the reconstruction errors are not linearly separable. In
addition, the high similarity in some normal and abnormal traffic
creates data samples with a small distance between them, which
sophisticates the use of the simple threshold in attack detection
systems.

Figure 1: We demonstrate the Andrews curve for the InSDN


2.2 Limitations in the Traditional Machine dataset. We plot the legitimate and malicious observations
Techniques in green and red curves respectively.
The machine learning and shallow learning techniques, such as
SVM, NB and RF have been widely deployed in intrusion detection
systems to recognize attack threats [10, 16, 26]. These approaches a binary classification problem, wherein we classify any network
attempt to learn the feature representation in network traffic data data into normal and malicious type.
for an effective classification. However, it is not easy to manually
handcraft and extract the discriminatory features in intrusion detec- 3 PROPOSED MODEL
tion systems. Firstly, the nature of the attacks evolve everyday, and This section presents the framework elements and the system ar-
the attacker’s techniques change with time. Secondly, the features chitecture of our proposed IDS model.
which are extracted for one category of attack, may not necessar-
ily be suitable for other attack classes. As a result, selecting the 3.1 Autoencoder
significant features to identify the attack from network traffic is a An autoencoder is an artificial neural network that applies back-
cumbersome task. Therefore, existing attack detection techniques propagation, to produce the output vector similar to the inputs. It
fail to discover all types of attacks. Furthermore, there is a high compresses the input data into a lower-dimensional space, then re-
degree of non-linearity in the dataset, and therefore the traditional constructs the original data again from this representation. It uses a
machine learning based methods fail to classify the normal and non-linear activation function and multiple layers to learn the non-
malicious data types [7]. Elsayed et al. further established this fact linear relation in the data. A simple illustration of the autoencoder
in [7] by generating the Andrews curve for the NSL-KDD dataset. model architecture is shown in Fig. 2.
The Andrews curve represents a high-dimensional feature space The Autoencoder is considered an unsupervised learning tech-
in the form of a finite Fourier series. This provides a visual under- nique since it does not require a separate label value to train. In
standing of the internal structure of the dataset. Figure 1 shows the practice, the autoencoder is composed of two phases: the encoder
Andrews curve for the NSL-KDD dataset. Each curve in Fig. 1 repre- and the decoder parts. The main objective of the encoder phase
sents an observation in the dataset. We observe that the two labels is to reduce the dimensions of the input data X according to the
are not clearly grouped in two separate streams. The legitimate equation 1.
and malicious data curves are tangled with each other, indicating a
high-degree of inherent non-linearity in the feature space. There-
𝑍 = 𝜎(𝑊 𝑋 + 𝑏) (1)
fore, traditional machine learning techniques fail to capture the
non-linearity in such datasets. Here, Z is the latent dimension, 𝜎 is the activation functions,
Hence, in this work we propose a deep learning technique using W is the weight, and b is the bias vector. In the same manner, the
LSTM-autoencoder to model the normal traffic data. This assists in decoding phase is trained according to the equation 2 in order
proposing a robust framework for detecting attacks in SDN network to obtain the output data similar to the original space, but with
traffic. Unlike other machine learning approaches, our proposed different bias, weight and possibly activation functions.
technique can automatically learn the discriminatory features from
the network traffic data. However, we formulate our problem as 𝑋 ′ = 𝜎 ′ (𝑊 ′𝑍 + 𝑏 ′ ) (2)
The main goal of the autoencoder is to make the output vector In this work, we use the RNN based on the nature of the input
similar to the original space, by minimizing the reconstruction error data, where the temporal correlations of network traffic often gen-
between them. The reconstruction error can be obtained using a erate time-series data [29]. For this reason, we used the RNN-based
cross-entropy function or sum of squared errors (SSE). In this paper, approach to solve the problem of simple feed-forward neural net-
we used SSE to calculate the reconstruction error according to the works, since RNN considers the previous output and the current
equation 3 input at each stage. In addition, RNN has been applied efficiently
𝑛 2 in the anomaly detection for traditional networks [2][21]. Training
X𝑖′ − X𝑖
X
𝑆𝑆𝐸 = (3) the model with such methods can minimize the loss and further, it
𝑖=1
can provide high performance.
The decoder part regenerates the initial data based on the en- The main issue in RNN is the vanishing gradient problem. The
coder output. To achieve the dimensional reduction and generate gradient is used to update the weight values of the learned model.
the compressed feature vector of input data, the code layer [13] is However, in case the gradient is very small, the model can not
used at the center of the autoencoder structure. The code layer can learned efficiently. Thus, layers that get a small gradient update
be utilized for classification activities or combined with another in RNN will stop learning, and usually, this issue happens in ear-
stacked autoencoder [30]. lier layers. The LSTM algorithm [11] was explicitly proposed as a
solution to avoid the vanishing gradient problem. The LSTM uses
the mechanism of gates to regulate the flow of information. LSTM
composites of three control gates: forget, output, and input gates.
The forget gate keeps a fraction of previous state information, while
output gate is responsible for choosing how much of an informa-
tion we output and the input gate is responsible for getting new
information.

3.3 One-Class SVM


OC-SVM [25] approach is a special case of support vector and
widely used to discover anomalies in an unsupervised fashion. It
is trained only on the ‘normal’ data to learn the boundaries of
these points. Then, it is able to classify any points that lie out-
side the boundary i.e. outliers. The main difference between the
standard SVM and one-class SVM is that the OC-SVM provides a
hyperparameter “nu”, which is used to control the sensitivity of the
Figure 2: An example of a single autoencoder. support vectors, instead of the normal hyperparameters like C in
the standard SVM, which is used for tuning the margin.
Here, let the training samples (𝑥 1, 𝑥 2, 𝑥 3, ...., 𝑥𝑙 ), belonging to one
known class X (i.e. “normal driving”). Let 𝜙 is a kernel map function
3.2 RNN to LSTM that transform the training samples into another space. We need to
Recurrent Neural Network (RNN) is a class of artificial neural net- solve the following objective function of one-class SVM to separate
work with backward connections, where the output from a network the data set from the origin [23]:
layer is returned to either that layer or to a previous network layer.
RNN can address the problem of traditional feed-forward neural 1 1 X 𝑙
networks [31]. As a result, it can create much powerful models 𝑚𝑖𝑛 ∥𝑤 ∥ 2 + 𝜀 − 𝜌, (6)
2 𝜈𝑙 𝑖=1 𝑙
with high classification accuracy. RNN is widely applied in different
domain applications such as language processing and speech recog-
nition. Unlike the feed-forward neural networks, the cyclic connec- 𝑠𝑢𝑏 𝑗𝑒𝑐𝑡 𝑡𝑜 : 𝑤𝜙(𝑥𝑖 ) ≥ 𝜌 − 𝜀𝑖 , 𝑖 = 1, 2, 3, 4, ...., 𝑙, 𝜀𝑖 ≥ 0 (7)
tions of the RNN can be effectively used for modeling sequences [14].
In RNN, for the given input vector sequence 𝑋 = (𝑥 1, 𝑥 2, 𝑥 3, ...., 𝑥𝑡 ), where w is a decision hyperplane, 𝜌 is the bias term and 𝜀𝑖 is a
we can compute the hidden vector 𝑍 = (𝑧 1, 𝑧 2, 𝑧 3, ...., 𝑧𝑡 ) and output Nonzero slack variables. The meta-parameter 𝜈 𝜖 (0, 1) is used to
vector sequence 𝐹 = (𝑓1, 𝑓2, 𝑓3, ...., 𝑓𝑡 ) at time t using Eq.4 and Eq.5, control the number of samples contained in the hyper sphere. The
respectively. decision function corresponding to w and 𝜌 is:

𝑧𝑡 = 𝜎(𝑊𝑥𝑧 𝑋𝑡 + 𝑊𝑧𝑧 𝑧𝑡 −1 + 𝑏ℎ ), (4)


𝑓 (𝑥) = 𝑤𝜙(𝑥) − 𝜌 (8)

The main objective is to find a hyper sphere, which contains


𝑓𝑡 = 𝑊𝑧 𝑓 ℎ𝑡 + 𝑏 𝑓 , (5)
most of the training samples obtained consequently from the tar-
Here, 𝜎 is the activation function, 𝑊 is the weight, 𝑏 is the bias get region. After training, the decision boundary may allow us to
and 𝑧𝑡 −1 is the state at time 𝑡 − 1. choose the most appropriate candidate region.
3.4 InSDN Dataset entire dataset, where the size of training and testing records
The performance of the IDSs techniques relies on the quality of the are not very large. Thus, the time is taken for the model
training datasets. One of the main challenges in the deployment training can stay reasonable.
of the detection mechanisms is the lack of available up-to date • The features have different ranges so they need to be stan-
real-world datasets. The main reason for the lack of public datasets dardized.
for the intrusion detection domain returns to privacy and legal • We use one-hot encoding to convert the labeled string to
issues. In this work, we are using the InSDN dataset to evaluate our numerical values. In this model, we consider only binary
proposed deep learning model. The InSDN dataset was generated to classification to identify the malicious and normal traffic
overcome the shortcomings of the existing datasets in the context from input data. Therefore, we are encoding the normal
of SDN network [9]. string to a binary value of 0 and respectively, all malicious
The InSDN dataset contains various attack scenarios and attack traffic of 1.
classes such as DoS, DDoS, Web attacks, Password-Guessing, Bot-
net, Exploitation, and Probe attacks. Besides, the normal traffic 3.6 Modeling the normal traffic data
in InSDN includes various popular application services such as This section introduces the proposed architecture model for detect-
HTTPS, HTTP, DNS, Email, FTP, SSH. The source of attacks in the ing network attacks. We know that deep learning can assist us in
dataset comes from internal and external network to mimic the real representing large-scale network traffic with a more discriminatory
attack scenarios. It contains more than 80 statistical features in CSV feature space. Such technique uses multiple processing layers to
file format such as Protocol, Duration, Number of bytes, Number model the input feature properly. This is advantageous as com-
of packets, etc. The total number of dataset instances are 343,939 pared to traditional hand-crafted feature descriptors, because deep
for normal and attack traffic, where the normal data brings a total learning techniques can automatically extract the discriminatory
of 68,424, and attack traffic contains 275,515 instances. features, as compared to manually generating the features. Our
Using InSDN dataset for the evaluation of our proposed model proposed approach can estimate a good representation of the input
it provides more accurate results, since the nature of attacks in the feature space. Figure 3 describes our proposed architecture to model
SDN is different from those commonly affecting the conventional the normal network data.
networks. When the OpenFlow switch receives any unknown flow We use the RNN and autoencoder architectures based on the
packets, it will send these flows to the SDN controller in the form of nature of the input data, where the temporal correlations of net-
packet-In message for further processing. Since the normal and work traffic often generate time-series data [29]. For this reason,
malicious traffic is forwarded to the SDN controller for decision we used the RNN-based approach to solve the problem of simple
making, the attack traffic mimics the same normal behavior. More- feed-forward neural networks since RNN considers the previous
over, the centralized view of the SDN network and separation of output and the current input at each stage. In addition, RNN has
the data plane from the control plane creates a new opportunity been applied efficiently in the anomaly detection for traditional
for the attacker to carry out various types of attacks compared networks [2, 6, 21]. Training the model with such methods can
to the conventional network. These attacks are not easy to detect minimize the loss and further, it can provide high performance. Ad-
as the intruder is connected to the victim server in an authorized ditionally, the autoencoder has the advantage in number of classifi-
manner. As a result, using such dataset for the model evaluation cation problem. The reason that we decided to use the autoencoder
can be a good indicator to reflect the real-world scenario. In addi- in our proposed model for anomaly detection is the fact that the
tion, the InSDN dataset does not contain any redundant records, autoencoder is trying to learn the best parameters to reconstruct
which prevents the learner model to bias towards the most frequent the input at the output layer. Moreover, we adapted the LSTM al-
records. gorithm for our model to solve the issues of the standard RNN
technique, such as vanishing and exploding gradient problems [15].
3.5 Dataset Preparation
In this paper, we focus our attention on a binary classification prob-
lem, and do not delve further into classifying the various types of
attacks. The observations belonging to any attack class are catego-
rized as anomalous traffic data. The first phase before training the
IDS model is preparing the dataset for proper use. Few steps are
taken for pre-processing the entering flows, as follows:
• The generated dataset contains the socket information such
as Source IP, Destination IP, flow ID, etc. We remove all
socket features to avoid the overfitting problem, where such
data can be changed from network to network. The final
dataset includes 77 various features, besides the traffic cate- Figure 3: Proposed model to encode the input features. We
gory. During the training phase, 57956 of normal samples are use blocks of encoder and decoder comprising LSTM layers.
used for the model training, while 27697 are used for the test
purposes. The testing data has samples for both normal and Additionally, our model uses LSTM with autoencoder to learn
attack traffic. We randomly selected some samples from the the representations of the network dataset in a simi-supervised
fashion, as depicted in Fig. 4. It contains multiple layers of encoder The LSTM-autoencoder network is used to model the normal
and decoder stages and each stage consists of multiple LSTM units. traffic data using the discriminatory feature X
c𝑡 . The reconstruction
The input data X𝑡 is encoded via the encoder block to generate a error for normal traffic data will be less as compared to that of
fixed range feature vector Z𝑡 . The input data X𝑡 ∈ IR77×1 is the anomalous traffic data. This behavior will greatly help in detecting
initial encoded feature vector generated from the dataset. We set the anomalous traffic since its corresponding error value will be
timestamp = 1 for our LSTM blocks. We used the LSTM blocks for considerably higher.
individual events, and not for time series. The encoder block se-
quentially reduces the dimension of the 77 dimension initial feature 4 EVALUATION AND RESULTS
vector. The dimensions are reduced to 128, 64, 32, and 16, after the This section details the evaluation process of our technique and
first, second, third and fourth layers of the encoder respectively. The shows that our method provides great confidence in securing the
final encoded feature vector Z𝑡 ∈ IR16×1 represents the compressed networks from malicious traffic.
input data. The low-dimensional representation of the input data
Z𝑡 is trained with OC-SVM for anomaly classification. The model 4.1 Loss trend of our proposed model
is trained using normal traffic only, so the malicious traffic will be
We use the generated feature X𝑡 to train our model, such that
considered as outliers.
the reconstruction loss is minimum. We use a learning rate of
0.0001, batch size of 32, tanh activation function, and train the
model using Adam optimizer. We train our model for 100 epochs.
Table 1 summaries the choice of the different hyper-parameters.
Figure 5 describes the trend of training and validation loss over
the number of epochs. We observe that the loss trend is similar
for training and validation sets, and converges after a few tens of
epochs.

Parameters Best Values


Hidden layers 4
Hidden layer size (neurons) 128, 64, 32, and 16
Optimizer Adam
Loss function MSE
Activation function Tanh
Learning rate 0.0001
Number of epochs 100
Batch size 32
Table 1: We mention the values of the several hyper-
parameters. We conducted different experiments to get the
best values of hyper-parameters for model initiation. Dur-
ing these experiments, we change the value of learning rate,
hidden layers size, epochs, and batch size that provide a high
accuracy rate.

Figure 4: The diagram flow of the proposed method using -


InSDN data set.
4.2 Performance Metrics
The encoded data is then fed into the decoder block for gen- We use precision, recall, f-Score, and accuracy to evaluate our model
erating the output feature vector. We represent the input feature performance. The mathematical representation of these metrics are
vector of the decoder block as Z c𝑡 . The layers in the decoder block calculated as follows:
are arranged in the reverse order as that of the encoder layers. The
𝑇𝑃
encoded features Z c𝑡 are then fed via a series of LSTM blocks to gen- Precision = (9)
erate the output feature vector X c𝑡 . The dimensions are increased to 𝑇𝑃 + 𝐹𝑃
16, 32, 64, 128, after the first, second, third and fourth layers of the 𝑇𝑃
decoder respectively. Finally, the final layer of the decoder block is Recall = (10)
𝑇𝑃 + 𝐹𝑁
fed to a fully connected layer to generate the output feature vector
X
c𝑡 . We attempt to reconstruct this output feature vector X c𝑡 to be 2 × Precision × Recall
F-score = (11)
as close as the input feature vector X𝑡 . We use the mean square Precision + Recall
error (MSE) to calculate the estimation error between input data 𝑇𝑃 +𝑇𝑁
X𝑡 and output representation X c𝑡 . Accuracy = (12)
𝑇𝑃 + 𝑇 𝑁 + 𝐹𝑃 + 𝐹𝑁
Threshold Precision Recall F1-measure Accuracy
0.05 0.652 0.986 0.785 0.664
0.06 0.6919 0.986 0.813 0.718
0.07 0.7111 0.983 0.825 0.741
0.08 0.690 0.847 0.7611 0.669
0.09 0.667 0.760 0.710 0.615
0.1 0.587 0.539 0.562 0.477
0.2 0.584 0.438 0.500 0.456
0.3 0.202 0.065 0.099 0.257
Table 2: Evaluation Metrics with different threshold values.

gap, we integrate the One-Class SVM algorithm with the LSTM-


Autoencoder to better characterize the network traffic; hence, the
detection rate can significantly improve.

4.4 LSTM-Autoencoder-OC-SVM for Attack


Figure 5: Trend of training and validation loss over the num-
ber of epochs. Detection
As mentioned in the previous section, the simple threshold does not
perform well in our experiments. The complexity of attacks in the
employed dataset and the high similarity of some attack traffic with
legitimate one are the main concern to use the simple threshold for
where TP (True Positive) represents the number of instances
anomaly detection works, especially in SDN environments.
correctly classified as an attack; TN (True Negative) represents
This section analyzes the detection performance of the LSTM-
the number of instances correctly classified as normal; FP (False
AE-OC-SVM approach. In this paper, we focus our attention on a
Positive) represents the number of instances incorrectly classified
binary classification problem, wherein we classify each observation
as an attack; FN (False Negative) represents the number of instances
as normal and anomalous traffic data. We analyse the precision,
incorrectly classified as normal.
recall, F-score, and accuracy values for all methods considered.
The results are presented in Table 3. For the OC-SVM model, the
4.3 Anomaly detection with a simple parameters gammas=0.001, nu=0.4 and a Radial Basic Function
Threshold (RBF) kernel are choosen for the experiment. Our approach has
We train our deep learning model using traffic data that are labeled the best performance in terms of F-score and accuracy values. The
as normal. We compute the ℓ2 -norm error between the original OC-SVM technique fails to have competitive F-score and accuracy
feature X𝑡 and the output feature Xc𝑡 in order to compute the recon- values.
struction error. The ℓ2 -norm error 𝑒 = ∥X𝑡 − X c𝑡 ∥ 2 will be low for
Algorithm Precision Recall F1-measure Accuracy (%)
normal traffic data, and high for anomalous traffic data. Therefore, OC-SVM 0.89 0.93 0.91 87.5
we use a fixed threshold in the reconstruction error for the binary LSTM-Autoencoder-OC-SVM 0.93 0.93 0.93 90.5
classification of normal and anomalous traffic data. The threshold Table 3: The Evaluation Metric Comparison. We report the
value is used as a decision boundary for detecting anomalous data. precision, recall, F-score and accuracy for the different both
The observations that have a reconstruction error greater than the algorithms.
threshold will be classified as anomalous, whereas the ones with
reconstruction error less than the threshold as normal traffic data.

We illustrate the efficacy of the proposed approach by reporting We also check the efficacy of our proposed method in terms of
different threshold values and represent their impact on precision, computational time. The computational time is very important to
recall, F-score, and accuracy. Table 2 summarizes the performance evaluate a classifier’s performance, especially with the era of big
of different threshold values in terms of evaluation metrics. The best data since massive amount of data is needed for the classification
performance is obtained at a threshold value 0.07. Then the evalua- in real-time. The table 4 represents the training and testing times
tion metrics drop dramatically with the increase in the threshold of the OC-SVM algorithm and hybrid approach. We observe the
value. consumed time by the OC-SVM algorithm is significantly high
However, using the reconstruction error as an anomaly threshold compared to the hybrid approach for both training and testing.
cannot significantly separate the normal and malicious data. The
high degree of similarity in some malicious and legitimate traffic 4.5 Receiver Operating Characteristic (ROC)
makes the reconstruction error rates for both traffic are relatively Further, we use the receiver operating characteristic (ROC) curve
close to each other i.e. are not linearly separated. To overcome this for checking the performance of the proposed approach. The ROC
Algorithm Training Time (s) Testing Time (s) conducted with COST Action 17124 DigForAsp, supported by COST
OC-SVM 479.748 38.355
(European Cooperation in Science and Technology) www.cost.eu".
LSTM-Autoencoder-OC-SVM 147.548 13.546
Table 4: The Training and Testing Time for Both Algorithms REFERENCES
[1] Mohammed Abbass, Ki-Chul Kwon, Nam Kim, Safey A Abdelwahab, Nehad
Haggag, Fatma Ibrahim, Yasser Mahrous, Ahmad Seddik, Ali Khalil, Zeinab
Elsherbeeny, et al. 2020. Anomaly Detection from Medical Signals and Images
curve represents the relation between false positive and true pos- Using Advanced Convolutional Neural Network. (2020).
itive rates. The area under the curve indicates the efficacy of the [2] L. Bontemps, V.L. Cao, J. McDermott, and Nhien-An Le-Khac. 2016. Collec-
binary classifier. The binary classifier gives perfect measures when tive Anomaly Detection Based on Long Short-Term Memory Recurrent Neural
Networks. In: Dang T., Wagner R., Küng J., Thoai N., Takizawa M., Neuhold E.
the area under curve (AUC) is near to the 1. In contrast, the model (eds) Future Data and Security Engineering. FDSE 2016. Lecture Notes in Computer
has the worst measures during AUC near to the 0. The AUC value of Science, vol 10018. Springer, Cham (2016).
[3] Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly
the presented model are shown in Figure 6. We obtained the value detection: A survey. arXiv preprint arXiv:1901.03407 (2019).
of 0.906, which indicates that our model can successfully separate [4] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection:
90.6% of positive and negative rates. A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1–58.
[5] Elisa Costante, Davide Fauri, Sandro Etalle, Jerry Den Hartog, and Nicola Zannone.
2016. A hybrid framework for data loss prevention and detection. In 2016 IEEE
Security and Privacy Workshops (SPW). IEEE, 324–333.
[6] Mahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev, and Anca Delia
Jurcut. [n. d.]. Ddosnet: A deep-learning model for detecting network attacks. In
21ST IEEE INTERNATIONAL SYMPOSIUM ON A WORLD OF WIRELESS, MOBILE
AND MULTIMEDIA NETWORKS (IEEE WOWMOM 2020), Ireland. IEEE.
[7] Mahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev, and Anca Delia
Jurcut. 2019. Machine-Learning Techniques for Detecting Attacks in SDN. In 2019
IEEE 7th International Conference on Computer Science and Network Technology
(ICCSNT). IEEE, 277–281.
[8] Mahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev, and Anca Delia
Jurcut. 2020. Detecting Abnormal Traffic in Large-Scale Networks. In 2020 IEEE
International Symposium on Networks, Computers and Communications (ISNCC’20).
IEEE.
[9] Mahmoud Said Elsayed, Nhien-An Le-Khac, and Anca D Jurcut. 2020. InSDN: A
Novel SDN Intrusion Dataset. IEEE Access 8 (2020), 165263–165284.
[10] S. Garg and S. Batra. 2017. A novel ensembled technique for anomaly detection.
International Journal of Communication Systems 30, 11 (2017), e3248.
[11] Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. 1999. Learning to forget:
Continual prediction with LSTM. (1999).
[12] Simon Hawkins, Hongxing He, Graham Williams, and Rohan Baxter. 2002. Outlier
detection using replicator neural networks. In International Conference on Data
Warehousing and Knowledge Discovery. Springer, 170–180.
[13] Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensional-
ity of data with neural networks. science 313, 5786 (2006), 504–507.
Figure 6: Receiver Operating Curve (ROC) of our proposed [14] F. Jiang, Y. Fu, B. B. Gupta, F. Lou, S. Rho, F. Meng, and Z. Tian. 2018. Deep
approach. learning based multi-channel intelligent attack detection for data security. IEEE
transactions on Sustainable Computing (2018).
[15] J. Kim, J. Kim, H. L. T. Thu, and H. Kim. 2016. Long short term memory recurrent
neural network classifier for intrusion detection. In Proc. International Conference
5 CONCLUSION AND FUTURE WORK on Platform Technology and Service (PlatCon). IEEE, 1–5.
[16] F. Kuang, W. Xu, and S. Zhang. 2014. A novel hybrid KPCA and SVM with GA
Network data can often be compromised because of malicious at- model for intrusion detection. Applied Soft Computing 18 (2014), 178–184.
tacks initiated by intruders. A good practice to protect against these [17] Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh, Ikkyun Kim, and
Kuinam J Kim. 2019. A survey of deep learning-based network anomaly detection.
attacks is to deploy machine learning based frameworks to detect Cluster Computing (2019), 1–13.
anomalies caused during the attacks. In this paper, we highlighted [18] D. Larson. 2016. Distributed denial of service attacks–holding back the flood.
the existing problems in exiting techniques and proposed solutions Network Security 2016, 3 (2016), 5–7.
[19] M. Latah and L. Toker. 2018. An Efficient Flow-based Multi-level Hybrid Intrusion
to address them. We proposed a deep learning framework based Detection System for Software-Defined Networks. arXiv preprint arXiv:1806.03875
on LSTM-autoencoder and OC-SVM that can model the normal (2018).
[20] Rashmika Nawaratne, Damminda Alahakoon, Daswin De Silva, and Xinghuo Yu.
traffic data efficiently. Our experiments shows that our proposed 2019. Spatiotemporal anomaly detection using deep learning for real-time video
model can efficiently detect the anomalies presented in network surveillance. IEEE Transactions on Industrial Informatics 16, 1 (2019), 393–402.
traffic data. In our future work, we plan to apply the proposed IDS [21] N. Nguyen Thi, V.L. Cao, and Nhien-An Le-Khac. 2017. One-Class Collective
Anomaly Detection Based on LSTM-RNNs. In: Hameurlain A., Küng J., Wagner R.,
framework in one or more realistic network settings to evaluate its Dang T., Thoai N. (eds) Transactions on Large-Scale Data- and Knowledge-Centered
performance in real-world scenarios and test its impact with regard Systems XXXVI. Lecture Notes in Computer Science, vol 10720. Springer, Berlin,
to latency. We also plan to extend the binary classification problem Heidelberg (2017).
[22] M. K. Prasath and B. Perumal. 2019. A meta-heuristic Bayesian network classifi-
into a multi-class classification problem, in order to identify the cation for intrusion detection. International Journal of Network Management 29,
type of network attacks. 3 (2019), e2047.
[23] Saina Ramyar, Abdollah Homaifar, Ali Karimoddini, and Edward Tunstel. 2016.
Identification of anomalies in lane change behavior using one-class SVM. In
6 ACKNOWLEDGMENTS 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE,
004405–004410.
This research is funded by the School of Computer Science, Univer-
sity College Dublin, Ireland. Dr. Anca Jurcut is involved in the work
[24] Mayu Sakurada and Takehisa Yairi. 2014. Anomaly detection using autoencoders [28] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho. 2016. Deep
with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd learning approach for network intrusion detection in software defined network-
Workshop on Machine Learning for Sensory Data Analysis. 4–11. ing. In Proc. International Conference on Wireless Networks and Mobile Communi-
[25] Bernhard Schölkopf, Robert C Williamson, Alex J Smola, John Shawe-Taylor, and cations (WINCOM). IEEE, 258–263.
John C Platt. 2000. Support vector method for novelty detection. In Advances in [29] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho. 2018. Deep
neural information processing systems. 582–588. recurrent neural network for intrusion detection in SDN-based networks. In Proc.
[26] N. Sultana, N. Chilamkurti, W. Peng, and R. Alhadad. 2019. Survey on SDN 4th IEEE Conference on Network Softwarization and Workshops (NetSoft). IEEE,
based network intrusion detection system using machine learning approaches. 202–206.
Peer-to-Peer Networking and Applications 12, 2 (2019), 493–501. [30] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-
[27] T Tang, L Mhamdi, S Zaidi, F El-moussa, D McLernon, and M Ghogho. 2019. Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful repre-
A Deep Learning Approach Combining Auto-encoder with One-class SVM for sentations in a deep network with a local denoising criterion. Journal of machine
DDoS Attack Detection in SDNs. In Proceedings of the International Conference learning research 11, Dec (2010), 3371–3408.
on Communications and Networking. IEEE. [31] C. Yin, Y. Zhu, J. Fei, and X. He. 2017. A deep learning approach for intrusion
detection using recurrent neural networks. IEEE Access 5 (2017), 21954–21961.
[32] Junyi Zou, Jinliang Zhang, and Ping Jiang. 2019. Credit Card Fraud Detection
Using Autoencoder Neural Network. arXiv preprint arXiv:1908.11553 (2019).

View publication stats

You might also like