Intrusion Detection of Imbalanced

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3048198, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2020.DOI
Intrusion Detection of Imbalanced

Network Traffic Based on Machine
Learning and Deep Learning
LAN LIU1 , PENGCHENG WANG1 , JUN LIN2 , LANGZHOU LIU1
1
Guangdong Polytechnic Normal University School of Electronic and Information Engineering, Guangzhou 510655, China
2
China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou 510610, China
Corresponding author: Jun Lin(linjun@ceprei.com)
This research is supported in part by the National Natural Science Foundation of China under Grant 61972104, in part by the Special
project for research and development in key areas of Guangdong Province under Grant 2019B010121001, in part by the Special Fund for
Science and Technology Innovation Strategy of Guangdong Province under Grant 2020A0332.
ABSTRACT In imbalanced network traffic, malicious cyber-attacks can often hide in large amounts
of normal data. It exhibits a high degree of stealth and obfuscation in cyberspace, making it difficult
for Network Intrusion Detection System(NIDS) to ensure the accuracy and timeliness of detection. This
paper researches machine learning and deep learning for intrusion detection in imbalanced network traffic.
It proposes a novel Difficult Set Sampling Technique(DSSTE) algorithm to tackle the class imbalance
problem. First, use the Edited Nearest Neighbor(ENN) algorithm to divide the imbalanced training set
into the difficult set and the easy set. Next, use the KMeans algorithm to compress the majority samples
in the difficult set to reduce the majority. Zoom in and out the minority samples’ continuous attributes in the
difficult set synthesize new samples to increase the minority number. Finally, the easy set, the compressed
set of majority in the difficult, and the minority in the difficult set are combined with its augmentation
samples to make up a new training set. The algorithm reduces the imbalance of the original training set
and provides targeted data augment for the minority class that needs to learn. It enables the classifier
to learn the differences in the training stage better and improve classification performance. To verify
the proposed method, we conduct experiments on the classic intrusion dataset NSL-KDD and the newer
and comprehensive intrusion dataset CSE-CIC-IDS2018. We use classical classification models: random
forest(RF), Support Vector Machine(SVM), XGBoost, Long and Short-term Memory(LSTM), AlexNet,
Mini-VGGNet. We compare the other 24 methods; the experimental results demonstrate that our proposed
DSSTE algorithm outperforms the other methods.
INDEX TERMS IDS, imbalanced network traffic, machine learning, deep learning, CSE-CIC-IDS2018.
I. INTRODUCTION the limitation of computer storage and computing power at

With the rapid development and wide application of 5G, IoT, that time, machine learning failed to attract attention. With
Cloud Computing, and other technologies, network scale, the rapid development of computers and the emergence and
and real-time traffic become more complex and massive, promotion of Artificial Intelligence(AI) and other technolo-
cyber-attacks have also become complex and diverse, bring- gies, many scholars have applied machine learning methods
ing significant challenges to cyberspace security. As the to network security. They have achieved certain results [2]–
second line of defense behind the firewall, the Network In- [4].
trusion Detection System(NIDS) needs to accurately identify In real cyberspace, normal activities occupy the dominant
malicious network attacks, provide real-time monitoring and position, so most traffic data are normal traffic; only a few
dynamic protection measures, and formulate strategies. are malicious cyber-attacks, resulting in a high imbalance of
James Anderson first proposed the concept of intrusion categories. In the highly imbalanced and redundant network
detection in 1980, and then some scholars applied machine traffic data, intrusion detection is facing tremendous pressure.
learning methods in intrusion detection [1]. However, due to Cyber-attacks can hide in a large amount of normal traffic.
VOLUME XX, 2020 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
L. Liu et al.: Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning
Therefore, the machine learning algorithm cannot fully learn the classification accuracy of classifiers under different di-
the distribution of a few categories, and it is easy to misclas- mension features [12]. Shiraz studied some new technologies
sify [5]. to improve CANN intrusion detection methods’ classification
Since Hinton et al. [6] proposed the theory of Deep performance and evaluated their performance on the NSL-
Learning as an essential subfield of machine learning, deep KDD Cup99 dataset [13]. He used the K Farthest Neigh-
learning has shown excellent performance in Computer Vi- bor(KFN) and the K Nearest Neighbor(KNN) to classify the
sion(CV) [7], Natural Language Processing(NLP) [8]. Intru- data and used the Second Nearest Neighbor(SNN) of the data
sion detection technology based on deep learning has been when the nearest and farthest neighbors have the same class
widely studied in academia and industry. The method of deep label. The result shows the CANN detection rate and reduces
learning is to mine the potential features of high-dimensional the failure the alert rate is improved or provides the same per-
data through training models and convert network traffic formance. Bhattacharya proposed a machine learning model
anomaly detection problems into classification problems [9]. based on hybrid Principal Component Analysis(PCA)-Firefly
By training a large number of data samples, adaptive learning [14]. The dataset used was the open dataset collected from
of the difference between normal behavior and abnormal Kaggle. Firstly, the model performs one key coding for
behavior effectively enhances the real-time performance of transforming the IDS dataset, then uses the hybrid PCA-
intrusion processing. However, in the multi-classification of Firefly algorithm to reduce the dimension, and the XGBoost
network traffic, the imbalance of classification still affects. algorithm classifies the reduced dataset.
Faced with imbalanced network traffic data, we propose a In recent years, with the powerful ability of automatic fea-
novel Difficult Set Sampling Technique(DSSTE) algorithm ture extraction, deep learning has made remarkable achieve-
to tackle the class imbalance problem in network traffic. This ments in the fields of Computer Vision(CV), Autonomous
method effectively reduces the imbalance and makes the clas- driving(AD), Natural Language Processing(NLP). Many
sification model learning difficult samples more effective. We scholars apply deep learning to intrusion detection for traffic
use classic machine learning and deep learning algorithms to classification, which has become a hot spot of current re-
verify on two benchmark datasets. The specific contributions search. The method of deep learning is to mine the potential
are as follows. characteristics of high-dimensional data through a training
(1) We use the classic NSL-KDD and the up-to-date CSE- model and transform network traffic anomaly detection into
CIC-IDS2018 as benchmark datasets and conduct detailed classification problem [15]. Through a large number of sam-
analysis and data cleaning. ple data training, adaptive learning between normal network
(2) This work proposes a novel DSSTE algorithm, re- traffic and abnormal network traffic effectively enhances real-
ducing the majority samples and augmenting the minority time intrusion processing.
samples in the difficult set, tackling the class imbalance
Torres et al. [16] first converted network traffic characteris-
problem in intrusion detection so that the classifier learns the
tics into a series of characters and then used Recurrent Neural
differences better in training.
Network(RNN) to learn their temporal characteristics, which
(3) The classification model uses Random Forest(RF),
were further used to detect malicious network traffic. Wang
Support Vector Machine(SVM), XGBoost, Long and Short
et al. [17] proposed a malicious software traffic classification
Time Memory(LSTM), AlexNet, Mini-VGGNet. Comparing
algorithm based on Convolutional Neural Network(CNN).
with other methods, we divide the experiment into 30 meth-
By mapping the traffic characteristics to pixels, the network
ods.
traffic image is generated, and the image is used as the
The rest of this article is organized as follows. The second
input of the CNN to realize traffic classification. Staudemeyer
part mainly introduces the related work of intrusion detection
et al. [13] proposed an intrusion detection algorithm based
and class imbalance research. The third section introduces
on Long Short-Term Memory(LSTM), which detects DoS
our proposed DSSTE algorithm, machine learning, and deep
attacks and probe attacks with unique time series in the KDD
learning algorithm. The fourth section analyzes and experi-
Cup99 dataset. Kwon et al. [18] has carried out relevant
ments on the benchmark dataset. Finally, the paper concludes
research on the deep learning model, focusing on data simpli-
in the fifth section.
fication, dimension reduction, classification, and other tech-
II. RELATED WORKS
nologies, and proposes a Fully Convolutional Network(FCN)
A. INTRUSION DETECTION SYSTEM(IDS)
model. By comparing with the traditional machine learning
technology, it is proved that the FCN model is useful for net-
In the research of network intrusion detection based on
work traffic analysis. Tama et al. [19] proposed an anomaly-
machine learning, scholars mainly distinguish normal net-
based IDS based on a two-stage meta-classifier, which uses
work traffic from abnormal network traffic by dimensionality
a hybrid feature selection method to obtain accurate feature
reduction, clustering, and classification, to realize the identi-
representations. They conducted on the proposed method
fication of malicious attacks [10], [11].
on the NSL-KDD and UNSW-NB15 intrusion datasets and
Pervez proposed a new method for feature selection
improved detection rates.
and classification merging of multi-class NSL-KDD Cup99
dataset using Support Vector Machine(SVM) and discussed
2 VOLUME XX, 2020
B. CLASS BALANCING METHODS misjudging new data types, including those not in the training
In the field of machine learning, the problem of category dataset. Bedi et al. [24] proposed a new type of IDS based on
imbalance has always been a challenge. Therefore, intrusion Siamese Neural Network(Siamese-NN), the proposed Siam-
detection also faces enormous challenges in network traf- IDS can detect R2L and U2R attacks without using tra-
fic with extremely imbalanced categories. Therefore, many ditional class balancing techniques, such as over-sampling
scholars have begun to study how to improve the intrusion and random under-sampling. The performance of Siam-IDS
recognition accuracy of imbalanced network traffic data. was compared with Deep Neural Network(DNN) and CNN,
Piyasak proposed a method to improve the accuracy Siam-IDS can achieve a higher recall value for R2L and U2R
of minority classification [20]. This method combines the attack categories compared with similar products.
Synthetic Minority Over-sampling Technique(SMOTE) and Most scholars use interpolation, oversampling, encoder
Complementary Neural Network(CMTNN) to solve imbal- synthesis data, and other data augmentation methods, balance
anced data classification. Experiments on the UCI dataset the training set, and achieve better experimental performance
show that the proposed combination technique can improve results. Although their method synthetic close to real data and
class imbalance problems. Yan proposed an improved lo- effectively expand the minority class, the test data distribu-
cal adaptive composite minority sampling algorithm(LA- tion may exceed the range. The classifier cannot accurately
SMOTE) to deal with the network traffic imbalance problem predict this distribution. We propose the DSSTE algorithm
and then based on the deep learning GRU neural network to mine the difficult samples in the imbalanced training set,
to detect the network traffic anomaly [21]. Abdulhammed compress the majority class among them, and zoom in or
et al. [22] deal with the imbalanced dataset CIDDS-001 out the minority class’s continuous attributes. This method
using data Upsampling and Downsampling methods, and by reduces the imbalance and produces data that conforms to
Deep Neural Networks, Random Forest, Voting, Variational the true distribution.
Autoencoder, and Stacking Machine Learning classifiers to
evaluate datasets. In their proposed method, the accuracy can III. METHOD
reach 99.99%. Faced with imbalanced network traffic, we propose the Diffi-
Recently, Chuang et al. [23] trained the depth automatic cult Set Sampling Technique(DSSTE) algorithm to compress
encoder to establish a data generation model to generate the majority samples and augment the number of minor-
reasonable data needed to form a balanced dataset . His ity samples in difficult samples, reducing imbalance in the
experiments show that the generation of balanced datasets training set that the intrusion detection system can achieve
helps to deal with the problem of over fitting caused by better classification accuracy. We use Random Forest, SVM,
imbalanced data, and it can prevent the training model from XGBoost, LSTM, Mini-VGGNet, and AlexNet as classifiers
FIGURE 1. The overall framework of network intrusion detection model.
VOLUME XX, 2020 3
for classification models. minority class’s continuous attributes are zoomed to produce
We proposed the intrusion detection model shown in Fig- data that conforms to the true distribution. Therefore, we
ure 1. Data pre-processing first performed in our intrusion propose the DSSTE algorithm to reduce the imbalance.
detection structure, including duplicate, outlier, and missing First, the imbalanced training set to divide into near-
value processing. Then, partitioning the test set and the neighbor set and far-neighbor set by Edited Nearest Neigh-
training set, and the training set processed for data balancing bor(ENN) algorithm. The samples in the near-neighbor set
using our proposed DSSTE algorithm. Before modeling, to are highly similar, making it very difficult for the classifier
increase the speed of the convergence, we use StandardScaler to learn the differences between the categories, so we refer
to standardize the data and digitize the sample labels. Finally, to the samples in the near-neighbor set as difficult samples
the processed training set is used to train the classification and the far-neighbor set as easy samples. Next, we zoom
model, and then the model is evaluated by the test set. in and out the minority samples in difficult set. Finally, the
easy set and minority in difficult set are combined with its
A. DSSTE ALGORITHM augmentation samples to make up a new training set. We use
the K neighbors in the ENN algorithm as the scaling factor
In imbalanced network traffic, different traffic data types of the entire algorithm. When scaling factor K increases, the
have similar representations, especially minority attacks can number of difficult samples increases, and the compression
hide among a large amount of normal traffic, making it rate of the majority of samples and the synthesis rate of
difficult for the classifier to learn the differences between the minority of class also increase. The DSSTE algorithm is
them during the training process. In the similar samples of written as Algorithm 1.
the imbalanced training set, the majority class is redundant
noise data. The number is much larger than the minority
class, making the classifier unable to learn the distribution B. MACHINE LEARNING AND DEEP LEARNING
of the minority class, so we compress the majority class. ALGORITHMS
The minority class discrete attributes remain constant, and In the classifier’s design, we use Random Forest, SVM,
there are differences in continuous attributes. Therefore, the XGBoost, LSTM, AlexNet, and Mini-VGGNet to train and
Algorithm 1 DSSTE Algorithm

Input: Imbalanced training set S, scaling factor K
Output: New training set SN
1: Step1: Distinguish easy set and difficult set
2: Take all samples from S and set it as SE
3: for each sample ∈ SE do
4: Compute its K nearest neighbors
5: Remove whose most K nearest neighbor samples are of different classes from SE
6: end for
7: Easy set SE , difficult set SD = S - SE
8: Step2: Compress the majority samples in difficult set by the cluster centroid
9: Take all the majority samples from SD and set it as SM aj
10: Use KMeans algorithm with K cluster
11: Use the coordinates of the K cluster centroids replace the majority samples in SM aj
12: Compressed the majority samples set SM aj
13: Step3: Zoom augmentation
14: Take the minority samples from SD and set it as SM in
15: Take the Discrete attributes from SM in and set it as XD
16: Take the Continuous attributes from SM in and set it as XC
17: Take the Label attributes from SM in and set it as Y
18: for n ∈ range(K, K + S number 1
) do // zoom range is [1 − K 1
,1 + K ], SM in .shape[0] is number of samples in
M in .shape[0]
SM in
19: XD1 = XD
20: XC1 = XC × (1 − n1 )
21: XD2 = XD
22: XC2 = XC × (1 + n1 )
23: SZ append [concat(XD1 , XC1 , Y ), concat(XD2 , XC2 , Y )]
24: end for
25: New training set SN = SE + SM aj + SM in + SZ
4 VOLUME XX, 2020
test, which are detailed in the following part. 4) Long short-term memory
The Long Short-Term Memory(LSTM) network is a Recur-
1) Random Forest rent Neural Network(RNN) structure proposed by Hochreiter
Leo Breiman proposed random Forest in 2001 [25]. Random and Jurgen in 1997 [30]. Like most RNN, the LSTM network
Forest is an excellent supervised learning algorithm that can is universal because as long as there is a suitable weight ma-
train a model to predict which classification results in a trix, the LSTM network can calculate any network element
certain sample type belong to based on a given dataset’s char- that can be calculated by any conventional computer. Dif-
acteristic attributes and classification results. Random Forest ferent from the traditional RNN, the LSTM network is very
is based on a decision tree and adopts the Bagging(Bootstrap suitable for learning from experience. When there is a time
aggregating) method to create different training sample sets. lag of unknown size and boundary between important events,
The random subspace division strategy selects the best at- the time series can be classified, processed, and predicted.
tribute from some randomly selected attributes to split in- LSTM is not sensitive to gap length and has advantages over
ternal nodes. The various decision trees formed are used as other RNN and hidden Markov models and other sequence
weak classifiers, and multiple weak classifiers form a robust learning methods in many applications [31]. The problem of
classifier, and the voting mechanism is used to classify the gradient disappearance and gradient explosion is solved by
input samples. After a random forest has established a large introducing the gate structure and storage unit.
number of decision trees according to a certain random rule
when a new set of samples is input, each decision tree in the 5) AlexNet
forest makes a prediction on this set of samples separately, AlexNet is one of the classic basic networks of deep learning.
and integrates the prediction results of each tree, get a final It was proposed by Hinton and his student Alex Krizhevsky
result. in 2012 [32]. Its main structure is an 8-layer deep neural
network, including 5-layer convolutional layers and 3-layer
2) Support Vector Machine fully connected layers, which are not counted in the Activa-
Coretes and Vapink first proposed support Vector Ma- tion layer and pooling layer. The ReLU function is used as
chine(SVM) in 1995 [26]. It shows many unique advantages the activation function in the AlexNet convolutional layer,
in a small sample, nonlinear, and high-dimensional pattern instead of the Sigmoid function widely used in previous
recognition and can be extended to other functions such networks. The introduction of the ReLU function solves the
as function fitting Machine learning problems [27]. Before problem of gradient dispersion when the neural network
the rise of deep learning, SVM was considered the most is deep. The AlexNet neural network uses the Maxpooling
successful and best-performing machine learning method in method in the convolutional layer to downsample the feature
recent decades. The SVM method is based on the Vapnik map output by the convolutional layer, instead of the average
Chervonenkis(VC) dimension theory of statistical learning pooling commonly used before. Therefore, the AlexNet neu-
theory and the principle of structural risk minimization. Its ral network has better performance than the previous neural
basic idea is to find a separation hyperplane between different network.
categories, so that different category can be better separated.
The SVM method believes that when deciding to separate the 6) Mini-VGGNet
hyperplane, only the sample point closest to the hyperplane, In 2014, researchers from the Visual Geometry Group of
as long as the support vector is found, the hyperplane can be Oxford University and Google DeepMind jointly developed
determined. a new deep convolutional neural network: VGGNet and won
second place in the ILSVRC2014 classification project. Their
3) XGBoost paper "Very Deep Learning Convolutional Neural Networks
XGBoost is a parallel regression tree model that combines for Large-Scale Image Recognition" mainly focuses on the
the idea of Boosting, which is improved based on gradient influence of convolutional neural networks’ depth on the
descent decision tree by Tianqi [28]. Compared with the recognition accuracy of large-scale image sets [33]. The main
GBDT(Gradient Boosting Decision Tree) model, XGBoost contribution is to use a small convolution kernel (3×33×3)
overcomes the limited calculation speed and accuracy. XG- to construct various depths of convolutional neural network
Boost adds regularization to the original GBDT loss function structures. Moreover, it evaluated these network structures
to prevent the model from overfitting. The traditional GBDT and finally proved that the 16-19 layer network depth could
performs a first-order Taylor expansion on the calculated loss achieve better recognition accuracy. VGG-16 and VGG-19
function and takes the negative gradient value as the residual are commonly used to extract image features. VGG can
value of the current model. In contrast, XGBoost performs be regarded as a deepened version of AlexNet. The entire
a second-order Taylor expansion to ensure the accuracy of network is superimposed by a convolutional layer and a fully
the model. Moreover, XGBoost blocks and sorts each feature, connected layer. Unlike AlexNet, VGGNet uses a small-sized
making it possible to parallelize the calculation when looking convolution kernel(3×3).
for the best split point, which significantly accelerates the AlexNet is one of the classic basic networks of deep
calculation speed [29]. learning. It was proposed by Hinton and his student Alex
VOLUME XX, 2020 5
Krizhevsky in 2012 [32]. Its main structure is an 8-layer TABLE 1. Description of the NSL-KDD features.
deep neural network, including 5-layer convolutional layers
and 3-layer fully connected layers, which are not counted in Attributes Description
the Activation layer and pooling layer. The ReLU function is
1–9 Basic features of network connections
used as the activation function in the AlexNet convolutional
layer, instead of the Sigmoid function widely used in previ- 10–22 Content-related traffic features
ous networks. The introduction of the ReLU function solves 23–31 Time-related traffic features
the problem of gradient dispersion when the neural network 32–41 Host-based traffic features
is deep. The AlexNet neural network uses the Maxpooling
method in the convolutional layer to downsample the feature
map output by the convolutional layer, instead of the average
pooling commonly used before. Therefore, the AlexNet neu- dataset contains six different attack scenarios: Brute Force,
ral network has better performance than the previous neural Botnet, DoS, DDoS, Web Attacks, and Infiltration. Each
network. sample in CSE-CIC-IDS2018 includes 83 features listed in
In this experiment, because we have fewer traffic char- Table 2.
acteristics, we used the Mini-VGGNet(miniVGG) network
mentioned by Ismail for classification experiments [34]. TABLE 2. Description of the CSE-CIC-IDS2018 features.
In general, Mini-VGGNet contains two sets of CONV =>

RELU => CONV => RELU => POOL, followed by FC => Attributes Description
RELU => FC => SOFTMAX layer. The first two CONV
1–4 Basic features of network connections
layers will learn 32 3×3 cores. The last two CONV layers
will learn 64 cores that are also 3×3. The POOL layer will 5–16 Features of network packets
perform a Maxpooling operation with 2×2 cores and a Stride 17–22 Features of network flows
of 2×2. 23–45 Statistic of network flows
46–63 Content-related traffic features
IV. EXPERIMENT
64–67 Features of network subflows
In this experiment, we use the classical classification algo-
rithms of machine learning and deep learning, including Ran- 68–79 General purpose traffic features
dom Forest(RF), Support Vector Machine(SVM), XGBoost, 80-83 Basic features of network connections
Long Short-Term Memory(LSTM), AlexNet, and Mini-
VGGNet. And compared with other oversampling meth-
ods, including Random Under-sampling(RUS) [35], Ran- We use t-SNE to visualize the NSL-KDD and CSE-CIC-
dom Over-sampling(ROS) [36] and Synthetic Minority Over- IDS2018 by dimensionality reduction [40]. As shown in
sampling TEchnique(SMOTE) [37], it is divided into 30 Figure 2, we can see that the normal samples are much
methods to combine. larger than the attack samples, making some attacks easy to
hide and confusion among them makes traditional intrusion
A. BENCHMARK DATASET detection technology increasingly challenging to detect.
We choose NSL-KDD and CSE-CIC-IDS2018 as the bench- In NSL-KDD dataset, we use KDDTrain+ and KDDTest+
mark dataset for experiments. as the training set and test set, and it is divided into five cat-
NSL-KDD is the most classic dataset in the field of egories: Normal, DOS, R2L, Probe, U2R. Since CSE-CIC-
intrusion detection [38]. It is an improvement based on IDS2018 is a huge and redundant dataset, there is no official
the KDD99 dataset and is reasonably divided into different division between training and test sets. In order to ensure
difficulty levels in the test set. Although it still has some the imbalance of traffic data and verify the effectiveness of
problems and is not a perfect representation of the existing our proposed method. We randomly selected 40,000 Benign
real network, it can still be used as an effective benchmark traffic. For the attack traffic data with more than 20,000,
dataset to help researchers compare different intrusion detec- we randomly selected 20,000 from them. For the attack
tion methods. Each sample in NSL-KDD includes 41 features traffic data with less than 20,000, we all pick it out. Since
listed in Table 1. DoS attacks-SlowHTTPTest only has three valid data after
CSE-CIC-IDS2018 is an intrusion detection dataset cre- removing features such as Timestamp, we will not add them
ated by the Canadian Institute of Cyber Security (CIC) on to our experiment. Furthermore, divide 80% of the selected
AWS (Amazon Web Services) in 2018. It is also the latest and data into the training set and 20% into the test set.
comprehensive intrusion dataset currently publicly available NSL-KDD, CSE-CIC-IDS2018 are highly imbalanced
[39]. CSE-CIC-IDS2018 is a dataset collected for launching datasets, with normal traffic accounting for the vast majority,
real attacks. It is an improvement based on the CSE-CIC- which conforms to traffic data distribution in the whole
IDS2017 dataset. It contains the necessary standards for the network world. We have performed traffic label statistics
attack dataset and covers various known attack types. The on tens of millions of samples, and it can be seen that the
6 VOLUME XX, 2020
(a) NSL-KDD (b) CSE-CIC-IDS2018
FIGURE 2. Use t-SNE to visualize NSL-KDD(a) and CSE-CIC-IDS2018(b).
TABLE 3. Distribution of the benchmark datasets.
Dataset Type Total Imbalance ratio Training set Testing set

Normal 77054 - 67343 9711
DoS 53385 1.44 45927 7458
NSL-KDD R2L 3882 19.85 995 2887
Probe 14077 5.47 11656 2421
U2R 119 647.51 52 67
Benign 40000 - 32000 8000
Bot 20000 2.00 16000 4000
DDOS attack-LOIC-UDP 1730 23.12 1384 346
DDOS attack-HOIC 20000 2.00 16000 4000
DDoS attacks-LOIC-HTTP 20000 2.00 16000 4000
DoS attacks-GoldenEye 20000 2.00 16000 4000
DoS attacks-Hulk 20000 2.00 16000 4000
CSE-CIC-IDS2018
DoS attacks-Slowloris 9908 4.04 7926 1982
SSH-Bruteforce 20000 2.00 16000 4000
FTP-BruteForce 46 869.57 37 9
Infilteration 20000 2.00 16000 4000
Brute Force -Web 550 72.73 440 110
Brute Force -XSS 227 176.21 181 46
SQL Injection 82 487.80 66 16
abnormal traffic is much smaller than the normal traffic. The B. DATA PREPROCESSING
specific results are shown in Table 3. When the dataset is extracted, part of the data contains some
noisy data, duplicate values, missing values, infinity values,
etc. due to extraction errors or input errors. Therefore, we first
VOLUME XX, 2020 7
perform data preprocessing. The main work is as follows. TABLE 4. Development environment.
(1) Duplicate values: delete the sample’s duplicate value,

only keep one valid data. Project Properties
(2) Outliers: in the sample data, the sample size of missing OS Ubuntu 18.04.3 LTS
values(Not a Number, NaN) and Infinite values(Inf) is small, CPU Intel(R) Xeon(R) CPU @2.30GHz
so we delete this. TPU -
(3) Features delete and transform: In CSE-CIC-IDS2018, Memory 12.7 GiB
we delete features such as "Timestamp", "Destination Ad-
Disk 64.0 GiB
dress", "Source Address", "Source Port", etc. If features "Init
Bwd Win Byts" and features "Init Fwd Win Byts" have a Framework Sklearn 0.22.2.post1 + TensorFlow 2.2.0
value of -1, we add two check dimensions. The mark of -1
is 1. Otherwise, it is 0. In NSL-KDD, we use the OneHot
encoder to complete this conversion. For example, "TCP",
"UDP" and "ICMP" are functions of three protocol types. iments provided in Sklearn. The specific parameters are
After OneHot encoding, they become binary vectors (1, 0, 0), shown in Table 5.
(0, 1, 0), (0, 0, 1). The protocol type function can be divided TABLE 5. Machine learning model related parameters.
into three categories, including 11 categories for flag function
and 70 categories for service function. Therefore, the 41
Classfier Parameters
dimensions initial feature vector becomes 122 dimensions.
n_estimators=200,
(4) Numerical standardization: In order to eliminate the
criterion=’gini’,
dimensional influence between indicators and accelerate the RandomForestClassifier
min_samples_plit=2,
gradient descent and model convergence, the data is stan-
min_samples_leaf=1
dardized, that is, the method of obtaining Z-Score, so that
the average value of each feature becomes 0 and the standard penalty=’l2’,
deviation becomes 1, converted to a standard normal distribu- loss=’squared_hinge’,
tion, which is related to the overall sample distribution, and dual=True, tol=0.0001,
each sample point can have an impact on standardization. The svm.LinearSVC C=1.0, multi_class=’ovr’,
standardization formula is as follows, u is the mean of each fit_intercept=True,
0
feature, s is the standard deviation of each feature, and xi is intercept_scaling=1,
the element corresponding to each column’s features. max_iter=1000
objective=’multi:softmax’,
XN booster=’gbtree’,
u= xi (1) XGBClassifier
i=1 verbosity=0, silent=0,
XN learning_rate=0.1
s= (xi − u)2 (2)
i=1
0 xi − u
xi = (3)
s For deep learning algorithms, LSTM and GRU use the
original one-dimensional sequence (144x1) of the learn-
C. EXPERIMENTAL PARAMETERS
ing data set. For AlexNet and MiniVGGNet, We added
The proposed method uses the Sklearn(machine learning 41 0 dimensions at the end of the feature and then
framework) and Tensorflow(deep learning framework) and reshaped the single-channel two-dimensional matrix pro-
completes related experiments on the Google Colaboratory cessing (12x12x1). We uniformly adopt Adam’s optimizer
platform. The machine learning algorithm uses CPU cal- (lr=0.001) and perform 100 epochs in the model training
culations, and the deep learning algorithm uses TPU for stage, and the batch size is 1024. The parameters are shown
acceleration. The specific parameters are shown in Table 4. in Table 6.
To prevent overfitting, we standardized the data. In ma-
chine learning, the integrated learning model uses shallow D. EVALUATION METRICS
trees to prevent overfitting. In the deep learning model, we We use the Accuracy, Prediction, Recall, and F1-Score to
use TPU for acceleration, so we chose a larger Batch and evaluate the experimental model’s performance. These eval-
increased the epochs accordingly. Overfitting was further pre- uation criteria reflect the performance of the intrusion detec-
vented by observing the accuracy and loss changes during the tion system’s flow recognition accuracy rate, and false alarm
training phase, using appropriate learning rates, and adding rate. The combination of the model prediction results and the
Dropout to the neural network layer. true label is divided into four types: False Negative(FN), a
For the machine learning algorithm, we used the Ran- positive sample, which is mistakenly judged as a negative
domForestClassifier, svm.LinearSVC, XGBClassifier exper- sample. False Positive(FP), negative samples are misjudged
8 VOLUME XX, 2020
TABLE 6. Deep learning model related parameters.
LSTM AlexNet miniVGGNet

Input(144,1) Input(12,12,1) Input(12,12,1)
LSTM-64 Conv11-96 Conv3-32
Dropout-0.2 Maxpool-3 Conv3-32
LSTM-64 Conv5-256 Maxpool-2
Dropout-0.2 Conv3-384 Dropout-0.25
LSTM-64 Conv3-384 Conv3-64
Dropout-0.2 Conv3-256 Conv3-64
FC-512 Maxpool-2 Maxpool-2 (a) NSL-KDD KDDTest+
Dropout-0.5 Flatten Dropout-0.25
FC-2048 Flatten
Dropout-0.5 FC-512
FC-2048 Dropout-0.25
Dropout-0.5
SooftMax-5 / SooftMax-14
as positive samples. True Negative(TN), actually negative

samples, are correctly judged as negative samples. True Posi-
tive(TP), actually positive samples, are judged as the positive (b) CSE-CIC-IDS2018
sample. These metrics are calculated according to Equations
4-7. FIGURE 3. F1-Score of DSSTE algorithm with different scaling factor K.
TP + TN
Accuracy = (4) performance at K=50. In CSE-CIC-IDS2018, the classifiers
TP + TN + FP + FN
TP achieve excellent average performance at K=10. Therefore,
P recision = (5) based on the average F1-Score, in NSL-KDD, we used the
TP + FP
TP + TN scaling factor k=50, where the difficult samples in Normal,
Recall = (6) DoS, and Probe were compressed, and the difficult samples
TP + TN + FP + FN
2 × P recision × Recall in R2L and U2R were augmented with data. In CSE-CIC-
F 1_Score = (7) IDS2018, we used the scaling factor K=10 and performed a
P recision + Recall
similar treatment to NSL-KDD for the difficult samples. The
E. EXPERIMENTAL RESULTS new training set after the treatment is shown in Table 7.
In our experiments, we first explored the classifier’s perfor- Table 8 summarizes the comparison between DSSTE and
mance on the training set treated with different deflation fac- other sampling methods, and our proposed DSSTE algorithm
tors. In the proposed DSSTE algorithm, there is a parameter outperforms other methods in NSL-KDD and CSE-CIC-
scaling factor of K. When K increases within a certain range, IDS2018.
the number of difficult samples will also increase, but when In the experimental results for the NSL-KDD dataset,
K exceeds the range, the number of difficult samples will LSTM achieved the highest accuracy of 78.24% and the high-
constantly be constant. However, the majority compression est F1-Score of 75.03% in the original training set. After sam-
and the minority augmentation in the difficult samples will pling the RUS algorithm’s training, XGBoost achieved the
increase with K change. Therefore, to ensure that the data highest accuracy rate of 78.79%, and miniVGGNet achieved
sampling is useful and does not generate excessive noise and the highest recall rate of 75.57%. After sampling the ROS al-
that the DSSTE algorithm achieves the best sampling results, gorithm’s training, LSTM achieved the highest accuracy rate
we experimented with different scaling factors. of 78.72% and the highest recall rate of 75.82%. After the
We processed the training set in NSL-KDD and CSE-CIC- SMOTE algorithm sampled the training set, AleNet achieved
IDS2018 using different scaling factors K. We performed the highest accuracy rate of 78.75% and the highest recall rate
experiments on the proposed six classifiers, and performance of 77.27%. In the training set sampled by DSSTE proposed
was evaluated using the average F1-Score of each classifier, in this paper, AleNet achieved the highest accuracy rate of
as shown in Figure 3. 82.84% and the highest recall rate of 81.66%.
In NSL-KDD, the classifiers achieve excellent average In the experimental results of the CSE-CIC-IDS2018
VOLUME XX, 2020 9
TABLE 7. The new training set class distribution processed by the DSSTE algorithm.
Dataset Type Training set New training set

Normal 67343 61914
DoS 45927 40425
NSL-KDD
R2L 995 15683
(K=50)
Probe 11656 7348
U2R 52 2652
Benign 32000 25195
Bot 16000 15952
DDOS attack-LOIC-UDP 1384 2891
DDOS attack-HOIC 16000 15984
DDoS attacks-LOIC-HTTP 16000 14469
DoS attacks-GoldenEye 16000 14320
CSE-CIC-IDS2018 DoS attacks-Hulk 16000 15496
(K=10) DoS attacks-Slowloris 7926 10709
SSH-Bruteforce 16000 15976
FTP-BruteForce 37 367
Infilteration 16000 11728
Brute Force -Web 440 1947
Brute Force -XSS 181 1391
SQL Injection 66 759
dataset, random forest achieves the highest accuracy of 94.89 97.04%. However, the accuracy and recall of random forest
% and the highest F1-Score of 94.72 % in the unprocessed are also very close to each other. Random forest exhibits the
training set. After the RUS, ROS, and SMOTE algorithms generalization capability of integrated learning when used in
sampled the training set. The random forest achieved the combination with each sampling algorithm, and it requires
highest accuracy and F1-Score. However, the performance fewer hardware resources.
improvement was very small or even lower than that of the As shown in Figure 4, we counted the average accuracy
original data set. In the training set sampled by the DSSTE and F1-Score of the classifier for each sampling method.
algorithm proposed in this paper, miniVGGNet achieves In the NSL-KDD dataset, the sampling algorithms’ per-
the highest accuracy of 96.99% and the highest recall of formance using RUS, ROS, and SMOTE are all improved
FIGURE 4. Comparison of the performance of different sampling methods(Accuracy and F1-Score are the average of each classifier).
10 VOLUME XX, 2020
TABLE 8. Comparison results between DSSTE and different methods(Acc, Pre, and F1-Score are the average of multiple classes, weighted by the
number of samples in each class).
NSL-KDD CSE-CIC-IDS2018
Model
Acc Pre Recall F1 Acc Pre Recall F1
RF 0.7434 0.8137 0.7434 0.7015 0.9489 0.9481 0.9489 0.9481
SVM 0.7366 0.7384 0.7366 0.6966 0.9225 0.9261 0.9225 0.9126
XGBoost 0.7715 0.8107 0.7715 0.7365 0.9398 0.9449 0.9398 0.9340
LSTM 0.7824 0.7838 0.7823 0.7503 0.9375 0.9444 0.9370 0.9313
AlexNet 0.7618 0.8050 0.7611 0.7194 0.9376 0.9440 0.9369 0.9313
miniVGGNet 0.7605 0.8066 0.7594 0.7303 0.9388 0.9450 0.9384 0.9326
RUS + RF 0.7655 0.8220 0.7655 0.7304 0.9419 0.9454 0.9419 0.9428
RUS + SVM 0.7362 0.7510 0.7362 0.7058 0.8902 0.9087 0.8902 0.8926
RUS + XGBoost 0.7879 0.8177 0.7879 0.7468 0.9212 0.9362 0.9212 0.9234
RUS + LSTM 0.7705 0.7970 0.7704 0.7462 0.9150 0.9294 0.9139 0.9171
RUS + AlexNet 0.7834 0.8250 0.7814 0.7537 0.9294 0.9308 0.9268 0.9280
RUS + miniVGGNet 0.7827 0.8134 0.7826 0.7557 0.9337 0.9330 0.9329 0.9319
ROS + RF 0.7515 0.8125 0.7515 0.7066 0.9492 0.9484 0.9492 0.9483
ROS + SVM 0.7493 0.8005 0.7493 0.7300 0.9165 0.9311 0.9165 0.9073
ROS + XGBoost 0.7809 0.8196 0.7809 0.7532 0.9385 0.9448 0.9385 0.9320
ROS + LSTM 0.7872 0.8289 0.7866 0.7582 0.9348 0.9409 0.9346 0.9293
ROS + AlexNet 0.7850 0.8057 0.7849 0.7451 0.9299 0.9387 0.9295 0.9262
ROS + miniVGGNet 0.7626 0.7893 0.7626 0.7332 0.9362 0.9406 0.9359 0.9316
SMOTE + RF 0.7409 0.8070 0.7409 0.6977 0.9488 0.9481 0.9488 0.9480
SMOTE + SVM 0.7467 0.7987 0.7467 0.7275 0.9155 0.9302 0.9155 0.9062
SMOTE + XGBoost 0.7744 0.8142 0.7744 0.7421 0.9381 0.9449 0.9381 0.9318
SMOTE + LSTM 0.7509 0.7976 0.7508 0.7239 0.9345 0.9431 0.9344 0.9278
SMOTE + AlexNet 0.7875 0.8256 0.7851 0.7727 0.9324 0.9423 0.9308 0.9287
SMOTE + miniVGGNet 0.7651 0.7698 0.7646 0.7435 0.9366 0.9423 0.9366 0.9317
DSSTE + RF 0.8050 0.8468 0.8050 0.7863 0.9692 0.9739 0.9692 0.9698
DSSTE + SVM 0.7759 0.8076 0.7759 0.7658 0.9488 0.9497 0.9488 0.9463
DSSTE + XGBoost 0.8013 0.8349 0.8013 0.7761 0.9602 0.9641 0.9602 0.9611
DSSTE + LSTM 0.8178 0.8271 0.8177 0.8098 0.9638 0.9711 0.9636 0.9650
DSSTE + AlexNet 0.8284 0.8394 0.8278 0.8166 0.9653 0.9709 0.9625 0.9649
DSSTE + miniVGGNet 0.8127 0.8268 0.8132 0.8057 0.9699 0.9746 0.9697 0.9704
compared to the original algorithm. In terms of prediction as the metrics to compare the different methods proposed
accuracy and F1-Score, the improvement is very slight. The by other authors in the face of imbalanced network traffic.
proposed DSSTE algorithm is significantly improved, in As shown in Table 9, our proposed data sampling method
which the average accuracy is improved by 4.75%, and the DSSTE has a higher accuracy than other methods on KD-
average F1-Score is improved by 7.1%. In the CSE-CIC- DTest+. The F1-Score is very close to that of AESMOTE,
IDS2018 dataset, performance gains are very slight or even which exhibits the advantage of reinforcement learning for
degraded after using the RUS, ROS, and SMOTE sampling automatic pairwise sequence learning, but reinforcement
algorithms. After the training set with DSSTE algorithm learning training requires a lot of time to build the model.
sampling proposed in this paper, the average accuracy im- Therefore, our proposed method is more generalizable to
proves by 2.54%, and the average F1-Score improves by imbalanced network traffic.
3.13%. The CIC-IDS-2018 is a large and redundant dataset, and
The F1-Score is a harmonic average of the prediction and data are selected and processed differently by different schol-
recall rates, which is a good indicator of a classification ars. Therefore, we do not compare it with other authors on
model’s performance. So we adopt F1-Score and accuracy the CIC-IDS-2018 dataset. In our experiments, we can see
VOLUME XX, 2020 11
FIGURE 5. Confusion Matrix of CIC-IDS-2018 by DSSTE+minVGGNet.
TABLE 9. Comparison results of DSSTE with the existing approaches on It achieves close to 100% detection rate in some attacks,
NSL-KDD KDDTest+.
and also improves the identification of Brute Force and
Infilteration attacks.
Method Year Acc F1-Score
To sum up, traditional sampling methods reduce the imbal-
DSSTE+AlexNet 2020 0.8284 0.8166 ance in the training set and synthesize close to the real data;
AESMOTE [41] 2020 0.8209 0.8243 it does not produce a distribution that matches the real data.
I-SiamIDS [42] 2020 0.8000 0.6834 RUS algorithm leads to loss of valid information; the ROS
AE-RL [43] 2019 0.8016 0.794 algorithm leads to data redundancy and overfitting. At the
same time, SMOTE interpolation generates noise traffic and
ADASYN [44] 2019 0.7897 /
data overlap, increasing the number of difficult samples in the
WGAN-GP [45] 2019 0.8080 / training set. Our proposed DSSTE algorithm is very targeted
ID-CAVE [46] 2017 0.8010 0.7908 to compress and augment difficult data from an imbalanced
SMOTE-EUS [15] 2016 0.7910 0.7576 training set. It enables the classifier to grasp more data
distribution, thus improving the classification performance.
V. CONCLUSION
that the DSSTE method is significantly better than other sam- As network intrusion continues to evolve, the pressure on
pling algorithms. As shown in Figure 5, DSSTE+AlexNet network intrusion detection is also increasing. In particular,
exhibits excellent performance on the CIC-IDS-2018 dataset. the problems caused by imbalanced network traffic make
12 VOLUME XX, 2020
it difficult for intrusion detection systems to predict the [13] H. Shapoorifard and P. Shamsinejad, “Intrusion detection using a novel
distribution of malicious attacks, making cyberspace security hybrid method incorporating an improved knn,” Int. J. Comput. Appl, vol.
173, no. 1, pp. 5–9, 2017.
face a considerable threat. [14] S. Bhattacharya, R. Kaluri, S. Singh, M. Alazab, U. Tariq et al., “A novel
This paper proposed a novel Difficult Set Sampling Tech- pca-firefly based xgboost classification model for intrusion detection in
networks using gpu,” Electronics, vol. 9, no. 2, p. 219, 2020.
nique(DSSTE) algorithm, which enables the classification
[15] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach
model to strengthen imbalanced network data learning. A for network intrusion detection system,” in Proceedings of the 9th EAI In-
targeted increase in the number of minority samples that need ternational Conference on Bio-inspired Information and Communications
Technologies (formerly BIONETICS), 2016, pp. 21–26.
to be learned can reduce the imbalance of network traffic and
[16] P. Torres, C. Catania, S. Garcia, and C. G. Garino, “An analysis of recurrent
strengthen the minority’s learning under challenging samples neural networks for botnet detection behavior,” in 2016 IEEE biennial
to improve the classification accuracy. We used six classical congress of Argentina (ARGENCON). IEEE, 2016, pp. 1–6.
classification methods in machine learning and deep learning [17] W. Wang, M. Zhu, X. Zeng, X. Ye, and Y. Sheng, “Malware traffic classi-
fication using convolutional neural network for representation learning,”
and combined them with other sampling techniques. Exper- in 2017 International Conference on Information Networking (ICOIN).
iments show that our method can accurately determine the IEEE, 2017, pp. 712–717.
samples that need to be expanded in the imbalanced network [18] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, “A survey of
deep learning-based network anomaly detection,” Cluster Computing, pp.
traffic and improve the attack recognition more effectively. 1–13, 2019.
In the experiment, we found that deep learning perfor- [19] B. A. Tama, M. Comuzzi, and K.-H. Rhee, “Tse-ids: A two-stage classifier
mance is better than machine learning after sampling the im- ensemble for intelligent anomaly-based intrusion detection system,” IEEE
Access, vol. 7, pp. 94 497–94 507, 2019.
balanced training set samples through the DSSTE algorithm. [20] P. Jeatrakul, K. W. Wong, and C. C. Fung, “Classification of imbalanced
Although the neural networks strengthen data expression, data by combining the complementary neural network and smote algo-
the current public datasets have already extracted the data rithm,” in International Conference on Neural Information Processing.
Springer, 2010, pp. 152–159.
features in advance, which is more limited for deep learning [21] B. Yan and G. Han, “La-gru: building combined intrusion detection model
to learn the preprocessed features and cannot take advantage based on imbalanced learning and gated recurrent unit neural network,”
of its automatic feature extraction. Therefore, in the next step, security and communication networks, vol. 2018, 2018.
[22] R. Abdulhammed, M. Faezipour, A. Abuzneid, and A. AbuMallouh, “Deep
we plan to directly use the deep learning model for feature and machine learning approaches for anomaly-based intrusion detection of
extraction and model training on the original network traffic imbalanced network traffic,” IEEE sensors letters, vol. 3, no. 1, pp. 1–4,
data, performance the advantages of deep learning in feature 2018.
[23] P.-J. Chuang and D.-Y. Wu, “Applying deep learning to balancing network
extraction, reduce the impact of imbalanced data and achieve intrusion detection datasets,” in 2019 IEEE 11th International Conference
more accurate classification. on Advanced Infocomm Technology (ICAIT). IEEE, 2019, pp. 213–217.
[24] P. Bedi, N. Gupta, and V. Jindal, “Siam-ids: Handling class imbalance
problem in intrusion detection systems using siamese neural network,”
REFERENCES Procedia Computer Science, vol. 171, pp. 780–789, 2020.
[1] D. E. Denning, “An intrusion-detection model,” IEEE Transactions on [25] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32,
software engineering, no. 2, pp. 222–232, 1987. 2001.
[2] N. B. Amor, S. Benferhat, and Z. Elouedi, “Naive bayes vs decision trees in [26] C. Cortes and V. Vapnik, “Support vector machine,” Machine learning,
intrusion detection systems,” in Proceedings of the 2004 ACM symposium vol. 20, no. 3, pp. 273–297, 1995.
on Applied computing, 2004, pp. 420–424. [27] L. Feng, L. Yu, L. Xueqiang, and L. Zhuo, “Research on query topic
classification method,” Data Analysis and Knowledge Discovery, vol. 31,
[3] M. Panda and M. R. Patra, “Network intrusion detection using naive
no. 4, pp. 10–17, 2015.
bayes,” International journal of computer science and network security,
vol. 7, no. 12, pp. 258–263, 2007. [28] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in
Proceedings of the 22nd acm sigkdd international conference on knowl-
[4] M. A. M. Hasan, M. Nasser, B. Pal, and S. Ahmad, “Support vector
edge discovery and data mining, 2016, pp. 785–794.
machine and random forest modeling for intrusion detection system (ids),”
[29] X. Lei and Y. Xie, “Improved xgboost model based on genetic algorithm
Journal of Intelligent Learning Systems and Applications, vol. 2014, 2014.
for hypertension recipe recognition,” Comput. Sci, vol. 45, pp. 476–481,
[5] N. Japkowicz, “The class imbalance problem: Significance and strategies,” 2018.
in Proc. of the Int’l Conf. on Artificial Intelligence, vol. 56. Citeseer, 2000.
[30] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[6] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, computation, vol. 9, no. 8, pp. 1735–1780, 1997.
no. 7553, pp. 436–444, 2015. [31] A. Raghavan, F. Di Troia, and M. Stamp, “Hidden markov models with
[7] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep random restarts versus boosting for malware detection,” Journal of Com-
learning for visual understanding: A review,” Neurocomputing, vol. 187, puter Virology and Hacking Techniques, vol. 15, no. 2, pp. 97–107, 2019.
pp. 27–48, 2016. [32] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[8] T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in with deep convolutional neural networks,” in Advances in neural informa-
deep learning based natural language processing,” ieee Computational tion processing systems, 2012, pp. 1097–1105.
intelligenCe magazine, vol. 13, no. 3, pp. 55–75, 2018. [33] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
[9] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
to network intrusion detection,” IEEE transactions on emerging topics in [34] A. Ismail, S. A. Ahmad, A. C. Soh, K. Hassan, and H. H. Harith,
computational intelligence, vol. 2, no. 1, pp. 41–50, 2018. “Improving convolutional neural network (cnn) architecture (minivggnet)
[10] D. A. Cieslak, N. V. Chawla, and A. Striegel, “Combating imbalance in with batch normalization and learning rate decay factor for image classi-
network intrusion datasets.” in GrC, 2006, pp. 732–737. fication,” International Journal of Integrated Engineering, vol. 11, no. 4,
[11] M. Zamani and M. Movahedi, “Machine learning techniques for intrusion 2019.
detection,” arXiv preprint arXiv:1312.2177, 2013. [35] M. A. Tahir, J. Kittler, and F. Yan, “Inverse random under sampling for
[12] M. S. Pervez and D. M. Farid, “Feature selection and intrusion classifica- class imbalance problem and its application to multi-label classification,”
tion in nsl-kdd cup 99 dataset employing svms,” in The 8th International Pattern Recognition, vol. 45, no. 10, pp. 3738–3750, 2012.
Conference on Software, Knowledge, Information Management and Appli- [36] A. Liu, J. Ghosh, and C. E. Martin, “Generative oversampling for mining
cations (SKIMA 2014). IEEE, 2014, pp. 1–6. imbalanced datasets.” in DMIN, 2007, pp. 66–72.
VOLUME XX, 2020 13
[37] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: JUN LIN received M.S. degree in computer ar-
synthetic minority over-sampling technique,” Journal of artificial intelli- chitecture from Huazhong University of Science
gence research, vol. 16, pp. 321–357, 2002. and Technology in 2002, senior engineer, vice
[38] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis director of Software & System Research Unit of
of the kdd cup 99 data set,” in 2009 IEEE symposium on computational China Electronic Product Reliability and Environ-
intelligence for security and defense applications. IEEE, 2009, pp. 1–6. ment Testing Research Institute, mainly engaged
[39] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating in computer software development and quality en-
a new intrusion detection dataset and intrusion traffic characterization.” in
gineering technology research and service in mo-
ICISSP, 2018, pp. 108–116.
bile communication, computer network, industrial
[40] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of
machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008. control system, information security, and other rel-
[41] X. Ma and W. Shi, “Aesmote: Adversarial reinforcement learning with evant fields with solid research foundation and rich experience in engineer-
smote for anomaly detection,” IEEE Transactions on Network Science and ing, toke charge or mainly participated in the research of dozen important
Engineering, 2020. scientific research projects funded by national or provincial government fo-
[42] P. Bedi, N. Gupta, and V. Jindal, “I-siamids: an improved siam-ids for cusing on frontier fields, such as new-generation broadband wireless mobile
handling class imbalance in network-based intrusion detection systems,” network, industry internet, IoT, infrastructure software, etc., responsible in
Applied Intelligence, pp. 1–19, 2020. technology for IEEE P360 intelligent wearable technology standard estab-
[43] G. Caminero, M. Lopez-Martin, and B. Carro, “Adversarial environment lishment, published more than ten papers obtained eight invention patents
reinforcement learning algorithm for intrusion detection,” Computer Net- and seven software copyrights, won 6 awards in science and technology.
works, vol. 159, pp. 96–109, 2019.
[44] A. K. Verma, P. Kaushik, and G. Shrivastava, “A network intrusion
detection approach using variant of convolution neural network,” in 2019
International Conference on Communication and Electronics Systems
(ICCES), 2019, pp. 409–416.
[45] J.-T. Wang and C.-H. Wang, “High performance wgan-gp based multiple-
category network anomaly classification system,” in 2019 International
Conference on Cyber Security for Emerging Technologies (CSET). IEEE,
2019, pp. 1–7.
[46] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, “Con-
ditional variational autoencoder for prediction and feature recovery ap-
plied to intrusion detection in iot,” Sensors, vol. 17, no. 9, p. 1967, 2017.
LAN LIU was born in Yiyang, China, in 1977.

She received the B.S., M.S., and Ph.D. degrees in
computer architecture from Huazhong University
of Science and Technology, Wuhan, Hubei, China,
in 1999, 2002 and 2007, respectively. She joined LANGZHOU LIU was born in Hubei, China, has
the Guangdong Polytechnic Normal University as obtained a bachelor’s degree in engineering and
an Assistant in 2003. She became an Associate is currently pursuing a master’s degree at Guang-
Professor and a Master Tutor of the Department dong Polytechnic Normal University. His main
of Electronic Information, in 2008, 2015. She was research direction is network information security
a Visiting Professor to research network security and deep learning.
in Computer Science Laboratory, University of Waikato, Hamilton, New
Zealand, in 2016. She is the author of more than 30 articles that are published
on IEEE conferences and journals (EI indexed or SCI indexed). Her research
interests include network security, deep learning, software defined network.
PENGCHENG WANG was born in Nanchong,

China. He received the B.S. degree of communi-
cation engineering from Physical and Electronic
Information Engineering, Neijiang Normal Uni-
versity, in 2019. He is currently pursuing the M.S.
degree in electronics and communication engi-
neering with the Guangdong Polytechnic Normal
University. His main research interests include
network information security and deep learning.
14 VOLUME XX, 2020

Intrusion Detection of Imbalanced

Uploaded by

Copyright:

Available Formats

Intrusion Detection of Imbalanced

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intrusion Detection of Imbalanced

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Intrusion Detection of Imbalanced

I. INTRODUCTION the limitation of computer storage and computing power at

VOLUME XX, 2020 1

FIGURE 1. The overall framework of network intrusion detection model.

VOLUME XX, 2020 3

Algorithm 1 DSSTE Algorithm

4 VOLUME XX, 2020

In general, Mini-VGGNet contains two sets of CONV =>

(a) NSL-KDD (b) CSE-CIC-IDS2018

FIGURE 2. Use t-SNE to visualize NSL-KDD(a) and CSE-CIC-IDS2018(b).

TABLE 3. Distribution of the benchmark datasets.

Dataset Type Total Imbalance ratio Training set Testing set

(1) Duplicate values: delete the sample’s duplicate value,

TABLE 6. Deep learning model related parameters.

LSTM AlexNet miniVGGNet

as positive samples. True Negative(TN), actually negative

Dataset Type Training set New training set

10 VOLUME XX, 2020

FIGURE 5. Confusion Matrix of CIC-IDS-2018 by DSSTE+minVGGNet.

VOLUME XX, 2020 13

LAN LIU was born in Yiyang, China, in 1977.

PENGCHENG WANG was born in Nanchong,

14 VOLUME XX, 2020

You might also like