Intrusion Detection of Imbalanced
ABSTRACT In imbalanced network traffic, malicious cyber-attacks can often hide in large amounts
of normal data. It exhibits a high degree of stealth and obfuscation in cyberspace, making it difficult
for Network Intrusion Detection System(NIDS) to ensure the accuracy and timeliness of detection. This
paper researches machine learning and deep learning for intrusion detection in imbalanced network traffic.
It proposes a novel Difficult Set Sampling Technique(DSSTE) algorithm to tackle the class imbalance
problem. First, use the Edited Nearest Neighbor(ENN) algorithm to divide the imbalanced training set
into the difficult set and the easy set. Next, use the KMeans algorithm to compress the majority samples
in the difficult set to reduce the majority. Zoom in and out the minority samples’ continuous attributes in the
difficult set synthesize new samples to increase the minority number. Finally, the easy set, the compressed
set of majority in the difficult, and the minority in the difficult set are combined with its augmentation
samples to make up a new training set. The algorithm reduces the imbalance of the original training set
and provides targeted data augment for the minority class that needs to learn. It enables the classifier
to learn the differences in the training stage better and improve classification performance. To verify
the proposed method, we conduct experiments on the classic intrusion dataset NSL-KDD and the newer
and comprehensive intrusion dataset CSE-CIC-IDS2018. We use classical classification models: random
forest(RF), Support Vector Machine(SVM), XGBoost, Long and Short-term Memory(LSTM), AlexNet,
Mini-VGGNet. We compare the other 24 methods; the experimental results demonstrate that our proposed
DSSTE algorithm outperforms the other methods.
INDEX TERMS IDS, imbalanced network traffic, machine learning, deep learning, CSE-CIC-IDS2018.
Therefore, the machine learning algorithm cannot fully learn the classification accuracy of classifiers under different di-
the distribution of a few categories, and it is easy to misclas- mension features [12]. Shiraz studied some new technologies
sify [5]. to improve CANN intrusion detection methods’ classification
Since Hinton et al. [6] proposed the theory of Deep performance and evaluated their performance on the NSL-
Learning as an essential subfield of machine learning, deep KDD Cup99 dataset [13]. He used the K Farthest Neigh-
learning has shown excellent performance in Computer Vi- bor(KFN) and the K Nearest Neighbor(KNN) to classify the
sion(CV) [7], Natural Language Processing(NLP) [8]. Intru- data and used the Second Nearest Neighbor(SNN) of the data
sion detection technology based on deep learning has been when the nearest and farthest neighbors have the same class
widely studied in academia and industry. The method of deep label. The result shows the CANN detection rate and reduces
learning is to mine the potential features of high-dimensional the failure the alert rate is improved or provides the same per-
data through training models and convert network traffic formance. Bhattacharya proposed a machine learning model
anomaly detection problems into classification problems [9]. based on hybrid Principal Component Analysis(PCA)-Firefly
By training a large number of data samples, adaptive learning [14]. The dataset used was the open dataset collected from
of the difference between normal behavior and abnormal Kaggle. Firstly, the model performs one key coding for
behavior effectively enhances the real-time performance of transforming the IDS dataset, then uses the hybrid PCA-
intrusion processing. However, in the multi-classification of Firefly algorithm to reduce the dimension, and the XGBoost
network traffic, the imbalance of classification still affects. algorithm classifies the reduced dataset.
Faced with imbalanced network traffic data, we propose a In recent years, with the powerful ability of automatic fea-
novel Difficult Set Sampling Technique(DSSTE) algorithm ture extraction, deep learning has made remarkable achieve-
to tackle the class imbalance problem in network traffic. This ments in the fields of Computer Vision(CV), Autonomous
method effectively reduces the imbalance and makes the clas- driving(AD), Natural Language Processing(NLP). Many
sification model learning difficult samples more effective. We scholars apply deep learning to intrusion detection for traffic
use classic machine learning and deep learning algorithms to classification, which has become a hot spot of current re-
verify on two benchmark datasets. The specific contributions search. The method of deep learning is to mine the potential
are as follows. characteristics of high-dimensional data through a training
(1) We use the classic NSL-KDD and the up-to-date CSE- model and transform network traffic anomaly detection into
CIC-IDS2018 as benchmark datasets and conduct detailed classification problem [15]. Through a large number of sam-
analysis and data cleaning. ple data training, adaptive learning between normal network
(2) This work proposes a novel DSSTE algorithm, re- traffic and abnormal network traffic effectively enhances real-
ducing the majority samples and augmenting the minority time intrusion processing.
samples in the difficult set, tackling the class imbalance
Torres et al. [16] first converted network traffic characteris-
problem in intrusion detection so that the classifier learns the
tics into a series of characters and then used Recurrent Neural
differences better in training.
Network(RNN) to learn their temporal characteristics, which
(3) The classification model uses Random Forest(RF),
were further used to detect malicious network traffic. Wang
Support Vector Machine(SVM), XGBoost, Long and Short
et al. [17] proposed a malicious software traffic classification
Time Memory(LSTM), AlexNet, Mini-VGGNet. Comparing
algorithm based on Convolutional Neural Network(CNN).
with other methods, we divide the experiment into 30 meth-
By mapping the traffic characteristics to pixels, the network
traffic image is generated, and the image is used as the
The rest of this article is organized as follows. The second
input of the CNN to realize traffic classification. Staudemeyer
part mainly introduces the related work of intrusion detection
et al. [13] proposed an intrusion detection algorithm based
and class imbalance research. The third section introduces
on Long Short-Term Memory(LSTM), which detects DoS
our proposed DSSTE algorithm, machine learning, and deep
attacks and probe attacks with unique time series in the KDD
learning algorithm. The fourth section analyzes and experi-
Cup99 dataset. Kwon et al. [18] has carried out relevant
ments on the benchmark dataset. Finally, the paper concludes
research on the deep learning model, focusing on data simpli-
in the fifth section.
fication, dimension reduction, classification, and other tech-
nologies, and proposes a Fully Convolutional Network(FCN)
model. By comparing with the traditional machine learning
technology, it is proved that the FCN model is useful for net-
In the research of network intrusion detection based on
work traffic analysis. Tama et al. [19] proposed an anomaly-
machine learning, scholars mainly distinguish normal net-
based IDS based on a two-stage meta-classifier, which uses
work traffic from abnormal network traffic by dimensionality
a hybrid feature selection method to obtain accurate feature
reduction, clustering, and classification, to realize the identi-
representations. They conducted on the proposed method
fication of malicious attacks [10], [11].
on the NSL-KDD and UNSW-NB15 intrusion datasets and
Pervez proposed a new method for feature selection
improved detection rates.
and classification merging of multi-class NSL-KDD Cup99
dataset using Support Vector Machine(SVM) and discussed
B. CLASS BALANCING METHODS misjudging new data types, including those not in the training
In the field of machine learning, the problem of category dataset. Bedi et al. [24] proposed a new type of IDS based on
imbalance has always been a challenge. Therefore, intrusion Siamese Neural Network(Siamese-NN), the proposed Siam-
detection also faces enormous challenges in network traf- IDS can detect R2L and U2R attacks without using tra-
fic with extremely imbalanced categories. Therefore, many ditional class balancing techniques, such as over-sampling
scholars have begun to study how to improve the intrusion and random under-sampling. The performance of Siam-IDS
recognition accuracy of imbalanced network traffic data. was compared with Deep Neural Network(DNN) and CNN,
Piyasak proposed a method to improve the accuracy Siam-IDS can achieve a higher recall value for R2L and U2R
of minority classification [20]. This method combines the attack categories compared with similar products.
Synthetic Minority Over-sampling Technique(SMOTE) and Most scholars use interpolation, oversampling, encoder
Complementary Neural Network(CMTNN) to solve imbal- synthesis data, and other data augmentation methods, balance
anced data classification. Experiments on the UCI dataset the training set, and achieve better experimental performance
show that the proposed combination technique can improve results. Although their method synthetic close to real data and
class imbalance problems. Yan proposed an improved lo- effectively expand the minority class, the test data distribu-
cal adaptive composite minority sampling algorithm(LA- tion may exceed the range. The classifier cannot accurately
SMOTE) to deal with the network traffic imbalance problem predict this distribution. We propose the DSSTE algorithm
and then based on the deep learning GRU neural network to mine the difficult samples in the imbalanced training set,
to detect the network traffic anomaly [21]. Abdulhammed compress the majority class among them, and zoom in or
et al. [22] deal with the imbalanced dataset CIDDS-001 out the minority class’s continuous attributes. This method
using data Upsampling and Downsampling methods, and by reduces the imbalance and produces data that conforms to
Deep Neural Networks, Random Forest, Voting, Variational the true distribution.
Autoencoder, and Stacking Machine Learning classifiers to
evaluate datasets. In their proposed method, the accuracy can III. METHOD
reach 99.99%. Faced with imbalanced network traffic, we propose the Diffi-
Recently, Chuang et al. [23] trained the depth automatic cult Set Sampling Technique(DSSTE) algorithm to compress
encoder to establish a data generation model to generate the majority samples and augment the number of minor-
reasonable data needed to form a balanced dataset . His ity samples in difficult samples, reducing imbalance in the
experiments show that the generation of balanced datasets training set that the intrusion detection system can achieve
helps to deal with the problem of over fitting caused by better classification accuracy. We use Random Forest, SVM,
imbalanced data, and it can prevent the training model from XGBoost, LSTM, Mini-VGGNet, and AlexNet as classifiers
for classification models. minority class’s continuous attributes are zoomed to produce
We proposed the intrusion detection model shown in Fig- data that conforms to the true distribution. Therefore, we
ure 1. Data pre-processing first performed in our intrusion propose the DSSTE algorithm to reduce the imbalance.
detection structure, including duplicate, outlier, and missing First, the imbalanced training set to divide into near-
value processing. Then, partitioning the test set and the neighbor set and far-neighbor set by Edited Nearest Neigh-
training set, and the training set processed for data balancing bor(ENN) algorithm. The samples in the near-neighbor set
using our proposed DSSTE algorithm. Before modeling, to are highly similar, making it very difficult for the classifier
increase the speed of the convergence, we use StandardScaler to learn the differences between the categories, so we refer
to standardize the data and digitize the sample labels. Finally, to the samples in the near-neighbor set as difficult samples
the processed training set is used to train the classification and the far-neighbor set as easy samples. Next, we zoom
model, and then the model is evaluated by the test set. in and out the minority samples in difficult set. Finally, the
easy set and minority in difficult set are combined with its
A. DSSTE ALGORITHM augmentation samples to make up a new training set. We use
the K neighbors in the ENN algorithm as the scaling factor
In imbalanced network traffic, different traffic data types of the entire algorithm. When scaling factor K increases, the
have similar representations, especially minority attacks can number of difficult samples increases, and the compression
hide among a large amount of normal traffic, making it rate of the majority of samples and the synthesis rate of
difficult for the classifier to learn the differences between the minority of class also increase. The DSSTE algorithm is
them during the training process. In the similar samples of written as Algorithm 1.
the imbalanced training set, the majority class is redundant
noise data. The number is much larger than the minority
class, making the classifier unable to learn the distribution B. MACHINE LEARNING AND DEEP LEARNING
of the minority class, so we compress the majority class. ALGORITHMS
The minority class discrete attributes remain constant, and In the classifier’s design, we use Random Forest, SVM,
there are differences in continuous attributes. Therefore, the XGBoost, LSTM, AlexNet, and Mini-VGGNet to train and
test, which are detailed in the following part. 4) Long short-term memory
The Long Short-Term Memory(LSTM) network is a Recur-
1) Random Forest rent Neural Network(RNN) structure proposed by Hochreiter
Leo Breiman proposed random Forest in 2001 [25]. Random and Jurgen in 1997 [30]. Like most RNN, the LSTM network
Forest is an excellent supervised learning algorithm that can is universal because as long as there is a suitable weight ma-
train a model to predict which classification results in a trix, the LSTM network can calculate any network element
certain sample type belong to based on a given dataset’s char- that can be calculated by any conventional computer. Dif-
acteristic attributes and classification results. Random Forest ferent from the traditional RNN, the LSTM network is very
is based on a decision tree and adopts the Bagging(Bootstrap suitable for learning from experience. When there is a time
aggregating) method to create different training sample sets. lag of unknown size and boundary between important events,
The random subspace division strategy selects the best at- the time series can be classified, processed, and predicted.
tribute from some randomly selected attributes to split in- LSTM is not sensitive to gap length and has advantages over
ternal nodes. The various decision trees formed are used as other RNN and hidden Markov models and other sequence
weak classifiers, and multiple weak classifiers form a robust learning methods in many applications [31]. The problem of
classifier, and the voting mechanism is used to classify the gradient disappearance and gradient explosion is solved by
input samples. After a random forest has established a large introducing the gate structure and storage unit.
number of decision trees according to a certain random rule
when a new set of samples is input, each decision tree in the 5) AlexNet
forest makes a prediction on this set of samples separately, AlexNet is one of the classic basic networks of deep learning.
and integrates the prediction results of each tree, get a final It was proposed by Hinton and his student Alex Krizhevsky
result. in 2012 [32]. Its main structure is an 8-layer deep neural
network, including 5-layer convolutional layers and 3-layer
2) Support Vector Machine fully connected layers, which are not counted in the Activa-
Coretes and Vapink first proposed support Vector Ma- tion layer and pooling layer. The ReLU function is used as
chine(SVM) in 1995 [26]. It shows many unique advantages the activation function in the AlexNet convolutional layer,
in a small sample, nonlinear, and high-dimensional pattern instead of the Sigmoid function widely used in previous
recognition and can be extended to other functions such networks. The introduction of the ReLU function solves the
as function fitting Machine learning problems [27]. Before problem of gradient dispersion when the neural network
the rise of deep learning, SVM was considered the most is deep. The AlexNet neural network uses the Maxpooling
successful and best-performing machine learning method in method in the convolutional layer to downsample the feature
recent decades. The SVM method is based on the Vapnik map output by the convolutional layer, instead of the average
Chervonenkis(VC) dimension theory of statistical learning pooling commonly used before. Therefore, the AlexNet neu-
theory and the principle of structural risk minimization. Its ral network has better performance than the previous neural
basic idea is to find a separation hyperplane between different network.
categories, so that different category can be better separated.
The SVM method believes that when deciding to separate the 6) Mini-VGGNet
hyperplane, only the sample point closest to the hyperplane, In 2014, researchers from the Visual Geometry Group of
as long as the support vector is found, the hyperplane can be Oxford University and Google DeepMind jointly developed
determined. a new deep convolutional neural network: VGGNet and won
second place in the ILSVRC2014 classification project. Their
3) XGBoost paper "Very Deep Learning Convolutional Neural Networks
XGBoost is a parallel regression tree model that combines for Large-Scale Image Recognition" mainly focuses on the
the idea of Boosting, which is improved based on gradient influence of convolutional neural networks’ depth on the
descent decision tree by Tianqi [28]. Compared with the recognition accuracy of large-scale image sets [33]. The main
GBDT(Gradient Boosting Decision Tree) model, XGBoost contribution is to use a small convolution kernel (3×33×3)
overcomes the limited calculation speed and accuracy. XG- to construct various depths of convolutional neural network
Boost adds regularization to the original GBDT loss function structures. Moreover, it evaluated these network structures
to prevent the model from overfitting. The traditional GBDT and finally proved that the 16-19 layer network depth could
performs a first-order Taylor expansion on the calculated loss achieve better recognition accuracy. VGG-16 and VGG-19
function and takes the negative gradient value as the residual are commonly used to extract image features. VGG can
value of the current model. In contrast, XGBoost performs be regarded as a deepened version of AlexNet. The entire
a second-order Taylor expansion to ensure the accuracy of network is superimposed by a convolutional layer and a fully
the model. Moreover, XGBoost blocks and sorts each feature, connected layer. Unlike AlexNet, VGGNet uses a small-sized
making it possible to parallelize the calculation when looking convolution kernel(3×3).
for the best split point, which significantly accelerates the AlexNet is one of the classic basic networks of deep
calculation speed [29]. learning. It was proposed by Hinton and his student Alex
TABLE 1. Description of the NSL-KDD features.
deep neural network, including 5-layer convolutional layers
and 3-layer fully connected layers, which are not counted in Attributes Description
the Activation layer and pooling layer. The ReLU function is
1–9 Basic features of network connections
used as the activation function in the AlexNet convolutional
layer, instead of the Sigmoid function widely used in previ- 10–22 Content-related traffic features
ous networks. The introduction of the ReLU function solves 23–31 Time-related traffic features
the problem of gradient dispersion when the neural network 32–41 Host-based traffic features
is deep. The AlexNet neural network uses the Maxpooling
method in the convolutional layer to downsample the feature
map output by the convolutional layer, instead of the average
pooling commonly used before. Therefore, the AlexNet neu- dataset contains six different attack scenarios: Brute Force,
ral network has better performance than the previous neural Botnet, DoS, DDoS, Web Attacks, and Infiltration. Each
network. sample in CSE-CIC-IDS2018 includes 83 features listed in
In this experiment, because we have fewer traffic char- Table 2.
acteristics, we used the Mini-VGGNet(miniVGG) network
mentioned by Ismail for classification experiments [34]. TABLE 2. Description of the CSE-CIC-IDS2018 features.
abnormal traffic is much smaller than the normal traffic. The B. DATA PREPROCESSING
specific results are shown in Table 3. When the dataset is extracted, part of the data contains some
noisy data, duplicate values, missing values, infinity values,
etc. due to extraction errors or input errors. Therefore, we first
perform data preprocessing. The main work is as follows. TABLE 4. Development environment.
Accuracy = (4) performance at K=50. In CSE-CIC-IDS2018, the classifiers
TP + TN + FP + FN
TP achieve excellent average performance at K=10. Therefore,
P recision = (5) based on the average F1-Score, in NSL-KDD, we used the
TP + TN scaling factor k=50, where the difficult samples in Normal,
Recall = (6) DoS, and Probe were compressed, and the difficult samples
TP + TN + FP + FN
2 × P recision × Recall in R2L and U2R were augmented with data. In CSE-CIC-
F 1_Score = (7) IDS2018, we used the scaling factor K=10 and performed a
P recision + Recall
similar treatment to NSL-KDD for the difficult samples. The
E. EXPERIMENTAL RESULTS new training set after the treatment is shown in Table 7.
In our experiments, we first explored the classifier’s perfor- Table 8 summarizes the comparison between DSSTE and
mance on the training set treated with different deflation fac- other sampling methods, and our proposed DSSTE algorithm
tors. In the proposed DSSTE algorithm, there is a parameter outperforms other methods in NSL-KDD and CSE-CIC-
scaling factor of K. When K increases within a certain range, IDS2018.
the number of difficult samples will also increase, but when In the experimental results for the NSL-KDD dataset,
K exceeds the range, the number of difficult samples will LSTM achieved the highest accuracy of 78.24% and the high-
constantly be constant. However, the majority compression est F1-Score of 75.03% in the original training set. After sam-
and the minority augmentation in the difficult samples will pling the RUS algorithm’s training, XGBoost achieved the
increase with K change. Therefore, to ensure that the data highest accuracy rate of 78.79%, and miniVGGNet achieved
sampling is useful and does not generate excessive noise and the highest recall rate of 75.57%. After sampling the ROS al-
that the DSSTE algorithm achieves the best sampling results, gorithm’s training, LSTM achieved the highest accuracy rate
we experimented with different scaling factors. of 78.72% and the highest recall rate of 75.82%. After the
We processed the training set in NSL-KDD and CSE-CIC- SMOTE algorithm sampled the training set, AleNet achieved
IDS2018 using different scaling factors K. We performed the highest accuracy rate of 78.75% and the highest recall rate
experiments on the proposed six classifiers, and performance of 77.27%. In the training set sampled by DSSTE proposed
was evaluated using the average F1-Score of each classifier, in this paper, AleNet achieved the highest accuracy rate of
as shown in Figure 3. 82.84% and the highest recall rate of 81.66%.
In NSL-KDD, the classifiers achieve excellent average In the experimental results of the CSE-CIC-IDS2018
TABLE 7. The new training set class distribution processed by the DSSTE algorithm.
dataset, random forest achieves the highest accuracy of 94.89 97.04%. However, the accuracy and recall of random forest
% and the highest F1-Score of 94.72 % in the unprocessed are also very close to each other. Random forest exhibits the
training set. After the RUS, ROS, and SMOTE algorithms generalization capability of integrated learning when used in
sampled the training set. The random forest achieved the combination with each sampling algorithm, and it requires
highest accuracy and F1-Score. However, the performance fewer hardware resources.
improvement was very small or even lower than that of the As shown in Figure 4, we counted the average accuracy
original data set. In the training set sampled by the DSSTE and F1-Score of the classifier for each sampling method.
algorithm proposed in this paper, miniVGGNet achieves In the NSL-KDD dataset, the sampling algorithms’ per-
the highest accuracy of 96.99% and the highest recall of formance using RUS, ROS, and SMOTE are all improved
FIGURE 4. Comparison of the performance of different sampling methods(Accuracy and F1-Score are the average of each classifier).
TABLE 8. Comparison results between DSSTE and different methods(Acc, Pre, and F1-Score are the average of multiple classes, weighted by the
number of samples in each class).
Acc Pre Recall F1 Acc Pre Recall F1
RF 0.7434 0.8137 0.7434 0.7015 0.9489 0.9481 0.9489 0.9481
SVM 0.7366 0.7384 0.7366 0.6966 0.9225 0.9261 0.9225 0.9126
XGBoost 0.7715 0.8107 0.7715 0.7365 0.9398 0.9449 0.9398 0.9340
LSTM 0.7824 0.7838 0.7823 0.7503 0.9375 0.9444 0.9370 0.9313
AlexNet 0.7618 0.8050 0.7611 0.7194 0.9376 0.9440 0.9369 0.9313
miniVGGNet 0.7605 0.8066 0.7594 0.7303 0.9388 0.9450 0.9384 0.9326
RUS + RF 0.7655 0.8220 0.7655 0.7304 0.9419 0.9454 0.9419 0.9428
RUS + SVM 0.7362 0.7510 0.7362 0.7058 0.8902 0.9087 0.8902 0.8926
RUS + XGBoost 0.7879 0.8177 0.7879 0.7468 0.9212 0.9362 0.9212 0.9234
RUS + LSTM 0.7705 0.7970 0.7704 0.7462 0.9150 0.9294 0.9139 0.9171
RUS + AlexNet 0.7834 0.8250 0.7814 0.7537 0.9294 0.9308 0.9268 0.9280
RUS + miniVGGNet 0.7827 0.8134 0.7826 0.7557 0.9337 0.9330 0.9329 0.9319
ROS + RF 0.7515 0.8125 0.7515 0.7066 0.9492 0.9484 0.9492 0.9483
ROS + SVM 0.7493 0.8005 0.7493 0.7300 0.9165 0.9311 0.9165 0.9073
ROS + XGBoost 0.7809 0.8196 0.7809 0.7532 0.9385 0.9448 0.9385 0.9320
ROS + LSTM 0.7872 0.8289 0.7866 0.7582 0.9348 0.9409 0.9346 0.9293
ROS + AlexNet 0.7850 0.8057 0.7849 0.7451 0.9299 0.9387 0.9295 0.9262
ROS + miniVGGNet 0.7626 0.7893 0.7626 0.7332 0.9362 0.9406 0.9359 0.9316
SMOTE + RF 0.7409 0.8070 0.7409 0.6977 0.9488 0.9481 0.9488 0.9480
SMOTE + SVM 0.7467 0.7987 0.7467 0.7275 0.9155 0.9302 0.9155 0.9062
SMOTE + XGBoost 0.7744 0.8142 0.7744 0.7421 0.9381 0.9449 0.9381 0.9318
SMOTE + LSTM 0.7509 0.7976 0.7508 0.7239 0.9345 0.9431 0.9344 0.9278
SMOTE + AlexNet 0.7875 0.8256 0.7851 0.7727 0.9324 0.9423 0.9308 0.9287
SMOTE + miniVGGNet 0.7651 0.7698 0.7646 0.7435 0.9366 0.9423 0.9366 0.9317
DSSTE + RF 0.8050 0.8468 0.8050 0.7863 0.9692 0.9739 0.9692 0.9698
DSSTE + SVM 0.7759 0.8076 0.7759 0.7658 0.9488 0.9497 0.9488 0.9463
DSSTE + XGBoost 0.8013 0.8349 0.8013 0.7761 0.9602 0.9641 0.9602 0.9611
DSSTE + LSTM 0.8178 0.8271 0.8177 0.8098 0.9638 0.9711 0.9636 0.9650
DSSTE + AlexNet 0.8284 0.8394 0.8278 0.8166 0.9653 0.9709 0.9625 0.9649
DSSTE + miniVGGNet 0.8127 0.8268 0.8132 0.8057 0.9699 0.9746 0.9697 0.9704
compared to the original algorithm. In terms of prediction as the metrics to compare the different methods proposed
accuracy and F1-Score, the improvement is very slight. The by other authors in the face of imbalanced network traffic.
proposed DSSTE algorithm is significantly improved, in As shown in Table 9, our proposed data sampling method
which the average accuracy is improved by 4.75%, and the DSSTE has a higher accuracy than other methods on KD-
average F1-Score is improved by 7.1%. In the CSE-CIC- DTest+. The F1-Score is very close to that of AESMOTE,
IDS2018 dataset, performance gains are very slight or even which exhibits the advantage of reinforcement learning for
degraded after using the RUS, ROS, and SMOTE sampling automatic pairwise sequence learning, but reinforcement
algorithms. After the training set with DSSTE algorithm learning training requires a lot of time to build the model.
sampling proposed in this paper, the average accuracy im- Therefore, our proposed method is more generalizable to
proves by 2.54%, and the average F1-Score improves by imbalanced network traffic.
3.13%. The CIC-IDS-2018 is a large and redundant dataset, and
The F1-Score is a harmonic average of the prediction and data are selected and processed differently by different schol-
recall rates, which is a good indicator of a classification ars. Therefore, we do not compare it with other authors on
model’s performance. So we adopt F1-Score and accuracy the CIC-IDS-2018 dataset. In our experiments, we can see
TABLE 9. Comparison results of DSSTE with the existing approaches on It achieves close to 100% detection rate in some attacks,
and also improves the identification of Brute Force and
Infilteration attacks.
Method Year Acc F1-Score
To sum up, traditional sampling methods reduce the imbal-
DSSTE+AlexNet 2020 0.8284 0.8166 ance in the training set and synthesize close to the real data;
AESMOTE [41] 2020 0.8209 0.8243 it does not produce a distribution that matches the real data.
I-SiamIDS [42] 2020 0.8000 0.6834 RUS algorithm leads to loss of valid information; the ROS
AE-RL [43] 2019 0.8016 0.794 algorithm leads to data redundancy and overfitting. At the
same time, SMOTE interpolation generates noise traffic and
ADASYN [44] 2019 0.7897 /
data overlap, increasing the number of difficult samples in the
WGAN-GP [45] 2019 0.8080 / training set. Our proposed DSSTE algorithm is very targeted
ID-CAVE [46] 2017 0.8010 0.7908 to compress and augment difficult data from an imbalanced
SMOTE-EUS [15] 2016 0.7910 0.7576 training set. It enables the classifier to grasp more data
distribution, thus improving the classification performance.
that the DSSTE method is significantly better than other sam- As network intrusion continues to evolve, the pressure on
pling algorithms. As shown in Figure 5, DSSTE+AlexNet network intrusion detection is also increasing. In particular,
exhibits excellent performance on the CIC-IDS-2018 dataset. the problems caused by imbalanced network traffic make
