Autoencoder Architecture
Autoencoder Architecture
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
Abstract—As communication technology advances, various and techniques have become more complex and sophisticated, and
heterogeneous data are communicated in distributed environ- the frequency of attacks has also increased. Accordingly, the
ments through network systems. Meanwhile, along with the importance of cybersecurity is emphasized, and various studies
development of communication technology, the attack surface
has expanded, and concerns regarding network security have have been actively conducted to prevent potential network
increased. Accordingly, to deal with potential threats, research threats.
on Network Intrusion Detection Systems (NIDS) has been ac- One of the fundamental challenges in cybersecurity is the
tively conducted. Among the various NIDS technologies, recently detection of network threats, and various results have been
interest is focused on artificial intelligence(AI)-based anomaly reported in the field of network intrusion detection systems
detection systems, and various models have been proposed to
improve the performance of NIDS. However, there still exists the (NIDS). In particular, the most recent studies have been fo-
problem of data imbalance, in which AI models cannot suffi- cused on applying the artificial intelligence (AI) technology to
ciently learn malicious behavior and thus fail to detect network NIDS, and AI-based intrusion detection systems have achieved
threats accurately. In this study, we propose a novel AI-based remarkable performance. Initially, the research primarily fo-
network intrusion detection system that can efficiently resolve cused on applying traditional machine learning models such
the data imbalance problem and improve the performance of
the previous systems. To address the aforementioned problem, we as decision trees [1] (DT) and support vector machines [2]
leveraged a state-of-the-art generative model that could generate (SVMs) to existing intrusion detection systems, and it has
plausible synthetic data for minor attack traffic. In particular, now been extended to deep learning approaches [3] such
we focused on the reconstruction error and Wasserstein distance- as convolutional neural networks (CNNs), long short-term
based generative adversarial networks, and autoencoder-driven memory (LSTM), and autoencoders. Although these results
deep learning models. To demonstrate the effectiveness of our
system, we performed comprehensive evaluations over various have achieved remarkable performance in detecting anomalies,
datasets and demonstrated that the proposed systems significantly there still exist limitations in deploying them in real systems.
outperformed the previous AI-based NIDS. In general, most of the network flow data is normal traffic,
and malicious behavior that can cause service failure occurs
Index Terms—Network intrusion detection system, anomaly rarely. Moreover, within the category of malicious behavior,
detection, network security, generative adversarial network. most of the data are well-known attacks, and specific types of
attacks are extremely rare. Due to this data imbalance problem,
I. I NTRODUCTION AI models deployed in NIDS cannot sufficiently learn the
characteristics of specific network threats, and this may leave
With the development of the fifth-generation (5G) mobile the network systems vulnerable to the attacks owing to the
communication technology that diversifies the access environ- poor detection performance.
ments and constructs distributed networks, various and het- In this study, to address this inherent problem, we propose
erogeneous data are communicated through network systems. a novel AI-based network intrusion detection system that can
In general, these data originate from diverse domains such as resolve the data imbalance problem and improve the perfor-
sensors, computers, and the Internet of Things (IoT), and the mance of the previous systems. To address the aforementioned
capacity of network systems has been expanded to process problem, we leveraged a state-of-the-art deep learning archi-
these data reliably. However, as the access points are diversi- tecture, generative adversarial networks [4] (GAN), to generate
fied, the attack surface expands, thereby leaving the network synthetic network traffic data. In particular, we focused on
systems vulnerable to potential threats. Moreover, cyber-attack the reconstruction error and Wasserstein distance-based GAN
This work was supported by Institute of Information & communications architecture [5], which can generate plausible synthetic data
Technology Planning & Evaluation (IITP) grant funded by the Korea gov- for minor attack traffic. By combining the generative model
ernment (MSIT) (No.2020-0-00952, Development of 5G Edge Security Tech- with anomaly detection models, we demonstrated that the
nology for Ensuring 5G+ Service Stability and Availability). (Corresponding
author: Cheolhee Park.) proposed systems outperformed previous results in terms of
C. Park, J. Lee, Y. Kim, JG. Park, H. Kim are with the Electronics the classification performance.
and Telecommunications Research Institute, Daejeon 34129, South Ko- The entire architecture of our system consists of four main
rea (e-mail: chpark0528@etri.re.kr; mine@etri.re.kr; blitzkrieg@etri.re.kr;
queue@etri.re.kr; be.successor@etri.re.kr). stages (see Fig. 1): pre-processing, generative model training,
D. Hong is with the Department of Applied Mathematics, Kongju National autoencoder training, and predictive model training. In the
University, Gongju 32588, South Korea (e-mail: dwhong@kongju.ac.kr). pre-processing stage, the system refines the raw dataset into
”Copyright (c) 20xx IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be a format that deep learning models can learn. After pre-
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.” processing, the system sequentially trains generative mod-
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
els and an autoencoder model, where the trained generative scenarios, we show that the proposed system can be
models are utilized to train the autoencoder model. Finally, effectively applied to real-world environments.
the system trains predictive models by applying the trained The rest of this paper is organized as follows. Section 2
generative models and the encoder of the trained autoencoder, briefly reviews related research from the perspective of NIDS
where the generative models are used to generate scarce data based on machine learning and deep learning approaches, and
and the encoder is used as a feature extractor. In the case of the Section 3 provides background with a focus on autoencoders
classifier models, we consider three deep learning models that and generative adversarial networks. In Section 4, we describe
have been widely utilized in AI-based NIDS: deep neural net- our methodology and the proposed framework as well as
works (DNN), convolutional neural networks (CNN), and long the four main stages in detail. In Section 5, we evaluate
short-term memory (LSTM) model. To evaluate our system, the proposed system in various environments and present
we experimented with four network flow datasets considering experimental results with detailed analysis. Finally, we present
different scenarios: NSL-KDD [6][7], UNSW-NB15 [8], IoT concluding remarks and future work directions of this study
dataset [9], and real-world dataset. Through experiments on in Section 6.
these various datasets, we show that the proposed system
outperformed previous results. Moreover, we demonstrate that II. R ELATED WORK
our methodology can improve the performance of existing AI-
based NIDS by resolving the data imbalance problem. In the field of AI-based network intrusion detection systems,
many studies have been conducted to apply machine learning
The main contributions of the proposed approach can be
and deep learning technologies as anomaly detection. Ingre
summarized as follows:
and Yadav [10] proposed multi-layer perceptron-based intru-
• By combining the state-of-the-art GAN model that can sion detection system and showed that the proposed approach
generate plausible synthetic data and measure the con- achieve 81% and 79.9% accuracy in experiments on the NSL-
vergence of training, we show that the proposed system KDD dataset for binary and multi-classification, respectively.
outperforms existing AI-based NIDS in terms of detection Gao et al. [11] proposed a semi-supervised learning approach
rate. for network intrusion detection systems based on fuzzy and
• Through comparative experiments with various deep ensemble learning and reported that the proposed system
learning models, we present that the detection perfor- achieved 84.54% accuracy on the NSL-KDD dataset. By
mance for rare attacks can be improved by applying our applying the deep belief network (DBN) model, Alrawashdeh
methodology it as a base module. et al. [12] developed an anomaly intrusion detection system
• By experimenting with datasets collected from various and showed that the proposed DBN-based IDS exhibited
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
a superior classification performance in sub-sampled testing focused on the application of unsupervised learning, especially
sets (sampled subsets from the original dataset). By consid- autoencoder models. Javaid et al. [25] proposed a sparse
ering the Software Defined Networking environment, Tang autoencoder-based NIDS and reported that the proposed model
et al. [13] proposed a deep neural network-based anomaly achieved 79.1% accuracy for multi-classification on the NSL-
detection system and reported that the DNN-based approach KDD dataset. Similarly, Yan and Han [26] leveraged the sparse
outperformed traditional machine learning model approaches autoencoder model to extract high-level feature representations
(e.g., Naı̈ve Bayes, SVM, and Decision Tree). In [14], the of intrusive behavior information and demonstrated that the
authors proposed a restricted Boltzmann machine (RBM)- stacked sparse autoencoder model could be applied as an
based intrusion detection system and showed that the Gaus- efficient feature extraction method. Shone et al. [27] proposed
sian–Bernoulli RBM model outperformed other RMB-based a stacked non-symmetric deep autoencoder-based intrusion
models (such as Bernoulli-Bernoulli RBM and DBN). From detection system. In [27], the authors showed that the proposed
the perspective of utilizing both behavioral (network traffic model could achieve 85.42% accuracy in multi-classification.
characteristics) and content features (payload information), As one of the significant results, Ieracitano et al. [28] pro-
Zhong et al. [15] introduced a big data and tree architecture- posed an autoencoder-driven intrusion detection model. In
driven deep learning system into intrusion detection system, [28], the authors proposed autoencoder-based and LSTM-
where the authors combined shallow learning and deep learn- based IDS models and compared their performance with
ing strategies and showed that the system is particularly conventional machine learning models. Through experiments
effective at detecting subtle patterns for intrusion attacks. on the NSL-KDD dataset, they reported that the proposed
With the ensemble model-like approach, Haghighat et al. [16] autoencoder-based systems outperformed other models and
proposed an intrusion detection system based on deep learning achieved 84.21% and 87% accuracy for binary and multi-
and voting mechanisms. In [16], the authors aggregated the classification, respectively.
best model results, and showed that the system can provide As another approach to applying unsupervised learning,
more accurate detections. Moreover, they showed that the false several studies have investigated using generative models
alarms can be reduced up to 75% compared to the conven- to improve the performance of existing NIDS. In partic-
tional deep learning approaches. Considering data streams in ular, they have focused on applying the basic generative
industrial IoT environments, Yang et al. [17] proposed a tree adversarial networks (GAN) [4], which are based on the
structure-based anomaly detection system, where the authors Jensen-Shannon divergence (or Kullback-Leibler divergence)
incorporates the window sliding, detection strategy changing, [29][30][31]. Thereafter, along with the development of var-
and model updating mechanisms into the locality-sensitive ious GAN models, studies have been conducted to apply
hashing-based iForest model [18][19] to handle the infiniteness appropriate GAN models for specific purposes. Li et al. [32]
of data streams in real-time scenario. Similarly, Qi et al. and Lee et al. [33] utilized the Wasserstein divergence-based
[20] proposed an intrusion detection system for multiaspect GAN model to generate the synthetic data, and Dlamini et
data streams by combining locality-sensitive hashing, isolation al. [34] proposed a conditional GAN-based anomaly detection
forest, and principal component analysis techniques. In [20], model to improve the classification performance in the minor-
the authors showed that the proposed system can effectively ity classes. By focusing on specific industrial environments, Li
detect group anomalies while dealing with multiaspect data et al. [35] and Alabugin et al. [36] proposed LSTM-GAN and
and process each data row faster than the previous approaches. bidirectional GAN-based anomaly detection models, respec-
From the perspective of dealing with time-series data, sev- tively. Through experiments on the Secure Water Treatment
eral results have been reported focusing on recurrent models. (SWaT) dataset, they demonstrated that GAN models could
Kim et al. [21] proposed an LSTM-based IDS model and be effectively applied to IDS. Siniosoglou et al.[37] proposed
proved the efficiency of the proposed IDS. Yin et al. [22] an anomaly detection model that could simultaneously detect
proposed a recurrent neural network-based intrusion detection anomalies and categorize the attack types. In [37], the author
system and achieved 83.3% accuracy and 81.3% accuracy in encapsulated the autoencoder architecture into the structure
binary and multi-classification, respectively. Xu et al. [23] of the basic GAN model (i.e., deploying the encoder as a
developed a recurrent neural network-based intrusion detection discriminator and the decoder as a generator) and proved
model and reported that the gated recurrent unit was more suit- the efficiency of the proposed model in various smart grid
able as a memory unit for intrusion detection than the LSTM environments.
unit. By considering supervisory control and data acquisition Unlike previous GAN approaches that are based on the
(SCADA) networks, Gao et al. [24] proposed omni-intrusion distance between data distributions, we considered the recon-
detection system. In [24] the authors combined LSTM and a struction error-based GAN model to generate more plausible
feedforward neural network through an ensemble approach, synthetic data. In particular, we leveraged the Boundary Equi-
and showed that the proposed system can effectively detect librium GAN (BEGAN) model [5], which is based on the
intrusion attacks regardless of temporal correlation. Moreover, concept of autoencoders and the Wasserstein distance between
they demonstrated that the proposed omni-IDS outperformed reconstruction error distributions of samples (real and synthetic
previous deep learning approaches through experiments on a samples). Moreover, we incorporated the autoencoder model
SCADA testbed. into the detection models to extract meaningful features from
In addition to the previous approach of applying supervised the data and extend the adaptability and demonstrated that the
learning as an anomaly detection model, several studies have propsoed framework outperforms previous AI-based network
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
based on the Wasserstein distance between reconstruction error the one-hot encoding process, the system scales the numeric
distributions as follows: attributes. In general, normalization (e.g., [28]) and standard-
ization (e.g., [24]) can be considered as scaling for numeric
LD = L(x; θD ) − kt · L(G(z; θG ); θD ) features. Between these two approaches, we adopted the min-
LG = L(G(z; θG ); θD ) (5) max normalization method.2 The normalization function fA (·)
kt+1 = kt + λk · (γ · L(x; θD ) − L(G(z; θG ); θD ) for a numeric attribute A that maps ∀x ∈ A into a range [0, 1]
can be defined as follows:
, where the hyper-parameter γ ∈ [0, 1] is the diversity ratio1 , xi − min xj
and λk serves as the learning rate for k. Note that L(·) denotes fA (xi ) = x̃i = (7)
max xj − min xj
the reconstruction error of the autoencoder, and t indicates the
iteration step. , where xi denotes the i-th attribute value in the attribute A.
In general, existing deep learning-based approaches con-
IV. P ROPOSED M ETHODOLOGY sider feature extraction (e.g., principal component analysis,
Pearson correlation coefficient, etc.) at this step to feed
As shown in Fig 1, the entire architecture of the proposed the model as many informative features as possible, and,
AI-based NIDS consists of four main streams: pre-processing, consequently, feature extraction can significantly impact the
generative model training, autoencoder training, and predictive performance of models in anomaly detection. However, we
model training. In this section, we describe the proposed do not consider the computational feature extraction process,
methodology and each module (process) in detail. as our framework embeds an autoencoder model that can
replace functionalities of feature extraction. Note that, in our
A. Pre-processing framework, the model with a computational feature extraction
Before building and training AI models, the system refines process did not show significant improvement compared with
a given raw dataset via the pre-processing module that consists the model without the feature extraction. A detailed descrip-
of three sub-processes: outlier analysis, one-hot encoding, and tion for deploying the autoencoder as a feature extractor is
feature scaling. presented later.
In the outlier analysis phase, the system eliminates outliers,
which can negatively affect the model training. Typically, B. Synthetic data generation with generative model
outliers are detected by quantifying the statistical distribution The synthetic data generation module builds and trains
of the datasets via robust measures of scale. There are several generative models using the dataset refined in the data pre-
standard robust measures of scale for detecting outliers, such processing module. In the case of the generative model, we
as interquartile range (IQR) and median absolute deviation utilize a state-of-the-art GAN model, BEGAN, which is based
(MAD). Among these measures, we leveraged the MAD. on the concept of autoencoders and the reconstruction error-
For a numeric attribute A = {x1 , x2 , ..., xn }, the MAD of based objective function. For the model architecture, we built
the attribute is defined as follows: the discriminator as a symmetric autoencoder model with five
layers and the generator with the same architecture as the
M AD = median(|xi − median(A)|). (6) decoder of the discriminator (autoencoder). Figure 4 illustrates
the entire architecture of the BEGAN model. Before training
We assume that numeric attributes appearing in the dataset
the BEGAN model, the system first splits the given dataset
follow a normal distribution. Then, a consistent estimator σ̂ for
according to the classes and then builds generative models
the estimation of the standard deviation is 1.4826 × M AD. In
for each split sub-dataset. That is, generative models are
terms of this estimator, we determine that for a given numeric
built in a number equal to the number of classes, and (after
attribute, values exceeding 10 × σ̂ are outliers. Obviously,
training) each generative model produces only synthetic data
outlier analysis is performed only on the numerical attributes
corresponding a particular class.
and conducted independently for each class. Note that, outlier
One of the important factors that must be considered when
removal should be performed before scaling features, as it can
applying GAN models to NIDS is the determination of the
potentially obscure information about outliers.
termination criteria of training, which has a significant impact
After filtering out the outliers, the system transforms nomi-
on the performance of anomaly detection, as it is directly
nal attributes into one-hot vectors. Each nominal (categorical)
related to the quality of the synthetic data to be trained on
attribute is represented as a binary vector with the size of
the detection model. The determination of the termination
the number of attribute values, where 1 is assigned only to
criteria stems from the tracking of the training convergence,
a point corresponding to the expressed value and 0 to all
and this is a difficult problem, as the objective function of
others. For example, in the case of the ‘protocol’ attribute
GAN models is defined to have the properties of a zero-sum
(commonly included in network traffic data) with the values
game. In general, monitoring the training progress has been
tcp, udp, and icmp, the attribute is transformed into a binary
conducted indirectly through visual inspection of synthetic
vector of length 3, and the attribute values are converted
(generated) data. However, even this approach is not feasible
into [1,0,0], [0,1,0], and [0,0,1], respectively. Together with
2 In our experiments, there was no significant difference between the two
1 Originally, E[L(G(z))]
the diversity ratio γ is defined as γ = E[L(x)]
feature scaling methods in terms of the performance of detection models.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
Fig. 5. The basic architecture of generative adversarial networks. Fig. 6. Structure of LSTM cell.
1D-convolutional layers and one fully connected layer. For Algorithm 2 Classifier training with generators
LSTM, we designed the model to possess two recurrent layers
with the LSTM units and a fully connected layer, as shown Input: training dataset Dtrain , a set of generators G, trained
in Figure 6. LSTM is known to be particularly effective in encoder θenc
analyzing temporally correlated features [24]. Taking these 1: Initialize classifier parameters W 0
characteristics into account, we omitted the process of com- 2: for Gi ∈ G, where 1 ≤ i ≤ k do
bining with the autoencoder model for the LSTM model, since 3: sample z = {zj }j=1,...,mi from the latent space
the encoder may obscure the temporal features. For all models, 4: D̂i = Gi (z)
we designed the output layer with a binary field when the task 5: end for
was to detect anomalies, and with multi-valued fields when the 6: D̃ = Dtrain ∪ D̂1 ∪ · · · ∪ Dˆk
purpose was to distinguish not only the anomalies but also the 7: Set Trainable State on θenc = False
detailed threat types. Algorithm 2 presents a detailed workflow 8: Build Wθ0enc = Concatenate Models(θAE , W 0 )
for training a detection model with the trained generators and 9: Wθenc = Train Classifier(Wθ0enc , D̃)
the trained encoder. As with the autoencoder training process, Output : trained classifier Wθenc
the magnitude mi (1 ≤ i ≤ k) of synthetic data generation
can be set differently depending on the weight of each class.
Note that, the process of combining with the trained encoder
(line 7 and 8 in Algorithm 2) can be omitted according to the intrusion detection systems. Furthermore, we collected the
predictive model. real data from a large enterprise system and analyzed the
From the perspective of the entire framework, the system performance of the proposed model on the real dataset.
sequentially processes the data pre-processing, synthetic data
generation, and detection model training modules, and we 1) NSL-KDD dataset: The NSL-KDD dataset is a refined
refer to the whole system as G-DNNAE , G-CNNAE , and version of the KDDcup99 dataset [6][7] and consists of
G-LSTM, according to the type of the detection model. training and testing datasets, KDDTrain and KDDTest, with
Additionally, we subdivide the whole system into subsystems 125,973 and 22,544 rows, respectively.4 In each data point,
for a comprehensive comparison. In particular, we consider the there exist 41 attributes (3 nominal, 6 binary, and 32 numeric
DNN, CNN, and LSTM models as naı̈ve deep learning models attributes) presenting different features of the network flow
and DNNAE and CNNAE , which are models combined with and a label indicating an attack type or normal behavior. For
the autoencoder, as advanced deep learning models. In the the attack type, there exist four distinct attack profiles: Denial
experiment, we conducted a comparative analysis of G-LSTM, of Service (DoS), Probing, Remote to Local (R2L), and
G-DNNAE , and G-DNNAE with the subsystems. User to Root (U2R). DoS is an attack that depletes resources
by sending excessive traffic to the target system, thereby
V. E XPERIMENTS AND E VALUATIONS rendering it incapable of handling legitimate network traffic
or service access. In the case of a probing attack, attacker’s
In this section, we first review the target datasets and de- objective is to gain information about the target system (e.g.,
scribe the detailed implementation of each component. Then, scanning ports in use and sweeping IP addresses). R2L is
we present the experimental results with comparative analysis an attack that attempts to obtain local access from a remote
and evaluate the proposed systems. machine by sending remote fraudulent traffic to the target,
and behaviors such as password guessing and HTTP tunneling
A. Dataset description are considered R2L attacks. In the case of U2R, an attacker
In this work, we focused on three network traffic datasets 4 The original configuration of the dataset includes several sub-datasets.
that are widely used as benchmark datasets in the field of However, we only present the main training and testing datasets.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
TABLE I TABLE II
DATA DISTRIBUTION IN NSL-KDD DATA DISTRIBUTION IN UNSW-NB15
Class Training Weight (%) Testing Weight (%) Class Training Weight (%) Testing Weight (%)
Normal 67,342 53.46% 9,710 43.07% Normal 56,000 31.94% 37,000 44.94%
DoS 45,927 36.46% 7,460 33.09% Generic 40,000 22.81% 18,871 22.92%
Probing 11,656 9.25% 2,421 10.74% Exploits 33,393 19.04% 11,132 13.52%
R2L 995 0.79% 2885 12.79% Fuzzers 18,184 10.37% 6,062 7.36%
Total 125,973 100% 22,543 100% Reconnaissance 10,491 5.98% 3496 4.25%
first gains access to the target system as an honest user and Backdoors 1,746 0.99% 583 0.71%
then attempts to gain root privileges by causing system faults
(e.g., buffer overflow and rootkit). Table 1 presents the entire Shellcode 1,133 0.65% 378 0.46%
distribution of the NSL-KDD dataset with respect to the
Worms 130 0.07% 44 0.05%
classes (attack classes and normal).
Total 175,341 100% 82,332 100%
2) UNSW-NB15 dataset: Together with the NSL-KDD
dataset presented above, the UNSW-NB15 dataset [8],
which was created by the IXIA PerfectStorm tool, has
been widely used as an experimental dataset in the field of
anomaly detection systems. Similarly, UNSW-NB15 consists scenario (named CTU-IoT-Malware-Capture-34-1). The
of training and testing datasets, UNSW-NB15 training and dataset contains 23,145 IoT network flows, where each data
UNSW-NB15 testing, with 175,341 and 82,332 records, point belongs to one of the following four classes: Benign,
respectively. Each record possesses 43 attributes that present C&C, DDos, and PortScan. Benign matches the normal
network flow features and two class attributes.5 The class class, and the others are treated as threats. C&C indicates
attributes consist of an attribute that indicates whether or not communication connected to the command & control server,
the record is normal traffic (binary-valued attribute) and the and PortScan refers to the activity of scanning ports to gather
type of attack (when the record is abnormal). For the attack information in order to conduct further attacks. For each data
type, there are nine distinct attack profiles that are intuitively point, there are 21 attributes (11 nominal, 2 binary, and 8
labeled as follows: Fuzzers, Analysis, Backdoors, DoS, numeric attributes) presenting different features of network
Exploits, Generic, Reconnaissance, Shellcode, and Worms. flow, and we removed four features that did not affect the
Table 2 presents the entire distribution of the UNSW-NB15 learning, such as id and IP address. To adjust the magnitude
dataset. Note that, we excluded any unnecessary attribute of normal class data considering the data imbalance scenario,
that did not affect the training of the models (”id” field) we randomly sampled 98,077 data from datasets in the benign
and combined the two class attributes into a single field. scenarios. Consequently, we configured the IoT data set to
Therefore, the dataset is considered to have 42 attributes (4 have 100,000 Benign data, 6,706 C&C data, 14,394 DDos
nominal, 2 binary, and 36 numeric attributes) and a class data, and 122 PortScan data.
attribute.
4) Real dataset: To evaluate the performance of our system
3) IoT dataset: In addition to the datasets NSL-KDD and in real-world environments, we collected raw security events
UNSW-NB15, we evaluated the performance of our system on from a large enterprise system. The data were collected over
a network traffic dataset, called IoT-23 [9], collected from the 5 months, where threats were logged separately by security
Internet of Things (IoT) devices. The IoT-23 dataset consists operations center (SOC) analysts whenever an intrusion oc-
of 20 sub-datasets collected from malicious IoT scenarios and curred. In the dataset, we investigated 798 cyber threats, which
three sub-datasets collected from benign scenarios. For these occurred evenly over the collection period (not focused on a
datasets, we utilized the dataset collected on the Mirai botnet specific period) and observed 547 system attacks, 240 scan-
ning, and 11 warm attacks (the categorizing was conducted
5 The raw dataset contains 47 attributes (excluding class attributes), in-
by the SOC analysts). In terms of the categories, the system
cluding source/destination IPs and ports. However, we used the provided
training/test dataset, in which features that do not affect AI training are attack includes cross-site scripting, DDoS, brute force attack,
excluded. and injection attack, whereas the scanning attack includes
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
TABLE III three layers. In the experiment, the one-layer structure showed
D ISTRIBUTION OF RAW SECURITY EVENTS IN THE REAL DATASET. high volatility, and the three-layer structure showed a tendency
to overfit. As a results, the models were most stable in the
Event ID Prefix Count Weight (%) two-layer structure and showed the highest performance.
For the DNN model, we set the first hidden layer to have 32
E2 UDP Packet Flooding 1,048,926 21.9% neurons and the second layer to have 16 neurons. For CNN,
E4 UDP Source-IP Flooding 718,788 15.2% we used a 1D-CNN model with two convolutional layers. The
E40 SIP Vulnerability Scanner 644,683 13.5% convolutional layers are configured to have 32 convolution
E7 TCP Connect DoS 553,362 11.6% filters with windows of size 5, and a fully connected layer of
.. .. .. 16 neurons follows. Additionally, we applied a max pooling
. . .
layer with windows of size 3 to the first convolutional layer,
E23 HTTPD Overflow 115,477 2.4% and the batch normalization layer after each convolutional
E29 NTP Amplification DDoS 107,617 2.3%
layer. For the activation function, we used ReLU as in the
... .. ..
. . generative model. In the case of LSTM, we connected 64
LSTM cells in each layer, and concatenated a fully connected
layer with 32 neurons. For these detection models, we set the
default number of epochs to 300 and applied the early stop
Trojan and backdoor attacks. In total, we collected 4,782,342 technique (we stopped learning when relative differences of
security event data, of which 230,026 were identified as cyber loss are less than 10−6 consecutively for 35 epochs [24]).
threats (i.e., 4,552,316 data were labeled as ”Normal,” and
230,026 data were labeled as ”Threat”). Each raw data has We utilized two additional basic machine learning models
16 basic features for network flow information, such as the as comparative models.
protocol type, service, and source bytes (8 nominal and 8
numeric attributes). Moreover, because the collected data are • Support Vector Machine (SVM) is a supervised learning
raw security events, each data includes information regarding model based on the statistical learning theory and aims
the suspicious security event6 . Table 3 presents a distribution to locate the best hyperplane that can optimally separate
of the collected dataset with respect to the suspicious security input domains according to the classes. In the experiment,
events, and it can be seen that the false positives are relatively we implemented the linear kernel SVM model [2].
high (see [43] for a detailed description of the collected real • Decision Tree (DT) is a non-parametric supervised learn-
dataset). Note that, although there were several detailed classes ing model, and it recursively splits input domains based
of detected attacks, each data was categorized as ”Normal” and on the correlation between each feature and class. In this
”Threat” only (related to the privacy issues of the enterprise). study, we implemented the C4.5 algorithm [1].
For a more extensive comparison, we subdivided the com-
B. Implementation and hyperparameters tuning ponents of our system, DNN, CNN, LSTM, DNNAE , and
CNNAE , and utilized them as comparative models with the
As described in the previous section, we set the discrimi-
whole system. Note that, we regard these sub-models to
nator of the generative model to be a symmetric autoencoder
correspond to the existing AI-based NIDS. In particular, DNN,
model with three layers. For this model, we constructed the
CNN, and LSTM are considered as naı̈ve deep learning
first hidden layer with 80 neurons and a latent space dimension
approaches. In the case of DNNAE and CNNAE , they are
with a size of 50. Therefore, the generator is set to have
considered as advanced deep learning approaches combined
the latent space of size 50 and a hidden layer of size 80.
with autoencoders7 .
Additionally, we applied batch normalization to each hidden
In the experiment, we utilized four metrics to evaluate
layer for stability of learning and used the Rectified Linear
the performance of AI models: Accuracy, P recision,
Unit (ReLU) as the activation function. Note that, because we
Recall, and F 1-score. Accuracy refers to the fraction of
configured the autoencoder as a feature extractor with the same
correctly inferred results and is commonly used to quantify
architecture as the discriminator, the above configuration cor-
the performance of AI models. For a given class in a dataset,
responds to that of the autoencoder as well. In the case of the
P recision presents the fraction of positive values inferred by
generative model, we set the convergence threshold to 0.058,
the model that are correct, while Recall refers to the fraction
and terminated training when the convergence measure fell
of data with positive values that are correctly inferred by the
below the given threshold, or the number of epochs reached
model. The F 1-score is the harmonic mean of P recision and
250. For autoencoder learning, we set the default number
Recall. The formulas of these metrics are defined as follows:
of epochs to 300 and stop training when the reconstruction
accuracy was above 0.97.
T P +T N
For the classifier models, we deployed three distinct deep • Accuracy = T P +F P +T N +F N
learning models: DNN, CNN and LSTM. Considering the
number of features, we explored the depth of the models up to
7 Although the detailed architecture and configurations may differ from
6 Note that, the suspicious security event can be different from the labels those of the previous approaches, we stress that the implemented models are
classified by the SOC analysts. comparable or outperform in terms of performance to the existing systems.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
10
TABLE IV
B INARY CLASSIFICATION RESULTS FOR THE TEST DATASET IN NSL-KDD.
Normal Abnormal
Classifier Accuracy Recall P recision F 1-score Recall P recision F 1-score
SVM 72.1% 97.8% 61.2% 75.2% 53.1% 96.9% 68.6%
DT 81.5% 97.3% 70.8% 81.9% 69.6% 97.1% 81.0%
DNN 79.5% 96.2% 67.7% 79.6% 67.8% 96.2% 79.6%
CNN 80.5% 96.5% 68.7% 80.3% 69.5% 96.6% 80.8%
LSTM 82.0% 97.5% 71.0% 82.1% 70.0% 97.2% 81.3%
DNNAE 85.5% 98.8% 78.0% 87.2% 72.5% 98.5% 83.5%
CNNAE 86.4% 98.8% 79.0% 87.8% 74.1% 98.4% 84.6%
TP
• P recision = T P +F P
TP
• Recall = T P +F N
P recision×Recall
• F 1-score = 2 × P recision+Recall
, where TP, TN, FN, and FP denote true positive, true negative,
false negative, and false positive, respectively.
Using these metrics, we evaluated each model on the
experimental datasets. Note that, although we built the models
with a stable structure, there was still the issue of volatility.
Accordingly, with respect to comparison and evaluation, we
independently trained each model 100 times, and displayed
the results for the model with the best detection rate in the Fig. 7. Comparison of binary classification results on the NSL-KDD dataset.
test dataset.
C. Experiments on the NSL-KDD dataset comparison of experimental results for the NSL-KDD dataset
For the NSL-KDD dataset, we explored both binary and in the binary classification scenario.
multi-classification tasks. Note that, NSL-KDD is provided Overall, the models output relatively high recall values
separately as a training dataset and a test dataset as mentioned for the data belonging to the normal class and, conversely,
above, and we used these datasets in our experiments as showed relatively high precision values for the abnormal class.
provided. In other words, we used KDDTrain (125,973 rows) For the basic machine learning models, the DT outperformed
as a training dataset and KDDTest (22,544 rows) as a test the SVM model, with an accuracy of 81.5%. Moreover,
dataset, and there was no data shuffling between the two the DT model performed better than the naı̈ve DNN and
datasets. In the experiments on our system (i.e., G-DNNAE , G- CNN models, where DNN achieved an accuracy of 79.5%
CNNAE , and G-LSTM), we generated synthetic data for each and CNN achieved an accuracy of 80.5%. Among the basic
class via the generative model, and integrated them into the models and the naı̈ve models, the LSTM model outperformed
training dataset. Obviously, the evaluation of all models was others with an accuracy of 82.0%. For the advanced deep
conducted on the original test dataset (KDDTest) for unbiased learning approaches, both DNNAE and CNNAE exhibited
comparisons. better results than the basic machine learning and the naı̈ve
1) Binary classification: Table 4 presents the experimental deep learning models. The advanced models, DNNAE and
results for the binary classification task on the NSL-KDD CNNAE achieved an 85.5% accuracy and 86.4% accuracy,
dataset. Note that, the data belonging to the attack classes are respectively. The proposed models, to which the generative
naturally considered anomalies in the binary classification task and autoencoder had been applied, were found to significantly
(labeled as abnormal). In the experiments on our system, we outperform all the aforementioned models. In particular,
generated a total of 35,000 additional data (synthetic data) for both G-DNNAE and G-CNNAE achieved an accuracy close
each class via the trained generative module. Figure 7 shows a to 90%, and it was observed that G-CNNAE produced the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
11
TABLE V
M ULTI - CLASSIFICATION RESULTS FOR THE TEST DATASET IN NSL-KDD.
DoS Probe
Algorithm Accuracy Recall P recision F 1-score Recall P recision F 1-score
SVM 75.4% 76.3% 96.7% 80.0% 33.3% 80.0% 47.1%
DT 80.5% 84.2% 94.1% 88.9% 50.0% 85.7% 63.2%
DNN 79.6% 83.8% 94.8% 89.0% 31.5% 65.4% 42.5%
CNN 80.1% 89.6% 94.4% 91.9% 30.8% 69.5% 42.7%
LSTM 82.6% 92.1% 98.4% 95.1% 28.7% 69.5% 40.6%
DNNAE 88.3% 94.9% 99.0% 96.9% 93.0% 78.6% 85.2%
CNNAE 88.5% 95.7% 99.0% 97.3% 93.7% 78.4% 85.4%
R2L U2R
Algorithm Recall P recision F 1-score Recall P recision F 1-score
SVM - - - - - -
DT 11.1% 50.0% 18.2% - - -
DNN 26.0% 48.6% 33.9% 4.6% 74.9% 8.8%
CNN 24.1% 65.1% 35.2% 6.2% 79.9% 11.5%
LSTM 21.8% 63.4% 32.4% 5.9% 76.8% 10.9%
DNNAE 42.7% 93.7% 58.7% 7.8% 83.3% 14.2%
CNNAE 41.1% 93.2% 57.0% 9.3% 85.7% 16.8%
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
12
TABLE VI
C LASSIFICATION ACCURACY FOR EACH THREAT CLASS ON THE UNSW-NB15 DATASET.
Algorithm Generic Exploit Fuzzers DoS Reconnaissance Analysis Backdoors Shellcode Worms
DNN 76.8% 48.5% 72.9% 28.0% 49.8% 76.3% 87.3% 57.9% 52.2%
CNN 76.7% 48.6% 75.2% 27.9% 49.9% 76.8% 88.3% 58.2% 52.2%
LSTM 76.8% 48.6% 73.2% 29.4% 49.9% 76.6% 87.3% 58.0% 52.2%
DNNAE 76.8% 48.4% 74.1% 28.4% 50.1% 77.1% 88.5% 58.7% 52.2%
CNNAE 76.8% 49.1% 74.3% 28.4% 49.9% 77.5% 88.5% 58.2% 54.5%
G-LSTM 80.1% 49.0% 79.6% 29.4% 50.1% 77.1% 90.8% 58.2% 56.8%
G-DNNAE 80.6% 50.3% 81.6% 29.4% 51.3% 77.2% 91.4% 58.7% 56.8%
G-CNNAE 82.0% 50.2% 81.9% 29.1% 51.3% 77.5% 91.5% 58.9% 56.8%
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
13
TABLE VII
E XPERIMENTAL RESULTS ON THE I OT-23 DATASET FOR MULTI - CLASSIFICATION TASKS .
G-LSTM, DNNAE , CNNAE 95.9% 100% 100% 100% 80.0% 100% 88.9% 100% 90.4% 95.0%
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
14
TABLE VIII
E XPERIMENTAL RESULTS ON THE REAL DATASET FOR BINARY- CLASSIFICATION TASKS .
Normal Abnormal
Classifier Accuracy Recall P recision F 1-score Recall P recision F 1-score
DNN 94.7% 97.0% 93.0% 94.9% 89.5% 76.9% 82.7%
CNN 95.0% 97.0% 94.5% 95.7% 90.0% 77.5% 83.2%
LSTM 95.2% 97.4% 94.6% 95.9% 89.8% 77.4% 83.1%
DNNAE 95.2% 97.2% 94.7% 95.9% 90.0% 77.5% 83.2%
CNNAE 95.2% 97.3% 94.6% 95.9% 90.2% 77.3% 83.2%
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
15
VI. C ONCLUSION [12] K. Alrawashdeh and C. Purdy, “Toward an online anomaly intrusion
detection system based on deep learning,” in Proc. IEEE 15th Int. Conf.
In this study, we presented a novel AI-based NIDS that can Mach. Learn. Appl. (ICMLA), Anaheim, CA, USA, 2016, pp. 195–200.
efficiently resolve the data imbalance problem and improve [13] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho,
“Deep learning approach for network intrusion detection in software
the classification performance of the previous systems. To defined networking,” in Proc. Int. Conf. Wireless Netw. Mobile Commun.
address the data imbalance problem, we leveraged a state-of- (WINCOM), 2016, pp. 258–263.
the-art generative model that could generate plausible synthetic [14] Y. Imamverdiyev and F. Abdullayeva, “Deep learning method for denial
of service attack detection based on restricted Boltzmann machine,” Big
data and measure the convergence of training. Moreover, we Data, vol. 6, no. 2, pp. 159–169, Jun. 2018.
implemented autoencoder-driven detection models based on [15] W. Zhong, N. Yu, and C. Ai, “Applying big data based deep learning
DNN and CNN, and demonstrated that the proposed models system to intrusion detection,” Big Data Min. Anal., vol. 3, no. 3, pp.
181–195, Sep. 2020.
outperforms previous machine learning and deep learning [16] M. H. Haghighat and J. Li, “Intrusion detection system using vot-
approaches. The proposed system was analyzed on various ingbased neural network,” Tsinghua Sci. Technol., vol. 26, no. 4, pp.
datasets, including two benchmark datasets, an IoT dataset, 484–495, Aug. 2021.
[17] Y. Yang, X. Yang, M. Heidari, M. A. Khan, G. Srivastava, M. Khosravi,
and a real dataset. In particular, the proposed models achieved and L. Qi, “ASTREAM: Data-Stream-Driven Scalable Anomaly Detection
accuracies of up to 93.2% and 87% on the NSL-KDD dataset with Accuracy Guarantee in IIoT Environment,” IEEE Trans. Netw. Sci.
and the UNSW-NB15 dataset, respectively, and showed re- Eng., early access, Mar, 2022, doi: 10.1109/TNSE.2022.3157730.
[18] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation-based anomaly
markable performance improvement in the minor classes. In detection,” ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, pp. 1–39,
addition, through experiments on an IoT dataset, we demon- Mar. 2012.
strated that the proposed system can efficiently detect network [19] X. Zhang, W. Dou, Q. He, R. Zhou, C. Leckie, R. Kotagiri, and Z.
threats in a distributed environment. Moreover, in order to Salcic, “LSHiForest: A generic framework for fast tree isolation based
ensemble anomaly analysis,” in Proc. IEEE 33rd Int. Conf. Data Eng.
investigate the feasibility in real-world environments, we col- (ICDE), Apr. 2017, pp. 983–994.
lected real data from a large enterprise system and evaluated [20] L. Qi, Y. Yang, X. Zhou, W. Rafique, and J. Ma, “Fast Anomaly Iden-
the proposed model on the collected dataset. Through this tification Based on Multi-Aspect Data Streams for Intelligent Intrusion
Detection Toward Secure Industry 4.0,” IEEE Trans. Ind. Inf., vol. 18,
experiment, we demonstrated that the proposed model can no.9, pp. 6503-6511, Sep. 2022.
significantly improve the detection rate of network threats by [21] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term memory
resolving the data imbalance problem in the real environment. recurrent neural network classifier for intrusion detection,” in Proc. Int.
Conf. Platform Technol. Service (PlatCon), 2016, pp. 1–5.
In the future, by considering practical distributed environ- [22] C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for
ments, we will focus on applying our framework to federated intrusion detection using recurrent neural networks,” IEEE Access, vol.
learning systems and ensemble AI systems to enhance network 5, pp. 21954–21961, 2017.
[23] C. Xu, J. Shen, X. Du, and F. Zhang, “An intrusion detection system
threat detection. In addition, we will study adversarial attacks using a deep neural network with gated recurrent units,” IEEE Access,
that can bypass AI-based NIDS through vulnerabilities in AI vol. 6, pp. 48697–48707, 2018.
models and conduct research on enhanced NIDS that can resist [24] J. Gao, L. Gan, F. Buschendorf, L. Zhang, H. Liu, P. Li, X. Dong, and T.
Lu, “Omni SCADA intrusion detection using deep learning algorithms,”
these attacks in real-world environments. IEEE Internet Things J., vol. 8, no. 2, pp. 951–961, Jan. 2021.
[25] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach
for network intrusion detection system,” EAI Endorsed Trans. Secur. Saf.,
R EFERENCES vol. 3, no. 9, p. e2, May 2016
[26] B. Yan and G. Han, “Effective feature extraction via stacked sparse
[1] J. R. Quinlan, “C4.5: Programs for machine learning,” Morgan Kaufmann autoencoder to improve intrusion detection system,” IEEE Access, vol. 6,
Ser. Mach. Learn., San Mateo, CA: Morgan Kaufmann, 1993. pp. 41238–41248, 2018.
[2] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector [27] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach
Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: to network intrusion detection,” IEEE Trans. Emerg. Topics Comput.
Cambridge Univ. Press, 2000. Intell., vol. 2, no. 1, pp. 41–50, Feb 2018.
[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, [28] C. Ieracitano, A. Adeel, F. C. Morabito, and A. Hussain, “A novel
MA, USA: MIT Press, 2016. statistical analysis and autoencoder driven intelligent intrusion detection
[4] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. 27th Int. approach,” Neurocomputing, vol. 387, pp. 51–62, Apr. 2020.
Conf. Neural Inf. Process. Syst. (NIPS), 2014, pp. 2672–2680. [29] J. Y. Kim, S. J. Bu, and S. B. Cho, “Malware detection using deep
[5] D. Berthelot, T. Schumm, and L. Metz, “BEGAN: Boundary equilib- transferred generative adversarial networks,” in Proc. Int. Conf. Neural
rium generative adversarial networks,” 2017, arXiv:1703.10717. [Online]. Inf. Process. Guangzhou, China: Springer, 2017, pp. 556–564.
Available: http://arxiv.org/abs/1703.10717 [30] M. H. Shahriar, N. I. Haque, M. A. Rahman, and M. Alonso, “G-IDS:
[6] S. Hettich and S. D. Bay. (1999). KDD Cup 1999 Data. [Online]. Generative adversarial networks assisted intrusion detection system,” in
Available: http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html Proc. IEEE 44th Annu. Comput., Softw., Appl. Conf. (COMPSAC), Jul.
[7] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis 2020, pp. 376–385.
of the KDD CUP 99 data set,” in Proc. IEEE Symp. Comput. Intell. Secur. [31] I. Yilmaz, R. Masum, and A. Siraj, “Addressing imbalanced data
Defense Appl., Jul. 2009, pp. 1–6. problem with generative adversarial network for intrusion detection,” in
[8] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for Proc. IEEE 21st Int. Conf. Inf. Reuse Integr. Data Sci. (IRI), Las Vegas,
network intrusion detection systems (UNSW-NB15 network data set),” in NV, USA, 2020, pp. 25–30.
Proc. Military Commun. Inf. Syst. Conf. (MilCIS), 2015, pp. 1–6. [32] D. Li, D. Kotani, and Y. Okabe, “Improving attack detection perfor-
[9] A. Parmisano, S. Garcia, and M. J. Erquiaga. (2020). A Labeled Dataset mance in NIDS using GAN,” in Proc. IEEE 44th Annu. Comput., Softw.,
With Malicious and Benign IoT Network Traffic. [Online]. Available: Appl. Conf. (COMPSAC), Jul. 2020, pp. 817–825.
https://www.stratosphereips.org/datasets-iot23 [33] W. Lee, B. Noh, Y. Kim, and K. Jeong, “Generation of Network Traffic
[10] B. Ingre and A. Yadav, “Performance analysis of NSL-KDD dataset Using WGAN-GP and a DFT Filter for Resolving Data Imbalance,” in
using ANN,” in Proc. Int. Conf. Signal Process. Commun. Eng. Syst., Int. Conf. Internet Distrib. Comput. Syst. (IDCS), Springer, Oct. 2019,
Andhra Pradesh, India, Jan. 2015, pp. 92–96. pp. 306-317.
[11] Y. Gao, Y. Liu, Y. Jin, J. Chen, and H. Wu, “A novel semi-supervised [34] G. Dlamini and M. Fahim, “DGM: A data generative model to improve
learning approach for network intrusion detection on cloud-based robotic minority class presence in anomaly detection domain,” Neural Comput.
system,” IEEE Access, vol. 6, pp. 50927–50938, 2018. Appl., vol. 2021, pp. 13635–13646, Apr. 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3211346
16
[35] D. Li, D. Chen, J. Goh, and S.-k. Ng, “Anomaly detection with
generative adversarial networks for multivariate time series,” 2018,
arXiv:1809.04758. [Online]. Available: http://arxiv.org/abs/1809.04758
[36] S. K. Alabugin and A. N. Sokolov, “Applying of generative adversarial
networks for anomaly detection in industrial control systems,” in Proc.
Global Smart Ind. Conf. (GloSIC), Nov. 2020, pp. 199–203,
[37] I. Siniosoglou, P. Radoglou-Grammatikis, G. Efstathopoulos, P. Fouliras,
and P. Sarigiannidis, “A unified deep learning anomaly detection and
classification approach for smart grid environments,” IEEE Trans. Netw.
Service Manage., vol. 18, no. 2, pp. 1137–1151, Jun. 2021.
[38] D. E. Rumelhart and J. L. McClelland, “Learning internal representations
by error propagation,” in Proc. Parallel Distrib. Process., Explorations
Microstruct. Cogn., Found., vol. 1. Cambridge, MA, USA: MIT Press,
1987, pp. 318–362.
[39] G. E. Hinton and R. S. Zemel, “Autoencoders, minimum description
length and helmholtz free energy,” in Proc. 6th Int. Conf. Neural Inf.
Process. Syst., 1993, pp. 3–10.
[40] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,” 2016.
[Online]. Available: https://arxiv.org/abs/1511.06434.
[41] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”
2014, arXiv:1411.1784.
[42] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative ad-
versarial networks,” in Proc. 34th Int. Conf. Mach. Learn. (ICML), 2017,
pp. 214–223.
[43] J. Lee, J. Kim, I. Kim, and K. Han, “Cyber threat detection based on
artificial neural networks using event profiles,” IEEE Access, vol. 7, pp.
165607–165626, 2019.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/