1. Introduction
The advancement of communication technology has altered the face of human civilization as it enters the digital information era. The advancement of information technology will have an impact on the ease of living in human civilization. With the arrival of the 5G era, human society’s level of informatization will become even higher. In comparison to 4G, the 5G network’s application scenarios will span the sectors of mobile Internet, Internet of Vehicles, and the Industrial Internet. Simultaneously, operators have set greater standards for 5G networks, including huge transmission capacity, ultra-long transmission distance, network slicing, and intelligent management and control. Among them, the software-defined network (SDN) is a new type of network design idea that intends to increase network flexibility and agility and can better fulfill the network slicing demands of 5G networks. The central concept of software-defined networking is to decouple past network architecture into the control plane and the data plane and to previous abstract network functions into applications in the network operating system in the control plane [
1]. From top to bottom, the software-defined network architecture is split into the application plane, control plane, infrastructure layer, and physical device layer [
2,
3], as illustrated in
Figure 1. The system components of the application plane include application applications and network management systems. This applies to control plane service requests made via the northbound interface provided by the SDN regulator [
4]. One or more SDN controllers comprise the control plane. In the software-defined network architecture, the SDN controller serves as a bridge between the application plane and the infrastructure layer. On the one hand, the SDN controller exposes diverse programmable services to upper-layer application software via the northbound interface, and network users can flexibly formulate network policies based on actual application scenarios; on the other hand, the SDN controller constructs and maintains a global network view via the southbound interface to control and manage network devices at the infrastructure layer, and inherits the control plane functions. The infrastructure layer is made up of data-forwarding devices such as switches and routers that were abstracted into network devices. The data flow is handled in accordance with the instructions given by the SDN controller, thereby improving network device management efficiency. The physical layer includes control equipment, including field instruments, sensors, and actuators and performs duties such as information interchange between the ICS controller and field equipment. SDN has gotten much attention from people from many areas of life.
However, software-defined networks are vulnerable to cyber-attacks in the same way that traditional networks are. As previously said, SDN introduces the SDN controller, which provides unified API services for the application plane and the infrastructure layer, allowing the network to be centralized, programmable, and open. These characteristics, such as permitting mismatched network packets to be submitted to the controller to request forwarding rules, raise security issues for SDN. A network assault is frequently exhibited as anomalous traffic. The term “abnormal traffic” refers to network traffic behavior that deviates from the expected typical pattern. Server overload induced by DOS assaults, worms’ privileged access, and server attacks will result in anomalous traffic [
5]. SDN network security risks primarily target the control plane, with the majority of attacks targeting the network’s controller [
6]. Malicious controllers, malware, and malicious switches can all put SDN controllers at risk. The controller’s security has a direct influence on SDN security since it is the centralized decision-making entity and processing hub of SDN.
When aberrant traffic is detected, abnormal traffic detection technology monitors network traffic transmission immediately, sends an alarm, or takes active reaction steps. The real-time monitoring of SDN traffic may maintain the security, confidentiality, and integrity of SDN network information while also promoting the development and implementation of SDN technology [
5]. As a result, research on intrusion detection systems in the context of SDN offers tremendous theoretical and application value for creating and upgrading SDN technology.
Hinton, Geoffrey E. et al. [
7] proposed the concept of deep learning in 2006. With the continuous improvement in computer computing power and the continuous development of algorithms, deep learning algorithms that require huge computing power have attracted great attention from researchers and enterprises. Traditional detection algorithms based on traffic feature statistics and machine learning perform better when small-scale datasets and feature quantities are small. However, it still relies on the manual judgment and induction of traffic characteristics. Deep learning algorithms can calculate optimal solutions from limited data and do not require expert knowledge to find unknown and new abnormal traffic types. With large-scale datasets and many features, it can also have better performance.
We propose an abnormal traffic detection method based on the stacking method and self-attention mechanism (TSMASAM) that combines the self-attention mechanism and ensemble learning to make up for the inability of ensemble learning to learn the associations between data. First, we propose a neural network composed of a self-attention mechanism and a deep convolutional network which aims to automatically learn the correlation between traffic samples, capture the feature space’s internal structure, and provide it downstream in the form of a sample embedding task. Secondly, we design a novel stacking integration method, which aims to detect and identify abnormal network traffic by integrating the sample embedding obtained above and the inspection results of the heterogeneous base learner. Finally, we design a new loss function, which fully and comprehensively considers the basic learner’s influence on the model’s overall performance by introducing the basic learner’s loss value and the regular term composed of it and preventing the model from falling into an overfitting state.
The main contributions of this paper are as follows:
We propose a neural network composed of a self-attention mechanism and a deep convolutional network, which learns from samples and converts them into sample embeddings.
We propose a stacking ensemble learning method composed of the autoencoder and base learner, using the autoencoder to remove irrelevant information in the samples and the stacking method to integrate the detection results of sample embedding and the base learner.
We design a novel loss function to observe the operation of the model through the introduced regularization term and base learner loss value. We use a network traffic dataset under an SDN architecture to evaluate the model’s performance. The results show that the model has a better abnormal traffic detection effect than the comparison model.
The structure of this paper is as follows:
Section 2 briefly describes the research status of related work;
Section 3 introduces the experimental environment, model framework, and specific design of TSMASAM;
Section 4 details the experiments and performance evaluation of TSMASAM proposed in this paper. In
Section 5, we conclude the paper.
2. Related Works
The concept of abnormal traffic detection technology began in the 1980s, which refers to a network security technology that monitors network traffic transmission, promptly issues an alarm or takes active response measures when abnormal traffic is found. Anomaly detection obtains a feature model by modeling and analyzing traffic characteristics, thereby judging whether the network traffic is normal. Anomaly detection technology can be roughly divided into: that based on traffic feature matching, based on traffic feature statistics, and based on machine learning. There are essential differences in the modeling logic of the three traffic characteristics, which bring about different detection scenarios and inspection effects. Algorithms based on traffic feature matching require professionals to analyze and summarize the characteristics of abnormal traffic and then match them with the observed traffic characteristic. The advantage of this approach is the higher accuracy of identifying known attacks. However, algorithms based on traffic feature matching rely on expert knowledge and are difficult to deal with unseen abnormal network traffic. The anomaly detection algorithm based on data statistics is based on the normal distribution of network traffic characteristics. When the observed network traffic characteristics deviation from the benchmark exceeds a certain threshold, it is regarded as abnormal network traffic. The advantage of the algorithm based on data feature statistics is that it is simple to implement and can also identify abnormal traffic in unknown networks, but it is prone to misjudgment. Algorithms based on machine learning have a stronger learning ability and can learn from incomplete traffic characteristics to abnormal traffic, however, the models generally have high computational complexity and are not suitable for high-response environments. However, with the substantial improvement in computing technology, methods based on machine learning have received attention and research from all parties.
Based on the traffic feature matching algorithm, the abnormal network traffic is known quickly, but it cannot effectively deal with the unknown abnormal network traffic. Ref. [
8] proposed the model NADIR, a near real-time expert system, to replace the manual review log method. NADIR compares the network activity summarized in user profiles with expert rules that define network security policies, inappropriate or suspicious network activity, and normal network and user activity. Ref. [
9] proposed an adaptive real-time intrusion detection expert system which contains a statistical subsystem to observe the normal traffic of the computer. The statistical subsystem identifies user behavior as a potential intrusion when it observes significant deviations from expected behavior. Ref. [
9] maintains a knowledge base of statistical topics consisting of profiles, and updates the observed behaviors to the knowledge base daily. Before the new statistics are synchronized to the knowledge base, the previous statistics are multiplied by a decay factor to adaptively learn the behavior patterns of the observers. Ref. [
10] proposed an approach to specification-based and anomaly-based intrusion detection by starting from the state machine specification of the network protocol and supplementing the state machine information with statistical information. Ref. [
10] verified the effectiveness of this method on the KDD99 dataset. Furthermore, Ref. [
10] uses a protocol specification to simplify the feature selection step. Ref. [
11] described network intrusion detection expert system (NIDX) by combining knowledge describing the target system, the historical profiles of users’ past activities, and knowledge-based intrusion detection heuristics. NIDX classifies user activity through the UNIX system calls and then uses knowledge and heuristics about typical intrusion and attack techniques to determine whether the activity is anomalous. Ref. [
12] built a method to augment domain knowledge with machine learning to create rules for intrusion detection expert systems. To this end, Ref. [
12] adopted a combination of genetic algorithms and decision trees to automatically generate rules for classifying network traffic. The algorithm based on the feature matching relies on the analysis and summary of professionals and is not sufficiently flexible to operate.
The algorithm based on data statistics can quickly identify abnormal traffic and deal with unknown abnormal network traffic, but it is prone to misjudgment. A histogram-based outlier detection (HBOS) algorithm was proposed to score data in linear time. Since HBOS assumes no dependencies between features, the algorithm is technically faster than other methods but less accurate [
13]. HBOS detects global outliers such as the state-of-the-art algorithm on multiple datasets but performs poorly on local outliers. Ref. [
14] described the anomaly detection problem as a binary composite hypothesis testing problem and developed a model-free and model-based approach using large deviation theory. Both methods extracted a series of probability laws representing traffic patterns over different time periods and then detect anomalies by evaluating the traffic and deviations from these laws. Ref. [
15] proposed a statistical signal processing technique based on mutation detection. Authors such as M. Thottan demonstrated the method’s feasibility in [
15] and conducted related experiments to verify the usability of the method. In addition, Ref. [
15] introduced an operator matrix to correlate various indicators and finally obtained a variable or indicator to express all aspects of the network. Since not all mutation phenomena originate from network anomalies, this may lead to model misjudgment. To cope with the new requirements of network security brought by the complexity of cellular networks, Ref. [
16] proposed an anomaly detection algorithm based on Bayesian decision rules and applied it to mobile user profiles to verify the method’s feasibility. In addition, the algorithm specializes in privacy protection, however, the algorithm’s analysis function also violates users’ privacy. Ref. [
17] proposed a multi-level hierarchical Kohonen network (K-Map) for intrusion detection, where each layer of the hierarchical graph is modeled as a simple winner-takes-all K-Map. This multi-level hierarchical K-Map structure has the advantage of low computational complexity, avoids costly peer-to-peer computations by organizing the data into clusters, and reduces the size of the network. Ref. [
18] proposed an anomaly detection algorithm based on an unrestricted
-stable first-order model and statistical hypothesis validation by automatically selecting a flow window to be used as a reference, compared with an observed flow window. The algorithm of [
18] focuses on detecting two anomaly types: floods and flash crowds. Ref. [
19] proposed a flow-based aggregation technique (FSAS), which greatly reduces the amount of monitored data and handles large amounts of statistical and grouped data. A stream, or IP stream, is a given series of IP packets. The FSAS set flow-based statistical feature vectors reports to the acute neural network classification model. Ref. [
20] developed a new statistical decision-theoretic framework for network traffic using Markov chain modeling. The algorithm first formulates the optimal anomaly detection problem for the composite model of [
20] via generalized likelihood ratio check (GLRT). However, this algorithm leads to a very expensive combinatorial optimization problem. Then, Ref. [
20] developed two low-complexity anomaly detection algorithms, the cross-entropy-based and GLRT-based methods. Ref. [
21] implemented an algorithm developed in the SRI-based NIDES (next-generation intrusion detection expert system) project. In addition, Ref. [
21] also developed three OSPF routing protocol insider attacks to evaluate the effectiveness of detection capabilities. Ref. [
22] developed an anomaly detection method for large networks. The algorithm first uses a Kalman filter to filter out “normal” traffic and judges by comparing the predicted traffic matrix with the actual traffic matrix. Then, detect whether there is any abnormality in the remaining filtering process. Ref. [
22] here explains how any anomaly detection method can be viewed as a problem in statistical hypothesis testing. Ref. [
23] believed that if the joint distribution of multi-dimensional data can be effectively expressed, one can try to estimate the tail probability of each point, and then the abnormal situation can be well evaluated. To date, Ref. [
23] proposed a copula-based anomaly detection algorithm. The copula is a statistical probability function for efficiently modeling dependencies among multiple random variables. Ref. [
23] used a non-parametric method, obtains empirical copula through empirical cumulative distribution (Empirical CDF), and then estimates the tail probability of joint distribution of all dimensions through empirical copula. The advantage of the algorithm based on data feature statistics is that it is simple to implement and can also identify abnormal traffic in unknown networks, but it is prone to misjudgment.
Algorithms based on machine learning have a strong learning ability and can effectively deal with unknown abnormal network traffic. Ref. [
24] proposed a 5G-oriented network defense architecture. To this end, Ref. [
24] used deep learning techniques to analyze network traffic by extracting features from it. In addition, the architecture allows automatic the adjustment of the configuration of the network fabric to manage traffic fluctuations. Experiments show that the method can adaptively adjust the anomaly detection system and optimize resource consumption. Ref. [
25] proposed an anomaly-based NIDS implemented using deep learning techniques. The method demonstrates the ability and adaptability to infer partial knowledge from incomplete data. With the advent of the Internet of Things, the need to process streaming data in real-time has become critical. To this end, Ref. [
26] proposed a hybrid data processing model using GWO and CNN for anomaly detection. In order to increase the learning ability of the model, both GWO and CNN learning methods have been enhanced; for the first, the generative ability to explore, exploit and initialize the population is improved; for the second, the dropout function is improved. Ref. [
26] first used GWO for feature selection to obtain the best trade-off between two objectives, i.e., reducing the error rate and minimizing the feature set. Then, Ref. [
26] used CNN for network anomaly classification. In Ref. [
27], in order to deal with the security issues brought by SDN, a deep neural network model was constructed and trained using the NSL-KDD dataset. During the training process, only six basic features among the forty-one features were selected for training. However, the dataset selected by this method is not in the SDN environment and is not necessarily suitable for the SDN environment. To improve the reliability of SDN, Ref. [
28] proposed a hybrid deep learning-based anomaly detection scheme for suspicious flow detection in social multimedia environments. The scheme consists of an anomaly detection module and an end-to-end data transfer module. The anomaly detection module utilizes a modified, restricted Boltzmann machine and a gradient descent-based support vector machine to detect anomalous traffic. The end-to-end data transmission module is designed to meet the strict QoS requirements of SDN. Since existing anomaly detection solutions all require a large number of datasets for offline training, Ref. [
29] proposed a neural network-based anomaly detection system with dynamically updatable training models—Griffin, which utilizes an ensemble of autoencoders from normal and abnormal traffic which is jointly screened in the traffic, and the autoencoder, which is updated according to the mean square error. In order to solve the problem of high memory consumption, low accuracy, and the high processing and overhead of detection methods in the IoT environment, Ref. [
30] proposed sFlow and sampling based on adaptive rotation training, combined with the Snort intrusion detection system and deep learning-based model, which is helpful for the IoT in cases of various types of DDoS attacks. Due to the decoupled nature of SDN, Ref. [
30] obtained the required parameters by programming network devices. First, in the data plane, in order to reduce the switches’ processing and network overhead, Ref. [
30] distributed the deployment of sFlow and sampling based on the adaptive round-robin. Second, to optimize the detection accuracy in the control plane, Ref. [
30] deployed the Snort IDS with the SAE deep learning model. Algorithms based on machine learning have a stronger learning ability and can learn from incomplete traffic characteristics to abnormal traffic, but the models generally have high computational complexity and are not suitable for high-response environments.