1. Introduction
A network intrusion detection system is a process for discovering the existence of malicious or unwanted packets in the network. This process is done using real-time traffic monitoring to find out if any unusual behavior is present in the network or not. Big data, powerful computation facilities, and the expansion of the network size increase the demand for the required tasks that should be carried out simultaneously in real-time. Therefore, NIDS should be careful, accurate, and precise in monitoring, which has not been the case in the traditional methods. On the other hand, the rapid increase in the accuracy of machine learning algorithms is highly impressive. Its introduction relies on the increasing demand for improved performance on different types of network [
1]. However, software defined network (SDN) implementation of the network-based intrusion detection system (NIDS) has opened a frontier for its deployment, considering the increasing scope and typology of security risks of modern networks. The rapid growth in the volume of network data and connected devices carries inherent security risks. The adoption of technologies such as the Internet of Things (IoT), artificial intelligence (AI), and quantum computing, has increased the threat level [
2], making network security challenging and necessitating a new paradigm in its implementation. Various attacks have overwhelmed previous approaches (classified into signature-based intrusion detection systems [
3,
4] and anomaly-based intrusion detection systems [
5,
6]), increasing the need for advanced, adaptable and resilient security implementation [
1,
2,
7,
8]. For this reason, the traditional network design platform is being transformed into the evolving SDN implementation [
9]. Monitoring data and analyzing it over time are essential to the process of predicting future events, such as risks, attacks and diseases. The more details are formed, discovered and documented through analyzing very large-scale data, the more saved resources, as well as the working environment, will remain normal without any variations. Big data analytics (BDA) research in the supply chain becomes the secret of a protector for managing and preventing risks [
10]. BDA for humanitarian supply chains can aid the donors in their decision of what is appropriate in situations such as disasters, where it can improve the response and minimize human suffering and deaths. BDA and data monitoring using machine learning can help in identifying and understanding the interrelationships between the reasons, difficulties, obstacles and barriers that guide organizations in taking the most efficient and accurate decisions in risk management processes. This could impact entire organizations and countries, producing a hugely significant improvement in the process.
Network monitoring-based machine learning techniques have been involved in diverse fields. Using bi-directional long-short-term-memory neural networks, a social media network monitoring system is proposed for analyzing and detecting traffic accidents [
11]. The proposed method retrieves traffic-related information from social media (Facebook and Twitter) using query-based crawling: this process collects sentences related to any traffic events, such as jams, road closures, etc. Subsequently, several pre-processing techniques are carried out, such as steaming, tokenization, POS tagging and segmentation, in order to transform the retrieved data into structured form. Thereafter, the data are automatically labeled as ’traffic’ or ’non-traffic’, using a latent Dirichlet allocation (LDA) algorithm. Traffic- labeled data are analyzed into three types; positive, negative, and neutral. The output from this stage is a sentence labeled according to whether it is traffic or non-traffic, and with the polarity of that traffic sentence (positive, negative or neutral). Then, using the bag-of-words (BoW) technique, each sentence is transformed into a one-hot encoding representation in order to feed it to the Bi-directional LSTM neural network (Bi-LSTM). After the learning process, the neural networks perform multi-class classification using the softmax layer in order to classify the sentence in terms of location, traffic event and polarity types. The proposed method compares different classical machine learning and advanced deep learning approaches in terms of accuracy, F-score and other criteria.
Many initiatives and workshops have been conducted in order to improve and develop the healthcare systems using machine learning, such as [
12]. In these workshops several proposed machine algorithms have been used, such as K Nearest-Neighbors, logistic regression, K-means clustering, Random Forest (RF) etc, together with deep learning algorithms such as CNN, RNN, fully connected layer and auto-encoder. These varieties of techniques allow the researchers to deal with several data types, such as medical imaging, history, medical notes, video data, etc. Therefore, different topics and applications are introduced, with significant performance results such as causal inference, in investigations of Covid-19, disease prediction, such as disorders and heart diseases.
Using intelligent ensemble deep learning methods, healthcare monitoring is carried out for prediction of heart diseases [
13]. Real-time health status monitoring can prevent and predict any heart attacks before occurrence. For disease prediction, the proposed ensemble deep learning approach achieved a brilliant accuracy performance score of 98.5%. The proposed model takes two types of data that are transferred and saved on an online cloud database. The first is the data transferred from the sensors; these sensors have been placed in different places on the body in order to extract more than 10 different types of medical data. The second type is the daily electronic medical records from doctors, which includes various types of data, such as smoking history, family diseases, etc. The features are fused using the feature fusion Framingham Risk factors technique, which executes two tasks at a time, fusing the data together, and then extracting a fused and informative feature from this data. Then different pre-processing techniques are used to transform the data into a structured and well-prepared form, such as normalization, missing values filtering and feature weighting. Subsequently, an ensemble deep learning algorithm starts which learns from the data in order to predict whether a heart disease will occur or the threat is absent.
IDS refers to a mechanism capable of identifying or detecting intrusive activities. In a broader view, this encompasses all the processes used in the discovery of unauthorized uses of network devices or computers. This is achieved through software designed specifically to detect unusual or abnormal activities. IDS can be classified according to several surveys and sources in the literature [
14,
15,
16,
17] into four types (HIDS, NIDS, WIDS, NBA).
NIDS is an inline or passive-based intrusion detection technique. The scope of its detection targets network and host levels. The only architecture that fits and works with NIDS is the managed network. The advantage of using NIDS is that it costs less and is quicker in response, since there is no need to maintain sensor programming at the host level. The performance of monitoring the traffic is close to real-time; NIDS can detect attacks as they occur. However, it has the following limited features. It does not indicate if such attacks are successful or not: it has restricted visibility inside the host machine. There is also no effective way to analyze encrypted network traffic to detect the type of attack [
16,
18]. Moreover, NIDS may have difficulty capturing all packets in a large or busy network. Thus, it may fail to recognize an attack launched during a period of high traffic.
SDN provides a novel means of network implementation, stimulating the development of a new type of network security application [
19]. It adopts the concept of programmable networks through the deployment of logically centralized management. The network deployment and configuration are virtualized to simplify complex processes, such as orchestration, network optimization, and traffic engineering [
20,
21,
22]. It creates a scalable architecture that allows sufficient and reliable services based on certain types of traffic [
23]. The global view approach to a network enhances flow-level control of the underlying layers.
Implementing NIDS over SDN becomes a major effective security defense mechanism for detecting network attacks from the network entry point [
24]. NIDS has been implemented and investigated for decades to achieve optimal efficiency. It represents an application or device for monitoring network traffic for suspicious or malicious activity with policy violations [
25]. Such activities include malware attacks, untrustworthy users, security breaches, and DDoS. NIDS focuses on identifying anomalous network traffic or behavior; its efficiency means that network anomaly is adequately implemented as part of the security implementation. Since it is nearly impossible to prevent threats and attacks, NIDS will ensure early detection and mitigation [
26]. However, the advancement in NIDS has not instilled sufficient confidence among practitioners, since most solutions still use less capable, signature-based techniques [
2].
This study aims to increase the focus on several points:
choosing the right algorithm for the right tasks depends on the data types, size and network behavior and needs.
Implementing the optimized development process by preparing and selecting the benchmark dataset in order to build a promising system in NIDS.
Analyzing the data, finding, shaping, and engineering the important features, using several preprocessing techniques by stacking them together with an intelligent order to find the best accuracy with the lowest amount of data representation and size.
proposing an integration and complete development process using those algorithms and techniques from the selection of dataset to the evaluation of the algorithms using a different metric. Which can be extended to other NIDS applications.
This study enhances the implementation of NIDS by deploying different machine learning algorithms over SDN. Tree-based machine learning algorithms (XGBoost, random forest (RF), and decision tree (DT)) were implemented to enhance the monitoring and accuracy performance of NIDS. The proposed method will be trained on network traffic packet data, collected from large-scale resources and servers called NSL-KDD dataset to perform two tasks at a time by detecting whether there is an attack or not and classifying the type of attack.
This study enhances the implementation of NIDS by deploying machine learning over SDN. Tree-based machine learning algorithms (XGBoost, random forest (RF), and decision tree (DT)) are proposed to enhance NIDS. The proposed method will be trained on network traffic packet data, collected from large-scale resources and servers, called the NSL-KDD dataset to perform two tasks at a time by detecting whether there is an attack or not and classifying the type of attack.
The remainder of the paper is organized as follows.
Section 2 presents the background and related work. In
Section 3, materials and methods are introduced and illustrated.
Section 5 discusses and evaluates the results with multiple evaluation metrics. Finally,
Section 6 presents the conclusion and recommendations for future study.
2. Background and Related Work
Integrating machine learning algorithms into SDN has attracted significant attention. In [
26], a solution was proposed that solved the issues in KDD Cup 99 by performing an extensive experimental study, using the NSL-KDD dataset to achieve the best accuracy in intrusion detection. The experimental study was conducted on five popular and efficient machine learning algorithms (RF, J48, SVM, CART, and Naïve Bayes). The correlation feature selection algorithm was used to reduce the complexity of features, resulting in 13 features only in the NSL-KDD dataset.
This study tests the NSL-KDD dataset’s performance for real-world anomaly detection in network behavior. Five classic machine learning models RF, J48, SVM, CART, and Naïve Bayes were trained on all 41 features against the five normal types of attacks, DOS, probe, U2R, and R2L to achieve average accuracies of 97.7%, 83%, 94%, 85%, and 70% for each algorithm, respectively. The same models were trained again using the reduced 13 features to achieve average accuracies of 98%, 85%, 95%, 86%, and 73% for each model. In [
27], a deep neural network model was proposed to find and detect intrusions in the SDN. The NSL-KDD dataset was used to train and test the model. The neural network was constructed with five primary layers, one input layer with six inputs, three hidden layers with (12, 6, 3) neurons, and one output layer with 2D dimensions. The proposed method was trained on six features chosen from 41 features in the NSL-KDD dataset, which are basic and traffic features that can easily be obtained from the SDN environment. The proposed method calculates the accuracy, precision and recall, achieving an F1-score of 0.75. A second evaluation was conducted on seven classic machine learning models (RF, NB, NB Tree, J48, DT, MLP, and SVM) proposed in [
28] and the model achieved sixth place out of eight. The same author [
29] extended the approach using a gated recurrent unit neural network (GRU-RNN) for SDN anomaly detection, achieving accuracy up to 89%. In addition, the Min-Max normalization technique is used for feature scaling to improve and boost the learning process.
The SVM classifier [
30], integrated with the principal component analysis (PCA) algorithm, was used for an intrusion detection application. The NSL-KDD dataset is used in this approach to train and optimize the model for detecting abnormal patterns. A Min-Max normalization technique was proposed to solve the diversity data scale ranges with the lowest misclassification errors [
31]. The PCA algorithm is selected as a statistical technique to reduce the NSL-KDD dataset’s complexity, reducing the number of trainable parameters that needed to be learned. The nonlinear radial basis function kernel was chosen for SVM optimization. Detection rate (DR), false alarm rate (FAR), and correlation coefficient metrics were chosen to evaluate the proposed model, with an overall average accuracy of 95% using 31 features in the dataset. In [
32], an extreme gradient-boosting (XGBoost) classifier was used to distinguish between two attacks, i.e., normal and DoS. The detection method was analyzed and conducted over POX SDN, as a controller, which is an SDN open-source platform for prototyping and developing a technique based on SDN. Mininet was used to emulate the network topology to simulate real-time SDN-based cloud detection. Logistic regression was selected as a learning algorithm, with a regularization term penalty to prevent overfitting. The XGBoost term was added and combined with the logistic regression algorithm to boost the computations by constructing structure trees. The dataset used in this approach was KDD Cup 1999, while 400 K samples were selected for constructing the training set. Two types of normalization techniques were used; one with a logarithmic-based technique and one with a Min-Max-based technique. The average overall accuracy for XGBoost, compared to RF and SVM, was 98%, 96%, 97% respectively.
Based on DDoS attack characteristics, a detection system was simulated with the Mininet and floodlight platform using the SVM algorithm [
5]. The proposed method categorizes the characteristics into six tuples, which are calculated from the packet network. These characteristics are the speed of the source IP (SSIP), the speed of the source port, the standard deviation of flow packets, the deviation of flow bytes (SDFB), the speed of flow entries, and the ratio of pair-flow. Based on the calculated statistics from the SVM classifier’s six characteristics, the current network state is normal or attack. Attack flow (AF), DR, and FAR were chosen to achieve an average accuracy of 95%.
In TSDL [
33] a model with two stages of deep neural networks was designed and proposed for NIDS, using a stacked auto-encoder, integrated with softmax in the output layer as a classifier. TSDL was designed and implemented for Multi-class classification of attack detection. Down-sampling and other preprocessing techniques were performed over different datasets in order to improve the detection rate, as well as the monitoring efficiency. The detection accuracy for UNSW-NB15 was 89.134%. Different models of neural networks, such as variational auto-encoder, seq2seq structures using Long-Short-Term-Memory (LSTM) and fully connected networks were proposed in [
34] for NIDS. The proposed approach was designed and implemented to differentiate between normal and attack packets in the network, using several datasets, such as NSL-KDD, UNSW-NB15, KYOTO-HONEYPOT, and MAWILAB. A variety of preprocessing techniques have been used, such as one-hot-encoding, normalization, etc., for data preparation, feature manipulation and selection and smooth training in neural networks. Those factors are designed mainly, but not only, to enable the neural networks to learn complex features from different scopes of a single packet. Using 4 hidden layers, a deep neural network model [
35] was illustrated and implemented on KDD cup99 for monitoring intrusion attacks. Feature scaling and encoding were used for data preprocessing and lower data usage. More than 50 features were used to perform this task on different datasets. Therefore, complex hardware GPUs were used in order to handle this huge number of features with lower training time.
A supervised [
36] adversarial auto-encoder neural network was proposed for NIDS. It combined GANS and a variational auto-encoder. GANS consists of two different neural networks competing with each other, known as the generator and the discriminator. The result of the competition is to minimize the objective function as much as possible, using the Jensen-Shannon minimization algorithm. The generator tries to generate fake data packets, while the discriminator determined whether this data is real or fake; in other words, it checks if that packet is an attack or normal. In addition, the proposed method integrates the regularization penalty with the model structure for overfitting control behavior. The results were reasonable in the detection rate of U2RL and R2L but lower in others.
Multi-channel deep learning of features for NIDS was presented in [
37], using AE involving CNN, two fully connected layers and the output to the softmax classifier. The evaluation is done over three different datasets; KDD cup99, UNSW-NB15 and CICIDS, with an average accuracy of 94%. The proposed model provides effective results; however, the structure and the characteristics of the attack were not highlighted clearly.
The proposed method enhances the implementation of NIDS by deploying machine learning over SDN. It introduces a machine learning algorithm for network monitoring within the NIDS implementation on the central controller of the SDN. In this paper, enhanced tree-based machine learning algorithms are proposed for anomaly detection. Using only five features, a multi-class classification task is conducted by detecting whether there is an attack or not and classifying the type of attack.
5. Results and Discussion
To evaluate the performance of NIDS in terms of accuracy (AC), different metrics were used; precision (P), recall (R), and F-measure (F). These metrics can be calculated using confusion matrix parameters: true positive (the number of anomalous instances that are correctly classified); false positive (the number of normal instances that are incorrectly classified as anomalous); true negative (the number of normal instances that are correctly classified); and false negative (the number of anomalous instances that are incorrectly classified as normal). A good NIDS must achieve high DR and FAR.
Accuracy (AC): This is the percentage of correctly classified network activities. As shown in
Table 5.
Precision (P): The percentage of predicted anomalous instances that are actual anomalous instances; the higher P, the lower FAR. As shown in
Table 5.
Recall (R): the percentage of predicted attack instances versus all attack instances presented. As shown in
Table 5.
F-measure (F): measuring the performance of NIDS using the harmonic mean of the P and R. We aim to achieve a high F-score. As shown in
Table 5.
We compare XGBoost against the other two tree- based methods, RF and DT. Using the test set, which includes the four types of attacks as discussed in
Figure 5. Three different evaluation metrics are computed; F-score, precision and recall. As shown in
Figure 5, XGBoost ranked first in the evaluation, with an F1-score of 95.55%, while RF and DT achieved 94.6% and 94.5%, respectively. For precision, XGBoost outperformed RF and DT with a score of 92%, while RF and DT scored 90% and 90.2% respectively. Finally, for Recall, our proposed method with XGBoost proves its stability with a score of 98% while for RF and DT, the results were 82%, and 85%, respectively.
From these results, the proposed model with XGBoost performs with high precision and high recall, which means that the classifier returns accurate results and high precision, while, at the same time, returning a majority of all positive results (it’s an attack and the classifier detects that it’s an attack), which means high recall.
In this evaluation, the accuracy metric is chosen in order to measure the performance of the proposed method against the results of Tang [
27] on the binary classification task. The binary classification task represents the model’s ability to detect whether there is an attack or not in the network environment. The accuracy metric measures tolerance, or how close the predicted classification values are to the true values. As shown in
Figure 6, the proposed XGBoost algorithm with the selected five features achieved a high accuracy score of 95.55%, compared with the deep neural network with six features proposed by Tang [
27], which had a score of 75.75%. In addition, from the previous
Figure 5, we showed that the proposed model performs with high precision, with a score of 92%. Higher precision is related to reproducibility and repeatability; it is the degree to which repeated measurements on the classification task, under unchanged conditions, show the same results. We conclude from the previous scores that the proposed algorithm is accurate and stable in the binary classification task.
Deep inside the performance details of the proposed model, the performance of the multi-class classification against the different types of attacks is shown in
Figure 7. F1-score, precision and recall are computed on the four attack categories separately. For the F1-score, the proposed XGBoost showed an outstanding performance in detecting R2L and U2R attacks with similar scores of 96%, while against DOS and Probe, the model achieved scores of 94.6% and 94.5%, respectively, with a difference of 2%.
For precision and recall, the detection scores for R2L and U2RL achieved similar scores of 92% and 98%, respectively. For DOS and Probe, the model achieved slightly similar precision scores of 90%, 90.1%, and recall scores of 82% and 85%, respectively. We conclude from these F1-scores that the proposed model has a difference of only 2% in its performance against the four categories of attacks, while precision and recall differ by nearly 7%. This shows that the proposed model is forceful, effective and energetic, and can be applied to several types of attacks without weakening its performance.
For another evaluation of the proposed model on the four types of attacks, to make sure that the proposed method’s performance is efficient and capable, we used Receiver Operating Characteristic (ROC), which plots and measures the trade-off between TPR (recall, or probability of detection), and FPR, known as the probability of false alarm. Afterwards, we compute the Area Under Curve or AUC, to measure the overall 2-D area under the all-ROC curve. AUC gives us the accumulated performance over all the possible classification thresholds. AUC is scale-invariant and threshold-invariant: it is not a function of threshold, like F-score; it is an evaluation of the classifier as threshold versus overall possible values. As shown in
Figure 8, the AUC value achieves 99.94% for all four classes. Hence, another evaluation method proves that the proposed method is qualified and skillful in attack classification.
In the fifth evaluation (
Figure 9), we test the proposed method’s detection rate against the binary approach proposed in [
27]. The detection rate, or sensitivity, or true-positive rate, measures the proportion of the model’s predictions of whether a packet is an attack or not with respect to all positive data points or all attacks. For instance, if we have 100 samples of packets, of which 95 are attacks, a robust model shows that it can detect and classify all the 95 attacks correctly and classify the other 5 as normal. As shown in
Figure 9, the proposed method successfully accomplished an accuracy of 96.55% in the detection rate, while the deep learning method with six features achieved a score of 75.75%.
Finally, we evaluate the proposed method using an accuracy analysis against seven classical machine learning algorithms, in addition to the deep neural network [
27]. As shown in
Figure 10, the proposed method achieves an accuracy of 95.55%, while the second-best accuracy performance is 82.02 for the NB tree, showing a significant difference between the accuracy of our proposed method and the other approaches. This evaluation confirms that the proposed method is accurate and robust, even compared against other algorithms. This shows how the unambiguous steps in our approach are reliable, effective and authoritative.
We conclude that the proposed method achieves a verifiable result using several techniques. For the precise literature and comparison, we carefully chose the NSL-KDD data set, which is considered one of the most powerful benchmark datasets. Several procedures of data statistics, cleaning and verification are performed on the dataset, which are very important in order to produce a smooth learning process with no obstacles, such as over- or under-fitting issues. This stage ensures that the proposed model has unified data and increases the value of data, which helps in decision-making. Feature normalization and selection clarifies the path for clear selection and intelligent preferences, using only 5 features. Subsequently, more detailed exploration and various comparisons are carried out, based on three machine learning algorithms, i.e., DT, RF, and XGBoost, in order to test their performance with different criteria and then select the best performing algorithm for our task. This shows that the selection is dependably proven and technically verified.
6. Conclusions and Future Work
NIDS in SDN-based machine learning algorithms has attracted significant attention in the last two decades because of the datasets and various algorithms proposed in machine learning, using only limited features for better detection of anomalies better and more efficient network security. In this study, the benchmarking dataset NSL-KDD is used for training and testing. Feature normalization, feature selection and data preprocessing techniques are used in order to improve and optimize the algorithm’s performance for accurate prediction, as well as to facilitate a smooth training process with minimal time and resources. To select the appropriate algorithm, we compare three classical tree-based machine learning algorithms; Random Forest, Decision Trees and XGBoost. We examine them using a variety of evaluation metrics to find the disadvantages and advantages of using one or more. Using six different evaluation metrics, the proposed XGBoost model outperformed more than seven algorithms used in NIDS. The proposed method focused on detecting anomalies and protecting the SDN platform from attacks in real-time scenarios. The proposed methods performed two tasks simultaneously; to detect if there is an attack or not, and to determine the type of attack (Dos, probe, U2R, R2L).
In future studies, more evaluation metrics will be carried out. We plan to implement the approach using several deep neural network algorithms, such as Auto-Encoder, Generative Adversarial Networks, and Recurrent neural networks, such as GRU and LSTM. These techniques have been proven in the literature to allow convenient anomaly detection approaches in NIDS applications. Also, we plan to compare these algorithms against each other and integrate one or more neural network architectures to extract more details of how we can implement an efficient anomaly detection system in NIDS, with lower consumption of time and resources.
In addition, for a more solid basis for comparison, several benchmarking cyber security datasets, such as NSL-KDD, UNSW-NB15, CIC-IDS2017 will be conducted, in order to make sure that the selection of the proposed algorithm is not biased in any situation. These various datasets are generated in different environments and conditions, so more complex features will be available, more generalized attacks will be covered and the accuracy of the proposed algorithm will significantly increase, which could lead to a state-of-the-art approach.