Anomalous Network Traffic Detection Method Based on an Elevated Harris Hawks Optimization Method and Gated Recurrent Unit Classifier
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. GRU Neural Network
3.2. Harris Hawks Optimization
3.2.1. Exploration Phase
3.2.2. Transition from Exploration to Exploitation
3.2.3. Exploitation Phase
- Soft BesiegeWhen and , the prey has enough energy to escape, so the Harris Hawk uses a soft siege strategy, the main purpose of which is to consume the energy of the prey, and choose the best position to raid and dive to catch the prey, The equation for its position update is as follows:In the equation, is the position of the prey at iteration t, is the difference between the position vector of the rabbit and the current location in iteration t, J = 2(1 − rand), which is the random jump strength of the rabbit throughout the escaping procedure, where rand is the a random number inside (0, 1).
- Hard BesiegeWhen and < 0.5, the energy of the prey is severely consumed and exhausted. In addition, the Harris hawks hardly encircle the intended prey to finally perform the surprise pounce. The mathematical expression for its position update is:
- Soft besiege with progressive rapid divesWhen r < 0.5 and > 0.5, the prey still has a chance to escape, and the escape energy is sufficient. The Harris Hawks would make a soft siege before attacking. In order to simulate the escape mode of the prey, HHO introduces the Levy function(LF) to update the mathematical expression of the position in the HHO algorithm:
- Hard besiege with progressive rapid divesWhen r < 0.5 and < 0.5, the prey has a chance to escape, but the escape energy E is insufficient, so the Harris hawk adopts a hard besiege with progressive rapid dives, forming a hard besiege before the raid, and then shrinks them and the prey average distance. The mathematical expression for its position update is:The HHO uses escape energy E and factor r to configure four attack mechanisms between Harris hawk and prey to solve the optimization problem.
4. Elevated HHO
4.1. Elevated HHO for Escape Energy Function
4.2. Elevated HHO for Random Jump Distance Function
Algorithm 1: Pseudo code of standard Harris Hawks Optimisation. |
Input: The population size N and maximum number of iterations T Output: The location of rabbit and its fitness value Initialize the random population return |
Algorithm 2: Pseudo code of Elevated Harris Hawks Optimisation. |
Input: The population size N and maximum number of iterations T Output: The location of rabbit and its fitness value Initialize the random population return |
4.3. Fitness Function
4.4. Detailed Execution Flow of EHHO-GRU Abnormal Flow Detection Method
- Data preprocessing. Delete the feature columns with more than half of the total missing values in the malware detection dataset, map discrete data into feature columns with one-hot encoding, and process the missing values in the dataset with 0 padding; in binary classification, set the class label to normal and anomalies are coded 0 and 1 respectively; the data set is processed in a normalized way to reduce the influence of the data dimension, and the features that are completely irrelevant to the label column in the data set are deleted in the way of mutual information feature selection.
- Optimization algorithm initialization. Set the initialization parameters of the EHHO algorithm population, the number of iterations, and the problem dimension, and determine the fitness function. Initialize the population and perform binary discretization of the feature dimensions in feasible solutions.
- Calculate the fitness. Taking the fitness function as the optimization direction, calculating the fitness value of the feasible solution after iteration, taking the feasible solution with the best fitness in the population as the prey, and record the optimal individual information.
- Determine whether the optimization cycle is terminated, if the termination condition is met, go to step (5); otherwise, go back to step (3).
- Output the optimal feature subset and send it to the GRU classifier, and end the algorithm process.
5. Experiment
5.1. Experiment Environment
5.2. Use of Datasets
- NSL-KDD datasetThe NSL-KDD dataset is a simplified and improved version of the KDDCUP99 dataset. The NSL-KDD dataset ensures that the intrusion detection model is free from bias and is more suitable for deep learning for anomalous traffic monitoring. The classification labels in the multiclassification test are shown in Table 2, which gives details of the NSL-KDD dataset. In the dichotomous test the classification labels are all 1 except for the Normal label which is 0. The detailed composition is shown in Table 2.
Attack Category | Description | Train | Test |
---|---|---|---|
Normal | normal flow record | 67,341 | 9711 |
Probe | Get detailed statistics on system and network configuration | 11,656 | 7456 |
DoS | Attacks are designed to degrade network resources | 45,927 | 2421 |
U2R | get permission | 114 | 1436 |
R2L | Illegal access to a remote computer | 934 | 1520 |
Total | 125,972 | 22,543 |
- UNSW-NB15 datasetThe UNSW-NB15 dataset is a new public dataset introduced by the Cyber Security Experimentation Team at the Australian Cyber Security Centre. Cyber security researchers often use the UNSW-NB15 dataset to address issues identified in the NSL-KDD dataset and the KDDCUP99 dataset. The dataset is generated in a hybrid manner and includes both normal and attack traffic from real-time network traffic, making it a comprehensive dataset of network attack traffic. There was a total of 49 features in the dataset to describe a piece of data. There are a total of nine types of abnormal attack traffic marked as 1 and one type of normal traffic marked as 0 in this dataset, the details of which are shown in Table 3 below.
Attack Category | Description | Train | Test |
---|---|---|---|
Normal | normal flow record | 37,000 | 56,000 |
Backdoor | Techniques to gain access to programs or systems by bypassing security controls | 583 | 1746 |
Analysis | Intrusion methods of infiltrating web applications through ports and web scripts | 677 | 2000 |
Fuzzers | An attack that tries to find a security hole by passing a lot of random data, making it crash | 6062 | 18,184 |
Shellcode | Attacks that control the target machine by sending code that exploits a specific vulnerability | 378 | 1133 |
Reconnaissance | Attacks that collect computer network information to evade security controls | 3496 | 10,491 |
Exploit | Code that takes control of the target system by triggering a bug or several bugs | 11,132 | 33,393 |
DoS | Attacks are designed to degrade network resources | 4089 | 12,264 |
Worms | Actively attacking malignant computer virus spread through the network | 44 | 130 |
Genertic | A technique for colliding each block cipher using a hash function | 18,871 | 40,000 |
Total | 82,332 | 175,341 |
- CICIDS2018 datasetThe CICIDS2018 dataset is a collaborative project between the Communications Security Establishment (CSE) and the Canadian Institute for Cybersecurity Research (CIC). Previous partial datasets were highly anonymous, did not reflect current trends, or they lacked certain statistical characteristics for which perfect datasets existed. The Canadian research team has therefore devised a systematic approach to generating datasets to analyze, test, and evaluate intrusion detection systems, with a focus on network-based anomaly detectors. The dataset includes seven different attack scenarios: Brute-force, Heartbleed, Botnet, DoS, DDoS, Web attacks, and more. The data is made up of an attack machine consisting of 50 hosts, with 5 sectors of target machines, including 420 machines and 30 servers. The dataset includes network traffic and system logs for each machine captured, as well as 80 features extracted from the captured traffic using CICFlowMeter-V3. As the dataset is too large to contain ten datasets and is limited by the experimental environment, a portion of the data from each dataset is selected to form a new dataset for the experiment, and 10% of each dataset is extracted hierarchically for generalizability. There are seven types of abnormal attack traffic marked as 1 and one type of normal traffic marked as 0 in this dataset, and the detailed composition is shown in Table 4.
5.3. Dataset Preprocessing Step
- Data normalisationA min-max normalization process is used to compress the data into the (−1, 1) interval. One advantage is that it improves the speed of convergence of the model, and another is that it improves the accuracy of convergence, which is given by:
- Character feature unique heat codeIn feature engineering, data will appear as category-based features, including character-based features and discontinuous features. In order to solve the problem that the classifier does not handle attribute data well, this paper uses the pre-processing part of the Keras framework to process these category-based data with unique thermal coding.
- Dataset labelsSome datasets commonly used in the field of anomalous traffic detection will have more than one label, due to the characteristics of the data commonly used in this field, which, in addition to labeling the different attack method category labels, also typically use 0: for normal network traffic, 1: for abnormal network traffic, to act as labels for the network data. Some datasets (e.g., UNSW-NB15 dataset) inherently have two label columns, while in others (e.g., NSL-KDD dataset), only the attack category label column exists. In order to better evaluate the generalisability of the anomalous traffic detection model proposed in this paper, this paper tests and evaluates each dataset separately for multiple and dual classifications.
5.4. Experimental Model Parameter Settings
5.5. Experimental Evaluation Metrics
6. Experimental Results and Analysis
6.1. EHHO Algorithm Performance Test
6.2. Analysis of the EHHO-GRU Model Results for the NSL-KDD Dataset
6.3. Analysis of the EHHO-GRU Model Results for the UNSW-NB15 Dataset
6.4. EHHO-GRU Model Analysis of CICIDS2018 Dataset Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Almomani, O. A Feature Selection Model for Network Intrusion Detection System Based on PSO, GWO, FFA and GA Algorithms. Symmetry 2020, 12, 1046. [Google Scholar] [CrossRef]
- Gu, J.; Wang, L.; Wang, H.; Wang, S. A novel approach to intrusion detection using SVM ensemble with feature augmentation. Comput. Secur. 2019, 86, 53–62. [Google Scholar] [CrossRef]
- Song, K.; Yan, F.; Ding, T.; Gao, L.; Lu, S. A steel property optimization model based on the XGBoost algorithm and improved PSO. Comput. Mater. Sci. 2020, 174, 109472. [Google Scholar] [CrossRef]
- Júnior, D.A.D.; da Cruz, L.B.; Diniz, J.O.B.; da Silva, G.L.F.; Junior, G.B.; Silva, A.C.; de Paiva, A.C.; Nunes, R.A.; Gattass, M. Automatic method for classifying COVID-19 patients based on chest X-ray images, using deep features and PSO-optimized XGBoost. Expert Syst. Appl. 2021, 183, 115452. [Google Scholar] [CrossRef] [PubMed]
- Aghdam, M.H.; Kabiri, P. Feature selection for intrusion detection system using ant colony optimization. Int. J. Netw. Secur. 2016, 18, 420–432. [Google Scholar]
- Mazini, M.; Shirazi, B.; Mahdavi, I. Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and AdaBoost algorithms. J. King Saud Univ.-Comput. Inf. Sci. 2019, 31, 541–553. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, P.; Wang, X. Intrusion detection for IoT based on improved genetic algorithm and deep belief network. IEEE Access 2019, 7, 31711–31722. [Google Scholar] [CrossRef]
- Zhang, L.; Fan, X.; Xu, C. A fusion financial prediction strategy based on RNN and representative pattern discovery. In Proceedings of the 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Taipei, Taiwan, 18–20 December 2017; pp. 92–97. [Google Scholar]
- Sheikhan, M.; Jadidi, Z.; Farrokhi, A. Intrusion detection using reduced-size RNN based on feature grouping. Neural Comput. Appl. 2012, 21, 1185–1190. [Google Scholar] [CrossRef]
- Agarap, A.F.M. A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data. In Proceedings of the 2018 10th International Conference on Machine Learning and Computing, Macau, China, 26–28 February 2018; pp. 26–30. [Google Scholar]
- Zhang, H.; Kang, C.; Xiao, Y. Research on Network Security Situation Awareness Based on the LSTM-DT Model. Sensors 2021, 21, 4788. [Google Scholar] [CrossRef] [PubMed]
- Sak, H.; Senior, A.; Beaufays, F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv 2014, arXiv:1402.1128. [Google Scholar]
- Li, Y.; Lu, Y. LSTM-BA: DDoS detection approach combining LSTM and Bayes. In Proceedings of the 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD), Suzhou, China, 21–22 September 2019; pp. 180–185. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
- Chen, H.; Jiao, S.; Wang, M.; Heidari, A.A.; Zhao, X. Parameters identification of photovoltaic cells and modules using diversification-enriched Harris hawks optimization with chaotic drifts. J. Clean. Prod. 2020, 244, 118778. [Google Scholar] [CrossRef]
- Protić, D.D. Review of KDD Cup 99, NSL-KDD and Kyoto 2006+ datasets. Vojnoteh. Glas. Tech. Cour. 2018, 66, 580–596. [Google Scholar] [CrossRef] [Green Version]
- Moustafa, N.; Slay, J. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. A Glob. Perspect. 2016, 25, 18–31. [Google Scholar] [CrossRef]
- Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
- Poli, R.; Kennedy, J.; Blackwell, T. Particle swarm optimization. Swarm Intell. 2007, 1, 33–57. [Google Scholar] [CrossRef]
- Vinayakumar, R.; Alazab, M.; Soman, K.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
CPU | AMD R5 3600X |
GPU | Nvdia rtx2060 |
RAM | 16 GB |
Language | Python3.9 |
Deep Learning Framework | Pytorch |
Attack Category | Description | Train | Test |
---|---|---|---|
Brute-force attack | Perform brute force and password cracking attacks | 31,767 | 21,178 |
Botnet | botnet | 17,167 | 11,445 |
DoS | Attacks are designed to degrade network resources | 38,606 | 25,738 |
DDoS | Distributed Denial of Service Attack | 82,307 | 54,871 |
Infiltration | Intranet penetration attack | 13,640 | 9093 |
SQL | SQL injection attack | 8 | 5 |
Benign | benign traffic | 69,489 | 46,326 |
Total | 252,984 | 168,656 |
Category | Description | |
---|---|---|
N | initial population | 30 |
T | The maximum number of iterations of the feature selection algorithm | See the specific experiment section for details. |
Max depth | Decision tree maximum depth | 4 |
Hid dim | The number of hidden layer units in the neural network | 128 |
Lr | Neural network learning rate | 0.0005 |
E | The number of neural network iterations | 3000 |
Neural network forgetting rate | 0.5 |
Function | Equation | Variable Domain | The Optimal Value |
---|---|---|---|
Ackley | [−5, 5] | 0 | |
Booth | [−10, 10] | 0 | |
Easom | [−100, 100] | ||
Rastrigin | [−5.12, 5.12] | 0 |
Method | GRU | PSO | WOA | GA | HHO | EHHO |
---|---|---|---|---|---|---|
Feature dimension | 41 | 11 | 5 | 5 | 5 | 6 |
accuracy | 78.34% | 77.94% | 79.77% | 81.42% | 79.97% | 82.47% |
precision | 96.84% | 96.79% | 95.02% | 94.58% | 96.07% | 96.23% |
recall | 64.03% | 63.34% | 68.03% | 71.46% | 67.58% | 72.02% |
f1-score | 77.09% | 76.57% | 79.29% | 81.41% | 79.34% | 82.38% |
Fpr | 2.76% | 2.78% | 4.72% | 5.41% | 3.66% | 3.73% |
Method | GRU | PSO | WOA | GA | HHO | EHHO |
---|---|---|---|---|---|---|
Feature dimension | 42 | 9 | 8 | 10 | 9 | 8 |
accuracy | 86.27% | 85.14% | 84.14% | 86.13% | 84.13% | 86.85% |
normal tpr | 96.72% | 98.89% | 97.11% | 97.29% | 98.23% | 97.52% |
normal fpr | 24.89% | 29.56% | 29.71% | 25.81% | 30.95% | 24.55% |
DoS tpr | 98.92% | 83.23% | 93.12% | 95.21% | 77.83% | 97.46% |
DoS fpr | 0.43% | 1.06% | 0.48% | 0.38% | 0.54% | 0.36% |
r2l tpr | 0.23% | 0.00% | 0.00% | 0.00% | 7.05% | 0.00% |
r2l fpr | 0.04% | 0.00% | 0.07% | 0.03% | 0.28% | 0.02% |
probe tpr | 99.37% | 52.08% | 91.32% | 99.28% | 69.62% | 99.46% |
probe fpr | 1.67% | 5.73% | 1.32% | 2.17% | 5.27% | 1.97% |
u2r tpr | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
u2r fpr | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
Method | GRU | PSO | WOA | GA | HHO | EHHO |
---|---|---|---|---|---|---|
Feature dimension | 42 | 19 | 23 | 18 | 11 | 20 |
accuracy | 74.08% | 87.04% | 75.56% | 88.37% | 89.33% | 90.26% |
precision | 99.59% | 98.31% | 98.7% | 98.57% | 98.62% | 96.96% |
recall | 62.18% | 82.38% | 64.94% | 84.13% | 85.51% | 88.46% |
f1-score | 76.56% | 89.64% | 78.34% | 90.78% | 91.6% | 92.52% |
Fpr | 0.55% | 3.02% | 1.82% | 2.59% | 2.54% | 5.92% |
Method | GRU | PSO | WOA | GA | HHO | EHHO |
---|---|---|---|---|---|---|
Feature dimension | 42 | 19 | 11 | 16 | 16 | 22 |
accuracy | 74.01% | 85.97% | 86.45% | 88.17% | 86.14% | 88.67% |
Normal TPR | 99.3% | 98.41% | 98.79% | 98.62% | 98.45% | 97.76% |
Normal FPR | 37.86% | 17.77% | 19.35% | 16.74% | 19.64% | 15.6% |
Generic TPR | 97.87% | 97.9% | 98% | 97.78% | 97.85% | 97.78% |
Generic FPR | 0.1% | 0.21% | 0.33% | 0.15% | 0.14% | 0.04% |
Exploits TPR | 7.06% | 54.3% | 54.76% | 54.95% | 51.18% | 53.66% |
Exploits FPR | 2.09% | 3.74% | 3.46% | 3.62% | 3.52% | 3.89% |
Fuzzers TPR | 4.09% | 8.66% | 6.42% | 7.8% | 9.07% | 12.4% |
Fuzzers FPR | 0.37% | 0.59% | 0.51% | 0.66% | 1.41% | 0.99% |
DoS TPR | 64.2% | 70.14% | 71.71% | 72.97% | 70.21% | 73.88% |
DoS FPR | 9.85% | 10.6% | 10.73% | 10.96% | 10.69% | 11.02% |
Reconnaissance TPR | 33.92% | 42.64% | 43.35% | 69.96% | 41.46% | 71.9% |
Reconnaissance FPR | 0.09% | 0.44% | 0.67% | 0.41% | 0.66% | 0.6% |
Analysis TPR | 0% | 0% | 0% | 0% | 0% | 0% |
Analysis FPR | 0% | 0% | 0% | 0% | 0% | 0% |
Backdoor TPR | 0% | 0% | 0% | 1.83% | 0% | 0% |
Backdoor FPR | 0% | 0% | 0% | 0% | 0% | 0% |
Shellcode TPR | 33.54% | 6.62% | 8.21% | 0% | 0% | 0% |
Shellcode FPR | 0.35% | 0.09% | 0.04% | 0% | 0% | 0.01% |
Worms TPR | 0% | 0% | 0% | 0% | 0% | 0% |
Worms FPR | 0% | 0% | 0% | 0% | 0% | 0% |
Method | GRU | PSO | WOA | GA | HHO | EHHO |
---|---|---|---|---|---|---|
Feature dimension | 80 | 18 | 6 | 5 | 11 | 11 |
accuracy | 94.57% | 93.94% | 94.06% | 93.66% | 94.15% | 94.20% |
precision | 99.87% | 98.93% | 98.33% | 98.3% | 99.38% | 99.51% |
recall | 92.64% | 92.65% | 93.41% | 92.87% | 93.41% | 92.48% |
f1-score | 96.12% | 95.69% | 95.81% | 95.51% | 95.83% | 95.86% |
Fpr | 0.31% | 2.65% | 4.22% | 4.26% | 1.53% | 1.22% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, Y.; Kang, C.; Yu, H.; Fan, T.; Zhang, H. Anomalous Network Traffic Detection Method Based on an Elevated Harris Hawks Optimization Method and Gated Recurrent Unit Classifier. Sensors 2022, 22, 7548. https://doi.org/10.3390/s22197548
Xiao Y, Kang C, Yu H, Fan T, Zhang H. Anomalous Network Traffic Detection Method Based on an Elevated Harris Hawks Optimization Method and Gated Recurrent Unit Classifier. Sensors. 2022; 22(19):7548. https://doi.org/10.3390/s22197548
Chicago/Turabian StyleXiao, Yao, Chunying Kang, Hongchen Yu, Tao Fan, and Haofang Zhang. 2022. "Anomalous Network Traffic Detection Method Based on an Elevated Harris Hawks Optimization Method and Gated Recurrent Unit Classifier" Sensors 22, no. 19: 7548. https://doi.org/10.3390/s22197548