Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach
Abstract
:1. Introduction
- The authors proposed an efficient and reliable feature selection approach that incorporates genetic search technique, rule-based engine, and CfsSubsetEval, which effectively selects the relevant features for increasing the performance of models.
- The authors proposed a comprehensive and practical ensemble approach that combines the above four mentioned algorithms as an ensemble classifier.
- A comprehensive comparative analysis using various current IDS datasets, which consequently yields yet an accurate, reliable, and efficient IDS.
- The proposed system outshines its equivalent systems using various evaluation metrics on three current IDS datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017.
- Finally, the proposed approach has recorded negligible FAR, MBT, and MTT compared to many current studies and equally solves the low detection challenge of the UNSW-NB15 dataset discussed in subsequent sections.
2. Fundamental Concepts
2.1. Ensemble Learning
Synopsis of Bagging, Boosting and Stacking
2.2. Feature Selection
Summary of Filter, Wrapper and Embedded Models
Algorithm 1 A Classical Filter Algorithm |
Input: D (F0, F1, …, Fn−1) // A training dataset of N features |
So // A specified starting subset for the search of an optimal subset |
Ω // A specified stopping condition |
Output Sbest // A desired optimal subset |
01 begin |
02 Initialize: Sbest = So; |
03 λbest = eval (So, D, α); // So is evaluated Using an Independent Measure α |
04 do begin |
05 S = generate (D); // Generate a subset for evaluation |
06 λ = eval (S, D, α); // Evaluating the current generated subset S by α |
07 if (λ is better than λbest); |
08 λbest = λ; |
09 Sbest = S; |
10 end until (Ω is reached or fulfilled); |
11 return Sbest |
12 end; |
Algorithm 2 A Classical Wrapper Algorithm |
Input: D (F0, F1, …, Fn−1) // A training dataset of N features |
So // A specified starting subset for the search of an optimal subset |
Ω // A specified stopping condition |
Output Sbest // A desired optimal subset |
01 begin |
02 Initialize: Sbest = So; |
03 λbest = eval (So, D, Ψ); // So is evaluated Using a Specified Algorithm Ψ |
04 do begin |
05 S = generate (D); // Generate a subset for evaluation |
06 λ = eval (S, D, Ψ); // Evaluating the current generated subset S by algorithm Ψ |
07 if (λ is better than λbest); |
08 λbest = λ; |
09 Sbest = S; |
10 end until (Ω is reached or fulfilled); |
11 return Sbest |
12 end; |
3. Related Works
4. Materials and Methods
4.1. Summary and Statistical Details of the Benchmark Datasets
4.1.1. Summarized Description of the CIC-IDS2017 Dataset
4.1.2. Synopsis of the UNSW-NB15 Dataset
4.1.3. Outline of the NSL-KDD Dataset
4.2. Data Preprocessing
4.2.1. Data Cleaning and Removal of White Spaces
4.2.2. Label Encoding
4.2.3. Data Normalization
4.3. Utilized Base Clustering Algorithms
4.4. Feature Selection
4.4.1. Genetic Search
4.4.2. CfsSubsetEval
4.4.3. Rule-Based Engine
4.4.4. Feature Selection Steps
Algorithm 3 The Proposed Hybrid Feature Selection Algorithm |
Input: S (F1, F2, …, Fk, Fc) //A training dataset of 𝓃 features |
Output Class Ci //A desired optimal subset |
01 Start by arbitrarily creating an underlying populace P. |
02 Process correlation (CfsSubsetEval) between features subsets and classification |
03 Select subset of features with high correlation |
04 Ascertain f(x:CfsSubsetEval) for every part x ∈ p. |
05 Characterize a likelihood dissemination p over the individuals from p where p(x)/f(x). |
06 Select two populace individuals x and y regarding p. |
07 Applying hybrid to x as well as y to create novel populace individuals x’ as well as y’ |
08 Apply change to x’ as well as y’ |
09 Supplement x’ as well as y’ into p’. |
10 In the event that |p’| < |p|, iterate to step 4 |
11 Now let p← p’ |
12 If there exist more generations to measure, iterate to 2. |
13 Return x ∈ p where f(x) is most noteworthy. |
14 if any two subsets of features have a similar fitness value |
15 Return the subset of features lowest number of subset attributes or features |
16 beginning weight ←irregular (x) |
17 for all cases p as well as for each output node j |
18 Compute Activation (j) |
19 for every input node i to output node (j) Do |
20 Delta W = LearningConst * Error-j * Activation-i |
21 W (t) = W + DeltaW until error is adequately little or expires |
22 For I = length of p |
23 For J = L (length) of t, if (j = = i) & ci = t[j] |
24 Else increase i by 1 |
25 return Ci |
26 End; |
4.5. Adopted Model Evaluation Metrics
4.5.1. Accuracy (ACC)
4.5.2. Detection Rate (DR)
4.5.3. Precision
4.5.4. F1-Measure (F1)
4.5.5. False Alarm Rate (FAR)
4.5.6. Model Building/Testing Time (MBT/MTT)
5. Results and Discussion
5.1. Comparison of Feature Selection with No Feature Selection
5.2. Assessment of the Proposed HFS with Other Feature Selection Methods
5.3. Evaluation of KODE with (Voting) and Other Classification Methods
5.4. Comparison of Various Adopted Combination Rules
5.5. Comparison of Our Proposed Approach with Other Cutting-Edge IDS Approaches
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Park, J.H. Advances in Future Internet and the Industrial Internet of Things. Symmetry 2019, 11, 244. [Google Scholar] [CrossRef] [Green Version]
- Tankard, C. Big data security. Netw. Secur. 2012, 2012, 5–8. [Google Scholar] [CrossRef]
- Khan, M.; Karim, R.; Kim, Y. A Scalable and Hybrid Intrusion Detection System Based on the Convolutional-LSTM Network. Symmetry 2019, 11, 583. [Google Scholar] [CrossRef] [Green Version]
- Meryem, A.; EL Ouahidi, B. Hybrid intrusion detection system using machine learning. Netw. Secur. 2020, 2020, 8–19. [Google Scholar] [CrossRef]
- Sarker, I.H.; Kayes, A.S.M.; Badsha, S.; Alqahtani, H.; Watters, P.; Ng, A. Cybersecurity data science: An overview from machine learning perspective. J. Big Data 2020, 7, 1–29. [Google Scholar] [CrossRef]
- Damaševičius, R.; Venčkauskas, A.; Toldinas, J.; Grigaliūnas, Š. Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection. Electronics 2021, 10, 485. [Google Scholar] [CrossRef]
- Dang, Q. Studying Machine Learning Techniques for Intrusion Detection Systems. In Future Data and Security Engineering. FDSE 2019. Lecture Notes in Computer Science; Dang, T., Küng, J., Takizawa, M., Bui, S., Eds.; Springer: Cham, Switzerland, 2019; Volume 11814. [Google Scholar]
- Muñoz, A.; Maña, A.; González, J. Dynamic Security Properties Monitoring Architecture for Cloud Computing. In Security Engineering for Cloud Computing: Approaches and Tools; IGI Globa: Hershey, PA, USA, 2013; pp. 1–18. [Google Scholar] [CrossRef]
- Kagara, B.N.; Siraj, M.M. A Review on Network Intrusion Detection System Using Machine Learning. Int. J. Innov. Comput. 2020, 10, 598–607. [Google Scholar] [CrossRef]
- Bhosale, K.S.; Nenova, M.; Iliev, G. Intrusion Detection in Communication Networks Using Different Classifiers. Technol. Soc. 2018, 2019, 19–28. [Google Scholar] [CrossRef]
- Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef] [Green Version]
- Ashoor, A.S.; Gore, S. Importance of Intrusion Detection System (IDS). Int. J. Sci. Eng. Res. 2011, 2, 1–4. Available online: http://www.ijser.org/researchpaper%5CImportance_of_Intrusion_Detection_System.pdf (accessed on 15 August 2021).
- Saleh, A.I.; Talaat, F.M.; Labib, L.M. A hybrid intrusion detection system (HIDS) based on prioritized k-nearest neighbors and optimized SVM classifiers. Artif. Intell. Rev. 2017, 51, 403–443. [Google Scholar] [CrossRef]
- Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J.; Alazab, A. Hybrid Intrusion Detection System Based on the Stacking Ensemble of C5 Decision Tree Classifier and One Class Support Vector Machine. Electronics 2020, 9, 173. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Cheng, G.; Jiang, S.; Dai, M. Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput. Netw. 2020, 174, 107247. [Google Scholar] [CrossRef] [Green Version]
- Lyu, R.; He, M.; Zhang, Y.; Jin, L.; Wang, X. Network Intrusion Detection Based on an Efficient Neural Architecture Search. Symmetry 2021, 13, 1453. [Google Scholar] [CrossRef]
- Zhang, Y.; Ye, X.; Xie, F.; Peng, Y. A Practical Database Intrusion Detection System Framework. In Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology, Xiamen, China, 11–14 October 2009; Volume 1, pp. 342–347. [Google Scholar] [CrossRef]
- Song, J.; Takakura, H.; Okabe, Y.; Nakao, K. Toward a more practical unsupervised anomaly detection system. Inf. Sci. 2013, 231, 4–14. [Google Scholar] [CrossRef]
- Ullah, I.; Mahmoud, Q.H. A filter-based feature selection model for anomaly-based intrusion detection systems. In Proceedings of the 2017 IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017; pp. 2151–2159. [Google Scholar] [CrossRef]
- Fitni, Q.R.S.; Ramli, K. Implementation of Ensemble Learning and Feature Selection for Performance Improvements in Anomaly-Based Intrusion Detection Systems. In Proceedings of the 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence and Communications Technology (IAICT), Bali, Indonesia, 7–8 July 2020; pp. 118–124. [Google Scholar] [CrossRef]
- Vaiyapuri, T.; Binbusayyis, A. Application of deep autoencoder as a one-class classifier for unsupervised network intrusion detection: A comparative evaluation. PeerJ Comput. Sci. 2020, 6, e327. [Google Scholar] [CrossRef]
- Wagh, S.K.; Kolhe, S. Effective semi-supervised approach towards intrusion detection system using machine learning techniques. Int. J. Electron. Secur. Digit. Forensics 2015, 7, 290. [Google Scholar] [CrossRef]
- Hanifi, K.; Güvensan, M.A. Makine Ö˘grenmesi Anormal Durum Belirleme Yakla¸ sımı ile A˘g Üzerinde Saldırı Tespiti: Network Intrusion Detection Using Machine Learning Anomaly Detection Algorithms. 2016. Available online: https://ieeexplore.ieee.org/document/8442693 (accessed on 15 August 2021).
- Gautam, R.K.S.; Doegar, E.A. An Ensemble Approach for Intrusion Detection System Using Machine Learning Algorithms. In Proceedings of the 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 11–12 January 2018. [Google Scholar] [CrossRef]
- Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Proceedings of the NDSS Symposium 2018, San Diego, CA, USA, 18–21 February 2018; pp. 18–21. [Google Scholar] [CrossRef]
- Sah, G.; Banerjee, S. Feature Reduction and Classifications Techniques for Intrusion Detection System. In Proceedings of the 2020 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 28–30 July 2020; pp. 1543–1547. [Google Scholar]
- Sarnovsky, M.; Paralic, J. Hierarchical Intrusion Detection Using Machine Learning and Knowledge Model. Symmetry 2020, 12, 203. [Google Scholar] [CrossRef] [Green Version]
- Mahfouz, A.; Abuhussein, A.; Venugopal, D.; Shiva, S. Ensemble Classifiers for Network Intrusion Detection Using a Novel Network Attack Dataset. Futur. Internet 2020, 12, 180. [Google Scholar] [CrossRef]
- Zhou, Z.-H. Ensemble Learning. In Encyclopedia of Biometrics; Li, S.Z., Jain, A., Eds.; Springer: Boston, MA, USA, 2009; pp. 270–273. [Google Scholar] [CrossRef]
- Li, Y.; Chen, W. A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematic 2020, 8, 1756. [Google Scholar] [CrossRef]
- Richman, R.; Wüthrich, M.V. Nagging Predictors. Risks 2020, 8, 83. [Google Scholar] [CrossRef]
- Syarif, I.; Zaluska, E.; Prugel-Bennett, A.; Wills, G. Application of Bagging, Boosting and Stacking to Intrusion Detection. In Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science; Perner, P., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7376. [Google Scholar] [CrossRef] [Green Version]
- Aburomman, A.; Reaz, M.B.I. A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput. Secur. 2017, 65, 135–152. [Google Scholar] [CrossRef]
- Gaikwad, D.; Thool, R.C. Intrusion Detection System Using Bagging Ensemble Method of Machine Learning. In Proceedings of the International Conference on Computing Communication Control and Automation, Pune, India, 26–27 February 2015; pp. 291–295. [Google Scholar] [CrossRef]
- Demir, N.; Dalkiliç, G. Modified stacking ensemble approach to detect network intrusion. Turk. J. Electr. Eng. Comput. Sci. 2018, 26, 418–433. [Google Scholar] [CrossRef]
- Rajagopal, S.; Kundapur, P.P.; Hareesha, K.S. A Stacking Ensemble for Network Intrusion Detection Using Heterogeneous Datasets. Secur. Commun. Netw. 2020, 2020, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Aljawarneh, S.; Aldwairi, M.; Yassein, M.B. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 2018, 25, 152–160. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Petrović, S. A Comparison of Feature-Selection Methods. 2010, pp. 242–255. Available online: https://link.springer.com/chapter/10.1007/978-3-642-14706-7_19 (accessed on 15 August 2021).
- Suman, C.; Tripathy, S.; Saha, S. Building an effective intrusion detection system using unsupervised feature selection in multi-objective optimization framework. arXiv 2019, arXiv:1905.06562. [Google Scholar]
- Song, L.; Smola, A.; Gretton, A.; Borgwardt, K.M.; Bedo, J. Supervised feature selection via dependence estimation. In Proceedings of the 24th International Conference on Machine learning, New York, NY, USA, 20–24 June 2007; pp. 823–830. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Z.; Liu, H. Semi-supervised Feature Selection via Spectral Analysis. In Proceedings of the 2007 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics, Minneapolis, MN, USA, 26–28 April 2007; pp. 641–646. [Google Scholar] [CrossRef] [Green Version]
- Dy, J.G.; Brodley, C.E. Feature selection for unsupervised learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar] [CrossRef] [Green Version]
- Visalakshi, S.; Radha, V. A literature review of feature selection techniques and applications: Review of feature selection in data mining. In Proceedings of the 2014 IEEE International Conference on Computational Intelligence and Computing Research, Piscataway, NJ, USA, 18–20 December 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Ambusaidi, M.A.; He, X.; Nanda, P.; Tan, Z. Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm. IEEE Trans. Comput. 2016, 65, 2986–2998. [Google Scholar] [CrossRef] [Green Version]
- Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: New York, NY, USA, 2001; p. 738. [Google Scholar]
- Robnik, M.; Konenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef] [Green Version]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar]
- Khammassi, C.; Krichen, S. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput. Secur. 2017, 70, 255–277. [Google Scholar] [CrossRef]
- Bai, L.; Wang, Z.; Shao, Y.-H.; Deng, N.-Y. A novel feature selection method for twin support vector machine. Knowl.-Based Syst. 2014, 59, 1–8. [Google Scholar] [CrossRef]
- Rani, P.; Kumar, R.; Jain, A.; Chawla, S.K. A Hybrid Approach for Feature Selection Based on Genetic Algorithm and Recursive Feature Elimination. Int. J. Inf. Syst. Model. Des. 2021, 12, 17–38. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Ma, S.; Huang, J. Penalized feature selection and classification in bioinformatics. Brief. Bioinform. 2008, 9, 392–403. [Google Scholar] [CrossRef] [Green Version]
- Milenkoski, A.; Vieira, M.; Kounev, S.; Avritzer, A.; Payne, B.D. Evaluating Computer Intrusion Detection Systems. ACM Comput. Surv. 2015, 48, 1–41. [Google Scholar] [CrossRef]
- Hota, H.S.; Shrivas, A.K. Decision Tree Techniques Applied on NSL-KDD data and its Comparison with Various Feature Selection Techniques. In Advanced Computing, Networking and Informatics; Springer: Berlin/Heidelberg, Germany, 2014; Volume 1. [Google Scholar] [CrossRef]
- Gaikwad, D.; Thool, R.C. Intrusion Detection System Using Bagging with Partial Decision TreeBase Classifier. Procedia Comput. Sci. 2015, 49, 92–98. [Google Scholar] [CrossRef] [Green Version]
- Thaseen, I.S.; Kumar, C.A. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J. King Saudi Univ. Comput. Inf. Sci. 2017, 29, 462–472. [Google Scholar] [CrossRef] [Green Version]
- Paulauskas, N.; Auskalnis, J. Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset. In Proceedings of the 2017 Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 27 April 2017; pp. 1–5. [Google Scholar]
- Abdullah, M.; Alshannaq, A.; Balamash, A.; Almabdy, S. Enhanced Intrusion Detection System using Feature Selection Method and Ensemble Learning Algorithms. Int. J. Comput. Sci. Inf. Secur. 2018, 16, 2018. [Google Scholar]
- The General Data Protection Regulation v. CCPA. 2018, pp. 1–42. Available online: https://fpf.org/wp-content/uploads/2018/11/GDPR_CCPA_Comparison-Guide.pdf (accessed on 15 August 2021).
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the ICISSP 2018, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set in Computational Intelligence for Security and Defense Applications. In Proceedings of the CISDA 2009: IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- Zhong, Y.; Chen, W.; Wang, Z.; Chen, Y.; Wang, K.; Li, Y.; Yin, X.; Shi, X.; Yang, J.; Li, K. HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning. Comput. Netw. 2020, 169, 107049. [Google Scholar] [CrossRef]
- Devan, P.; Khare, N. An efficient XGBoost–DNN-based classification model for network intrusion detection system. Neural Comput. Appl. 2020, 32, 12499–12514. [Google Scholar] [CrossRef]
- Rodriguez, M.Z.; Comin, C.H.; Casanova, D.; Bruno, O.M.; Amancio, D.R.; Costa, L.D.F.; Rodrigues, F. Clustering algorithms: A comparative approach. PLoS ONE 2019, 14, e0210236. [Google Scholar] [CrossRef]
- Koryshev, N.; Hodashinsky, I.; Shelupanov, A. Building a Fuzzy Classifier Based on Whale Optimization Algorithm to Detect Network Intrusions. Symmetry 2021, 13, 1211. [Google Scholar] [CrossRef]
- Bouhmala, N. How Good is the Euclidean Distance Metric for the Clustering Problem. In Proceedings of the 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, Japan, 10–14 July 2016 2016; pp. 312–315. [Google Scholar] [CrossRef]
- Chou, C.H.; Su, M.C.; Lai, E. Symmetry as a new measure for cluster validity. Recent Adv. Comput. Comput. Commun. 2002, 1, 209–213. [Google Scholar]
- Bohara, A.; Thakore, U.; Sanders, W.H. Intrusion detection in enterprise systems by combining and clustering diverse monitor data. In Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, 19–21 April 2016; pp. 7–16. [Google Scholar] [CrossRef]
- Gan, J.; Tao, Y. DBSCAN Revisited. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; Volume 2015, pp. 519–530. [Google Scholar] [CrossRef]
- Schölkopf, B.; Platt, J.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
- Mukhopadhyay, I.; Chakraborty, M. EMID: A Novel Expectation Maximization based Intrusion Detection Algorithm. In Proceedings of the IEMCON 2011, Kolkata, India, 5–6 January 2011; pp. 500–505. [Google Scholar]
- Ran, J.; Ji, Y.; Tang, B. A Semi-Supervised Learning Approach to IEEE 802.11 Network Anomaly Detection. In Proceedings of the 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia, 28 April–2 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Salo, F.; Nassif, A.B.; Essex, A. Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput. Netw. 2019, 148, 164–175. [Google Scholar] [CrossRef]
- Bergman, D.L. Symmetry Constrained Machine Learning. Adv. Intell. Syst. Comput. 2019, 1038, 501–512. [Google Scholar] [CrossRef] [Green Version]
- Umar, M.A.; Zhanfang, C.; Liu, Y. Network Intrusion Detection Using Wrapper-based Decision Tree for Feature Selection. In Proceedings of the 2020 International Conference on Internet Computing for Science and Engineering, Malé, Maldives, 14–16 January 2020; pp. 5–13. [Google Scholar] [CrossRef]
- Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L. Feature Extraction Foundations; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–8. [Google Scholar]
- Saba, T.; Sadad, T.; Rehman, A.; Mehmood, Z.; Javaid, Q. Intrusion Detection System Through Advance Machine Learning for the Internet of Things Networks. IT Prof. 2021, 23, 58–64. [Google Scholar] [CrossRef]
- Verma, A.; Ranga, V. Machine Learning Based Intrusion Detection Systems for IoT Applications. Wirel. Pers. Commun. 2020, 111, 2287–2310. [Google Scholar] [CrossRef]
- Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Catal, C.; Nangir, M. A sentiment classification model based on multiple classifiers. Appl. Soft Comput. 2017, 50, 135–141. [Google Scholar] [CrossRef]
- Belouch, M.; Idhammad, M.; El, S. A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 389–394. [Google Scholar] [CrossRef] [Green Version]
- Golrang, A.; Golrang, A.M.; Yayilgan, S.Y.; Elezaj, O. A Novel Hybrid IDS Based on Modified NSGAII-ANN and Random Forest. Electronics 2020, 9, 577. [Google Scholar] [CrossRef] [Green Version]
Category | Training Set | Testing Set |
---|---|---|
Normal | 536,937 | 453,877 |
Botnet | 1180 | 324 |
DDoS | 34,880 | 25,597 |
DoS | 60,765 | 50,317 |
FTP_Patator | 4763 | 1585 |
Probe | 36,176 | 31,753 |
SSH_Patator | 3538 | 1177 |
Web Attack | 4320 | 423 |
Total Number of Records | 682,559 | 565,053 |
Category | Training Set | Testing Set |
---|---|---|
Normal | 56,000 | 37,000 |
Attacks | 119,341 | 45,332 |
Total Number of Records | 175,341 | 82,332 |
Category | Training Set | Testing Set |
---|---|---|
Normal | 67,343 | 45,000 |
Attack | 80,046 | 2600 |
Total Number of Records | 125,973 | 47,600 |
01 Discovering correlation of feature subsets for classification. |
02 Selecting feature subsets with a stronger correlation. |
03 Compute the value of fitness for chosen feature subsets. |
04 Return the feature subsets with the most elevated fitness score. |
05 If two feature subsets have a similar value of fitness score, return a subset that has the lowest features count. |
06 Applying the proposed Ensemble system (KODE) on the chosen feature subsets to categorize or classify the attacks. |
NSL KDD | CIC-IDS 2017 | UNSW-NB15 | |||
---|---|---|---|---|---|
No | Feature Name | No | Feature Name | No | Feature Name |
f_4 | flag | f_2 | Bwd.Packet.Length.Min | f_2 | dur |
f_5 | Src_ bytes | f_3 | Fwd.Packet.Length.Min | f_3 | xport |
f_6 | Dst_bytes | f_6 | Total.Length.of.Bwd.Packets | f_4 | xserv |
f_15 | Min.Packet.Length | f_15 | Flow Bytes/s | f_23 | dwin |
f_17 | radiotap.channel.type.cc | f_17 | Flow IAT Max | f_25 | tcprtt |
f_26 | srv_serror-rate | f_21 | Subflow.Fwd.Bytes | f_27 | synack |
f_30 | diff_ srv_ rate | f_14 | Min.Packet.Length | f_28 | ackdat |
f_29 | same_sev_rate | f_12 | Bwd.Packet.Length.Std | f_30 | trans_dept |
f_13 | Bwd.Packet.s | f_31 | resp_body_len35ct_srv_src | ||
f_30 | dest_host_srv_diff_host_rate | f_36 | ct_state_ttl | ||
f_39 | dest_host_srv_serror_rate | f_43 | attack_cat | ||
f_45 | Down/Up Ratio | ||||
f_59 | Idle Max |
Class | Actual | Predicted | Description |
---|---|---|---|
True Negative | (+) | (+) | An instance classified as a legitimate network traffic. |
False Positives | (+) | (-) | A legitimate network traffic flagged as an attack, an existing massive challenge in anomaly IDS. |
False Negative | (-) | (+) | Unfortunate cases of classifying a malicious network traffic as a legitimate traffic, equally a huge challenge. |
True Positive | (-) | (-) | Malicious network traffic successfully flagged as an attack. |
Classifier | Accuracy | DR | FAR | Precision | F-Measure | Building (S) | Testing (S) |
---|---|---|---|---|---|---|---|
K-mean | 95.34 | 0.996 | 0.133 | 0.913 | 0.952 | 12 | 531.6 |
OneClass SVM | 86.03 | 0.86 | 0.089 | 0.804 | 0.832 | 128.4 | 1656.6 |
DBSCAN | 82.27 | 0.827 | 0.128 | 0.827 | 0.823 | 235.2 | 7.8 |
EM | 61.06 | 0.62 | 0.61 | 0.86 | 0.74 | 65.4 | 671.4 |
KODE | 92.03 | 0.9 | 0.09 | 0.902 | 0.903 | 441.6 | 2140.2 |
(A). Performance Analysis for NSL-KDD with Eight (8) Selected Features on the Test Dataset | |||||||
K-mean | 99.72 | 0.997 | 0.011 | 0.992 | 0.992 | 154.2 | 213.6 |
One-Class SVM | 98.82 | 0.988 | 0.012 | 0.992 | 0.99 | 3 | 21 |
DBSCAN | 98.66 | 0.986 | 0.014 | 0.986 | 0.985 | 79.2 | 9 |
EM | 71.03 | 0.71 | 0.012 | 0.714 | 0.78 | 54 | 212.4 |
KODE | 99.73 | 0.999 | 0.01 | 0.992 | 0.993 | 120 | 208.8 |
Classifier | Accuracy | DR | FAR | Precision | F-Measure | Building (S) | Testing (S) |
---|---|---|---|---|---|---|---|
K-mean | 97.89 | 0.978 | 0.129 | 0.907 | 0.971 | 394.2 | 3669 |
One-Class SVM | 96.23 | 0.809 | 0.049 | 0.956 | 0.947 | 32.4 | 4880.4 |
DBSCAN | 81.24 | 0.824 | 0.106 | 0.803 | 0.812 | 185.4 | 25.8 |
EM | 79.21 | 0.78 | 0.102 | 0.789 | 0.792 | 125.4 | 24.6 |
KODE | 89.15 | 0.891 | 0.012 | 0.908 | 0.889 | 217.2 | 4958.4 |
(A). Performance Evaluations for CIC-IDS2017 with Thirteen (13) Selected Features on the Test Dataset | |||||||
K-mean | 99.72 | 0.997 | 0.011 | 0.992 | 0.992 | 154.2 | 213.6 |
One-Class SVM | 98.92 | 0.989 | 0.011 | 0.982 | 0.99 | 3 | 21 |
DBSCAN | 97.76 | 0.977 | 0.012 | 0.986 | 0.985 | 79.2 | 9 |
EM | 95.32 | 0.952 | 0.013 | 0.96 | 0.949 | 87.6 | 10.2 |
KODE | 99.99 | 0.997 | 0.011 | 0.992 | 0.993 | 120 | 208.8 |
Classifier | Accuracy | DR | FAR | Precision | F-Measure | Building (S) | Testing (S) |
---|---|---|---|---|---|---|---|
K-mean | 96.79 | 0.967 | 0.121 | 0.965 | 0.967 | 94.2 | 189 |
One-Class SVM | 96.23 | 0.962 | 0.051 | 0.962 | 0.962 | 30.6 | 307.2 |
DBSCAN | 81.22 | 0.812 | 0.103 | 0.812 | 0.812 | 48.72 | 72.6 |
EM | 84.56 | 0.85 | 0.012 | 0.845 | 0.841 | 50.7 | 79.2 |
KODE | 89.85 | 0.898 | 0.011 | 0.898 | 0.898 | 217.2 | 192 |
(A). Performance comparisons for UNSW-NB15 with eleven (11) selected features on the test dataset | |||||||
K-mean | 99.92 | 0.992 | 0.08 | 0.992 | 0.992 | 155.4 | 210.6 |
One-Class SVM | 97.92 | 0.972 | 0.07 | 0.979 | 0.97 | 0.6 | 23.4 |
DBSCAN | 98.76 | 0.987 | 0.011 | 0.987 | 0.985 | 78.6 | 129 |
EM | 96.34 | 0.963 | 0.012 | 0.963 | 0.963 | 73.8 | 122.4 |
KODE | 99.99 | 0.99 | 0.01 | 0.99 | 0.99 | 120 | 204.6 |
Attack Type | Average of Probabilities | Majority Voting | Product of Probability | Minimum Probability | Maximum Probability |
---|---|---|---|---|---|
Normal | 99.91 | 99.89 | 97.19 | 97.56 | 98.45 |
Dos | 99.09 | 99.78 | 99.01 | 99.31 | 99.01 |
PROBE | 99.58 | 97.57 | 96.45 | 96.13 | 97.32 |
R2L | 96.57 | 96.67 | 95.60 | 90.51 | 90.56 |
U2R | 67.90 | 72.34 | 54.10 | 52.68 | 50.92 |
Attack Type | Average of Probabilities | Majority Voting | Product of Probability | Minimum Probability | Maximum Probability |
---|---|---|---|---|---|
Benign | 99.97 | 99.89 | 97.19 | 95.09 | 94.45 |
DoS | 99.20 | 98.79 | 99.01 | 97.09 | 96.01 |
Fuzzers | 98.94 | 97.89 | 96.15 | 94.07 | 93.32 |
Analysis | 97.30 | 96.67 | 95.30 | 90.51 | 90.32 |
Backdoor | 99.34 | 98.01 | 97.54 | 95.43 | 94.92 |
Generic | 89.34 | 87.89 | 87.19 | 85.50 | 80.32 |
Shellcode | 82.83 | 82.34 | 80.23 | 78.12 | 75.65 |
Worms | 97.09 | 96.78 | 94.90 | 92.13 | 89.90 |
Reconnaissance | 78.09 | 77.23 | 75.23 | 74.21 | 71.65 |
Exploits | 69.12 | 68.67 | 54.56 | 50.32 | 48.43 |
Attack Type | Average of Probabilities | Majority Voting | Product of Probability | Minimum Probability | Maximum Probability |
---|---|---|---|---|---|
Benign | 99.96 | 99.67 | 98.10 | 96.90 | 95.68 |
Botnet | 99.15 | 98.34 | 96.98 | 95.90 | 93.41 |
DDoS | 99.99 | 99.04 | 97.94 | 96.07 | 95.33 |
DoS | 98.89 | 96.89 | 95.32 | 94.53 | 92.90 |
FTP_Patator | 99.09 | 97.89 | 97.54 | 96.42 | 94.92 |
Probe | 89.34 | 87.85 | 86.78 | 85.55 | 83.32 |
SSH_Patator | 81.85 | 79.86 | 78.69 | 76.16 | 75.56 |
Web Attack | 99.97 | 97.73 | 94.89 | 92.56 | 90.90 |
IDS Models | Utilized Dataset | Feature Selection | Base Classifier Used | FAR (%) | ACC (%) | DR (%) |
---|---|---|---|---|---|---|
[20] | CIC-IDS2018 | Spearman’s rank correlation | LR, DT, and GB | N/A | 98.8 | N/A |
[84] | NSL-KDD | Information Gain | RepTree | N/A | 89.85 | N/A |
[84] | UNSW-NB15 | Information Gain | RepTree | N/A | 88.95 | N/A |
[60] | NSL-KDD | IG-Filters | Voting (Random Forest, and PART) | 0.01 | 86.697 | NA |
[50] | UNSW-NB15 and KDD99 | Wrapper (GA-LR) | C4.5, NBTree, and Random Forest Algorithm | 0.105 | 99.90 | 99.81 |
[78] | UNSW-NB15 | DT-based | ANN, SVM, KNN, RF and NB | 27.73 | 86.41 | 97.95 |
[34] | NSL-KDD | Manually selected | Bagging(REPTree) | 0.148 | 81.2988 | N/A |
[85] | NSL-KDD | NSGAII-ANN | Random Forest | 6.00 | 99.4 | N/A |
[85] | UNSW-NB15 | NSGAII-ANN | Random Forest | 6.00 | 94.8 | N/A |
Proposed (KODE) | CIC-IDS2017 | HFS | Voting (K-means, One-Class SVM, DBSCAN, EM) | 0.09 | 99.99 | 99.75 |
Proposed (KODE) | NSL-KDD | HFS | Voting (K-means, One-Class SVM, DBSCAN, EM) | 0.16 | 99.73 | 96.64 |
Approach (KODE) | UNSW-NB15 | HFS | Voting (K-means, One-Class SVM, DBSCAN, EM) | 0.11 | 99.997 | 99.93 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jaw, E.; Wang, X. Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach. Symmetry 2021, 13, 1764. https://doi.org/10.3390/sym13101764
Jaw E, Wang X. Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach. Symmetry. 2021; 13(10):1764. https://doi.org/10.3390/sym13101764
Chicago/Turabian StyleJaw, Ebrima, and Xueming Wang. 2021. "Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach" Symmetry 13, no. 10: 1764. https://doi.org/10.3390/sym13101764
APA StyleJaw, E., & Wang, X. (2021). Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach. Symmetry, 13(10), 1764. https://doi.org/10.3390/sym13101764