Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automated Feature Selection for Anomaly Detection in Network Traffic Data

Published: 21 June 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Variable selection (also known as feature selection) is essential to optimize the learning complexity by prioritizing features, particularly for a massive, high-dimensional dataset like network traffic data. In reality, however, it is not an easy task to effectively perform the feature selection despite the availability of the existing selection techniques. From our initial experiments, we observed that the existing selection techniques produce different sets of features even under the same condition (e.g., a static size for the resulted set). In addition, individual selection techniques perform inconsistently, sometimes showing better performance but sometimes worse than others, thereby simply relying on one of them would be risky for building models using the selected features. More critically, it is demanding to automate the selection process, since it requires laborious efforts with intensive analysis by a group of experts otherwise. In this article, we explore challenges in the automated feature selection with the application of network anomaly detection. We first present our ensemble approach that benefits from the existing feature selection techniques by incorporating them, and one of the proposed ensemble techniques based on greedy search works highly consistently showing comparable results to the existing techniques. We also address the problem of when to stop to finalize the feature elimination process and present a set of methods designed to determine the number of features for the reduced feature set. Our experimental results conducted with two recent network datasets show that the identified feature sets by the presented ensemble and stopping methods consistently yield comparable performance with a smaller number of features to conventional selection techniques.

    References

    [1]
    Evangelos E. Papalexakis, Alex Beutel, and Peter Steenkiste. 2012. Network anomaly detection using co-clustering. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 403–410.
    [2]
    Jinoh Kim, Alex Sim, Brian Tierney, Sang Suh, and Ikkyun Kim. 2019. Multivariate network traffic analysis using clustered patterns. Computing 101, 4 (2019), 339–361.
    [3]
    Sunhee Baek, Donghwoon Kwon, Jinoh Kim, Sang C. Suh, Hyunjoo Kim, and Ikkyun Kim. 2017. Unsupervised labeling for supervised anomaly detection in enterprise and cloud networks. In Proceedings of the 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud’17). IEEE, 205–210.
    [4]
    Mohiuddin Ahmed, Abdun Naser Mahmood, and Jiankun Hu. 2016. A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60 (2016), 19–31.
    [5]
    Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C. Suh, Ikkyun Kim, and Kuinam J. Kim. 2019. A survey of deep learning-based network anomaly detection. Cluster Comput. 22, Suppl 1 (2019), 949–961.
    [6]
    Anna L. Buczak and Erhan Guven. 2015. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutor. 18, 2 (2015), 1153–1176.
    [7]
    Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1–6.
    [8]
    Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP’18). 108–116.
    [9]
    Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Comput. Surv. 50, 6 (2017), 1–45.
    [10]
    Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. ACM Comput. Surv. 50, 6 (2018), 94.
    [11]
    Salem Alelyani, Jiliang Tang, and Huan Liu. 2018. Feature selection for clustering: A review. In Data Clustering. Chapman & Hall/CRC, 29–60.
    [12]
    Qi-Hai Zhu and Yu-Bin Yang. 2018. Discriminative embedded unsupervised feature selection. Pattern Recogn. Lett. 112 (2018), 219–225.
    [13]
    Ahmed A. Ewees, Mohamed Abd El Aziz, and Aboul Ella Hassanien. 2019. Chaotic multi-verse optimizer-based feature selection. Neural Comput. Appl. 31, 4 (2019), 991–1006.
    [14]
    Fernando Jiménez, Carlos Martínez, Enrico Marzano, Jose Tomas Palma, Gracia Sánchez, and Guido Sciavicco. 2019. Multiobjective evolutionary feature selection for fuzzy classification. IEEE Trans. Fuzzy Syst. 27, 5 (2019), 1085–1099.
    [15]
    Yonghua Zhu, Xuejun Zhang, Rongyao Hu, and Guoqiu Wen. 2018. Adaptive structure learning for low-rank supervised feature selection. Pattern Recogn. Lett. 109 (2018), 89–96.
    [16]
    Esma Nur Cinicioglu and Taylan Yenilmez. 2016. Determination of variables for a Bayesian network and the most precious one. In Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, 313–325.
    [17]
    Félix Iglesias and Tanja Zseby. 2015. Analysis of network traffic features for anomaly detection. Mach. Learn. 101, 1--3 (2015), 59–84.
    [18]
    Ishfaq Manzoor, Neeraj Kumar, et al. 2017. A feature reduced intrusion detection system using ANN classifier. Expert Syst. Appl. 88 (2017), 249–257.
    [19]
    T. H. Divyasree and K. K. Sherly. 2018. A network intrusion detection system based on ensemble CVM using efficient feature selection approach. Proc. Comput. Sci. 143 (2018), 442–449.
    [20]
    Tharmini Janarthanan and Shahrzad Zargari. 2017. Feature selection in UNSW-NB15 and KDDCUP’99 datasets. In Proceedings of the 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE’17). IEEE, 1881–1886.
    [21]
    Chaouki Khammassi and Saoussen Krichen. 2017. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput. Secur. 70 (2017), 255–277.
    [22]
    Gayatri V. Patil, K. Vinod Pachghare, and Deepak D. Kshirsagar. 2018. Feature reduction in flow based intrusion detection system. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT’18). IEEE, 1356–1362.
    [23]
    Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications (CISDA’09). 53–58.
    [24]
    Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, (Mar.2003), 1157–1182.
    [25]
    Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, and María Tombilla-Sanromán. 2007. Filter methods for feature selection–A comparative study. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, 178–187.
    [26]
    A. Jović, K. Brkić, and N. Bogunović. 2015. A review of feature selection methods with applications. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO'15). 1200--1205.
    [27]
    Yvan Saeys, Iñaki Inza, and Pedro Larrañaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507–2517.
    [28]
    Sebastian Raschuka. 2015. Python Machine Learning. Packet Publishing Ltd.
    [29]
    Xue-wen Chen and Jong Cheol Jeong. 2007. Enhanced recursive feature elimination. In Proceedings of the 6th International Conference on Machine Learning and Applications (ICMLA’07). IEEE, 429–435.
    [30]
    Oznur Tastan, Yanjun Qi, Jaime G. Carbonell, and Judith Klein-Seetharaman. 2009. Prediction of interactions between HIV-1 and human proteins by information integration. In Proceedings of the Annual Conference on Biocomputing. World Scientific, 516–527.
    [31]
    Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, 1–6.
    [32]
    Dalwinder Singh and Birmohan Singh. 2019. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. (2019), 105524.
    [33]
    Devansh Arpit and Yoshua Bengio. 2019. The benefits of over-parameterization at initialization in deep ReLU networks. arXiv:1901.03611. Retrieved from https://arxiv.org/abs/1901.03611.
    [34]
    Jonathan Wang, Wucherl Yoo, Alex Sim, Peter Nugent, and Kesheng Wu. 2017. Parallel variable selection for effective performance prediction. In Proceedings of the 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’17). IEEE, 208–217.
    [35]
    Alexandr Katrutsa and Vadim Strijov. 2017. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria. Expert Syst. Appl. 76 (2017), 1–11.
    [36]
    KDD Cup 1999 Data. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
    [37]
    Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE, 305–316.
    [38]
    Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explor. Newslett. 11, 1 (2009), 10–18.

    Cited By

    View all
    • (2024)Feature Engineering and Computer Vision for CybersecurityGlobal Perspectives on the Applications of Computer Vision in Cybersecurity10.4018/978-1-6684-8127-1.ch006(155-174)Online publication date: 23-Feb-2024
    • (2023)A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack DetectionInformation10.3390/info1403019114:3(191)Online publication date: 17-Mar-2023
    • (2023)The Opportunity in Difficulty: A Dynamic Privacy Budget Allocation Mechanism for Privacy-Preserving Multi-dimensional Data CollectionACM Transactions on Management Information Systems10.1145/356994414:1(1-24)Online publication date: 16-Jan-2023
    • Show More Cited By

    Index Terms

    1. Automated Feature Selection for Anomaly Detection in Network Traffic Data

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Management Information Systems
          ACM Transactions on Management Information Systems  Volume 12, Issue 3
          September 2021
          225 pages
          ISSN:2158-656X
          EISSN:2158-6578
          DOI:10.1145/3468067
          Issue’s Table of Contents
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 21 June 2021
          Accepted: 01 December 2020
          Revised: 01 December 2020
          Received: 01 June 2020
          Published in TMIS Volume 12, Issue 3

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Feature selection
          2. ensemble approach
          3. network anomaly detection
          4. cybersecurity analytics

          Qualifiers

          • Research-article
          • Refereed

          Funding Sources

          • U.S. Department of Energy (DOE)
          • Office of Science, Office of Advanced Scientific Computing Research
          • Institute for Information & communications Technology Promotion (IITP)
          • Korea government (MSIP)
          • National Energy Research Scientific Computing Center (NERSC)

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)118
          • Downloads (Last 6 weeks)7
          Reflects downloads up to 26 Jul 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Feature Engineering and Computer Vision for CybersecurityGlobal Perspectives on the Applications of Computer Vision in Cybersecurity10.4018/978-1-6684-8127-1.ch006(155-174)Online publication date: 23-Feb-2024
          • (2023)A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack DetectionInformation10.3390/info1403019114:3(191)Online publication date: 17-Mar-2023
          • (2023)The Opportunity in Difficulty: A Dynamic Privacy Budget Allocation Mechanism for Privacy-Preserving Multi-dimensional Data CollectionACM Transactions on Management Information Systems10.1145/356994414:1(1-24)Online publication date: 16-Jan-2023
          • (2023)Anomaly detection method of power purchase material data based on BIRCH clustering algorithm and time seriesThird International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023)10.1117/12.3006094(153)Online publication date: 10-Oct-2023
          • (2023)Enhancing Network Intrusion Detection: An Online Methodology for Performance Analysis2023 IEEE 9th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft57336.2023.10175465(510-515)Online publication date: 19-Jun-2023
          • (2023)MSTP Network Data Traffic Anomaly Optimization Detection Algorithm2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS)10.1109/ISCTIS58954.2023.10213019(32-35)Online publication date: 7-Jul-2023
          • (2022)Interactive Web-Based Visual Analysis on Network Traffic DataInformation10.3390/info1401001614:1(16)Online publication date: 28-Dec-2022
          • (2022)Feature Extraction of High-dimensional Data Based on J-HOSVD for Cyber-Physical-Social SystemsACM Transactions on Management Information Systems10.1145/348344813:3(1-21)Online publication date: 4-Feb-2022

          View Options

          Get Access

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media