Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474369.3486864acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

INSOMNIA: Towards Concept-Drift Robustness in Network Intrusion Detection

Published: 15 November 2021 Publication History

Abstract

Despite decades of research in network traffic analysis and incredible advances in artificial intelligence, network intrusion detection systems based on machine learning (ML) have yet to prove their worth. One core obstacle is the existence of concept drift, an issue for all adversary-facing security systems. Additionally, specific challenges set intrusion detection apart from other ML-based security tasks, such as malware detection.
In this work, we offer a new perspective on these challenges. We propose INSOMNIA, a semi-supervised intrusion detector which continuously updates the underlying ML model as network traffic characteristics are affected by concept drift. We use active learning to reduce latency in the model updates, label estimation to reduce labeling overhead, and apply explainable AI to better interpret how the model reacts to the shifting distribution.
To evaluate INSOMNIA, we extend TESSERACT - a framework originally proposed for performing sound time-aware evaluations of ML-based malware detectors - to the network intrusion domain. Our evaluation shows that accounting for drifting scenarios is vital for effective intrusion detection systems.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat,..., and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/
[2]
Charu C. Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, and Philip S. Yu. 2014. Active Learning: A Survey. In Data Classification: Algorithms and Applications.
[3]
Giuseppina Andresini, Annalisa Appice, and Donato Malerba. 2021 a. Autoencoder-based deep metric learning for network intrusion detection. Information Sciences, Vol. 569 (2021).
[4]
Giuseppina Andresini, Annalisa Appice, and Donato Malerba. 2021 b. Nearest cluster-based intrusion detection through convolutional neural networks. Knowledge-Based Systems, Vol. 216 (2021).
[5]
Giuseppina Andresini, Annalisa Appice, Nicola Di Mauro, Corrado Loglisci, and Donato Malerba. 2020. Multi-Channel Deep Feature Learning for Intrusion Detection. IEEE Access, Vol. 8 (2020).
[6]
Giuseppina Andresini, Annalisa Appice, Luca De Rose, and Donato Malerba. 2021 c. GAN augmentation to deal with imbalance in imaging-based intrusion detection. Future Generation Computer Systems, Vol. 123 (2021).
[7]
Annalisa Appice, Corrado Loglisci, and Donato Malerba. 2018. Active learning via collective inference in network regression problems. Information Sciences (2018).
[8]
Giovanni Apruzzese, Fabio Pierazzi, Michele Colajanni, and Mirco Marchetti. 2017. Detection and threat prioritization of pivoting attacks in large networks. IEEE Transactions on Emerging Topics in Computing, Vol. 8, 2 (2017).
[9]
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. 2022. Dos and Don'ts of Machine Learning in Computer Security. In Proc. of the USENIX Security Symposium.
[10]
Stefan Axelsson. 2000. The Base-Rate Fallacy and the Difficulty of Intrusion Detection. ACM Transactions on Information and System Security (TISSEC) (2000).
[11]
Hubert Baniecki, Wojciech Kretowicz, Piotr Piatyszek, Jakub Wisniewski, and Przemyslaw Biecek. 2020. dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python. arXiv:2012.14406 (2020). https://github.com/ModelOriented/DALEX/
[12]
Federico Barbero, Feargus Pendlebury, Fabio Pierazzi, and Lorenzo Cavallaro. 2020. Transcending Transcend: Revisiting Malware Classification with Conformal Evaluation. CoRR, Vol. abs/2010.03856 (2020).
[13]
James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems (NeurIPS).
[14]
James Bergstra, Daniel Yamins, and David D. Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proc. of the International Conference on Machine Learning (ICML).
[15]
Przemyslaw Biecek. 2018. Dalex: Explainers for complex predictive models in R. Journal of Machine Learning Research, Vol. 19 (11 2018).
[16]
Avrim Blum and Tom M. Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-Training. In Proc. of the ACM Conference on Learning Theory (COLT).
[17]
Leo Breiman. 2001. Random Forests. Machine Learning (2001).
[18]
Anna L. Buczak and Erhan Guven. 2016. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Communications Surveys & Tutorials (2016).
[19]
Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-End Incremental Learning. In Proc. of the European Conference on Computer Vision (ECCV) (Lecture Notes in Computer Science). Springer.
[20]
B. B. Chaudhuri. 1996. A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognition Letters (1996).
[21]
François Chollet et al. 2015. Keras. https://keras.io.
[22]
Bruno L. Dalmazo, Jo ao P. Vilela, and Marilia Curado. 2017. Performance Analysis of Network Traffic Predictors in the Cloud. Journal of Network and Systems Management (2017).
[23]
Abebe Abeshu Diro and Naveen Chilamkurti. 2018. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Generation Computer Systems (2018).
[24]
Gerard Draper-Gil, Arash Habibi Lashkari, Mohammad Saiful Islam Mamun, and Ali A. Ghorbani. 2016. Characterization of Encrypted and VPN Traffic using Time-related Features. In Proc. of the International Conference on Information Systems Security and Privacy (ICISSP). SciTePress.
[25]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proc. of the ACM Conference on Computer and Communications Security (CCS).
[26]
Gints Engelen, Vera Rimmer, and Wouter Joosen. 2021. Troubleshooting an Intrusion Detection Dataset: the CICIDS2017 Case Study. In IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).
[27]
Aaron Fisher, Cynthia Rudin, and Francesca Dominici. 2019. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research (JMLR) (2019).
[28]
Prahlad Fogla, Monirul I. Sharif, Roberto Perdisci, Oleg M. Kolesnikov, and Wenke Lee. 2006. Polymorphic Blending Attacks. In Proc. of the USENIX Security Symposium.
[29]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR.
[30]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep Sparse Rectifier Neural Networks. In International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR.
[31]
Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. 2014. An empirical investigation of catastrophic forgeting in gradient based neural networks. In Proc. of the International Conference on Learning Representations (ICLR).
[32]
Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised Learning by Entropy Minimization. In Advances in Neural Information Processing Systems (NeurIPS).
[33]
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. 2008. Botminer: Clustering analysis of network traffic for protocol-and structure-independent botnet detection. (2008).
[34]
Guofei Gu, Phillip A Porras, Vinod Yegneswaran, Martin W Fong, and Wenke Lee. 2007. Bothunter: Detecting malware infection through ids-driven dialog correlation. In USENIX Security Symposium, Vol. 7.
[35]
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. Comput. Surveys (2018).
[36]
Roberto Jordaney, Kumar Sharad, Santanu Kumar Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend: Detecting Concept Drift in Malware Classification Models. In Proc. of the USENIX Security Symposium.
[37]
Zeliang Kan, Feargus Pendlebury, Fabio Pierazzi, and Lorenzo Cavallaro. 2021. Investigating Labelless Drift Adaptation for Malware Detection. In Proc. of the ACM Workshop on Artificial Intelligence and Security (AISec).
[38]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proc. of the International Conference on Learning Representations (ICLR).
[39]
Arash Habibi Lashkari, Gerard Draper-Gil, Mohammad Saiful Islam Mamun, and Ali A. Ghorbani. 2017. Characterization of Tor Traffic using Time based Features. In Proc. of the International Conference on Information Systems Security and Privacy (ICISSP). SciTePress.
[40]
David D. Lewis and William A. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In Proc. of the International ACM Conference of the Special Interest Group on Information Retrieval (SIGIR).
[41]
Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, and Kuobin Dai. 2012. An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Systems With Applications (2012).
[42]
Jie Lu, Anjin Liu, Fan Dong, Feng Gu, Jo a o Gama, and Guangquan Zhang. 2019. Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2019).
[43]
Mirco Marchetti, Fabio Pierazzi, Michele Colajanni, and Alessandro Guido. 2016. Analysis of high volumes of network traffic for advanced persistent threat detection. Computer Networks (2016).
[44]
Diego Marrón, Eduard Ayguadé, José R. Herrero, Jesse Read, and Albert Bifet. 2017. Low-latency multi-threaded ensemble learning for dynamic big data streams. In Proc. of the IEEE International Conference on Big Data (Big Data).
[45]
McAfee Labs. 2016. McAfee Labs Threats Report, December 2016. https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-dec-2016.pdf.
[46]
Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu, George Yiu, Anthony D. Joseph, and J. D. Tygar. 2016. Reviewer Integration and Performance Measurement for Malware Detection. In Proc. of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA).
[47]
Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. 2018. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Proc. of the Network and Distributed System Security Symposium (NDSS).
[48]
Marwa R. Mohamed, Abdurrahman A. Nasr, Ibrahim F. Tarrad, and Mohamed Z. Abdulmageed. 2019. Exploiting Incremental Classifiers for the Training of an Adaptive Intrusion Detection Model. Int. Journal of Network Security (2019).
[49]
Jose G. Moreno-Torres, Troy Raeder, Rocío Alaíz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recognition (2012).
[50]
Fakhroddin Noorbehbahani, Ali Fanian, Sayyed Rasoul Mousavi, and Homa Hasannejad. 2017. An incremental intrusion detection system using a new semi-supervised stream classification method. International Journal of Communication Systems (2017).
[51]
S. J. Pan and Q. Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2010).
[52]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vander Plas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (JMLR) (2011).
[53]
Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2018. Enabling Fair ML Evaluations for Security. In Proc. of the ACM Conference on Computer and Communications Security (CCS) (posters).
[54]
Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time. In Proc. of the USENIX Security Symposium.
[55]
Deboleena Roy, Priyadarshini Panda, and Kaushik Roy. 2020. Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning. Neural Networks (2020).
[56]
Burr Settles. 2012. Active Learning Literature Survey. Synthesis Lectures on Artificial Intelligence and Machine Learning (2012).
[57]
Amin Shahraki, Mahmoud Abbasi, Amir Taherkordi, and Anca Delia Jurcut. 2021. Active Learning for Network Traffic Classification: A Technical Survey. arxiv: 2106.06933 [cs.NI]
[58]
Iman Sharafaldin, Arash Habibi Lashkari, and Ali Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proc. of the International Conference on Information Systems Security and Privacy (ICISSP). SciTePress.
[59]
Yun Shen, Enrico Mariconti, Pierre-Antoine Vervier, and Gianluca Stringhini. 2018. Tiresias: Predicting Security Events Through Deep Learning. In Proc. of the ACM Conference on Computer and Communications Security (CCS).
[60]
Nathan Shone, Nguyen Ngoc Tran, Vu Dinh Phai, and Qi Shi. 2018. A Deep Learning Approach to Network Intrusion Detection. IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) (2018).
[61]
Xiaokui Shu, Danfeng Yao, and Naren Ramakrishnan. 2015. Unearthing Stealthy Program Attacks Buried in Extremely Long Execution Paths. In Proc. of the ACM Conference on Computer and Communications Security (CCS).
[62]
Robin Sommer and Vern Paxson. 2010. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In Proc. of the IEEE Symposium on Security and Privacy (S&P).
[63]
Sona Taheri, Adil M. Bagirov, Iqbal Gondal, and Simon Brown. 2020. Cyberattack triage using incremental clustering for intrusion detection systems. International Journal of Information Security (2020).
[64]
Kimberly Tam, Ali Feizollah, Nor Badrul Anuar, Rosli Salleh, and Lorenzo Cavallaro. 2017. The Evolution of Android Malware and Android Analysis Techniques. Comput. Surveys (2017).
[65]
Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A Survey on Deep Transfer Learning. In Proc. of the International Conference on Artificial Neural Networks and Machine Learning (ICANN).
[66]
R. Tibshirani, Trevor Hastie, B. Narasimhan, and Gilbert Chu. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. of the National Academy of Sciences (PNAS) (2002).
[67]
Ke Wang, Gabriela Cretu, and Salvatore J Stolfo. 2005. Anomalous payload-based worm detection and signature generation. In International Workshop on Recent Advances in Intrusion Detection. Springer.
[68]
Alexander Warnecke, Daniel Arp, Christian Wressnegger, and Konrad Rieck. 2020. Evaluating explanation methods for deep learning in security. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P).
[69]
Ning Xie, Gabrielle Ras, Marcel van Gerven, and Derek Doran. 2020. Explainable deep learning: A field guide for the uninitiated. arXiv preprint arXiv:2004.14545 (2020).
[70]
Ke Xu, Yingjiu Li, Robert H. Deng, Kai Chen, and Jiayun Xu. 2019. DroidEvolver: Self-Evolving Android Malware Detection System. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P).
[71]
Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ahmadzadehand, Xinyu Xing, and Gang Wang. 2021. CADE: Detecting and Explaining Concept Drift Samples for Security Applications. In Proc. of the USENIX Security Symposium.
[72]
Yazhou Yang and Marco Loog. 2019. Single shot active learning using pseudo annotators. Pattern Recognition (2019).
[73]
Chunlin Zhang, Ju Jiang, and Mohamed S. Kamel. 2005. Intrusion detection using hierarchical neural networks. Pattern Recognition Letters (2005).
[74]
Xiaojin Jerry Zhu. 2005. Semi-supervised learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

Cited By

View all
  • (2024)An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT NetworksBig Data Mining and Analytics10.26599/BDMA.2023.90200277:2(500-511)Online publication date: Jun-2024
  • (2024)QARF: A Novel Malicious Traffic Detection Approach via Online Active Learning for Evolving Traffic StreamsChinese Journal of Electronics10.23919/cje.2022.00.36033:3(645-656)Online publication date: May-2024
  • (2024)Mateen: Adaptive Ensemble Learning for Network Anomaly DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678901(215-234)Online publication date: 30-Sep-2024
  • Show More Cited By

Index Terms

  1. INSOMNIA: Towards Concept-Drift Robustness in Network Intrusion Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security
      November 2021
      210 pages
      ISBN:9781450386579
      DOI:10.1145/3474369
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 November 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. machine learning
      2. network security

      Qualifiers

      • Research-article

      Funding Sources

      • EPSRC

      Conference

      CCS '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 94 of 231 submissions, 41%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)376
      • Downloads (Last 6 weeks)68
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT NetworksBig Data Mining and Analytics10.26599/BDMA.2023.90200277:2(500-511)Online publication date: Jun-2024
      • (2024)QARF: A Novel Malicious Traffic Detection Approach via Online Active Learning for Evolving Traffic StreamsChinese Journal of Electronics10.23919/cje.2022.00.36033:3(645-656)Online publication date: May-2024
      • (2024)Mateen: Adaptive Ensemble Learning for Network Anomaly DetectionProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678901(215-234)Online publication date: 30-Sep-2024
      • (2024)SoK: Federated Learning based Network Intrusion Detection in 5G: Context, State of the Art and ChallengesProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3664500(1-13)Online publication date: 30-Jul-2024
      • (2024)Fast Learning Enabled by In-Network Drift DetectionProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663427(129-134)Online publication date: 3-Aug-2024
      • (2024)ReCDA: Concept Drift Adaptation with Representation Enhancement for Network Intrusion DetectionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672007(3818-3828)Online publication date: 25-Aug-2024
      • (2024)Dealing with Drift of Adaptation Spaces in Learning-based Self-Adaptive Systems Using Lifelong Self-AdaptationACM Transactions on Autonomous and Adaptive Systems10.1145/363642819:1(1-57)Online publication date: 14-Feb-2024
      • (2024)Practical Cyber Attack Detection With Continuous Temporal Graph in Dynamic Network SystemIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338532119(4851-4864)Online publication date: 2024
      • (2024)SPIDER: A Semi-Supervised Continual Learning-based Network Intrusion Detection SystemIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621428(571-580)Online publication date: 20-May-2024
      • (2024)Bad Design Smells in Benchmark NIDS Datasets2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP60621.2024.00042(658-675)Online publication date: 8-Jul-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media