Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3649476.3658725acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article
Open access

Comprehensive Analysis of Consistency and Robustness of Machine Learning Models in Malware Detection

Published: 12 June 2024 Publication History

Abstract

Cybersecurity in recent years has gained significant attention, especially with the deployment of millions of devices across the globe and increased threats targeted toward embedded systems. Many cyber threats have been detected and emerged in the last few years. Among multiple threats, malware attacks are considered to be prominent due to the impact on users and systems. Considering the evolving trend of such cyber threats, traditional statistical and heuristic threat detection approaches have observed the need to be more effective and efficient. Machine learning (ML)-based cyber-threat detection has been actively researched and adopted across academia and industry to address the challenges of evolving cyber threats. However, ML-based neural network techniques though efficient, are considered black boxes due to the lack of sufficient information that can be used to deduce their functionality. On the other hand, the interpretable and explainable AI/ML field focuses on the explainability and reason for the decisions performed by the ML models. In this paper, we experiment with different explainable AI (XAI) techniques for interpreting multiple malware detection models. Specifically, we analyze the consistency and reliability of these neural network models in determining an attack and benign functions. We provide quantitative analysis of multiple explanation methods across different datasets. When trained with the top feature attributes (10%-35% of whole data) generated by XAI methods, the ML classifiers (trained on High Performance Counters and Mimicus PDF malware datasets) retain a malware detection accuracy of 88%-92%. The ML classifiers are also compared with state-of-the-art models and the proposed technique (training with partial data features generated by explainable methods) produce comparable malware detection accuracy above 82%.

References

[1]
2021. Virustotal package. https://www.rdocumentation.org/packages/virustotal/versions/0.2.1
[2]
Jack Beerman and et al.2023. A Review of Colonial Pipeline Ransomware Attack. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops. IEEE.
[3]
Dipkamal Bhusal and Nidhi Rastogi. 2022. SoK: Modeling Explainability in Security Monitoring for Trust, Privacy, and Interpretability. arXiv preprint arXiv:2210.17376 (2022).
[4]
Francesco Bodria and et al.2021. Benchmarking and survey of explanation methods for black box models. (2021).
[5]
Daniel Gibert and et al.2019. Using convolutional neural networks for classification of malware represented as images. Journal of Computer Virology and Hacking Techniques (2019).
[6]
Wenbo Guo. 2018. LEMNA: Explaining Deep Learning based Security Applications.
[7]
Joseph Johnson. [n. d.]. Number of malware attacks per year 2020. https://www.statista.com/statistics/873097/malware-attacks-per-year-worldwide/
[8]
Sreenitha Kasarapu, Sanket Shukla, and Sai Manoj Pudukotai Dinakarrao. 2024. Resource- and Workload-aware Malware Detection through Distributed Computing in IoT Networks. In 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). 368–373. https://doi.org/10.1109/ASP-DAC58780.2024.10473814
[9]
Sreenitha Kasarapu, Sanket Shukla, Rakibul Hassan, Avesta Sasan, Houman Homayoun, and Sai Manoj PD. 2022. CAD-FSL: Code-Aware Data Generation based Few-Shot Learning for Efficient Malware Detection. (2022).
[10]
Antigoni Kruti and et al.2023. A review of SolarWinds attack on Orion platform using persistent threat agents and techniques for gaining unauthorized access.
[11]
Abraham Peedikayil Kuruvila, Shamik Kundu, and Kanad Basu. [n. d.]. Analyzing the Efficiency of Machine Learning Classifiers in Hardware-Based Malware Detectors. In 2020 IEEE Computer Society Annual Symposium on VLSI.
[12]
Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. (2017).
[13]
Maad Mijwil and et al.2023. Towards artificial intelligence-based cybersecurity: the practices and ChatGPT generated ways to combat cybercrime. Iraqi Journal For Computer Science and Mathematics 4 (2023).
[14]
Mehrnoosh Nobakht and et al.2022. DEMD-IoT: a deep ensemble model for IoT malware detection using CNNs and network traffic. (2022).
[15]
Nisarg Patel, Avesta Sasan, and Houman Homayoun. 2017. Analyzing hardware based malware detectors. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/3061639.3062202
[16]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. (2016).
[17]
Marko Robnik-Šikonja and et.al. 2018. Perturbation-based explanations of prediction models. Human and Machine Learning: Visible, Explainable, Trustworthy and Transparent (2018).
[18]
Hossein Sayadi, Hosein Mohammadi Makrani, Onkar Randive, Sai Manoj P.D., Setareh Rafatirad, and Houman Homayoun. 2018. Customized Machine Learning-Based Hardware-Assisted Malware Detection in Embedded Devices. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00251
[19]
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2019. Learning Important Features Through Propagating Activation Differences. (2019). arxiv:1704.02685 [cs.CV]
[20]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. (2014). arxiv:1312.6034 [cs.CV]
[21]
Charles Smutz and Angelos Stavrou. 2012. Malicious PDF detection using metadata and structural features. ACM International Conference Proceeding Series.
[22]
Mukund Sundararajan and et al.2017. Axiomatic Attribution for Deep Networks. arxiv:1703.01365 [cs.LG]
[23]
Luca Viganò and Daniele Magazzeni. 2018. Explainable Security. (2018). arxiv:1807.04178 [cs.CR]
[24]
Alexander Warnecke and et al.2020. Evaluating Explanation Methods for Deep Learning in Security. (2020).
[25]
Matthew Zeiler and Rob Fergus. 2013. Visualizing and Understanding Convolutional Neural Networks. ECCV 2014, Part I, LNCS 8689.
[26]
Zhibo Zhang and et al.2022. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. IEEE Access (2022).

Index Terms

  1. Comprehensive Analysis of Consistency and Robustness of Machine Learning Models in Malware Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024
    June 2024
    797 pages
    ISBN:9798400706059
    DOI:10.1145/3649476
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2024

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    GLSVLSI '24
    Sponsor:
    GLSVLSI '24: Great Lakes Symposium on VLSI 2024
    June 12 - 14, 2024
    FL, Clearwater, USA

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,156 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 207
      Total Downloads
    • Downloads (Last 12 months)207
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media