research-article

Open access

Comprehensive Analysis of Consistency and Robustness of Machine Learning Models in Malware Detection

Authors:

Sreenitha Kasarapu,

Dipkamal Bhusal,

Sai Manoj Pudukotai DinakarraoAuthors Info & Claims

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024

Pages 477 - 482

https://doi.org/10.1145/3649476.3658725

Published: 12 June 2024 Publication History

All formats PDF

Abstract

Cybersecurity in recent years has gained significant attention, especially with the deployment of millions of devices across the globe and increased threats targeted toward embedded systems. Many cyber threats have been detected and emerged in the last few years. Among multiple threats, malware attacks are considered to be prominent due to the impact on users and systems. Considering the evolving trend of such cyber threats, traditional statistical and heuristic threat detection approaches have observed the need to be more effective and efficient. Machine learning (ML)-based cyber-threat detection has been actively researched and adopted across academia and industry to address the challenges of evolving cyber threats. However, ML-based neural network techniques though efficient, are considered black boxes due to the lack of sufficient information that can be used to deduce their functionality. On the other hand, the interpretable and explainable AI/ML field focuses on the explainability and reason for the decisions performed by the ML models. In this paper, we experiment with different explainable AI (XAI) techniques for interpreting multiple malware detection models. Specifically, we analyze the consistency and reliability of these neural network models in determining an attack and benign functions. We provide quantitative analysis of multiple explanation methods across different datasets. When trained with the top feature attributes (10%-35% of whole data) generated by XAI methods, the ML classifiers (trained on High Performance Counters and Mimicus PDF malware datasets) retain a malware detection accuracy of 88%-92%. The ML classifiers are also compared with state-of-the-art models and the proposed technique (training with partial data features generated by explainable methods) produce comparable malware detection accuracy above 82%.

References

[1]

2021. Virustotal package. https://www.rdocumentation.org/packages/virustotal/versions/0.2.1

[2]

Jack Beerman and et al.2023. A Review of Colonial Pipeline Ransomware Attack. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops. IEEE.

[3]

Dipkamal Bhusal and Nidhi Rastogi. 2022. SoK: Modeling Explainability in Security Monitoring for Trust, Privacy, and Interpretability. arXiv preprint arXiv:2210.17376 (2022).

[4]

Francesco Bodria and et al.2021. Benchmarking and survey of explanation methods for black box models. (2021).

[5]

Daniel Gibert and et al.2019. Using convolutional neural networks for classification of malware represented as images. Journal of Computer Virology and Hacking Techniques (2019).

[6]

Wenbo Guo. 2018. LEMNA: Explaining Deep Learning based Security Applications.

Digital Library

[7]

Joseph Johnson. [n. d.]. Number of malware attacks per year 2020. https://www.statista.com/statistics/873097/malware-attacks-per-year-worldwide/

[8]

Sreenitha Kasarapu, Sanket Shukla, and Sai Manoj Pudukotai Dinakarrao. 2024. Resource- and Workload-aware Malware Detection through Distributed Computing in IoT Networks. In 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). 368–373. https://doi.org/10.1109/ASP-DAC58780.2024.10473814

Digital Library

[9]

Sreenitha Kasarapu, Sanket Shukla, Rakibul Hassan, Avesta Sasan, Houman Homayoun, and Sai Manoj PD. 2022. CAD-FSL: Code-Aware Data Generation based Few-Shot Learning for Efficient Malware Detection. (2022).

[10]

Antigoni Kruti and et al.2023. A review of SolarWinds attack on Orion platform using persistent threat agents and techniques for gaining unauthorized access.

[11]

Abraham Peedikayil Kuruvila, Shamik Kundu, and Kanad Basu. [n. d.]. Analyzing the Efficiency of Machine Learning Classifiers in Hardware-Based Malware Detectors. In 2020 IEEE Computer Society Annual Symposium on VLSI.

[12]

Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. (2017).

[13]

Maad Mijwil and et al.2023. Towards artificial intelligence-based cybersecurity: the practices and ChatGPT generated ways to combat cybercrime. Iraqi Journal For Computer Science and Mathematics 4 (2023).

[14]

Mehrnoosh Nobakht and et al.2022. DEMD-IoT: a deep ensemble model for IoT malware detection using CNNs and network traffic. (2022).

[15]

Nisarg Patel, Avesta Sasan, and Houman Homayoun. 2017. Analyzing hardware based malware detectors. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/3061639.3062202

Digital Library

[16]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. (2016).

[17]

Marko Robnik-Šikonja and et.al. 2018. Perturbation-based explanations of prediction models. Human and Machine Learning: Visible, Explainable, Trustworthy and Transparent (2018).

[18]

Hossein Sayadi, Hosein Mohammadi Makrani, Onkar Randive, Sai Manoj P.D., Setareh Rafatirad, and Houman Homayoun. 2018. Customized Machine Learning-Based Hardware-Assisted Malware Detection in Embedded Devices. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00251

[19]

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2019. Learning Important Features Through Propagating Activation Differences. (2019). arxiv:1704.02685 [cs.CV]

[20]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. (2014). arxiv:1312.6034 [cs.CV]

[21]

Charles Smutz and Angelos Stavrou. 2012. Malicious PDF detection using metadata and structural features. ACM International Conference Proceeding Series.

Digital Library

[22]

Mukund Sundararajan and et al.2017. Axiomatic Attribution for Deep Networks. arxiv:1703.01365 [cs.LG]

[23]

Luca Viganò and Daniele Magazzeni. 2018. Explainable Security. (2018). arxiv:1807.04178 [cs.CR]

[24]

Alexander Warnecke and et al.2020. Evaluating Explanation Methods for Deep Learning in Security. (2020).

[25]

Matthew Zeiler and Rob Fergus. 2013. Visualizing and Understanding Convolutional Neural Networks. ECCV 2014, Part I, LNCS 8689.

[26]

Zhibo Zhang and et al.2022. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. IEEE Access (2022).

Index Terms

Comprehensive Analysis of Consistency and Robustness of Machine Learning Models in Malware Detection
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Machine/Deep Learning for obfuscated malware Detection
ICFNDS '23: Proceedings of the 7th International Conference on Future Networks and Distributed Systems

The protection of digital systems and sensitive data has become paramount due to the appearance of new intelligent cybersecurity threats. The landscape of cyber threats is constantly evolving, becoming more sophisticated and diverse by the day. This ...
Automatic analysis of malware behavior using machine learning

Malicious software - so called malware - poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with ...
A novel malware analysis for malware detection and classification using machine learning algorithms
SIN '17: Proceedings of the 10th International Conference on Security of Information and Networks

Nowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024

June 2024

797 pages

ISBN:9798400706059

DOI:10.1145/3649476

Editors:
Inna Partin-Vaisband
University of Illinois Chicago, USA
,
Srinivas Katkoori
University of South Florida, USA
,
Lu Peng
Tulane University, USA
,
Boris Vaisband
McGill University, Canada
,
Tooraj Nikoubin
University of Texas at Dallas, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

GLSVLSI '24

Sponsor:

SIGDA

GLSVLSI '24: Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

FL, Clearwater, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
207
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)34

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents