research-article

A novel machine learning approach for detecting first-time-appeared malware

Authors:

Kamran Shaukat,

Suhuai Luo,

Vijay VaradharajanAuthors Info & Claims

Volume 131, Issue C

https://doi.org/10.1016/j.engappai.2023.107801

Published: 09 July 2024 Publication History

Abstract

Conventional malware detection approaches have the overhead of feature extraction, the requirement of domain experts, and are time-consuming and resource-intensive. Learning-based approaches are the mainstay of malware detection as they overcome most of these challenges by significantly improving the detection effectiveness and providing a low false positive rate. The exponential growth of malware variants and first-time-appeared malware, which includes polymorphic and zero-day attacks, are some of the significant challenges to learning-based malware detectors. These challenges have catastrophic impacts on the detection effectiveness of these learning-based malware detectors. This paper proposes a novel deep learning-based framework to detect first-time-appeared malware effectively and efficiently by providing better performance than conventional malware detection approaches. First, it translates and visualises each Windows portable executable (PE) file into a coloured image to eliminate the overhead of feature extraction and the need for domain experts to analyse the features. In the subsequent step, a fine-tuned deep learning model is used to extract the deep features from the last fully connected layer. The step has reduced the cost of training required by the deep learning models if used for end-to-end classification. The third step selects the most important and influential features through a powerful feature selection algorithm. The most important features are then fed to a one-class classifier for final detection. With the one-class classifier, an enclosed boundary around the features of benign data is constructed. Anything outside the boundary is declared as an anomaly/malicious. It has enhanced the framework's ability to detect evolving, unseen, polymorphic, and zero-day attacks, as well as reducing the problem of overfitting. The detection effectiveness of the proposed framework is validated with state-of-the-art deep learning models and conventional approaches. The proposed framework has outperformed with an accuracy of 99.30% on the Malimg dataset. The Wilcoxon signed-rank test is used to validate the statistical significance of the proposed framework. It is evident from the results that the proposed framework is effective and can be used in the defence industry, resulting in more powerful and robust solutions against zero-day and polymorphic attacks.

Graphical abstract

Display Omitted

Highlights

•

A novel approach of combining deep learning and machine learning is proposed. First, deep learning is used to extract deep features. The most influential and meticulous features are selected in the subsequent steps to train the machine learning classifier for final detection. The proposed framework eliminates the need for human efforts for reverse engineering tasks.

•

The proposed framework consists of four steps. In the first step, all PEs are transformed into coloured images. The second step used a deep learning model to extract the deep features. The subsequent step selects the most important features. Finally, the lightweight and most influential features are sent to the final machine learning classifier for final malware detection.

•

We demonstrate that the proposed framework is lightweight, resilient, efficient and cost-effective. An in-depth analysis is performed to validate the detection effectiveness and generalisation of the proposed framework on multiple datasets. Our results demonstrate that the proposed framework outperformed conventional and state-of-the-art malware detection approaches.

References

[1]

Z. Akram, M. Majid, S. Habib, A systematic literature review: usage of logistic regression for malware detection, in: 2021 International Conference on Innovative Computing (ICIC), IEEE, 2021, pp. 1–8.

Abstract

Graphical abstract

Highlights

References

Recommendations

Machine/Deep Learning for obfuscated malware Detection

A Lifecycle Based Approach for Malware Analysis

A novel deep learning-based approach for malware detection

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations