Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A novel machine learning approach for detecting first-time-appeared malware

Published: 09 July 2024 Publication History

Abstract

Conventional malware detection approaches have the overhead of feature extraction, the requirement of domain experts, and are time-consuming and resource-intensive. Learning-based approaches are the mainstay of malware detection as they overcome most of these challenges by significantly improving the detection effectiveness and providing a low false positive rate. The exponential growth of malware variants and first-time-appeared malware, which includes polymorphic and zero-day attacks, are some of the significant challenges to learning-based malware detectors. These challenges have catastrophic impacts on the detection effectiveness of these learning-based malware detectors. This paper proposes a novel deep learning-based framework to detect first-time-appeared malware effectively and efficiently by providing better performance than conventional malware detection approaches. First, it translates and visualises each Windows portable executable (PE) file into a coloured image to eliminate the overhead of feature extraction and the need for domain experts to analyse the features. In the subsequent step, a fine-tuned deep learning model is used to extract the deep features from the last fully connected layer. The step has reduced the cost of training required by the deep learning models if used for end-to-end classification. The third step selects the most important and influential features through a powerful feature selection algorithm. The most important features are then fed to a one-class classifier for final detection. With the one-class classifier, an enclosed boundary around the features of benign data is constructed. Anything outside the boundary is declared as an anomaly/malicious. It has enhanced the framework's ability to detect evolving, unseen, polymorphic, and zero-day attacks, as well as reducing the problem of overfitting. The detection effectiveness of the proposed framework is validated with state-of-the-art deep learning models and conventional approaches. The proposed framework has outperformed with an accuracy of 99.30% on the Malimg dataset. The Wilcoxon signed-rank test is used to validate the statistical significance of the proposed framework. It is evident from the results that the proposed framework is effective and can be used in the defence industry, resulting in more powerful and robust solutions against zero-day and polymorphic attacks.

Graphical abstract

Display Omitted

Highlights

A novel approach of combining deep learning and machine learning is proposed. First, deep learning is used to extract deep features. The most influential and meticulous features are selected in the subsequent steps to train the machine learning classifier for final detection. The proposed framework eliminates the need for human efforts for reverse engineering tasks.
The proposed framework consists of four steps. In the first step, all PEs are transformed into coloured images. The second step used a deep learning model to extract the deep features. The subsequent step selects the most important features. Finally, the lightweight and most influential features are sent to the final machine learning classifier for final malware detection.
We demonstrate that the proposed framework is lightweight, resilient, efficient and cost-effective. An in-depth analysis is performed to validate the detection effectiveness and generalisation of the proposed framework on multiple datasets. Our results demonstrate that the proposed framework outperformed conventional and state-of-the-art malware detection approaches.

References

[1]
Z. Akram, M. Majid, S. Habib, A systematic literature review: usage of logistic regression for malware detection, in: 2021 International Conference on Innovative Computing (ICIC), IEEE, 2021, pp. 1–8.
[2]
A. Al-Dujaili, A. Huang, E. Hemberg, U.-M. O'Reilly, Adversarial deep learning for robust detection of binary encoded malware, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 76–82.
[3]
M. Al-Qudah, Z. Ashi, M. Alnabhan, Q. Abu Al-Haija, Effective one-class classifier model for memory dump malware detection, J. Sens. Actuator Netw. 12 (1) (2023) 5.
[4]
H. Alazzam, A. Sharieh, K.E. Sabri, A lightweight intelligent network intrusion detection system using ocsvm and pigeon inspired optimizer, Appl. Intell. 52 (4) (2022) 3527–3544.
[5]
E.S. Alomari, et al., Malware detection using deep learning and correlation-based feature selection, Symmetry 15 (1) (2023) 123.
[6]
V. Ambalavanan, Cyber threats detection and mitigation using machine learning, in: Handbook of Research on Machine and Deep Learning Applications for Cyber Security, IGI Global, 2020, pp. 132–149.
[7]
M. Amin, F. Al-Obeidat, A. Tubaishat, B. Shah, S. Anwar, T.A. Tanveer, Cyber security and beyond: detecting malware and concept drift in AI-based sensor data streams using statistical techniques, Comput. Electr. Eng. 108 (2023).
[8]
B. Anderson, D. Quist, J. Neil, C. Storlie, T. Lane, Graph-based malware detection using dynamic analysis, J. Comput. Virol. 7 (4) (2011) 247–258.
[9]
J. Bae, C. Lee, Easy data augmentation for improved malware detection: a comparative study, in: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), IEEE, 2021, pp. 214–218.
[10]
M. Bansal, M. Kumar, M. Sachdeva, A. Mittal, Transfer learning for image classification using VGG19: caltech-101 image data set, J. Ambient Intell. Hum. Comput. (2021) 1–12.
[11]
P. Bhat, S. Behal, K. Dutta, A system call-based android malware detection approach with homogeneous & heterogeneous ensemble machine learning, Comput. Secur. 130 (2023).
[12]
N. Bhodia, P. Prajapati, F. Di Troia, M. Stamp, Transfer Learning for Image-Based Malware Classification, 2019, arXiv preprint arXiv:1903.11551.
[13]
A. Binbusayyis, T. Vaiyapuri, Unsupervised deep learning approach for network intrusion detection combining convolutional autoencoder and one-class SVM, Appl. Intell. 51 (10) (2021) 7094–7108.
[14]
P. Bouchaib, M. Bouhorma, Transfer learning and SMOTE algorithm for image-based malware classification, in: Proceedings of the 4th International Conference on Networking, Information Systems & Security, 2021, pp. 1–6.
[15]
R. Burks, K.A. Islam, Y. Lu, J. Li, Data augmentation with generative models for improved malware detection: a comparative study, in: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), IEEE, 2019, pp. 660–665.
[16]
G. Canfora, A. Di Sorbo, F. Mercaldo, C.A. Visaggio, Obfuscation techniques against signature-based detection: a case study, in: 2015 Mobile Systems Technologies Workshop (MST), IEEE, 2015, pp. 21–26.
[17]
R. Chaganti, V. Ravi, T.D. Pham, A multi-view feature fusion approach for effective malware classification using Deep Learning, J. Inf. Secur. Appl. 72 (2023).
[18]
L. Chen, Deep Transfer Learning for Static Malware Classification, 2018, arXiv preprint arXiv:1812.07606.
[19]
Y.-M. Chen, C.-H. Yang, G.-C. Chen, Using generative adversarial networks for data augmentation in android malware detection, in: 2021 IEEE Conference on Dependable and Secure Computing (DSC), IEEE, 2021, pp. 1–8.
[20]
Z. Cui, F. Xue, X. Cai, Y. Cao, G.-g. Wang, J. Chen, Detection of malicious code variants based on deep learning, IEEE Trans. Ind. Inf. 14 (7) (2018) 3187–3196.
[21]
Z. Cui, L. Du, P. Wang, X. Cai, W. Zhang, Malicious code detection based on CNNs and multi-objective algorithm, J. Parallel Distr. Comput. 129 (2019) 50–58.
[22]
A. Djenna, A. Bouridane, S. Rubab, I.M. Marou, Artificial intelligence-based malware detection, analysis, and mitigation, Symmetry 15 (3) (2023) 677.
[23]
G. D'Angelo, M. Ficco, F. Palmieri, Malware detection in mobile environments based on Autoencoders and API-images, J. Parallel Distr. Comput. 137 (2020) 26–33.
[24]
O.J. Falana, A.S. Sodiya, S.A. Onashoga, B.S. Badmus, Mal-Detect: an intelligent visualization approach for malware detection, Journal of King Saud University-Computer and Information Sciences 34 (5) (2022) 1968–1983.
[25]
E. Frank, M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2011.
[26]
Z. Fu, Y. Ding, M. Godfrey, An LSTM-based malware detection using transfer learning, Journal of Cybersecurity 3 (1) (2021) 11.
[27]
A. Fujino, J. Murakami, T. Mori, Discovering similar malware samples using api call topics, in: 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), IEEE, 2015, pp. 140–147.
[28]
O. Habibi, M. Chemmakha, M. Lazaar, Performance evaluation of CNN and pre-trained models for malware classification, Arabian J. Sci. Eng. (2023) 1–15.
[29]
O. Habibi, M. Chemmakha, M. Lazaar, Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection, Eng. Appl. Artif. Intell. 118 (2023).
[30]
J. Hemalatha, S.A. Roseline, S. Geetha, S. Kadry, R. Damaševičius, An efficient densenet-based deep learning model for malware detection, Entropy 23 (3) (2021) 344.
[31]
M. Imran, M.T. Afzal, M.A. Qadir, Using hidden markov model for dynamic malware analysis: first impressions, in: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), IEEE, 2015, pp. 816–821.
[32]
Y. Jian, H. Kuang, C. Ren, Z. Ma, H. Wang, A novel framework for image-based malware detection with a deep neural network, Comput. Secur. 109 (2021).
[33]
Y. Jiang, R. Li, J. Tang, A. Davanian, H. Yin, AOMDroid: detecting obfuscation variants of android malware using transfer learning, in: International Conference on Security and Privacy in Communication Systems, Springer, 2020, pp. 242–253.
[34]
A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, A. Alazab, Hybrid intrusion detection system based on the stacking ensemble of c5 decision tree classifier and one class support vector machine, Electronics 9 (1) (2020) 173.
[35]
C. Kim, S.-Y. Chang, J. Kim, D. Lee, J. Kim, Automated, Reliable Zero-Day Malware Detection Based on Autoencoding Architecture, IEEE Transactions on Network and Service Management, 2023.
[36]
J.Z. Kolter, M.A. Maloof, Learning to detect and classify malicious executables in the wild, J. Mach. Learn. Res. 7 (12) (2006).
[37]
S. Kumar, MCFT-CNN: malware classification with fine-tune convolution neural networks using traditional and transfer learning in internet of things, Future Generat. Comput. Syst. 125 (2021) 334–351.
[38]
S. Kumar, B. Janet, DTMIC: deep transfer learning for malware image classification, J. Inf. Secur. Appl. 64 (2022).
[39]
S. Kumar, B. Janet, S. Neelakantan, Identification of malware families using stacking of textural features and machine learning, Expert Syst. Appl. 208 (2022).
[40]
Q. Le, O. Boydell, B. Mac Namee, M. Scanlon, Deep learning at the shallow end: malware classification for non-domain experts, Digit. Invest. 26 (2018) S118–S126.
[41]
W.-C. Lin, Y.-R. Yeh, Efficient malware classification by binary sequences with one-dimensional convolutional neural networks, Mathematics 10 (4) (2022) 608.
[42]
W.W. Lo, X. Yang, Y. Wang, An xception convolutional neural network for malware classification with transfer learning, in: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), IEEE, 2019, pp. 1–5.
[43]
A. Mallik, A. Khetarpal, S. Kumar, ConRec: malware classification using convolutional recurrence, Journal of Computer Virology and Hacking Techniques 18 (4) (2022) 297–313.
[44]
N. Marastoni, R. Giacobazzi, M. Dalla Preda, Data augmentation and transfer learning to classify malware images in a deep learning context, Journal of Computer Virology and Hacking Techniques 17 (4) (2021) 279–297.
[45]
P. MAULANA, A. Heryanto, A.F. Oklilas, KLASIFIKASI MALWARE ADWARE PADA ANDROID MENGGUNAKAN METODE SUPPORT VEKTOR MACHINE (SVM) DAN LINEAR DISCRIMINANT ANALYSIS (LDA), Sriwijaya University, 2022.
[46]
N. McLaughlin, J.M. Del Rincon, Data augmentation for opcode sequence based malware detection, in: 2022 Cyber Research Conference-Ireland (Cyber-RCI), IEEE, 2022, pp. 1–8.
[47]
Microsoft : Microsoft malware classification challenge (BIG 2015). https://www.kaggle.com/c/malware-classification/data.
[48]
B. Min, J. Yoo, S. Kim, D. Shin, D. Shin, Network anomaly detection using memory-augmented deep autoencoder, IEEE Access 9 (2021) 104695–104706.
[49]
D. Nahmias, A. Cohen, N. Nissim, Y. Elovici, Deep feature transfer learning for trusted and automated malware signature generation in private cloud environments, Neural Network. 124 (2020) 243–257.
[50]
L. Nataraj, S. Karthikeyan, G. Jacob, B.S. Manjunath, Malware images: visualization and automatic classification, in: Proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011, pp. 1–7.
[51]
M. Nisa, et al., Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features, Appl. Sci. 10 (14) (2020) 4966.
[52]
F.C. Onwuegbuche, A.D. Jurcut, L. Pasquale, Enhancing ransomware classification with multi-stage feature selection and data imbalance correction, in: International Symposium on Cyber Security, Cryptology, and Machine Learning, Springer, 2023, pp. 285–295.
[53]
B. Prima, M. Bouhorma, Using transfer learning for malware classification, Int. Arch. Photogram. Rem. Sens. Spatial Inf. Sci. 44 (2020) 343–349.
[54]
E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, C.K. Nicholas, Malware detection by eating a whole exe, in: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[55]
F. Rustam, I. Ashraf, A.D. Jurcut, A.K. Bashir, Y.B. Zikria, Malware detection using image representation of malware data and transfer learning, J. Parallel Distr. Comput. 172 (2023) 32–50.
[56]
J. Sahs, L. Khan, A machine learning approach to android malware detection, in: 2012 European Intelligence and Security Informatics Conference, IEEE, 2012, pp. 141–147.
[57]
Z. Salehi, A. Sami, M. Ghiasi, MAAR: robust features to detect malicious activity based on API calls, their arguments and return values, Eng. Appl. Artif. Intell. 59 (2017) 93–102.
[58]
M.G. Schultz, E. Eskin, F. Zadok, S.J. Stolfo, Data mining methods for detection of new malicious executables, in: Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001, IEEE, 2000, pp. 38–49.
[59]
R. Searles, et al., Parallelization of machine learning applied to call graphs of binaries for malware detection, in: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), IEEE, 2017, pp. 69–77.
[60]
A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, Y. Elovici, Detecting unknown malicious code by applying classification techniques on opcode patterns, Security Informatics 1 (1) (2012) 1–22.
[61]
I.A. Shah, A. Mehmood, A.N. Khan, M. Elhadef, A.u.R. Khan, HeuCrip: a malware detection approach for internet of battlefield things, Cluster Comput. 26 (2) (2023) 977–992.
[62]
S.Z.M. Shaid, M.A. Maarof, Malware behaviour visualization, Jurnal Teknologi 70 (5) (2014).
[63]
K. Shaukat, S. Luo, S. Chen, D. Liu, Cyber threat detection using machine learning techniques: a performance evaluation perspective, in: 2020 International Conference on Cyber Warfare and Security (ICCWS), IEEE, 2020, pp. 1–6.
[64]
K. Shaukat, et al., Performance comparison and current challenges of using machine learning techniques in cybersecurity, Energies 13 (10) (2020) 2509.
[65]
K. Shaukat, S. Luo, V. Varadharajan, I.A. Hameed, M. Xu, A survey on machine learning techniques for cyber security in the last decade, IEEE Access 8 (2020) 222310–222354.
[66]
K. Shaukat, S. Luo, V. Varadharajan, A novel deep learning-based approach for malware detection, Eng. Appl. Artif. Intell. 122 (2023).
[67]
A. Singh, A. Handa, N. Kumar, S.K. Shukla, Malware classification using image representation, in: International Symposium on Cyber Security Cryptography and Machine Learning, Springer, 2019, pp. 75–92.
[68]
J. Soni, S.K. Peddoju, N. Prabakar, H. Upadhyay, Comparative analysis of lstm, one-class svm, and pca to monitor real-time malware threats using system call sequences and virtual machine introspection, in: International Conference on Communication, Computing and Electronics Systems: Proceedings of ICCCES 2020, Springer, 2021, pp. 113–127.
[69]
D.M. Tax, K.-R. Müller, Feature extraction for one-class classification, in: Artificial Neural Networks and Neural Information Processing—ICANN/ICONIP 2003: Joint International Conference ICANN/ICONIP 2003 Istanbul, Turkey, June 26–29, 2003 Proceedings, Springer, 2003, pp. 342–349.
[70]
A. Tekerek, M.M. Yapici, A novel malware classification and augmentation model based on convolutional neural network, Comput. Secur. 112 (2022).
[71]
D. Vasan, M. Alazab, S. Wassan, B. Safaei, Q. Zheng, Image-Based malware classification using ensemble of CNN architectures (IMCEC), Comput. Secur. 92 (2020).
[72]
S. Venkatraman, M. Alazab, R. Vinayakumar, A hybrid deep learning image-based analysis for effective malware detection, J. Inf. Secur. Appl. 47 (2019) 377–389.
[73]
R. Vinayakumar, M. Alazab, K. Soman, P. Poornachandran, S. Venkatraman, Robust intelligent malware detection using deep learning, IEEE Access 7 (2019) 46717–46738.
[75]
M.D. Wong, E. Raff, J. Holt, R. Netravali, Marvolo: Programmatic Data Augmentation for Practical Ml-Driven Malware Detection, 2022, arXiv preprint arXiv:2206.03265.
[76]
C. Yuan, J. Cai, D. Tian, R. Ma, X. Jia, W. Liu, Towards time evolved malware identification using two-head neural network, J. Inf. Secur. Appl. 65 (2022).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence  Volume 131, Issue C
May 2024
1508 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 09 July 2024

Author Tags

  1. Deep learning
  2. Machine learning
  3. Artificial intelligence
  4. Zero-day malware
  5. Polymorphic
  6. Malware
  7. Cybersecurity
  8. Evolving attacks
  9. Security analytics

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media