Abstract
This paper investigates the image-based malware classification using machine learning techniques. It is a recent approach for malware classification in which malware binaries are converted into images (i.e. malware images) prior to feeding machine learning models, i.e. k-nearest neighbour (k-NN), Naïve Bayes (NB), Support Vector Machine (SVM) or Convolution Neural Networks (CNN). This approach relies on image texture to classify a malware instead of signatures or behaviours of malware collected via malware analysis, thus it does not encounter a problem if the signatures of a new malware variant has not been collected or the behaviours of a new malware variant has not been updated.
This paper evaluates classification performance of various machine learning classifiers (i.e. k-NN, NB, SVM, CNN) fed by malware images in various dimensions (i.e., 128 × 128, 64 × 64, 32 × 32, 16 × 16). The experiment results achieved on three different datasets including Malimg, Malheur and BIG2015 show that k-NN outperforms others on three datasets with high accuracy (i.e. 97.9%, 94.41% and 95.63% respectively). On the contrary, NB showed its weakness on image-based malware classification. Experiment results also indicate that the accuracy of the k-NN reaches the highest value at the input image size of 32 × 32 and tends to reduce if too many feature information provided by large input images, i.e. 64 × 64, 128 × 128.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A Survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 50(3), 1–40 (2017). Article No. 41
Kaspersky Security Bulletin 2019, Kaspersky (2019). https://securelist.com/kaspersky-security-bulletin-threat-predictions-for-2019/88878/
Cybersecurity Ventures (2018). https://cybersecurityventures.com/-cybercrime-damages-6-trillion-by-2021/
Souri, A., Hosseini, R.: A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Centric Comput. Inf. Sci. 8(1), 1–22 (2018). https://doi.org/10.1186/s13673-018-0125-x
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, Pennsylvania, USA (2011)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2014). https://doi.org/10.1007/s10207-014-0242-0
Douze, M. et al.: Evaluation of GIST descriptors for web-scale image search. In: Proceedings of the ACM International Conference on Image and Video Retrieval, Article No. 19, Greece (2009)
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, Louisiana, USA (2016)
Bhodia, N., Prajapati, P., Troia, F.D., Stamp, M.: Transfer learning for image-based malware classification. In: Proceedings of the 5th International Conference on Information Systems Security and Privacy, pp. 719–726 (2015)
Alex, T.: Malware-detection-using-Machine-Learning. https://github.com/tuff96/Malware-detection-using-Machine-Learning
Le, Q., Boydell, O., Mac Namee, B., Scanlon, M.: Deep learning at the shallow end: Malware classification for non-domain experts. Digit. Invest. 26(1), 5118–5126 (2018)
Cui, Z., et al.: Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Inform. 14(7), 3187–3196 (2018)
Tareen, S.A.K., Saleem, Z.: A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In: International Conference on Computing, Mathematics and Engineering Technologies (iCoMET 2018), Sukkur, Pakistan (2018)
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. (JCS) 19(4), 639–668 (2011)
Torralba, A.: How many pixels make an image? Vis. Neurosci. 26(1), 123–131 (2009)
Orava, J.: k-nearst neighbour kernel density estimation, the choice of optimal k. Tatra Mountains Math. Publ. 50(1), 39–50 (2011)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maryland, USA, pp. 655–665 (2014)
Albelwi, S., Mahmood, A.: A framework for designing the architectures of deep convolutional neural networks. Entropy 19(6), 242 (2017)
Google Brain Team: TensorFlow. https://www.tensorflow.org/. Accessed 18 Nov 2019
Pedregosa, F., et al.: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Keras: Keras Documentation (2015). https://keras.io/
Abadi, M. et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conf. on Operating Systems Design and Implementation, Savannah, GA, USA, pp. 265–283 (2016)
Van den Bossche, J., et al.: Scikit-learn. https://scikit-learn.org/stable/. Accessed 18 Nov 2019
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
Stamp, M.: Data analysis. In: Introduction to Machine Learning with Applications in Information Security. CRC Press, Taylor & Francis Group (2018). ISBN-13: 978-1-138-62678-2
Yajamanam, S., Selvin, V., Troia, F.D., Stamp, M.: Deep learning versus gist descriptors for image-based malware classification. In: Proceedings of the 4th International Conference on Info. Systems Security and Privacy (ICISSP 2018), pp. 553–561 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Son, T.T., Lee, C., Le-Minh, H., Aslam, N., Raza, M., Long, N.Q. (2020). An Evaluation of Image-Based Malware Classification Using Machine Learning. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds) Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science, vol 1287. Springer, Cham. https://doi.org/10.1007/978-3-030-63119-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-63119-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63118-5
Online ISBN: 978-3-030-63119-2
eBook Packages: Computer ScienceComputer Science (R0)