Malware Detection with Sequence-Based Machine Learning and Deep Learning

Andreopoulos, William B.

doi:10.1007/978-3-030-62582-5_2

William B. Andreopoulos⁴

2377 Accesses
8 Citations

Abstract

In this chapter, we review sequence-based machine learning methods that are used for malware detection and classification. We start by reviewing the datatypes extracted from code: static features and dynamic traces of program execution. We review recent research that applies machine learning on opcode and API call sequences, call graphs, system calls, registry changes, information flow traces, as well as hybrid and raw data, to detect and classify malware. With a focus on metamorphic malware, we discuss Hidden Markov Models (HMMs) and Long Short-Term Memory (LSTM) networks. We describe their input formats, such as one-hot encoding and vector embeddings, the architecture of the machine learning models, the training process, and the output formats. Finally, we discuss commercial and open-source tools that are used for data extraction from software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lightweight Behavior-Based Malware Detection

Deep Learning for Classification of Malware System Call Sequences

A Comparison of Neural Network Architectures for Malware Classification Based on Noriben Operation Sequences

References

Ahmed, Faraz, Haider Hameed, M. Zubair Shafiq, and Muddassar Farooq. 2009. Using spatio-temporal information in API calls with machine learning algorithms for malware detection, 55. New York City: ACM Press.
Google Scholar
Alqurashi, Saja, and Omar Batarfi. 2016. A comparison of malware detection techniques based on hidden Markov model. Journal of Information Security 07 (03): 215–223.
Article Google Scholar
Anderson, Blake, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. 2011. Graph-based malware detection using dynamic analysis. Journal in Computer Virology 7 (4): 247–258.
Article Google Scholar
Andrade, Eduardo de O, José Viterbo, Cristina N. Vasconcelos, Joris Guérin, and Flavia Cristina Bernardini. 2019. A model based on lstm neural networks to identify five different types of malware. Procedia Computer Science 159: 182–191.
Google Scholar
Annachhatre, Chinmayee, Thomas H. Austin, and Mark Stamp. 2015. Hidden Markov models for malware classification. Journal of Computer Virology and Hacking Techniques 11 (2): 59–73.
Article Google Scholar
Athiwaratkun, B, and J. W. Stokes. 2017. Malware classification with lstm and gru language models and a character-level cnn. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2482–2486.
Google Scholar
Cho, Kyunghyun, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1724–1734, Doha, Qatar. Association for Computational Linguistics.
Google Scholar
Choi, Sunoh, Jangseong Bae, Changki Lee, Youngsoo Kim, and Jonghyun Kim. 2020. Attention-based automated feature extraction for malware analysis. Sensors 20 (10): 2893.
Article Google Scholar
Choi, Y.H, B.J. Han, B.C. Bae, H.G. Oh, and K.W. Sohn. 2012. Toward extracting malware features for classification using static and dynamic analysis. In IEEE conference publication.
Google Scholar
Christodorescu, M, S Jha, S A Seshia, D Song, and R E Bryant. 2005. Semantics-aware malware detection, 32–46, IEEE.
Google Scholar
Christodorescu , Mihai, and Somesh Jha. 2003. Static analysis of executables to detect malicious patterns. In Proceedings of the 12th conference on USENIX security symposium - volume 12, SSYM’03, 12. USA: USENIX Association.
Google Scholar
Dai, Jianyong, Ratan Guha, and Joohan Lee. 2009. Efficient virus detection using dynamic instruction sequences. Güncel Pediatri 4 (5).
Google Scholar
Damodaran, Anusha, Fabio Di Troia, Corrado Aaron Visaggio, Thomas H. 2017. Austin, and Mark Stamp. A comparison of static, dynamic, and hybrid analysis for malware detection. Journal of Computer Virology and Hacking Techniques 13(1): 1–12.
Google Scholar
Deshpande, Prasad. 2013. Metamorphic detection using function call graph analysis.
Google Scholar
Dinaburg, Artem, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: Malware analysis via hardware virtualization extensions, 51. New York City: ACM Press.
Google Scholar
Egele, Manuel, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2012. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys 44 (2): 1–42.
Article Google Scholar
Eskandari, Mojtaba, and Sattar Hashemi. 2012. A graph mining approach for detecting unknown malwares. Journal of Visual Languages and Computing 23 (3): 154–162.
Article Google Scholar
Eskandari, Mojtaba, Zeinab Khorshidpour, and Sattar Hashemi. 2013. Hdm-analyser: A hybrid analysis approach based on data mining techniques for malware detection. Journal of Computer Virology and Hacking Techniques 9 (2): 77–93.
Article Google Scholar
Eskandari, Mojtaba, Zeinab Khorshidpur, and Sattar Hashemi. 2012. To incorporate sequential dynamic features in malware detection engines, 46–52, IEEE.
Google Scholar
Fasikhov, R. The api logger tool. http://blackninja2000.narod.ru/rus/api_logger.html. Accessed 14 July 2020.
Gandotra, Ekta, Divya Bansal, and Sanjeev Sofat. 2014. Malware analysis and classification: A survey. Journal of Information Security 05 (02): 56–64.
Article Google Scholar
Ghahramani, Zoubin. 2001. An introduction to hidden Markov models and bayesian networks. International Journal of Pattern Recognition and Artificial Intelligence 15 (01): 9–42.
Google Scholar
Ghiasi, Mahboobe, Ashkan Sami, and Zahra Salehi. 2012. Dynamic malware detection using registers values set analysis, 54–59, IEEE.
Google Scholar
Hr, Sandeep. 2019. Static analysis of android malware detection using deep learning, 841–845, IEEE.
Google Scholar
Jain, Mugdha, William Andreopoulos, and Mark Stamp. 2020. Convolutional neural networks and extreme learning machines for malware classification. Journal of Computer Virology and Hacking Techniques.
Google Scholar
Lu, Renjie. 2019. Malware detection with lstm using opcode language. ArXiv:abs/1906.04593.
Mathew, J, and M A Ajay Kumara. 2020. API call based malware detection approach using recurrent neural network – LSTM. In Intelligent systems design and applications, Advances in intelligent systems and computing, eds. Abraham, Ajith, Aswani Kumar Cherukuri, Patricia Melin, and NiketaEditors Gandhi, vol. 940, 87–99. Springer International Publishing.
Google Scholar
Moser, Andreas, Christopher Kruegel, and Engin Kirda. 2007. Limits of static analysis for malware detection, 421–430, IEEE.
Google Scholar
Naidu, Vijay, Jacqueline Whalley, and Ajit Narayanan. 2017. Exploring the effects of gap-penalties in sequence-alignment approach to polymorphic virus detection. Journal of Information Security 08: 296–327.
Google Scholar
Park, Younghee, Douglas S. Reeves, and Mark Stamp. 2013. Deriving common malware behavior through graph clustering. Computers and Security 39: 419–430.
Article Google Scholar
Qiao, Yong, Yuexiang Yang, Lin Ji, and Jie He. 2013. Analyzing malware by abstracting the frequent itemsets in API call sequences, 265–270, IEEE.
Google Scholar
Rhee, Junghwan, Ryan Riley, Xu Dongyan, and Xuxian Jiang. 2010. Kernel malware analysis with un-tampered and temporal views of dynamic kernel memory. In Recent advances in intrusion detection, Lecture notes in computer science, eds. Somesh Jha, Robin Sommer, and Christian Kreibich, vol. 6307, 178–197. Berlin: Springer.
Google Scholar
Rhode, Matilda, Pete Burnap, and Kevin Jones. 2018. Early-stage malware prediction using recurrent neural networks. Computers and Security 77: 578–594.
Article Google Scholar
Roundy, Kevin, A., and Barton P. Miller. 2010. Hybrid analysis and control of malware. In Recent advances in intrusion detection, Lecture notes in computer science, eds. Somesh Jha, Robin Sommer, Christian Kreibich, vol. 6307, 317–338. Berlin: Springer.
Google Scholar
Runwal, Neha, Richard M. Low, and Mark Stamp. 2012. Opcode graph similarity and metamorphic detection. Journal in Computer Virology 8 (1–2): 37–52.
Article Google Scholar
Shankarapani, Madhu K., Subbu Ramamoorthy, Ram S. Movva, and Srinivas Mukkamala. 2011. Malware detection using assembly and api call sequences. Journal in Computer Virology 7 (2): 107–119.
Article Google Scholar
Shanmugam, Gayathri, Richard M. Low, and Mark Stamp. 2013. Simple substitution distance and metamorphic detection. Journal of Computer Virology and Hacking Techniques 9 (3): 159–170.
Article Google Scholar
Shijo, P.V., and A. Salim. 2015. Integrated static and dynamic analysis for malware detection. Procedia Computer Science 46: 804–811.
Article Google Scholar
Shukla, Sanket, Gaurav Kolhe, Sai Manoj P D, and Setareh Rafatirad. 2019. Stealthy malware detection using rnn-based automated localized feature extraction and classifier. In 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI), 590–597, IEEE.
Google Scholar
Stamp, M. A revealing introduction to hidden Markov models. tutorial. www.cs.sjsu.edu/~stamp/RUA/HMM.pdf. Accessed 14 July 2020.
Symantec. Symantec Internet security threat report (ISTR) Volume 23. Technical report, Symantec, 03 2018.
Google Scholar
Symantec. Symantec Internet security threat report (ISTR) Volume 24. Technical report, Symantec, 02 2019.
Google Scholar
Tabish, S. Momina, M. Zubair Shafiq, and Muddassar Farooq. 2009. Malware detection using statistical analysis of byte-level file content. In Proceedings of the ACM SIGKDD workshop on cybersecurity and intelligence informatics - CSI-KDD ’09, eds. Chen, Hsinchun, Marc Dacier, Marie-Francine Moens, Gerhard Paass, and Christopher C. Yang, 23. New York City: ACM Press.
Google Scholar
Le Thanh, Hieu. 2013. Analysis of malware families on android mobiles: detection characteristics recognizable by ordinary phone users and how to fix it. Journal of Information Security 04 (04): 213–224.
Article Google Scholar
Tobiyama, S, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi. 2016. Malware detection with deep neural network using process behavior. In 2016 IEEE 40th annual computer software and applications conference (COMPSAC), vol. 2, 577–582.
Google Scholar
Vinayakumar, R, K P Soman, Prabaharan Poornachandran, and S Sachin Kumar. 2018. Detecting android malware using long short-term memory (lstm). Journal of Intelligent and Fuzzy Systems 34 (3): 1277–1288.
Google Scholar
Wang, Xiaofeng. 2009. Effective and efficient malware detection at the end host. In USENIX security symposium, 351–366.
Google Scholar
Wong, A. Symantec internet security threat report highlights. www.techarp.com/cybersecurity/2019-symantec-istr-highlights/. Accessed 14 July 2020.
Xiao, Xi, Shaofeng Zhang, Francesco Mercaldo, Guangwu Hu, and Arun Kumar Sangaiah. 2017. Android malware detection based on system call sequences and lstm. Multimedia Tools and Applications 78 (4): 1–21.
Google Scholar
Yan, Jinpei, Yong Qi, and Qifan Rao. 2018. Lstm-based hierarchical denoising network for android malware detection. Security and Communication Networks 1–18: 2018.
Google Scholar
Ye, Yanfang, Dingding Wang, Tao Li, Dongyi Ye, and Qingshan Jiang. 2008. An intelligent pe-malware detection system based on association mining. Journal in Computer Virology 4 (4): 323–334.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, San Jose State University, San Jose, USA
William B. Andreopoulos

Authors

William B. Andreopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William B. Andreopoulos .

Editor information

Editors and Affiliations

Department of Computer Science, San Jose State University, San Jose, CA, USA
Mark Stamp
College of Engineering, IT & Environment, Charles Darwin University, Darwin, NT, Australia
Mamoun Alazab
Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Techology, Gjøvik, Norway
Andrii Shalaginov

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Andreopoulos, W.B. (2021). Malware Detection with Sequence-Based Machine Learning and Deep Learning. In: Stamp, M., Alazab, M., Shalaginov, A. (eds) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-62582-5_2
Published: 21 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62581-8
Online ISBN: 978-3-030-62582-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Malware Detection with Sequence-Based Machine Learning and Deep Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Lightweight Behavior-Based Malware Detection

Deep Learning for Classification of Malware System Call Sequences

A Comparison of Neural Network Architectures for Malware Classification Based on Noriben Operation Sequences

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Malware Detection with Sequence-Based Machine Learning and Deep Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Lightweight Behavior-Based Malware Detection

Deep Learning for Classification of Malware System Call Sequences

A Comparison of Neural Network Architectures for Malware Classification Based on Noriben Operation Sequences

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation