Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Malware Detection with Sequence-Based Machine Learning and Deep Learning

  • Chapter
  • First Online:
Malware Analysis Using Artificial Intelligence and Deep Learning

Abstract

In this chapter, we review sequence-based machine learning methods that are used for malware detection and classification. We start by reviewing the datatypes extracted from code: static features and dynamic traces of program execution. We review recent research that applies machine learning on opcode and API call sequences, call graphs, system calls, registry changes, information flow traces, as well as hybrid and raw data, to detect and classify malware. With a focus on metamorphic malware, we discuss Hidden Markov Models (HMMs) and Long Short-Term Memory (LSTM) networks. We describe their input formats, such as one-hot encoding and vector embeddings, the architecture of the machine learning models, the training process, and the output formats. Finally, we discuss commercial and open-source tools that are used for data extraction from software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmed, Faraz, Haider Hameed, M. Zubair Shafiq, and Muddassar Farooq. 2009. Using spatio-temporal information in API calls with machine learning algorithms for malware detection, 55. New York City: ACM Press.

    Google Scholar 

  2. Alqurashi, Saja, and Omar Batarfi. 2016. A comparison of malware detection techniques based on hidden Markov model. Journal of Information Security 07 (03): 215–223.

    Article  Google Scholar 

  3. Anderson, Blake, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. 2011. Graph-based malware detection using dynamic analysis. Journal in Computer Virology 7 (4): 247–258.

    Article  Google Scholar 

  4. Andrade, Eduardo de O, José Viterbo, Cristina N. Vasconcelos, Joris Guérin, and Flavia Cristina Bernardini. 2019. A model based on lstm neural networks to identify five different types of malware. Procedia Computer Science 159: 182–191.

    Google Scholar 

  5. Annachhatre, Chinmayee, Thomas H. Austin, and Mark Stamp. 2015. Hidden Markov models for malware classification. Journal of Computer Virology and Hacking Techniques 11 (2): 59–73.

    Article  Google Scholar 

  6. Athiwaratkun, B, and J. W. Stokes. 2017. Malware classification with lstm and gru language models and a character-level cnn. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2482–2486.

    Google Scholar 

  7. Cho, Kyunghyun, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1724–1734, Doha, Qatar. Association for Computational Linguistics.

    Google Scholar 

  8. Choi, Sunoh, Jangseong Bae, Changki Lee, Youngsoo Kim, and Jonghyun Kim. 2020. Attention-based automated feature extraction for malware analysis. Sensors 20 (10): 2893.

    Article  Google Scholar 

  9. Choi, Y.H, B.J. Han, B.C. Bae, H.G. Oh, and K.W. Sohn. 2012. Toward extracting malware features for classification using static and dynamic analysis. In IEEE conference publication.

    Google Scholar 

  10. Christodorescu, M, S Jha, S A Seshia, D Song, and R E Bryant. 2005. Semantics-aware malware detection, 32–46, IEEE.

    Google Scholar 

  11. Christodorescu , Mihai, and Somesh Jha. 2003. Static analysis of executables to detect malicious patterns. In Proceedings of the 12th conference on USENIX security symposium - volume 12, SSYM’03, 12. USA: USENIX Association.

    Google Scholar 

  12. Dai, Jianyong, Ratan Guha, and Joohan Lee. 2009. Efficient virus detection using dynamic instruction sequences. Güncel Pediatri 4 (5).

    Google Scholar 

  13. Damodaran, Anusha, Fabio Di Troia, Corrado Aaron Visaggio, Thomas H. 2017. Austin, and Mark Stamp. A comparison of static, dynamic, and hybrid analysis for malware detection. Journal of Computer Virology and Hacking Techniques 13(1): 1–12.

    Google Scholar 

  14. Deshpande, Prasad. 2013. Metamorphic detection using function call graph analysis.

    Google Scholar 

  15. Dinaburg, Artem, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: Malware analysis via hardware virtualization extensions, 51. New York City: ACM Press.

    Google Scholar 

  16. Egele, Manuel, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2012. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys 44 (2): 1–42.

    Article  Google Scholar 

  17. Eskandari, Mojtaba, and Sattar Hashemi. 2012. A graph mining approach for detecting unknown malwares. Journal of Visual Languages and Computing 23 (3): 154–162.

    Article  Google Scholar 

  18. Eskandari, Mojtaba, Zeinab Khorshidpour, and Sattar Hashemi. 2013. Hdm-analyser: A hybrid analysis approach based on data mining techniques for malware detection. Journal of Computer Virology and Hacking Techniques 9 (2): 77–93.

    Article  Google Scholar 

  19. Eskandari, Mojtaba, Zeinab Khorshidpur, and Sattar Hashemi. 2012. To incorporate sequential dynamic features in malware detection engines, 46–52, IEEE.

    Google Scholar 

  20. Fasikhov, R. The api logger tool. http://blackninja2000.narod.ru/rus/api_logger.html. Accessed 14 July 2020.

  21. Gandotra, Ekta, Divya Bansal, and Sanjeev Sofat. 2014. Malware analysis and classification: A survey. Journal of Information Security 05 (02): 56–64.

    Article  Google Scholar 

  22. Ghahramani, Zoubin. 2001. An introduction to hidden Markov models and bayesian networks. International Journal of Pattern Recognition and Artificial Intelligence 15 (01): 9–42.

    Google Scholar 

  23. Ghiasi, Mahboobe, Ashkan Sami, and Zahra Salehi. 2012. Dynamic malware detection using registers values set analysis, 54–59, IEEE.

    Google Scholar 

  24. Hr, Sandeep. 2019. Static analysis of android malware detection using deep learning, 841–845, IEEE.

    Google Scholar 

  25. Jain, Mugdha, William Andreopoulos, and Mark Stamp. 2020. Convolutional neural networks and extreme learning machines for malware classification. Journal of Computer Virology and Hacking Techniques.

    Google Scholar 

  26. Lu, Renjie. 2019. Malware detection with lstm using opcode language. ArXiv:abs/1906.04593.

  27. Mathew, J, and M A Ajay Kumara. 2020. API call based malware detection approach using recurrent neural network – LSTM. In Intelligent systems design and applications, Advances in intelligent systems and computing, eds. Abraham, Ajith, Aswani Kumar Cherukuri, Patricia Melin, and NiketaEditors Gandhi, vol. 940, 87–99. Springer International Publishing.

    Google Scholar 

  28. Moser, Andreas, Christopher Kruegel, and Engin Kirda. 2007. Limits of static analysis for malware detection, 421–430, IEEE.

    Google Scholar 

  29. Naidu, Vijay, Jacqueline Whalley, and Ajit Narayanan. 2017. Exploring the effects of gap-penalties in sequence-alignment approach to polymorphic virus detection. Journal of Information Security 08: 296–327.

    Google Scholar 

  30. Park, Younghee, Douglas S. Reeves, and Mark Stamp. 2013. Deriving common malware behavior through graph clustering. Computers and Security 39: 419–430.

    Article  Google Scholar 

  31. Qiao, Yong, Yuexiang Yang, Lin Ji, and Jie He. 2013. Analyzing malware by abstracting the frequent itemsets in API call sequences, 265–270, IEEE.

    Google Scholar 

  32. Rhee, Junghwan, Ryan Riley, Xu Dongyan, and Xuxian Jiang. 2010. Kernel malware analysis with un-tampered and temporal views of dynamic kernel memory. In Recent advances in intrusion detection, Lecture notes in computer science, eds. Somesh Jha, Robin Sommer, and Christian Kreibich, vol. 6307, 178–197. Berlin: Springer.

    Google Scholar 

  33. Rhode, Matilda, Pete Burnap, and Kevin Jones. 2018. Early-stage malware prediction using recurrent neural networks. Computers and Security 77: 578–594.

    Article  Google Scholar 

  34. Roundy, Kevin, A., and Barton P. Miller. 2010. Hybrid analysis and control of malware. In Recent advances in intrusion detection, Lecture notes in computer science, eds. Somesh Jha, Robin Sommer, Christian Kreibich, vol. 6307, 317–338. Berlin: Springer.

    Google Scholar 

  35. Runwal, Neha, Richard M. Low, and Mark Stamp. 2012. Opcode graph similarity and metamorphic detection. Journal in Computer Virology 8 (1–2): 37–52.

    Article  Google Scholar 

  36. Shankarapani, Madhu K., Subbu Ramamoorthy, Ram S. Movva, and Srinivas Mukkamala. 2011. Malware detection using assembly and api call sequences. Journal in Computer Virology 7 (2): 107–119.

    Article  Google Scholar 

  37. Shanmugam, Gayathri, Richard M. Low, and Mark Stamp. 2013. Simple substitution distance and metamorphic detection. Journal of Computer Virology and Hacking Techniques 9 (3): 159–170.

    Article  Google Scholar 

  38. Shijo, P.V., and A. Salim. 2015. Integrated static and dynamic analysis for malware detection. Procedia Computer Science 46: 804–811.

    Article  Google Scholar 

  39. Shukla, Sanket, Gaurav Kolhe, Sai Manoj P D, and Setareh Rafatirad. 2019. Stealthy malware detection using rnn-based automated localized feature extraction and classifier. In 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI), 590–597, IEEE.

    Google Scholar 

  40. Stamp, M. A revealing introduction to hidden Markov models. tutorial. www.cs.sjsu.edu/~stamp/RUA/HMM.pdf. Accessed 14 July 2020.

  41. Symantec. Symantec Internet security threat report (ISTR) Volume 23. Technical report, Symantec, 03 2018.

    Google Scholar 

  42. Symantec. Symantec Internet security threat report (ISTR) Volume 24. Technical report, Symantec, 02 2019.

    Google Scholar 

  43. Tabish, S. Momina, M. Zubair Shafiq, and Muddassar Farooq. 2009. Malware detection using statistical analysis of byte-level file content. In Proceedings of the ACM SIGKDD workshop on cybersecurity and intelligence informatics - CSI-KDD ’09, eds. Chen, Hsinchun, Marc Dacier, Marie-Francine Moens, Gerhard Paass, and Christopher C. Yang, 23. New York City: ACM Press.

    Google Scholar 

  44. Le Thanh, Hieu. 2013. Analysis of malware families on android mobiles: detection characteristics recognizable by ordinary phone users and how to fix it. Journal of Information Security 04 (04): 213–224.

    Article  Google Scholar 

  45. Tobiyama, S, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi. 2016. Malware detection with deep neural network using process behavior. In 2016 IEEE 40th annual computer software and applications conference (COMPSAC), vol. 2, 577–582.

    Google Scholar 

  46. Vinayakumar, R, K P Soman, Prabaharan Poornachandran, and S Sachin Kumar. 2018. Detecting android malware using long short-term memory (lstm). Journal of Intelligent and Fuzzy Systems 34 (3): 1277–1288.

    Google Scholar 

  47. Wang, Xiaofeng. 2009. Effective and efficient malware detection at the end host. In USENIX security symposium, 351–366.

    Google Scholar 

  48. Wong, A. Symantec internet security threat report highlights. www.techarp.com/cybersecurity/2019-symantec-istr-highlights/. Accessed 14 July 2020.

  49. Xiao, Xi, Shaofeng Zhang, Francesco Mercaldo, Guangwu Hu, and Arun Kumar Sangaiah. 2017. Android malware detection based on system call sequences and lstm. Multimedia Tools and Applications 78 (4): 1–21.

    Google Scholar 

  50. Yan, Jinpei, Yong Qi, and Qifan Rao. 2018. Lstm-based hierarchical denoising network for android malware detection. Security and Communication Networks 1–18: 2018.

    Google Scholar 

  51. Ye, Yanfang, Dingding Wang, Tao Li, Dongyi Ye, and Qingshan Jiang. 2008. An intelligent pe-malware detection system based on association mining. Journal in Computer Virology 4 (4): 323–334.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William B. Andreopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Andreopoulos, W.B. (2021). Malware Detection with Sequence-Based Machine Learning and Deep Learning. In: Stamp, M., Alazab, M., Shalaginov, A. (eds) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62582-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62581-8

  • Online ISBN: 978-3-030-62582-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics