Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459637.3481896acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

AutoCombo: Automatic Malware Signature Generation Through Combination Rule Mining

Published: 30 October 2021 Publication History

Abstract

Malware detection is an essential step in building trustworthy computer systems. Signature-based detection detects a sample as malware if the sample data match or contain a pre-stored malware signature. Among all detection methods that malware experts are constantly exploring, signature-based malware detection is indispensable, due to its simplicity, explainability and efficiency. Malware signatures could have various formats, for example, a substring, a subsequence, or a combination rule. A combination rule signature could be viewed as a fixed set of properties, each of which describes some characteristic of an analyzed sample. Although security experts have dedicated many efforts to extract meaningful features from samples, the step of signature generation from the features has been rather ad hoc and time-consuming.
This paper focuses on the generation of combination rule signatures. We abstract and formally define the problem of combination rule malware signature generation, followed by a systematic study towards an effective and efficient implementation. Inspired by classic frequent itemsets mining solutions, the proposed AutoCombo approach is greedy but also complete. It generates higher quality signatures first, but is also able to traverse all possible property combinations for a complete generation. Further optimizations and future research potential are also discussed. The proposed approach is currently in use to assist the analysis for millions of files per day in a large security company. Our evaluation results using large-scale production data have also shown its efficacy. With the release of over 10 million real production records as well as our exploratory code, we hope this initial study could draw AI experts' attention and advance the research even further in this field.

Supplementary Material

MP4 File (CIKM21-fp1281.mp4)
This video recording is a presentation for our paper accepted in CIKM?21, named ?AutoCombo: Automatic Malware Signature Generation Through Combination Rule Mining''. In short, this paper proposes a systematic pipeline to automatically generate combination rule signatures for malware detection. Malware detection is an essential step in building trustworthy computer systems. Signature-based detection detects a sample as malware if the sample data match or contain a pre-stored malware signature. A combination rule signature could be viewed as a fixed set of properties, each of which describes some characteristic of an analyzed sample. This paper focuses on the generation of such signatures. Inspired by classic frequent itemsets mining solutions, the proposed AutoCombo approach is greedy but also complete. It generates higher quality signatures first, but is also able to traverse all possible property combinations for a complete generation. Further optimizations and future research potential are also discussed.

References

[1]
Olawale Surajudeen Adebayo and Normaziah Abdul Aziz. 2019. Improved Malware Detection Model with Apriori Association Rule and Particle Swarm Optimization. Security and Communication Networks, Vol. 2019 (2019).
[2]
Ramesh C Agarwal, Charu C Aggarwal, and VVV Prasad. 2001. A tree projection algorithm for generation of frequent item sets. Journal of parallel and Distributed Computing, Vol. 61, 3 (2001), 350--371.
[3]
Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, Vol. 1215. 487--499.
[4]
Mohannad Alhanahnah, Qicheng Lin, Qiben Yan, Ning Zhang, and Zhenxiang Chen. 2018. Efficient signature generation for classifying cross-architecture IoT malware. In 2018 IEEE Conference on Communications and Network Security (CNS). IEEE, 1--9.
[5]
Karin Ask. 2006. Automatic malware signature generation.
[6]
Andrea Atzeni, Fernando Díaz, Andrea Marcelli, Antonio Sánchez, Giovanni Squillero, and Alberto Tonda. 2018. Countering android malware: A scalable semi-supervised approach for family-signature generation. IEEE Access, Vol. 6 (2018), 59540--59556.
[7]
David Brumley, James Newsome, Dawn Song, Hao Wang, and Somesh Jha. 2006. Towards automatic generation of vulnerability-based signatures. In 2006 IEEE Symposium on Security and Privacy (S&P'06). IEEE, 15--pp.
[8]
Chin-Hoong Chee, Jafreezal Jaafar, Izzatdin Abdul Aziz, Mohd Hilmi Hasan, and William Yeoh. 2019. Algorithms for frequent itemset mining: a literature review. Artificial Intelligence Review, Vol. 52, 4 (2019), 2603--2621.
[9]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.
[10]
Cuckoo. 2021. Cuckoo: Automated malware analysis. https://cuckoosandbox.org/. [Online; accessed 25-May-2021].
[11]
Omid E David and Nathan S Netanyahu. 2015. Deepsign: Deep learning for automatic malware signature generation and classification. In 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
[12]
Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao, and Yibin Zhang. 2013. A fast malware detection algorithm based on objective-oriented association mining. Computers & security, Vol. 39 (2013), 315--324.
[13]
Brendan Dolan-Gavitt, Abhinav Srivastava, Patrick Traynor, and Jonathon Giffin. 2009. Robust signatures for kernel data structures. In Proceedings of the 16th ACM conference on Computer and communications security. 566--577.
[14]
Christian Doll, Arnold Sykosch, Marc Ohm, and Michael Meier. 2019. Automated Pattern Inference Based on Repeatedly Observed Malware Artifacts. In Proceedings of the 14th International Conference on Availability, Reliability and Security. 1--10.
[15]
Mohammad El-Hajj and Osmar R Zaiane. 2003. COFI-tree mining: a new approach to pattern growth with reduced candidacy generation. In Workshop on Frequent Itemset Mining Implementations (FIMI'03) in conjunction with IEEE-ICDM.
[16]
Kent Griffin, Scott Schneider, Xin Hu, and Tzi-Cker Chiueh. 2009. Automatic generation of string signatures for malware detection. In International workshop on recent advances in intrusion detection. Springer, 101--120.
[17]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM sigmod record, Vol. 29, 2 (2000), 1--12.
[18]
Masome Sadat Hoseini, Mohammad Nadimi Shahraki, and Behzad Soleimani Neysiani. 2015. A new algorithm for mining frequent patterns in can tree. In 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI). IEEE, 843--846.
[19]
Kaggle. 2021. Kaggle competitions. https://www.kaggle.com/competitions. [Online; accessed 25-May-2021].
[20]
kaspersky daily. 2016. Antivirus fundamentals: Viruses, signatures, disinfection. https://usa.kaspersky.com/blog/signature-virus-disinfection/7790/. [Online; accessed 25-May-2021].
[21]
Sanmeet Kaur and Maninder Singh. 2019. Hybrid intrusion detection and signature generation using Deep Recurrent Neural Networks. Neural Computing and Applications (2019), 1--19.
[22]
Hans Kellerer, Ulrich Pferschy, and David Pisinger. 2004. Multidimensional knapsack problems. In Knapsack problems. Springer, 235--283.
[23]
Jeffrey O Kephart. 1994. Automatic extraction of computer virus signatures. In Proc. 4th Virus Bulletin International Conference, Abingdon, England, 1994. 178--184.
[24]
Hyang-Ah Kim and Brad Karp. 2004. Autograph: Toward Automated, Distributed Worm Signature Detection. In USENIX security symposium. San Diego, CA.
[25]
Trupti M Kodinariya and Prashant R Makwana. 2013. Review on determining number of Cluster in K-Means Clustering. International Journal, Vol. 1, 6 (2013), 90--95.
[26]
Suchul Lee, Sungho Kim, Sungil Lee, Jaehyuk Choi, Hanjun Yoon, Dohoon Lee, and Jun-Rak Lee. 2016. LARGen: automatic signature generation for Malwares using latent Dirichlet allocation. IEEE Transactions on Dependable and Secure Computing, Vol. 15, 5 (2016), 771--783.
[27]
Wenke Lee, Salvatore J Stolfo, and Philip K Chan. 1997. Learning patterns from unix process execution traces for intrusion detection. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management. New York;, 50--56.
[28]
Zhuowei Li, XiaoFeng Wang, Zhenkai Liang, and Michael K Reiter. 2008. Agis: Towards automatic generation of infection signatures. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, 237--246.
[29]
McAfee. 2018. McAfee Endpoint Security 10.6.0 - Threat Prevention Product Guide - Windows: How signatures protect applications and systems. https://docs.mcafee.com/bundle/endpoint-security-10.6.0-threat-prevention-product-guide-windows/page/GUID-D5EE4D68-0CE5-4B9F-A22D-9DE54EABC345.html. [Online; accessed 25-May-2021].
[30]
Hassan Naderi, P Vinod, Mauro Conti, Saeed Parsa, and Mohammad Hadi Alaeiyan. 2019. Malware signature generation using locality sensitive hashing. In International Conference on Security & Privacy. Springer, 115--124.
[31]
Daniel Nahmias, Aviad Cohen, Nir Nissim, and Yuval Elovici. 2020. Deep feature transfer learning for trusted and automated malware signature generation in private cloud environments. Neural Networks, Vol. 124 (2020), 243--257.
[32]
Vijay Naidu, Jacqueline Whalley, and Ajit Narayanan. 2018. Generating rule-based signatures for detecting polymorphic variants using data mining and sequence alignment approaches. Journal of Information Security, Vol. 9 (2018), 265--298.
[33]
Palo Alto Networks. 2021. PAN-OS® Administrator's Guide: Threat Signatures. https://docs.paloaltonetworks.com/pan-os/8--1/pan-os-admin/threat-prevention/threat-signatures. [Online; accessed 25-May-2021].
[34]
James Newsome, Brad Karp, and Dawn Song. 2005. Polygraph: Automatically generating signatures for polymorphic worms. In 2005 IEEE Symposium on Security and Privacy (S&P'05). IEEE, 226--241.
[35]
Byung-Chul Park, Young J Won, Myung-Sup Kim, and James W Hong. 2008. Towards automated application signature generation for traffic identification. In IEEE Network Operations and Management Symposium. IEEE, 160--167.
[36]
Roberto Perdisci, Wenke Lee, and Nick Feamster. 2010. Behavioral clustering of http-based malware and signature generation using malicious network traces. In NSDI, Vol. 10. 14.
[37]
Gwangbum Pyun, Unil Yun, and Keun Ho Ryu. 2014. Efficient frequent pattern mining based on linear prefix tree. Knowledge-Based Systems, Vol. 55 (2014), 125--139.
[38]
Yutaka Sasaki. 2007. The truth of the F-measure.
[39]
Asaf Shabtai, Eitan Menahem, and Yuval Elovici. 2010. F-sign: Automatic, function-based signature generation for malware. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 41, 4 (2010), 494--508.
[40]
sklearn. 2021. sklearn.model_selection.GridSearchCV. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html. [Online; accessed 25-May-2021].
[41]
Mingjun Song and Sanguthevar Rajasekaran. 2006. A transaction mapping algorithm for frequent itemsets mining. IEEE transactions on knowledge and data engineering, Vol. 18, 4 (2006), 472--481.
[42]
Giovanni Squillero, Andrea Atzeni, Andrea Marcelli, and Luca Cetro. 2018a. Automatic Malware Signature Generation. (2018).
[43]
Giovanni Squillero, Dott Andrea Marcelli, and Luca Mannella. 2018b. Heuristics and Evolutionary Algorithms for Android Malware Signature Optimization. (2018).
[44]
Oleksii Starov, Yuchen Zhou, and Jun Wang. 2019. Detecting Malicious Campaigns in Obfuscated JavaScript with Scalable Behavioral Analysis. In 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 218--223.
[45]
Gil Tahan, Chanan Glezer, Yuval Elovici, and Lior Rokach. 2010. Auto-sign: an automatic signature generator for high-speed malware filtering devices. Journal in computer virology, Vol. 6, 2 (2010), 91--103.
[46]
Shobha Venkataraman, Avrim Blum, and Dawn Song. 2008. Limits of learning-based signature generation with adversaries. (2008).
[47]
Christian Wressnegger, Kevin Freeman, Fabian Yamaguchi, and Konrad Rieck. 2017. Automatically inferring malware signatures for anti-virus assisted attacks. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 587--598.
[48]
Ping Yang, Hui Shu, Fei Kang, and Wenjuan Bu. 2020. Automatically Generating Malware Summary Using Semantic Behavior Graphs (SBGs). In 2020 Information Communication Technologies Conference (ICTC). IEEE, 282--291.
[49]
Yanfang Ye, Tao Li, Qingshan Jiang, and Youyu Wang. 2010. CIMDS: adapting postprocessing techniques of associative classification for malware detection. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 40, 3 (2010), 298--307.
[50]
Mohammed Javeed Zaki. 2000. Scalable algorithms for association mining. IEEE transactions on knowledge and data engineering, Vol. 12, 3 (2000), 372--390.
[51]
Shuofei Zhu, Jianjun Shi, Limin Yang, Boqin Qin, Ziyi Zhang, Linhai Song, and Gang Wang. 2020. Measuring and modeling the label dynamics of online anti-malware engines. In 29th {USENIX} Security Symposium (USENIX Security 20). 2361--2378.

Cited By

View all
  • (2024)Automatically Generate Malware Detection Rules By Extracting Risk InformationProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670209(595-599)Online publication date: 24-May-2024
  • (2024)CIA-EBE: Class Imbalance-Aware Event-Based Embedding for SOC Log Screening2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825261(2653-2662)Online publication date: 15-Dec-2024
  • (2024)Understanding and Measuring Inter-process Code Injection in Windows MalwareSecurity and Privacy in Communication Networks10.1007/978-3-031-64954-7_25(490-514)Online publication date: 15-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. frequent itemsets mining
  2. malware detection
  3. signature generation

Qualifiers

  • Research-article

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automatically Generate Malware Detection Rules By Extracting Risk InformationProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670209(595-599)Online publication date: 24-May-2024
  • (2024)CIA-EBE: Class Imbalance-Aware Event-Based Embedding for SOC Log Screening2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825261(2653-2662)Online publication date: 15-Dec-2024
  • (2024)Understanding and Measuring Inter-process Code Injection in Windows MalwareSecurity and Privacy in Communication Networks10.1007/978-3-031-64954-7_25(490-514)Online publication date: 15-Oct-2024
  • (2022)High-utility itemsets mining based on binary particle swarm optimization with multiple adjustment strategies▪Applied Soft Computing10.1016/j.asoc.2022.109073124:COnline publication date: 1-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media