survey

A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features

Authors:

Mourad Debbabi,

Lingyu WangAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 55, Issue 1

Article No.: 19, Pages 1 - 41

https://doi.org/10.1145/3486860

Published: 17 January 2022 Publication History

Abstract

Binary code fingerprinting is crucial in many security applications. Examples include malware detection, software infringement, vulnerability analysis, and digital forensics. It is also useful for security researchers and reverse engineers since it enables high fidelity reasoning about the binary code such as revealing the functionality, authorship, libraries used, and vulnerabilities. Numerous studies have investigated binary code with the goal of extracting fingerprints that can illuminate the semantics of a target application. However, extracting fingerprints is a challenging task since a substantial amount of significant information will be lost during compilation, notably, variable and function naming, the original data and control flow structures, comments, semantic information, and the code layout. This article provides the first systematic review of existing binary code fingerprinting approaches and the contexts in which they are used. In addition, it discusses the applications that rely on binary code fingerprints, the information that can be captured during the fingerprinting process, and the approaches used and their implementations. It also addresses limitations and open questions related to the fingerprinting process and proposes future directions.

References

[1]

2017. WIN32/INDUSTROYER a new threat for industrial control systems.Retrieved from https://www.welivesecurity.com/wp-content/uploads/2017/06/Win32_Industroyer.pdf. Accessed on May, 2021.

[2]

2019. EXEINFO PE. Retrieved from http://exeinfo.atwebpages.com/. Accessed on June, 2019.

[3]

2019. ghidra. Retrieved from https://www.nsa.gov/resources/everyone/ghidra/. Accessed on June, 2019.

[4]

2019. IDA pro disassembler. Retrieved from https://www.hex-rays.com/products/ida/tech/. Accessed on June, 2019.

[5]

2019. ollydbg is a 32-bit assembler level analysing debugger for microsoft windows. Retrieved from http://ollydbg.de/. Accessed on June, 2019.

[6]

2019. PEfile:. Retrieved from http://code.google.com/p/pefile/. Accessed on June, 2019.

[7]

2019. pivotal software. RabbitMQ web site. Retrieved from https://www.rabbitmq.com/. Accessed on June, 2019.

[8]

2019. RDG_Packer_Detector. Retrieved from http://www.rdgsoft.net/. Accessed on June, 2019.

[9]

2019. the paradyn project. Retrieved from http://www.paradyn.org/html/dyninst9.0.0-features.html. Accessed on June, 2019.

[10]

2019. tigress is a diversifying virtualizer/obfuscator for the c language. Retrieved from http://tigress.cs.arizona.edu/. Accessed on June, 2019.

[11]

Yousra Aafer, Wenliang Du, and Heng Yin. 2013. Droidapiminer: Mining api-level features for robust malware detection in android. In International Conference on Security and Privacy in Communication Systems. Zia T., Zomaya A., Varadharajan V., and Mao M. (Eds), Springer, 86–103.

[12]

Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R. Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685–701.

Digital Library

[13]

Hiralal Agrawal and Joseph R. Horgan. 1990. Dynamic program slicing. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation. Vol. 25. ACM, 246–256.

Digital Library

[14]

Shahinur Alam, R. Nigel Horspool, and Issa Traore. 2014. MARD: A framework for metamorphic malware analysis and real-time detection. In Proceedings of the 2014 IEEE 28th International Conference on Advanced Information Networking and Applications. IEEE, 480–489.

Digital Library

[15]

Shahid Alam, Issa Traore, and Ibrahim Sogukpinar. 2015. Annotated control flow graph for metamorphic malware detection. The Computer Journal 58, 10 (2015), 2608–2621.

[16]

Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2019. On the feasibility of binary authorship characterization. Digital Investigation 28, 1 (2019), S3–S11.

[17]

Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. Bineye: Towards efficient binary authorship characterization using deep learning. In European Symposium on Research in Computer Security, Kazue Sako Steve SchneiderPeter Y. A. Ryan (Eds.). Springer, 47–67.

[18]

Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An onion approach to binary code authorship attribution. Digital Investigation 11, 1 (2014), S94–S103.

[19]

Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2015. SIGMA: A semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation 12, 2 (2015), S61–S71.

Digital Library

[20]

Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. FOSSIL: A resilient and efficient system for identifying FOSS functions in malware binaries. ACM Transactions on Privacy and Security 21, 2 (2018), 1–34.

Digital Library

[21]

Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, and Aiman Hanna. 2018. On leveraging coding habits for effective binary authorship attribution. In European Symposium on Research in Computer Security. Lopez J., Zhou J., Soriano M. (Eds.), Springer, 26–47.

[22]

Saed Alrabaee, Lingyu Wang, and Mourad Debbabi. 2016. BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Investigation 18, 7 (2016), S11–S22.

Digital Library

[23]

Hyrum S. Anderson and Phil Roth. 2018. Ember: An open dataset for training static PE malware machine learning models. ArXiv abs/1804.04637.

[24]

Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51, 1 (2008), 117–122.

Digital Library

[25]

Dorian C. Arnold, Dong H. Ahn, Bronis R. De Supinski, Gregory L. Lee, Barton P. Miller, and Martin Schulz. 2007. Stack trace analysis for large scale debugging. In Proceedings of the 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1–10.

[26]

Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium. Vol. 14, 23–26.

[27]

Saba Arshad, Munam A. Shah, Abdul Wahid, Amjad Mehmood, Houbing Song, and Hongnian Yu. 2018. Samadroid: A novel 3-level hybrid malware detection model for android operating system. IEEE Access 6 (2018), 4321–4339. DOI:https://doi.org/10.1109/ACCESS.2018.2792941

[28]

Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Maverick Woo, and David Brumley. 2014. Automatic exploit generation. Communications of the ACM 57, 2 (2014), 74–84.

Digital Library

[29]

Michael Backes, Sven Bugiel, and Erik Derr. 2016. Reliable third-party library detection in android and its security applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 356–367.

Digital Library

[30]

Michael Backes, Sven Bugiel, Erik Derr, Sebastian Gerling, and Christian Hammer. 2016. R-droid: Leveraging android app analysis with static slice optimization. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. ACM, 129–140.

Digital Library

[31]

Jinrong Bai, Junfeng Wang, and Guozhong Zou. 2014. A malware detection scheme based on mining format information. The Scientific World Journal 2014 (2014), 1–12.

[32]

Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. 2005. CodeSurfer/ \(\times\) 886—A platform for analyzing \(\times\) 886 executables. In Compiler Construction. Bodik R. (Ed.), Springer, 250–254.

Digital Library

[33]

Gogul Balakrishnan and Thomas Reps. 2010. WYSINWYX: What you see is not what you execute. ACM Transactions on Programming Languages and Systems 32, 6 (2010), 1–84.

Digital Library

[34]

Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. \(\lbrace\) BYTEWEIGHT \(\rbrace\) : Learning to Recognize Functions in Binary Code. In Proceedings of the 23rd \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 14). 845–860.

Digital Library

[35]

Mayank Bawa, Tyson Condie, and Prasanna Ganesan. 2005. LSH forest: Self-tuning indexes for similarity search. In Proceedings of the 14th International Conference on World Wide Web. ACM, 651–660.

Digital Library

[36]

Laszlo A. Belady and Meir M. Lehman. 1976. A model of large program development. IBM Systems Journal 15, 3 (1976), 225–252.

Digital Library

[37]

Martial Bourquin, Andy King, and Edward Robbins. 2013. Binslayer: Accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. ACM, 4.

Digital Library

[38]

Rodrigo Rubira Branco, Gabriel Negreira Barbosa, and Pedro Drimel Neto. 2012. Scientific but not academical overview of malware anti-debugging, anti-disassembly and anti-vm technologies. Black Hat 1, (2012), 1–27.

[39]

Murray Brand. 2007. Forensic analysis avoidance techniques of malware. In Proceedings of the 5th Australian Digital Forensics. 59.

[40]

Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2007. Code normalization for self-mutating malware. IEEE Security & Privacy2 (2007), 46–54.

Digital Library

[41]

Juan Caballero, Noah M. Johnson, Stephen McCamant, and Dawn Song. 2009. Binary Code Extraction and Interface Identification for Security Applications. Technical Report. DTIC Document.

[42]

Juan Caballero and Zhiqiang Lin. 2016. Type inference on executables. ACM Computing Surveys 48, 4 (2016), 1–35.

Digital Library

[43]

Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, 209–224.

Digital Library

[44]

Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing programmers via code stylometry. In Proceedings of the 24th USENIX Conference on Security Symposium. 255–270.

Digital Library

[45]

Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When coding style survives compilation: De-anonymizing programmers from executable binaries. In Proceedings of the Network and Distributed System Security Symposium. (2018).

[46]

Peter Casey, Mateusz Topor, Emily Hennessy, Saed Alrabaee, Moayad Aloqaily, and Azzedine Boukerche. 2019. Applied comparative evaluation of the metasploit evasion module. In Proceedings of the 2019 IEEE Symposium on Computers and Communications. IEEE, 1–6.

[47]

Silvio Cesare, Yang Xiang, and Wanlei Zhou. 2014. Control flow-based malware variant detection. IEEE Transactions on Dependable and Secure Computing 11, 4 (2014), 307–317.

[48]

Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 380–394.

Digital Library

[49]

Sang Kil Cha, Maverick Woo, and David Brumley. 2015. Program-adaptive mutational fuzzing. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 725–741.

Digital Library

[50]

Sagar Chaki, Cory Cohen, and Arie Gurfinkel. 2011. Supervised learning for provenance-similarity of binaries. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 15–23.

Digital Library

[51]

Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-OS binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 678–689.

Digital Library

[52]

Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2012. The S2E platform: Design, implementation, and applications. ACM Transactions on Computer Systems 30, 1 (2012), 1–49.

Digital Library

[53]

Cory Cohen and Jeffrey S. Havrilla. 2009. Function hashing for malicious code analysis. CERT Research Annual Report (2009), 26–29.

[54]

Paolo Milani Comparetti, Guido Salvaneschi, Engin Kirda, Clemens Kolbitsch, Christopher Kruegel, and Stefano Zanero. 2010. Identifying dormant functionality in malware programs. In Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE, 61–76.

Digital Library

[55]

Emanuele Cozzi, Mariano Graziano, Yanick Fratantonio, and Davide Balzarotti. 2018. Understanding linux malware. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 161–175.

[56]

Christoph Csallner and Yannis Smaragdakis. 2005. Check’n’crash: Combining static checking and testing. In Proceedings of the 27th International Conference on Software Engineering. ACM, 422–431.

Digital Library

[57]

Santanu Kumar Dash, Guillermo Suarez-Tangil, Salahuddin Khan, Kimberly Tam, Mansour Ahmadi, Johannes Kinder, and Lorenzo Cavallaro. 2016. Droidscribe: Classifying android malware based on runtime behavior. In Proceedings of the 2016 IEEE Security and Privacy Workshops. IEEE, 252–261.

[58]

Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–28.

Digital Library

[59]

Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 266–280.

Digital Library

[60]

Yaniv David, Nimrod Partush, and Eran Yahav. 2018. Firmup: Precise static detection of common vulnerabilities in firmware. In Proceedings of the ACM SIGPLAN Notices, Vol. 53. ACM, 392–404.

Digital Library

[61]

Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. Vol. 49. ACM, 349–360.

Digital Library

[62]

Jeffrey Dean and Sanjay Ghemawat. 2008. Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107–113.

Digital Library

[63]

Steven HH Ding, Benjamin Fung, and Philippe Charland. 2016. Kam1n0: Mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 461–470.

Digital Library

[64]

S. H. H. Ding, B. C. M. Fung, and P. Charland. 2019. Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th International Symposium on Security and Privacy. IEEE Computer Society, 38–55.

[65]

Brendan F. Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, and Ryan Whelan. 2014. Repeatable reverse engineering for the greater good with panda. Retrieved on September 23, 2021 from https://mice.cs.columbia.edu/getTechreport.php?techreportID=1588&format=pdf&.

[66]

Stéphane Ducasse, Oscar Nierstrasz, and Matthias Rieger. 2006. On the effectiveness of clone detection by string matching. Journal of Software Maintenance and Evolution: Research and Practice 18, 1 (2006), 37–58.

Digital Library

[67]

Thomas Dullien and Rolf Rolles. 2005. Graph-based comparison of executable objects (english version). Sstic 5 1, (2005), 1–3.

[68]

Tudor Dumitraş and Darren Shou. 2011. Toward a standard benchmark for computer security research: The Worldwide Intelligence Network Environment. In Proceedings of the 1st Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. ACM, 89–96.

Digital Library

[69]

Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2012. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys 44, 2 (2012), 1–42.

Digital Library

[70]

Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium. 303–317.

Digital Library

[71]

E. Eilam. 2011. Reversing: Secrets of Reverse Engineering. John Wiley & Sons.

Digital Library

[72]

Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. 2013. Scalable variable and data type detection in a binary rewriter. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. Vol. 48. ACM, 51–60.

Digital Library

[73]

Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the Network and Distributed System Security Symposium.

[74]

Wenbin Fang, Barton P. Miller, and James A. Kupsch. 2012. Automated tracing and visualization of software security structure and properties. In Proceedings of the 9th International Symposium on Visualization for Cyber Security. ACM, 9–16.

Digital Library

[75]

Mohammad Reza Farhadi. 2013. Assembly Code Clone Detection for Malware Binaries. Ph.D. Dissertation. Concordia University.

[76]

Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. 2014. Binclone: Detecting code clones in malware. In Proceedings of the 2014 18th International Conference on Software Security and Reliability. IEEE, 78–87.

Digital Library

[77]

Pengbin Feng, Jianfeng Ma, Cong Sun, Xinpeng Xu, and Yuwan Ma. 2018. A novel dynamic android malware detection system with ensemble learning. IEEE Access 6 (2018), 30996–31011. DOI:https://doi.org/10.1109/ACCESS.2018.2844349

[78]

Qian Feng, Minghua Wang, Mu Zhang, Rundong Zhou, Andrew Henderson, and Heng Yin. 2017. Extracting conditional formulas for cross-platform bug search. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 346–359.

Digital Library

[79]

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 480–491.

Digital Library

[80]

Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems 9, 3 (1987), 319–349.

Digital Library

[81]

Halvar Flake. 2002. Graph-based binary analysis. Blackhat Briefings 2002 (2002).

[82]

Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 541–552.

Digital Library

[83]

Thomas Given-Wilson, Annelie Heuser, Nisrine Jafri, and Axel Legay. 2019. An automated and scalable formal process for detecting fault injection vulnerabilities in binaries. Concurrency and Computation: Practice and Experience 31, 23 (2019), e4794.

[84]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. Vol. 40. ACM, 213–223.

Digital Library

[85]

Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox fuzzing for security testing. Communications of the ACM 55, 3 (2012), 40–44.

Digital Library

[86]

Jinwei Gu, Jie Zhou, and Chunyu Yang. 2006. Fingerprint recognition by combining global structure and local cues. IEEE Transactions on Image Processing 15, 7 (2006), 1952–1964.

Digital Library

[87]

Sumit Gulwani and George C. Necula. 2005. Precise interprocedural analysis using random interpretation. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Vol. 40. ACM, 324–337.

Digital Library

[88]

Archit Gupta, Pavan Kuppili, Aditya Akella, and Paul Barford. 2009. An empirical study of malware evolution. In Proceedings of the 1stInternational Communication Systems and Networks and Workshops. IEEE, 1–10.

Digital Library

[89]

Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo iso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 337–348.

Digital Library

[90]

Sergio Chica Juan Caballero Haq, Irfan and Somesh Jha.2018. Malware lineage in the wild. Computers & Security 78 (2018), 347–363.

[91]

Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting debug information in stripped binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1667–1680.

Digital Library

[92]

Sean Heelan. 2009. Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities. Ph.D. Dissertation. University of Oxford.

[93]

Sean Heelan and Agustin Gianni. 2012. Augmenting vulnerability analysis of binary code. In Proceedings of the 28th Annual Computer Security Applications Conference. ACM, 199–208.

Digital Library

[94]

Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. 2011. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 63–72.

Digital Library

[95]

Susan Horwitz, Thomas Reps, and David Binkley. 1990. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems 12, 1 (1990), 26–60.

Digital Library

[96]

Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. 2017. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1507–1515.

Digital Library

[97]

Zhang Y. Li J. & Gu D. Hu, Y.2016. Cross-architecture binary semantics understanding via similar code comparison. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 57–67.

[98]

He Huang, Amr M. Youssef, and Mourad Debbabi. 2017. Binsequence: Fast, accurate and scalable binary code reuse detection. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 155–166.

Digital Library

[99]

Shih-Kun Huang, Min-Hsiang Huang, Po-Yen Huang, Chung-Wei Lai, Han-Lin Lu, and Wai-Meng Leong. 2012. Crax: Software crash analysis for automatic exploit generation by modeling attacks as symbolic continuations. In Proceedings of the 2012 IEEE 6th International Conference on Software Security and Reliability. IEEE, 78–87.

Digital Library

[100]

Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller. 2011. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools. ACM, 1–8.

Digital Library

[101]

Sachin Jain and Yogesh Kumar Meena. 2011. Byte level n–gram analysis for malware detection. In Proceedings of the 2011 International Conference on Information Processing. Springer, 51–59.

[102]

Jiyong Jang. 2013. Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code. Ph.D. Dissertation. Carnegie Mellon University.

[103]

Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. Redebug: Finding unpatched code clones in entire os distributions. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 48–62.

Digital Library

[104]

Jiyong Jang, David Brumley, and Shobha Venkataraman. 2011. Bitshred: Feature hashing malware for scalable triage and semantic analysis. In Proceedings of the 18th ACM Conference on Computer and Communications Security. ACM, 309–320.

Digital Library

[105]

Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards automatic software lineage inference. In Proceedings of the 22nd USENIX Security. 81–96.

Digital Library

[106]

Weiwei Jin, Sagar Chaki, Cory Cohen, Arie Gurfinkel, Jeffrey Havrilla, Charles Hines, and Priya Narasimhan. 2012. Binary function clustering using semantic hashes. In Proceedings of the 11th International Conference on Machine Learning and Applications. Vol. 1. IEEE, 386–391.

Digital Library

[107]

Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM: Software protection for the masses. In Proceedings of the 1st International Workshop on Software Protection. IEEE, 3–9.

Digital Library

[108]

ElMouatez Billah Karbab, Mourad Debbabi, Saed Alrabaee, and Djedjiga Mouheb. 2016. Dysign: Dynamic fingerprinting for the automatic detection of android malware. In Proceedings of the 11th International Conference on Malicious and Unwanted Software. IEEE, 1–8.

[109]

Md Enamul Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. 2005. Malware phylogeny generation using permutations of code. Journal in Computer Virology 1, 1–2 (2005), 13–23.

[110]

Iman Keivanloo, Chanchai K Roy, and Juergen Rilling. 2012. Java bytecode clone detection via relaxation on code fingerprint and semantic web reasoning. In Proceedings of the 6th International Workshop on Software Clones. IEEE, 36–42.

Digital Library

[111]

James M. Keller, Michael R. Gray, and James A. Givens. 1985. A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics4 (1985), 580–585.

[112]

Kris Kendall. 2007. Practical malware analysis. Retrieved on September 14, 2021 from https://www.blackhat.com/presentations/bh-dc-07/Kendall_McMillan/Presentation/bh-dc-07-Kendall_McMillan.pdf.

[113]

Wei Ming Khoo. 2013. Decompilation as Search. Technical Report UCAM-CL-TR-844. University of Cambridge, Computer Laboratory. Retrieved from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-844.pdf.

[114]

Wei Ming Khoo, Alan Mycroft, and Ross Anderson. 2013. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE, 329–338.

Digital Library

[115]

Johannes Kinder. 2010. Static Analysis of X86 Executables. Ph.D. Dissertation. Technische Universität Darmstadt.

[116]

Johannes Kinder and Helmut Veith. 2008. Jakstab: A static analysis platform for binaries. In Proceedings of the International Conference on Computer Aided Verification. Springer, 423–427.

Digital Library

[117]

Benjamin Kollenda, Enes Göktaş, Tim Blazytko, Philipp Koppe, Robert Gawlik, Radhesh Krishnan Konoth, Cristiano Giuffrida, Herbert Bos, and Thorsten Holz. 2017. Towards automated discovery of crash-resistant primitives in binary executables. In Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 189–200.

[118]

Laszlo Kozma. 2008. k Nearest Neighbors algorithm (kNN). Retrieved on August 23, 2021 from http://www.lkozma.net/knn2.pdf.

[119]

Meir M. Lehman and Juan F. Ramil. 2001. Rules and tools for software evolution planning and management. Annals of Software Engineering 11, 1 (2001), 15–44.

Digital Library

[120]

Pierre Lestringant, Frédéric Guihéry, and Pierre-Alain Fouque. 2015. Automated identification of cryptographic primitives in binary code with data flow graph isomorphism. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security. ACM, 203–214.

Digital Library

[121]

Ming Li and Zhi-Hua Zhou. 2007. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37, 6 (2007), 1088–1098.

Digital Library

[122]

Xingwei Li, Zheng Shan, Fudong Liu, Yihang Chen, and Yifan Hou. 2019. A consistently-executing graph-based approach for malware packer identification. IEEE Access 7 (2019), 51620–51629. DOI:https://doi.org/10.1109/ACCESS.2019.2910268

[123]

Yuping Li, Sathya Chandran Sundaramurthy, Alexandru G. Bardas, Xinming Ou, Doina Caragea, Xin Hu, and Jiyong Jang. 2015. Experimental study of fuzzy hashing in malware clustering analysis. In Proceedings of the 8th Workshop on Cyber Security Experimentation and Test.

Digital Library

[124]

Da Lin and Mark Stamp. 2011. Hunting for undetectable metamorphic viruses. Journal in Computer Virology 7, 3 (2011), 201–214.

Digital Library

[125]

Hong Lin, Dongdong Zhao, Linjun Ran, Mushuai Han, Jing Tian, Jianwen Xiang, Xian Ma, and Yingshou Zhong. 2017. Cvssa: Cross-architecture vulnerability search in firmware based on support vector machine and attributed control flow graph. In Proceedings of the 2017 International Conference on Dependable Systems and Their Applications. IEEE, 35–41.

[126]

Andreas Lindner, Roberto Guanciale, and Roberto Metere. 2019. TrABin: Trustworthy analyses of binaries. Science of Computer Programming 174 (2019), 72–89.

Digital Library

[127]

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 1–12.

[128]

Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, and Heng Tao Shen. 2014. Sk-lsh: An efficient index structure for approximate nearest neighbor search. Proceedings of the VLDB Endowment 7, 9 (2014), 745–756.

Digital Library

[129]

Yuan Wang Liu, Jing and Yongjun Wang. 2016. Inferring phylogenetic networks of malware families from api sequences. In Proceedings of the International Conference on Cyber-enabled Distributed Computing and Knowledge Discovery. IEEE, 14–17.

[130]

Fan Long, Stelios Sidiroglou-Douskos, and Martin Rinard. 2014. Automatic runtime error repair and containment via recovery shepherding. In Proceedings of the ACM SIGPLAN Notices. Vol. 49. ACM, 227–238.

Digital Library

[131]

Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 389–400.

Digital Library

[132]

Matias Madou, Bertrand Anckaert, Bjorn De Sutter, and Koen De Bosschere. 2005. Hybrid static-dynamic attacks against software protection mechanisms. In Proceedings of the 5th ACM Workshop on Digital Rights Management. ACM, 75–82.

Digital Library

[133]

D. Mahajan, R. Patel, and V. Sanker. 2018. Word2Vec using character n-grams. Retrieved on September 20, 2021 from https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2761021.pdf.

[134]

Umme Ayda Mannan, Iftekhar Ahmed, Rana Abdullah M. Almurshed, Danny Dig, and Carlos Jensen. 2016. Understanding code smells in android applications. In Proceedings of the IEEE/ACM International Conference on Mobile Software Engineering and Systems. IEEE, 225–236.

Digital Library

[135]

Marion Marschalek and Claudio Guarnieri. 2015. Big game hunting: The peculiarities in nation-state malware research. Black Hat, Las Vegas, NV, USA.

[136]

Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and Petros Maniatis. 2012. Path-exploration lifting: Hi-fi tests for lo-fi emulators. In Proceedings of the ACM SIGARCH Computer Architecture News. Vol. 40. ACM, 337–348.

Digital Library

[137]

Fabio Martinelli, Francesco Mercaldo, and Andrea Saracino. 2017. Bridemaid: An hybrid tool for accurate detection of android malware. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 899–901.

Digital Library

[138]

Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2009. Improving malware detection by applying multi-inducer ensemble. Computational Statistics & Data Analysis 53, 4 (2009), 1483–1494.

Digital Library

[139]

Xiaozhu Meng and Barton P. Miller. 2016. Binary code is not easy. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 24–35.

Digital Library

[140]

Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying multiple authors in a binary program. In Proceedings of the European Symposium on Research in Computer Security. Springer, 286–304.

[141]

Ghita Mezzour, Kathleen M. Carley, and L. Richard Carley. 2015. An empirical study of global malware encounters. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1–11.

Digital Library

[142]

Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin, Karen L. Karavanic, Krishna Kunchithapadam, and Tia Newhall. 1995. The paradyn parallel performance measurement tool. Computer 28, 11 (1995), 37–46.

Digital Library

[143]

Jiang Ming, Dongpeng Xu, and Dinghao Wu. 2015. Memoized semantics-based binary diffing with application to malware lineage inference. In Proceedings of the IFIP International Information Security and Privacy Conference. Springer, 416–430.

[144]

Ned Moran and James T. Bennett. 2013. Supply Chain Analysis: From Quartermaster to Sunshop. Vol. 11. FireEye.

[145]

Lakshmanan Nataraj, Dhilung Kirat, BS Manjunath, and Giovanni Vigna. 2013. Sarvam: Search and retrieval of malware. In Proceedings of the Annual Computer Security Conference (ACSAC) Worshop on Next Generation Malware Attacks and Defense (NGMAD).

[146]

Lina Nouh, Ashkan Rahimian, Djedjiga Mouheb, Mourad Debbabi, and Aiman Hanna. 2017. Binsign: Fingerprinting binary functions to support automated analysis of code executables. In Proceedings of the IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 341–355.

[147]

Lucky Onwuzurike, Enrico Mariconti, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, and Gianluca Stringhini. 2019. MaMaDroid: Detecting android malware by building markov chains of behavioral models (extended version). ACM Transactions on Privacy and Security 22, 2 (2019), 1–34.

Digital Library

[148]

Ori Or-Meir, Nir Nissim, Yuval Elovici, and Lior Rokach. 2019. Dynamic malware analysis in the modern era—a state of the art survey. ACM Computing Surveys 52, 5 (2019), 1–48.

Digital Library

[149]

Karl J. Ottenstein and Linda M. Ottenstein. 1984. The program dependence graph in a software development environment. In Proceedings of the ACM Sigplan Notices, Vol. 19. ACM, 177–184.

Digital Library

[150]

Pádraig O’Sullivan, Kapil Anand, Aparna Kotha, Matthew Smithson, Rajeev Barua, and Angelos D. Keromytis. 2011. Retrofitting security in cots software with binary rewriting. In Proceedings of the Future Challenges in Security and Privacy for Academia and Industry. Springer, 154–172.

[151]

Sancheng Peng, Shui Yu, and Aimin Yang. 2013. Smartphone malware and its propagation modeling: A survey. IEEE Communications Surveys & Tutorials 16, 2 (2013), 925–941.

[152]

Roberto Perdisci, Andrea Lanzi, and Wenke Lee. 2008. Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executables. In Proceedings of the 2008 Annual Computer Security Applications Conference. IEEE, 301–310.

Digital Library

[153]

Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 709–724.

Digital Library

[154]

Jannik Pewny, Felix Schuster, Lukas Bernhard, Thorsten Holz, and Christian Rossow. 2014. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference. ACM, 406–415.

Digital Library

[155]

Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1–25.

Digital Library

[156]

Jing Qiu, Xiaohong Su, and Peijun Ma. 2015. Library functions identification in binary code by using graph isomorphism testings. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 261–270.

[157]

Jing Qiu, Xiaohong Su, and Peijun Ma. 2016. Using reduced execution flow graph to identify library functions in binary code. IEEE Transactions on Software Engineering 42, 2 (2016), 187–202.

Digital Library

[158]

Edward Raff, Richard Zak, Russell Cox, Jared Sylvester, Paul Yacci, Rebecca Ward, Anna Tracy, Mark McLean, and Charles Nicholas. 2018. An investigation of byte n-gram features for malware classification. Journal of Computer Virology and Hacking Techniques 14, 1 (2018), 1–20.

[159]

Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, and Mourad Debbabi. 2015. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation 14, 1 (2015), S146–S155.

Digital Library

[160]

K. Raman. 2012. Selecting features to classify malware. In Proceedings of the InfoSec Southwest. 49–64.

[161]

David A. Ramos and Dawson Engler. 2015. Under-constrained symbolic execution: Correctness checking for real code. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15). 49–64.

Digital Library

[162]

Linjun Ran, Liping Lu, Hong Lin, Mushuai Han, Dongdong Zhao, Jianwen Xiang, Haiguo Yu, and Xian Ma. 2017. An experimental study of four methods for homology analysis of firmware vulnerability. In Proceedings of the 2017 International Conference on Dependable Systems and Their Applications. IEEE, 42–50.

[163]

Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. 2014. Optimizing seed selection for fuzzing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14). 861–875.

Digital Library

[164]

Miranda Rodriguez. 2015. All your IP are belong to us: An analysis of intellectual property rights as applied to malware. Tex. A&m L. Rev. 3 (2015), 663.

[165]

Nathan Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2011. Recovering the toolchain provenance of binary code. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 100–110.

Digital Library

[166]

Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who wrote this code? identifying the authors of program binaries. In Proceedings of the 2011 Computer Security–ESORICS. Springer, 172–189.

Digital Library

[167]

Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, 21–28.

Digital Library

[168]

Kevin A. Roundy and Barton P. Miller. 2010. Hybrid analysis and control of malware. In Proceedings of the Recent Advances in Intrusion Detection. Springer, 317–338.

Digital Library

[169]

Chanchal K. Roy, James R. Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74, 7 (2009), 470–495.

Digital Library

[170]

Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. 2014. Identifying shared software components to support malware forensics. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 21–40.

[171]

Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel Quinlan, and Zhendong Su. 2009. Detecting code clones in binary executables. In Proceedings of the 18th International Symposium on Software Testing and Analysis. ACM, 117–128.

Digital Library

[172]

Brezo F. Ugarte-Pedrero X. & Bringas P. G. Santos, I.2013. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences 231 (2013), 64–82. DOI:https://doi.org/10.1016/j.ins.2011.08.020

[173]

Felix Brezo Javier Nieves Yoseba K. Penya Borja Sanz Carlos Laorden Santos, Igor and Pablo G. Bringas.2010. Idea: Opcode-sequence-based malware detection. In Proceedings of the International Symposium on Engineering Secure Software and Systems. Springer, 35–43.

Digital Library

[174]

Andrea Saracino, Daniele Sgandurra, Gianluca Dini, and Fabio Martinelli. 2016. Madam: Effective and efficient behavior-based android malware detection and prevention. IEEE Transactions on Dependable and Secure Computing 15, 1 (2016), 83–97.

[175]

Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. 2003. Winnowing: Local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, 76–85.

Digital Library

[176]

Matthew G. Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings of the 2001 EEE Symposium on Security and PrivacyI. IEEE, 38–49.

Digital Library

[177]

M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, and Muddassar Farooq. 2009. Pe-miner: Mining structural information to detect malicious executables in realtime. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, 121–141.

Digital Library

[178]

Farrukh Shahzad, Sohail Bhatti, Muhammad Shahzad, and Muddassar Farooq. 2011. In-execution malware detection using task structures of linux processes. In Proceedings of the 2011 IEEE International Conference on Communications. IEEE, 1–6.

[179]

Farrukh Shahzad and Muddassar Farooq. 2012. Elf-miner: Using structural knowledge and data mining methods to detect new (linux) malicious executables. Knowledge and Information Systems 30, 3 (2012), 589–612.

Digital Library

[180]

Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15). 611–626.

Digital Library

[181]

Paria Shirani, Leo Collard, Basile L. Agba, Bernard Lebel, Mourad Debbabi, Lingyu Wang, and Aiman Hanna. 2018. BINARM: Scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 114–138.

[182]

Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2017. Binshape: Scalable and robust binary library function identification using function shape. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 301–324.

[183]

Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In Proceedings of the Network and Distributed System Security. Vol. 1. 1–1.

[184]

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK:(State of) The Art of War: Offensive Techniques in Binary Analysis. In Proceedings of the 2016 IEEE Symposium on Security and Privacy. IEEE, 138–157.

[185]

Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. 2008. Bitblaze: A new approach to computer security via binary analysis. In Proceedings of theInformation Systems Security. Springer, 1–25.

Digital Library

[186]

Guillermo Suarez-Tangil, Santanu Kumar Dash, Mansour Ahmadi, Johannes Kinder, Giorgio Giacinto, and Lorenzo Cavallaro. 2017. Droidsieve: Fast and accurate classification of obfuscated android malware. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy. ACM, 309–320.

Digital Library

[187]

Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, and Arturo Ribagorda. 2014. Evolution, detection and analysis of malware for smart devices. IEEE Communications Surveys & Tutorials 16, 2 (2014), 961–987.

[188]

Mingshen Sun, Xiaolei Li, John CS Lui, Richard TB Ma, and Zhenkai Liang. 2016. Monet: A user-oriented behavior-based malware variants detection system for android. IEEE Transactions on Information Forensics and Security 12, 5 (2016), 1103–1112.

Digital Library

[189]

Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012), 788–799.

Digital Library

[190]

Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2009. Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, 563–576.

Digital Library

[191]

Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2010. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Transactions on Database Systems 35, 3 (2010), 1–46.

Digital Library

[192]

Cristian Ţăpuş, I-Hsin Chung, and Jeffrey K. Hollingsworth2002. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. IEEE, 1–11.

Digital Library

[193]

Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42.

Digital Library

[194]

Jonathan van den Berg and and Hirohide Haga2018. Matching source code using abstract syntax trees in version control systems. Journal of Software Engineering and Applications 11, 06 (2018), 318.

[195]

Maarten Van Emmerik. 1998. Identifying library functions in executable file using patterns. In Proceedings of the 1998 Australian Software Engineering Conference. IEEE, 90–97.

Digital Library

[196]

Andrew Walenstein, Michael Venable, Matthew Hayes, Christopher Thompson, and Arun Lakhotia. 2007. Exploiting similarity between variants to defeat malware. In Proceedings of the 2007 Conference on BlackHat DC.

[197]

Xinran Wang, Chi-Chun Pan, Peng Liu, and Sencun Zhu. 2010. Sigfree: A signature-free buffer overflow attack blocker. IEEE Transactions on Dependable and Secure Computing 7, 1 (2010), 65–79.

Digital Library

[198]

Zheng Wang, Ken Pierce, and Scott McFarling. 2000. Bmat-a binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism 2 (2000), 1–20.

[199]

Daniel Weise, Roger F. Crew, Michael Ernst, and Bjarne Steensgaard. 1994. Value dependence graphs: Representation without taxation. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 297–310.

Digital Library

[200]

Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. 2005. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems. Springer, 365–381.

Digital Library

[201]

Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2017. Learning types for binaries. In Proceedings of the International Conference on Formal Engineering Methods. Springer, 430–446.

[202]

Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2018. Type learning for binaries and its applications. IEEE Transactions on Reliability 63, 3 (2018), 893–912.

[203]

Hongfa Xue, Shaowen Sun, Guru Venkataramani, and Tian Lan. 2019. Machine learning-based analysis of program binaries: A comprehensive study. IEEE Access 7 (2019), 65889–65912. DOI:https://doi.org/10.1109/ACCESS.2019.2917668

[204]

Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 797–812.

Digital Library

[205]

Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: Cross-modal retrieval for function-level binary source code matching. Advances in Neural Information Processing Systems 33 (2020), 1–10.

[206]

Fu Y. Miller K. A. Lin Z. Zhang X. & Xu D. Zeng, J.2013. Obfuscation resilient binary code reuse through trace-oriented programming. In Proceedings of the ACM SIGSAC Conference on Computer & Communications Security. ACM, 487–498.

[207]

Yuan Zhang, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang, and Hao Chen. 2018. Detecting third-party libraries in android applications with high precision and recall. In Proceedings of the 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 141–152.

[208]

Dongdong Zhao, Hong Lin, Linjun Ran, Mushuai Han, Jing Tian, Liping Lu, Shengwu Xiong, and Jianwen Xiang. 2019. CVSkSA: Cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph. Software Quality Journal 27 3, (2019), 1045–1068.

Digital Library

[209]

Viviane Zwanger and Felix C. Freiling. 2013. Kernel mode API spectroscopy for incident response and digital forensics. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. ACM, 3.

Digital Library

Cited By

Katar OYıldırım Ö(2024)Classification of Malware Images Using Fine-Tunned ViTSakarya University Journal of Computer and Information Sciences10.35377/saucis...13410827:1(22-35)Online publication date: 30-Apr-2024
https://doi.org/10.35377/saucis...1341082
Gray JSgandurra DCavallaro LBlasco Alis J(2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3653973
Liu SZhang LTan TWang LWang PZhang ZZhang M(2024) Homomorphic encryption domain asymmetric fingerprinting scheme for 3D models of oblique photography Transactions in GIS10.1111/tgis.1316028:4(790-815)Online publication date: 22-Mar-2024
https://doi.org/10.1111/tgis.13160
Show More Cited By

Index Terms

A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features
1. Security and privacy
  1. Software and application security
    1. Software reverse engineering

Recommendations

Zero-Based code modulation technique for digital video fingerprinting
KES'05: Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III

Digital fingerprinting is a technique to protect digital contents from illegal reproduction and redistribution by marking unique information for individual user. A powerful but simple attack to diminish fingerprint signals is averaging. While several ...
A family of asymptotically good binary fingerprinting codes

A fingerprinting code is a set of codewords that are embedded in each copy of a digital object with the purpose of making each copy unique. If the fingerprinting code is c-secure with Ζ error, then the decoding of a pirate word created by a coalition of ...
High rate fingerprinting codes and the fingerprinting capacity
SODA '09: Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms

Including a unique code in each copy of a distributed document is an effective way of fighting intellectual piracy. Codes designed for this purpose that are secure against collusion attacks are called fingerprinting codes.

In this paper we consider ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 55, Issue 1

January 2023

860 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3492451

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 January 2022

Accepted: 01 September 2021

Received: 01 November 2020

Published in CSUR Volume 55, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Refereed

Funding Sources

United Arab Emirates University Start-up

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
2,533
Total Downloads

Downloads (Last 12 months)734
Downloads (Last 6 weeks)60

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Katar OYıldırım Ö(2024)Classification of Malware Images Using Fine-Tunned ViTSakarya University Journal of Computer and Information Sciences10.35377/saucis...13410827:1(22-35)Online publication date: 30-Apr-2024
https://doi.org/10.35377/saucis...1341082
Gray JSgandurra DCavallaro LBlasco Alis J(2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3653973
Liu SZhang LTan TWang LWang PZhang ZZhang M(2024) Homomorphic encryption domain asymmetric fingerprinting scheme for 3D models of oblique photography Transactions in GIS10.1111/tgis.1316028:4(790-815)Online publication date: 22-Mar-2024
https://doi.org/10.1111/tgis.13160
Liu PCao YYan YWang Y(2024)Firmware Vulnerability Detection Algorithm Based on Matching Pattern-Specific Numerical Features With Structural FeaturesIEEE Access10.1109/ACCESS.2024.337853312(42317-42328)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3378533
Azab AKhasawneh MAlrabaee SChoo KSarsour M(2024)Network traffic classification: Techniques, datasets, and challengesDigital Communications and Networks10.1016/j.dcan.2022.09.00910:3(676-692)Online publication date: Jun-2024
https://doi.org/10.1016/j.dcan.2022.09.009
Faruki PBhan RJain VBhatia SEl Madhoun NPamula R(2023)A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection FrameworksInformation10.3390/info1407037414:7(374)Online publication date: 30-Jun-2023
https://doi.org/10.3390/info14070374
Li SWang YDong CYang SLi HSun HLang ZChen ZWang WZhu HSun L(2023)LibAM: An Area Matching Framework for Detecting Third-Party Libraries in BinariesACM Transactions on Software Engineering and Methodology10.1145/362529433:2(1-35)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3625294
Qasem ADebbabi MLebel BKassouf M(2023)Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU ArchitecturesProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3582818(443-456)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3579856.3582818
Hendriks DOortwijn W(2023)gLTSdiff: A Generalized Framework for Structural Comparison of Software Behavior2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)10.1109/MODELS58315.2023.00025(285-295)Online publication date: 1-Oct-2023
https://doi.org/10.1109/MODELS58315.2023.00025
Madathil NAlrabaee SAl-kfairy MDamseh RBelkacem A(2023)AI in Education: Improving Quality for Both Centralized and Decentralized Frameworks2023 IEEE Global Engineering Education Conference (EDUCON)10.1109/EDUCON54358.2023.10125139(1-6)Online publication date: 1-May-2023
https://doi.org/10.1109/EDUCON54358.2023.10125139
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents