Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features

Published: 17 January 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Binary code fingerprinting is crucial in many security applications. Examples include malware detection, software infringement, vulnerability analysis, and digital forensics. It is also useful for security researchers and reverse engineers since it enables high fidelity reasoning about the binary code such as revealing the functionality, authorship, libraries used, and vulnerabilities. Numerous studies have investigated binary code with the goal of extracting fingerprints that can illuminate the semantics of a target application. However, extracting fingerprints is a challenging task since a substantial amount of significant information will be lost during compilation, notably, variable and function naming, the original data and control flow structures, comments, semantic information, and the code layout. This article provides the first systematic review of existing binary code fingerprinting approaches and the contexts in which they are used. In addition, it discusses the applications that rely on binary code fingerprints, the information that can be captured during the fingerprinting process, and the approaches used and their implementations. It also addresses limitations and open questions related to the fingerprinting process and proposes future directions.

    References

    [1]
    2017. WIN32/INDUSTROYER a new threat for industrial control systems.Retrieved from https://www.welivesecurity.com/wp-content/uploads/2017/06/Win32_Industroyer.pdf. Accessed on May, 2021.
    [2]
    2019. EXEINFO PE. Retrieved from http://exeinfo.atwebpages.com/. Accessed on June, 2019.
    [3]
    2019. ghidra. Retrieved from https://www.nsa.gov/resources/everyone/ghidra/. Accessed on June, 2019.
    [4]
    2019. IDA pro disassembler. Retrieved from https://www.hex-rays.com/products/ida/tech/. Accessed on June, 2019.
    [5]
    2019. ollydbg is a 32-bit assembler level analysing debugger for microsoft windows. Retrieved from http://ollydbg.de/. Accessed on June, 2019.
    [6]
    2019. PEfile:. Retrieved from http://code.google.com/p/pefile/. Accessed on June, 2019.
    [7]
    2019. pivotal software. RabbitMQ web site. Retrieved from https://www.rabbitmq.com/. Accessed on June, 2019.
    [8]
    2019. RDG_Packer_Detector. Retrieved from http://www.rdgsoft.net/. Accessed on June, 2019.
    [9]
    2019. the paradyn project. Retrieved from http://www.paradyn.org/html/dyninst9.0.0-features.html. Accessed on June, 2019.
    [10]
    2019. tigress is a diversifying virtualizer/obfuscator for the c language. Retrieved from http://tigress.cs.arizona.edu/. Accessed on June, 2019.
    [11]
    Yousra Aafer, Wenliang Du, and Heng Yin. 2013. Droidapiminer: Mining api-level features for robust malware detection in android. In International Conference on Security and Privacy in Communication Systems. Zia T., Zomaya A., Varadharajan V., and Mao M. (Eds), Springer, 86–103.
    [12]
    Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R. Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685–701.
    [13]
    Hiralal Agrawal and Joseph R. Horgan. 1990. Dynamic program slicing. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation. Vol. 25. ACM, 246–256.
    [14]
    Shahinur Alam, R. Nigel Horspool, and Issa Traore. 2014. MARD: A framework for metamorphic malware analysis and real-time detection. In Proceedings of the 2014 IEEE 28th International Conference on Advanced Information Networking and Applications. IEEE, 480–489.
    [15]
    Shahid Alam, Issa Traore, and Ibrahim Sogukpinar. 2015. Annotated control flow graph for metamorphic malware detection. The Computer Journal 58, 10 (2015), 2608–2621.
    [16]
    Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2019. On the feasibility of binary authorship characterization. Digital Investigation 28, 1 (2019), S3–S11.
    [17]
    Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. Bineye: Towards efficient binary authorship characterization using deep learning. In European Symposium on Research in Computer Security, Kazue Sako Steve SchneiderPeter Y. A. Ryan (Eds.). Springer, 47–67.
    [18]
    Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An onion approach to binary code authorship attribution. Digital Investigation 11, 1 (2014), S94–S103.
    [19]
    Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2015. SIGMA: A semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation 12, 2 (2015), S61–S71.
    [20]
    Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. FOSSIL: A resilient and efficient system for identifying FOSS functions in malware binaries. ACM Transactions on Privacy and Security 21, 2 (2018), 1–34.
    [21]
    Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, and Aiman Hanna. 2018. On leveraging coding habits for effective binary authorship attribution. In European Symposium on Research in Computer Security. Lopez J., Zhou J., Soriano M. (Eds.), Springer, 26–47.
    [22]
    Saed Alrabaee, Lingyu Wang, and Mourad Debbabi. 2016. BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Investigation 18, 7 (2016), S11–S22.
    [23]
    Hyrum S. Anderson and Phil Roth. 2018. Ember: An open dataset for training static PE malware machine learning models. ArXiv abs/1804.04637.
    [24]
    Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51, 1 (2008), 117–122.
    [25]
    Dorian C. Arnold, Dong H. Ahn, Bronis R. De Supinski, Gregory L. Lee, Barton P. Miller, and Martin Schulz. 2007. Stack trace analysis for large scale debugging. In Proceedings of the 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1–10.
    [26]
    Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium. Vol. 14, 23–26.
    [27]
    Saba Arshad, Munam A. Shah, Abdul Wahid, Amjad Mehmood, Houbing Song, and Hongnian Yu. 2018. Samadroid: A novel 3-level hybrid malware detection model for android operating system. IEEE Access 6 (2018), 4321–4339. DOI:https://doi.org/10.1109/ACCESS.2018.2792941
    [28]
    Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Maverick Woo, and David Brumley. 2014. Automatic exploit generation. Communications of the ACM 57, 2 (2014), 74–84.
    [29]
    Michael Backes, Sven Bugiel, and Erik Derr. 2016. Reliable third-party library detection in android and its security applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 356–367.
    [30]
    Michael Backes, Sven Bugiel, Erik Derr, Sebastian Gerling, and Christian Hammer. 2016. R-droid: Leveraging android app analysis with static slice optimization. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. ACM, 129–140.
    [31]
    Jinrong Bai, Junfeng Wang, and Guozhong Zou. 2014. A malware detection scheme based on mining format information. The Scientific World Journal 2014 (2014), 1–12.
    [32]
    Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. 2005. CodeSurfer/ \(\times\) 886—A platform for analyzing \(\times\) 886 executables. In Compiler Construction. Bodik R. (Ed.), Springer, 250–254.
    [33]
    Gogul Balakrishnan and Thomas Reps. 2010. WYSINWYX: What you see is not what you execute. ACM Transactions on Programming Languages and Systems 32, 6 (2010), 1–84.
    [34]
    Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. \(\lbrace\) BYTEWEIGHT \(\rbrace\) : Learning to Recognize Functions in Binary Code. In Proceedings of the 23rd \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 14). 845–860.
    [35]
    Mayank Bawa, Tyson Condie, and Prasanna Ganesan. 2005. LSH forest: Self-tuning indexes for similarity search. In Proceedings of the 14th International Conference on World Wide Web. ACM, 651–660.
    [36]
    Laszlo A. Belady and Meir M. Lehman. 1976. A model of large program development. IBM Systems Journal 15, 3 (1976), 225–252.
    [37]
    Martial Bourquin, Andy King, and Edward Robbins. 2013. Binslayer: Accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. ACM, 4.
    [38]
    Rodrigo Rubira Branco, Gabriel Negreira Barbosa, and Pedro Drimel Neto. 2012. Scientific but not academical overview of malware anti-debugging, anti-disassembly and anti-vm technologies. Black Hat 1, (2012), 1–27.
    [39]
    Murray Brand. 2007. Forensic analysis avoidance techniques of malware. In Proceedings of the 5th Australian Digital Forensics. 59.
    [40]
    Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2007. Code normalization for self-mutating malware. IEEE Security & Privacy2 (2007), 46–54.
    [41]
    Juan Caballero, Noah M. Johnson, Stephen McCamant, and Dawn Song. 2009. Binary Code Extraction and Interface Identification for Security Applications. Technical Report. DTIC Document.
    [42]
    Juan Caballero and Zhiqiang Lin. 2016. Type inference on executables. ACM Computing Surveys 48, 4 (2016), 1–35.
    [43]
    Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, 209–224.
    [44]
    Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing programmers via code stylometry. In Proceedings of the 24th USENIX Conference on Security Symposium. 255–270.
    [45]
    Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When coding style survives compilation: De-anonymizing programmers from executable binaries. In Proceedings of the Network and Distributed System Security Symposium. (2018).
    [46]
    Peter Casey, Mateusz Topor, Emily Hennessy, Saed Alrabaee, Moayad Aloqaily, and Azzedine Boukerche. 2019. Applied comparative evaluation of the metasploit evasion module. In Proceedings of the 2019 IEEE Symposium on Computers and Communications. IEEE, 1–6.
    [47]
    Silvio Cesare, Yang Xiang, and Wanlei Zhou. 2014. Control flow-based malware variant detection. IEEE Transactions on Dependable and Secure Computing 11, 4 (2014), 307–317.
    [48]
    Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 380–394.
    [49]
    Sang Kil Cha, Maverick Woo, and David Brumley. 2015. Program-adaptive mutational fuzzing. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 725–741.
    [50]
    Sagar Chaki, Cory Cohen, and Arie Gurfinkel. 2011. Supervised learning for provenance-similarity of binaries. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 15–23.
    [51]
    Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-OS binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 678–689.
    [52]
    Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2012. The S2E platform: Design, implementation, and applications. ACM Transactions on Computer Systems 30, 1 (2012), 1–49.
    [53]
    Cory Cohen and Jeffrey S. Havrilla. 2009. Function hashing for malicious code analysis. CERT Research Annual Report (2009), 26–29.
    [54]
    Paolo Milani Comparetti, Guido Salvaneschi, Engin Kirda, Clemens Kolbitsch, Christopher Kruegel, and Stefano Zanero. 2010. Identifying dormant functionality in malware programs. In Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE, 61–76.
    [55]
    Emanuele Cozzi, Mariano Graziano, Yanick Fratantonio, and Davide Balzarotti. 2018. Understanding linux malware. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 161–175.
    [56]
    Christoph Csallner and Yannis Smaragdakis. 2005. Check’n’crash: Combining static checking and testing. In Proceedings of the 27th International Conference on Software Engineering. ACM, 422–431.
    [57]
    Santanu Kumar Dash, Guillermo Suarez-Tangil, Salahuddin Khan, Kimberly Tam, Mansour Ahmadi, Johannes Kinder, and Lorenzo Cavallaro. 2016. Droidscribe: Classifying android malware based on runtime behavior. In Proceedings of the 2016 IEEE Security and Privacy Workshops. IEEE, 252–261.
    [58]
    Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–28.
    [59]
    Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 266–280.
    [60]
    Yaniv David, Nimrod Partush, and Eran Yahav. 2018. Firmup: Precise static detection of common vulnerabilities in firmware. In Proceedings of the ACM SIGPLAN Notices, Vol. 53. ACM, 392–404.
    [61]
    Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. Vol. 49. ACM, 349–360.
    [62]
    Jeffrey Dean and Sanjay Ghemawat. 2008. Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107–113.
    [63]
    Steven HH Ding, Benjamin Fung, and Philippe Charland. 2016. Kam1n0: Mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 461–470.
    [64]
    S. H. H. Ding, B. C. M. Fung, and P. Charland. 2019. Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th International Symposium on Security and Privacy. IEEE Computer Society, 38–55.
    [65]
    Brendan F. Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, and Ryan Whelan. 2014. Repeatable reverse engineering for the greater good with panda. Retrieved on September 23, 2021 from https://mice.cs.columbia.edu/getTechreport.php?techreportID=1588&format=pdf&.
    [66]
    Stéphane Ducasse, Oscar Nierstrasz, and Matthias Rieger. 2006. On the effectiveness of clone detection by string matching. Journal of Software Maintenance and Evolution: Research and Practice 18, 1 (2006), 37–58.
    [67]
    Thomas Dullien and Rolf Rolles. 2005. Graph-based comparison of executable objects (english version). Sstic 5 1, (2005), 1–3.
    [68]
    Tudor Dumitraş and Darren Shou. 2011. Toward a standard benchmark for computer security research: The Worldwide Intelligence Network Environment. In Proceedings of the 1st Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. ACM, 89–96.
    [69]
    Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2012. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys 44, 2 (2012), 1–42.
    [70]
    Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium. 303–317.
    [71]
    E. Eilam. 2011. Reversing: Secrets of Reverse Engineering. John Wiley & Sons.
    [72]
    Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. 2013. Scalable variable and data type detection in a binary rewriter. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. Vol. 48. ACM, 51–60.
    [73]
    Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the Network and Distributed System Security Symposium.
    [74]
    Wenbin Fang, Barton P. Miller, and James A. Kupsch. 2012. Automated tracing and visualization of software security structure and properties. In Proceedings of the 9th International Symposium on Visualization for Cyber Security. ACM, 9–16.
    [75]
    Mohammad Reza Farhadi. 2013. Assembly Code Clone Detection for Malware Binaries. Ph.D. Dissertation. Concordia University.
    [76]
    Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. 2014. Binclone: Detecting code clones in malware. In Proceedings of the 2014 18th International Conference on Software Security and Reliability. IEEE, 78–87.
    [77]
    Pengbin Feng, Jianfeng Ma, Cong Sun, Xinpeng Xu, and Yuwan Ma. 2018. A novel dynamic android malware detection system with ensemble learning. IEEE Access 6 (2018), 30996–31011. DOI:https://doi.org/10.1109/ACCESS.2018.2844349
    [78]
    Qian Feng, Minghua Wang, Mu Zhang, Rundong Zhou, Andrew Henderson, and Heng Yin. 2017. Extracting conditional formulas for cross-platform bug search. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 346–359.
    [79]
    Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 480–491.
    [80]
    Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems 9, 3 (1987), 319–349.
    [81]
    Halvar Flake. 2002. Graph-based binary analysis. Blackhat Briefings 2002 (2002).
    [82]
    Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 541–552.
    [83]
    Thomas Given-Wilson, Annelie Heuser, Nisrine Jafri, and Axel Legay. 2019. An automated and scalable formal process for detecting fault injection vulnerabilities in binaries. Concurrency and Computation: Practice and Experience 31, 23 (2019), e4794.
    [84]
    Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. Vol. 40. ACM, 213–223.
    [85]
    Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox fuzzing for security testing. Communications of the ACM 55, 3 (2012), 40–44.
    [86]
    Jinwei Gu, Jie Zhou, and Chunyu Yang. 2006. Fingerprint recognition by combining global structure and local cues. IEEE Transactions on Image Processing 15, 7 (2006), 1952–1964.
    [87]
    Sumit Gulwani and George C. Necula. 2005. Precise interprocedural analysis using random interpretation. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Vol. 40. ACM, 324–337.
    [88]
    Archit Gupta, Pavan Kuppili, Aditya Akella, and Paul Barford. 2009. An empirical study of malware evolution. In Proceedings of the 1stInternational Communication Systems and Networks and Workshops. IEEE, 1–10.
    [89]
    Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo iso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 337–348.
    [90]
    Sergio Chica Juan Caballero Haq, Irfan and Somesh Jha.2018. Malware lineage in the wild. Computers & Security 78 (2018), 347–363.
    [91]
    Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting debug information in stripped binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1667–1680.
    [92]
    Sean Heelan. 2009. Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities. Ph.D. Dissertation. University of Oxford.
    [93]
    Sean Heelan and Agustin Gianni. 2012. Augmenting vulnerability analysis of binary code. In Proceedings of the 28th Annual Computer Security Applications Conference. ACM, 199–208.
    [94]
    Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. 2011. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 63–72.
    [95]
    Susan Horwitz, Thomas Reps, and David Binkley. 1990. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems 12, 1 (1990), 26–60.
    [96]
    Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. 2017. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1507–1515.
    [97]
    Zhang Y. Li J. & Gu D. Hu, Y.2016. Cross-architecture binary semantics understanding via similar code comparison. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 57–67.
    [98]
    He Huang, Amr M. Youssef, and Mourad Debbabi. 2017. Binsequence: Fast, accurate and scalable binary code reuse detection. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 155–166.
    [99]
    Shih-Kun Huang, Min-Hsiang Huang, Po-Yen Huang, Chung-Wei Lai, Han-Lin Lu, and Wai-Meng Leong. 2012. Crax: Software crash analysis for automatic exploit generation by modeling attacks as symbolic continuations. In Proceedings of the 2012 IEEE 6th International Conference on Software Security and Reliability. IEEE, 78–87.
    [100]
    Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller. 2011. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools. ACM, 1–8.
    [101]
    Sachin Jain and Yogesh Kumar Meena. 2011. Byte level n–gram analysis for malware detection. In Proceedings of the 2011 International Conference on Information Processing. Springer, 51–59.
    [102]
    Jiyong Jang. 2013. Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code. Ph.D. Dissertation. Carnegie Mellon University.
    [103]
    Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. Redebug: Finding unpatched code clones in entire os distributions. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 48–62.
    [104]
    Jiyong Jang, David Brumley, and Shobha Venkataraman. 2011. Bitshred: Feature hashing malware for scalable triage and semantic analysis. In Proceedings of the 18th ACM Conference on Computer and Communications Security. ACM, 309–320.
    [105]
    Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards automatic software lineage inference. In Proceedings of the 22nd USENIX Security. 81–96.
    [106]
    Weiwei Jin, Sagar Chaki, Cory Cohen, Arie Gurfinkel, Jeffrey Havrilla, Charles Hines, and Priya Narasimhan. 2012. Binary function clustering using semantic hashes. In Proceedings of the 11th International Conference on Machine Learning and Applications. Vol. 1. IEEE, 386–391.
    [107]
    Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM: Software protection for the masses. In Proceedings of the 1st International Workshop on Software Protection. IEEE, 3–9.
    [108]
    ElMouatez Billah Karbab, Mourad Debbabi, Saed Alrabaee, and Djedjiga Mouheb. 2016. Dysign: Dynamic fingerprinting for the automatic detection of android malware. In Proceedings of the 11th International Conference on Malicious and Unwanted Software. IEEE, 1–8.
    [109]
    Md Enamul Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. 2005. Malware phylogeny generation using permutations of code. Journal in Computer Virology 1, 1–2 (2005), 13–23.
    [110]
    Iman Keivanloo, Chanchai K Roy, and Juergen Rilling. 2012. Java bytecode clone detection via relaxation on code fingerprint and semantic web reasoning. In Proceedings of the 6th International Workshop on Software Clones. IEEE, 36–42.
    [111]
    James M. Keller, Michael R. Gray, and James A. Givens. 1985. A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics4 (1985), 580–585.
    [112]
    Kris Kendall. 2007. Practical malware analysis. Retrieved on September 14, 2021 from https://www.blackhat.com/presentations/bh-dc-07/Kendall_McMillan/Presentation/bh-dc-07-Kendall_McMillan.pdf.
    [113]
    Wei Ming Khoo. 2013. Decompilation as Search. Technical Report UCAM-CL-TR-844. University of Cambridge, Computer Laboratory. Retrieved from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-844.pdf.
    [114]
    Wei Ming Khoo, Alan Mycroft, and Ross Anderson. 2013. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE, 329–338.
    [115]
    Johannes Kinder. 2010. Static Analysis of X86 Executables. Ph.D. Dissertation. Technische Universität Darmstadt.
    [116]
    Johannes Kinder and Helmut Veith. 2008. Jakstab: A static analysis platform for binaries. In Proceedings of the International Conference on Computer Aided Verification. Springer, 423–427.
    [117]
    Benjamin Kollenda, Enes Göktaş, Tim Blazytko, Philipp Koppe, Robert Gawlik, Radhesh Krishnan Konoth, Cristiano Giuffrida, Herbert Bos, and Thorsten Holz. 2017. Towards automated discovery of crash-resistant primitives in binary executables. In Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 189–200.
    [118]
    Laszlo Kozma. 2008. k Nearest Neighbors algorithm (kNN). Retrieved on August 23, 2021 from http://www.lkozma.net/knn2.pdf.
    [119]
    Meir M. Lehman and Juan F. Ramil. 2001. Rules and tools for software evolution planning and management. Annals of Software Engineering 11, 1 (2001), 15–44.
    [120]
    Pierre Lestringant, Frédéric Guihéry, and Pierre-Alain Fouque. 2015. Automated identification of cryptographic primitives in binary code with data flow graph isomorphism. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security. ACM, 203–214.
    [121]
    Ming Li and Zhi-Hua Zhou. 2007. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37, 6 (2007), 1088–1098.
    [122]
    Xingwei Li, Zheng Shan, Fudong Liu, Yihang Chen, and Yifan Hou. 2019. A consistently-executing graph-based approach for malware packer identification. IEEE Access 7 (2019), 51620–51629. DOI:https://doi.org/10.1109/ACCESS.2019.2910268
    [123]
    Yuping Li, Sathya Chandran Sundaramurthy, Alexandru G. Bardas, Xinming Ou, Doina Caragea, Xin Hu, and Jiyong Jang. 2015. Experimental study of fuzzy hashing in malware clustering analysis. In Proceedings of the 8th Workshop on Cyber Security Experimentation and Test.
    [124]
    Da Lin and Mark Stamp. 2011. Hunting for undetectable metamorphic viruses. Journal in Computer Virology 7, 3 (2011), 201–214.
    [125]
    Hong Lin, Dongdong Zhao, Linjun Ran, Mushuai Han, Jing Tian, Jianwen Xiang, Xian Ma, and Yingshou Zhong. 2017. Cvssa: Cross-architecture vulnerability search in firmware based on support vector machine and attributed control flow graph. In Proceedings of the 2017 International Conference on Dependable Systems and Their Applications. IEEE, 35–41.
    [126]
    Andreas Lindner, Roberto Guanciale, and Roberto Metere. 2019. TrABin: Trustworthy analyses of binaries. Science of Computer Programming 174 (2019), 72–89.
    [127]
    Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 1–12.
    [128]
    Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, and Heng Tao Shen. 2014. Sk-lsh: An efficient index structure for approximate nearest neighbor search. Proceedings of the VLDB Endowment 7, 9 (2014), 745–756.
    [129]
    Yuan Wang Liu, Jing and Yongjun Wang. 2016. Inferring phylogenetic networks of malware families from api sequences. In Proceedings of the International Conference on Cyber-enabled Distributed Computing and Knowledge Discovery. IEEE, 14–17.
    [130]
    Fan Long, Stelios Sidiroglou-Douskos, and Martin Rinard. 2014. Automatic runtime error repair and containment via recovery shepherding. In Proceedings of the ACM SIGPLAN Notices. Vol. 49. ACM, 227–238.
    [131]
    Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 389–400.
    [132]
    Matias Madou, Bertrand Anckaert, Bjorn De Sutter, and Koen De Bosschere. 2005. Hybrid static-dynamic attacks against software protection mechanisms. In Proceedings of the 5th ACM Workshop on Digital Rights Management. ACM, 75–82.
    [133]
    D. Mahajan, R. Patel, and V. Sanker. 2018. Word2Vec using character n-grams. Retrieved on September 20, 2021 from https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2761021.pdf.
    [134]
    Umme Ayda Mannan, Iftekhar Ahmed, Rana Abdullah M. Almurshed, Danny Dig, and Carlos Jensen. 2016. Understanding code smells in android applications. In Proceedings of the IEEE/ACM International Conference on Mobile Software Engineering and Systems. IEEE, 225–236.
    [135]
    Marion Marschalek and Claudio Guarnieri. 2015. Big game hunting: The peculiarities in nation-state malware research. Black Hat, Las Vegas, NV, USA.
    [136]
    Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and Petros Maniatis. 2012. Path-exploration lifting: Hi-fi tests for lo-fi emulators. In Proceedings of the ACM SIGARCH Computer Architecture News. Vol. 40. ACM, 337–348.
    [137]
    Fabio Martinelli, Francesco Mercaldo, and Andrea Saracino. 2017. Bridemaid: An hybrid tool for accurate detection of android malware. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 899–901.
    [138]
    Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2009. Improving malware detection by applying multi-inducer ensemble. Computational Statistics & Data Analysis 53, 4 (2009), 1483–1494.
    [139]
    Xiaozhu Meng and Barton P. Miller. 2016. Binary code is not easy. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 24–35.
    [140]
    Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying multiple authors in a binary program. In Proceedings of the European Symposium on Research in Computer Security. Springer, 286–304.
    [141]
    Ghita Mezzour, Kathleen M. Carley, and L. Richard Carley. 2015. An empirical study of global malware encounters. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1–11.
    [142]
    Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin, Karen L. Karavanic, Krishna Kunchithapadam, and Tia Newhall. 1995. The paradyn parallel performance measurement tool. Computer 28, 11 (1995), 37–46.
    [143]
    Jiang Ming, Dongpeng Xu, and Dinghao Wu. 2015. Memoized semantics-based binary diffing with application to malware lineage inference. In Proceedings of the IFIP International Information Security and Privacy Conference. Springer, 416–430.
    [144]
    Ned Moran and James T. Bennett. 2013. Supply Chain Analysis: From Quartermaster to Sunshop. Vol. 11. FireEye.
    [145]
    Lakshmanan Nataraj, Dhilung Kirat, BS Manjunath, and Giovanni Vigna. 2013. Sarvam: Search and retrieval of malware. In Proceedings of the Annual Computer Security Conference (ACSAC) Worshop on Next Generation Malware Attacks and Defense (NGMAD).
    [146]
    Lina Nouh, Ashkan Rahimian, Djedjiga Mouheb, Mourad Debbabi, and Aiman Hanna. 2017. Binsign: Fingerprinting binary functions to support automated analysis of code executables. In Proceedings of the IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 341–355.
    [147]
    Lucky Onwuzurike, Enrico Mariconti, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, and Gianluca Stringhini. 2019. MaMaDroid: Detecting android malware by building markov chains of behavioral models (extended version). ACM Transactions on Privacy and Security 22, 2 (2019), 1–34.
    [148]
    Ori Or-Meir, Nir Nissim, Yuval Elovici, and Lior Rokach. 2019. Dynamic malware analysis in the modern era—a state of the art survey. ACM Computing Surveys 52, 5 (2019), 1–48.
    [149]
    Karl J. Ottenstein and Linda M. Ottenstein. 1984. The program dependence graph in a software development environment. In Proceedings of the ACM Sigplan Notices, Vol. 19. ACM, 177–184.
    [150]
    Pádraig O’Sullivan, Kapil Anand, Aparna Kotha, Matthew Smithson, Rajeev Barua, and Angelos D. Keromytis. 2011. Retrofitting security in cots software with binary rewriting. In Proceedings of the Future Challenges in Security and Privacy for Academia and Industry. Springer, 154–172.
    [151]
    Sancheng Peng, Shui Yu, and Aimin Yang. 2013. Smartphone malware and its propagation modeling: A survey. IEEE Communications Surveys & Tutorials 16, 2 (2013), 925–941.
    [152]
    Roberto Perdisci, Andrea Lanzi, and Wenke Lee. 2008. Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executables. In Proceedings of the 2008 Annual Computer Security Applications Conference. IEEE, 301–310.
    [153]
    Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 709–724.
    [154]
    Jannik Pewny, Felix Schuster, Lukas Bernhard, Thorsten Holz, and Christian Rossow. 2014. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference. ACM, 406–415.
    [155]
    Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1–25.
    [156]
    Jing Qiu, Xiaohong Su, and Peijun Ma. 2015. Library functions identification in binary code by using graph isomorphism testings. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 261–270.
    [157]
    Jing Qiu, Xiaohong Su, and Peijun Ma. 2016. Using reduced execution flow graph to identify library functions in binary code. IEEE Transactions on Software Engineering 42, 2 (2016), 187–202.
    [158]
    Edward Raff, Richard Zak, Russell Cox, Jared Sylvester, Paul Yacci, Rebecca Ward, Anna Tracy, Mark McLean, and Charles Nicholas. 2018. An investigation of byte n-gram features for malware classification. Journal of Computer Virology and Hacking Techniques 14, 1 (2018), 1–20.
    [159]
    Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, and Mourad Debbabi. 2015. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation 14, 1 (2015), S146–S155.
    [160]
    K. Raman. 2012. Selecting features to classify malware. In Proceedings of the InfoSec Southwest. 49–64.
    [161]
    David A. Ramos and Dawson Engler. 2015. Under-constrained symbolic execution: Correctness checking for real code. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15). 49–64.
    [162]
    Linjun Ran, Liping Lu, Hong Lin, Mushuai Han, Dongdong Zhao, Jianwen Xiang, Haiguo Yu, and Xian Ma. 2017. An experimental study of four methods for homology analysis of firmware vulnerability. In Proceedings of the 2017 International Conference on Dependable Systems and Their Applications. IEEE, 42–50.
    [163]
    Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. 2014. Optimizing seed selection for fuzzing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14). 861–875.
    [164]
    Miranda Rodriguez. 2015. All your IP are belong to us: An analysis of intellectual property rights as applied to malware. Tex. A&m L. Rev. 3 (2015), 663.
    [165]
    Nathan Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2011. Recovering the toolchain provenance of binary code. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 100–110.
    [166]
    Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who wrote this code? identifying the authors of program binaries. In Proceedings of the 2011 Computer Security–ESORICS. Springer, 172–189.
    [167]
    Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, 21–28.
    [168]
    Kevin A. Roundy and Barton P. Miller. 2010. Hybrid analysis and control of malware. In Proceedings of the Recent Advances in Intrusion Detection. Springer, 317–338.
    [169]
    Chanchal K. Roy, James R. Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74, 7 (2009), 470–495.
    [170]
    Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. 2014. Identifying shared software components to support malware forensics. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 21–40.
    [171]
    Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel Quinlan, and Zhendong Su. 2009. Detecting code clones in binary executables. In Proceedings of the 18th International Symposium on Software Testing and Analysis. ACM, 117–128.
    [172]
    Brezo F. Ugarte-Pedrero X. & Bringas P. G. Santos, I.2013. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences 231 (2013), 64–82. DOI:https://doi.org/10.1016/j.ins.2011.08.020
    [173]
    Felix Brezo Javier Nieves Yoseba K. Penya Borja Sanz Carlos Laorden Santos, Igor and Pablo G. Bringas.2010. Idea: Opcode-sequence-based malware detection. In Proceedings of the International Symposium on Engineering Secure Software and Systems. Springer, 35–43.
    [174]
    Andrea Saracino, Daniele Sgandurra, Gianluca Dini, and Fabio Martinelli. 2016. Madam: Effective and efficient behavior-based android malware detection and prevention. IEEE Transactions on Dependable and Secure Computing 15, 1 (2016), 83–97.
    [175]
    Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. 2003. Winnowing: Local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, 76–85.
    [176]
    Matthew G. Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings of the 2001 EEE Symposium on Security and PrivacyI. IEEE, 38–49.
    [177]
    M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, and Muddassar Farooq. 2009. Pe-miner: Mining structural information to detect malicious executables in realtime. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, 121–141.
    [178]
    Farrukh Shahzad, Sohail Bhatti, Muhammad Shahzad, and Muddassar Farooq. 2011. In-execution malware detection using task structures of linux processes. In Proceedings of the 2011 IEEE International Conference on Communications. IEEE, 1–6.
    [179]
    Farrukh Shahzad and Muddassar Farooq. 2012. Elf-miner: Using structural knowledge and data mining methods to detect new (linux) malicious executables. Knowledge and Information Systems 30, 3 (2012), 589–612.
    [180]
    Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15). 611–626.
    [181]
    Paria Shirani, Leo Collard, Basile L. Agba, Bernard Lebel, Mourad Debbabi, Lingyu Wang, and Aiman Hanna. 2018. BINARM: Scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 114–138.
    [182]
    Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2017. Binshape: Scalable and robust binary library function identification using function shape. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 301–324.
    [183]
    Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In Proceedings of the Network and Distributed System Security. Vol. 1. 1–1.
    [184]
    Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK:(State of) The Art of War: Offensive Techniques in Binary Analysis. In Proceedings of the 2016 IEEE Symposium on Security and Privacy. IEEE, 138–157.
    [185]
    Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. 2008. Bitblaze: A new approach to computer security via binary analysis. In Proceedings of theInformation Systems Security. Springer, 1–25.
    [186]
    Guillermo Suarez-Tangil, Santanu Kumar Dash, Mansour Ahmadi, Johannes Kinder, Giorgio Giacinto, and Lorenzo Cavallaro. 2017. Droidsieve: Fast and accurate classification of obfuscated android malware. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy. ACM, 309–320.
    [187]
    Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, and Arturo Ribagorda. 2014. Evolution, detection and analysis of malware for smart devices. IEEE Communications Surveys & Tutorials 16, 2 (2014), 961–987.
    [188]
    Mingshen Sun, Xiaolei Li, John CS Lui, Richard TB Ma, and Zhenkai Liang. 2016. Monet: A user-oriented behavior-based malware variants detection system for android. IEEE Transactions on Information Forensics and Security 12, 5 (2016), 1103–1112.
    [189]
    Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012), 788–799.
    [190]
    Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2009. Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, 563–576.
    [191]
    Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2010. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Transactions on Database Systems 35, 3 (2010), 1–46.
    [192]
    Cristian Ţăpuş, I-Hsin Chung, and Jeffrey K. Hollingsworth2002. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. IEEE, 1–11.
    [193]
    Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42.
    [194]
    Jonathan van den Berg and and Hirohide Haga2018. Matching source code using abstract syntax trees in version control systems. Journal of Software Engineering and Applications 11, 06 (2018), 318.
    [195]
    Maarten Van Emmerik. 1998. Identifying library functions in executable file using patterns. In Proceedings of the 1998 Australian Software Engineering Conference. IEEE, 90–97.
    [196]
    Andrew Walenstein, Michael Venable, Matthew Hayes, Christopher Thompson, and Arun Lakhotia. 2007. Exploiting similarity between variants to defeat malware. In Proceedings of the 2007 Conference on BlackHat DC.
    [197]
    Xinran Wang, Chi-Chun Pan, Peng Liu, and Sencun Zhu. 2010. Sigfree: A signature-free buffer overflow attack blocker. IEEE Transactions on Dependable and Secure Computing 7, 1 (2010), 65–79.
    [198]
    Zheng Wang, Ken Pierce, and Scott McFarling. 2000. Bmat-a binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism 2 (2000), 1–20.
    [199]
    Daniel Weise, Roger F. Crew, Michael Ernst, and Bjarne Steensgaard. 1994. Value dependence graphs: Representation without taxation. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 297–310.
    [200]
    Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. 2005. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems. Springer, 365–381.
    [201]
    Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2017. Learning types for binaries. In Proceedings of the International Conference on Formal Engineering Methods. Springer, 430–446.
    [202]
    Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2018. Type learning for binaries and its applications. IEEE Transactions on Reliability 63, 3 (2018), 893–912.
    [203]
    Hongfa Xue, Shaowen Sun, Guru Venkataramani, and Tian Lan. 2019. Machine learning-based analysis of program binaries: A comprehensive study. IEEE Access 7 (2019), 65889–65912. DOI:https://doi.org/10.1109/ACCESS.2019.2917668
    [204]
    Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 2015 IEEE Symposium on Security and Privacy. IEEE, 797–812.
    [205]
    Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: Cross-modal retrieval for function-level binary source code matching. Advances in Neural Information Processing Systems 33 (2020), 1–10.
    [206]
    Fu Y. Miller K. A. Lin Z. Zhang X. & Xu D. Zeng, J.2013. Obfuscation resilient binary code reuse through trace-oriented programming. In Proceedings of the ACM SIGSAC Conference on Computer & Communications Security. ACM, 487–498.
    [207]
    Yuan Zhang, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang, and Hao Chen. 2018. Detecting third-party libraries in android applications with high precision and recall. In Proceedings of the 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 141–152.
    [208]
    Dongdong Zhao, Hong Lin, Linjun Ran, Mushuai Han, Jing Tian, Liping Lu, Shengwu Xiong, and Jianwen Xiang. 2019. CVSkSA: Cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph. Software Quality Journal 27 3, (2019), 1045–1068.
    [209]
    Viviane Zwanger and Felix C. Freiling. 2013. Kernel mode API spectroscopy for incident response and digital forensics. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. ACM, 3.

    Cited By

    View all
    • (2024)Classification of Malware Images Using Fine-Tunned ViTSakarya University Journal of Computer and Information Sciences10.35377/saucis...13410827:1(22-35)Online publication date: 30-Apr-2024
    • (2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
    • (2024) Homomorphic encryption domain asymmetric fingerprinting scheme for 3D models of oblique photography Transactions in GIS10.1111/tgis.1316028:4(790-815)Online publication date: 22-Mar-2024
    • Show More Cited By

    Index Terms

    1. A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 55, Issue 1
      January 2023
      860 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3492451
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 January 2022
      Accepted: 01 September 2021
      Received: 01 November 2020
      Published in CSUR Volume 55, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Binary code analysis
      2. reverse engineering
      3. software security

      Qualifiers

      • Survey
      • Refereed

      Funding Sources

      • United Arab Emirates University Start-up

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)734
      • Downloads (Last 6 weeks)60
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Classification of Malware Images Using Fine-Tunned ViTSakarya University Journal of Computer and Information Sciences10.35377/saucis...13410827:1(22-35)Online publication date: 30-Apr-2024
      • (2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
      • (2024) Homomorphic encryption domain asymmetric fingerprinting scheme for 3D models of oblique photography Transactions in GIS10.1111/tgis.1316028:4(790-815)Online publication date: 22-Mar-2024
      • (2024)Firmware Vulnerability Detection Algorithm Based on Matching Pattern-Specific Numerical Features With Structural FeaturesIEEE Access10.1109/ACCESS.2024.337853312(42317-42328)Online publication date: 2024
      • (2024)Network traffic classification: Techniques, datasets, and challengesDigital Communications and Networks10.1016/j.dcan.2022.09.00910:3(676-692)Online publication date: Jun-2024
      • (2023)A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection FrameworksInformation10.3390/info1407037414:7(374)Online publication date: 30-Jun-2023
      • (2023)LibAM: An Area Matching Framework for Detecting Third-Party Libraries in BinariesACM Transactions on Software Engineering and Methodology10.1145/362529433:2(1-35)Online publication date: 23-Dec-2023
      • (2023)Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU ArchitecturesProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3582818(443-456)Online publication date: 10-Jul-2023
      • (2023)gLTSdiff: A Generalized Framework for Structural Comparison of Software Behavior2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)10.1109/MODELS58315.2023.00025(285-295)Online publication date: 1-Oct-2023
      • (2023)AI in Education: Improving Quality for Both Centralized and Decentralized Frameworks2023 IEEE Global Engineering Education Conference (EDUCON)10.1109/EDUCON54358.2023.10125139(1-6)Online publication date: 1-May-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media