Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Tracelet-based code search in executables

Published: 09 June 2014 Publication History

Abstract

We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into tracelets: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in over a million binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.

References

[1]
A heap based vulnerability in gnu's rtapelib.c. http://www.cvedetails.com/cve/CVE-2010-0624/.
[2]
Hex-rays IDAPRO. http://www.hex-rays.com.
[3]
Yard-plot. http://pypi.python.org/pypi/yard.
[4]
Balakrishnan, G., and Reps, T. Divine: discovering variables in executables. In VMCAI'07 (2007), pp. 1--28.
[5]
Ball, T., and Larus, J. R. Efficient path profiling. In Proceedings of the 29th Int. Symp. on Microarchitecture (1996), MICRO 29.
[6]
Bansal, S., and Aiken, A. Automatic generation of peephole superoptimizers. In ASPLOS XII (2006).
[7]
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., and Merlo, E. Comparison and evaluation of clone detection tools. IEEE TSE 33, 9 (2007), 577--591.
[8]
Bruschi, D., Martignoni, L., and Monga, M. Detecting self-mutating malware using control-flow graph matching. In DIMVA'06.
[9]
Comparetti, P., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., and Zanero, S. Identifying dormant functionality in malware programs. In IEEE Symp. on Security and Privacy (2010).
[10]
Horwitz, S. Identifying the semantic and textual differences between two versions of a program. In PLDI '90.
[11]
Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence graphs. In PLDI '88 (1988).
[12]
Jang, J., Woo, M., and Brumley, D. Towards automatic software lineage inference. In USENIX Security (2013).
[13]
Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: a search engine for binary code. In MSR '13.
[14]
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., and Vigna, G. Polymorphic worm detection using structural information of executables. In Proc. of int. conf. on Recent Advances in Intrusion Detection, RAID'05.
[15]
Myles, G., and Collberg, C. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, SAC '05, pp. 314--318.
[16]
Partush, N., and Yahav, E. Abstract semantic differencing for numerical programs. In SAS (2013).
[17]
Reps, T., Ball, T., Das, M., and Larus, J. The use of program profiling for software maintenance with applications to the year 2000 problem. In ESEC '97/FSE-5.
[18]
Rosenblum, N., Zhu, X., and Miller, B. P. Who wrote this code? identifying the authors of program binaries. In ESORICS'11.
[19]
Rosenblum, N. E., Miller, B. P., and Zhu, X. Extracting compiler provenance from program binaries. In PASTE'10.
[20]
Saebjornsen, A., Willcock, J., Panas, T., Quinlan, D., and Su, Z. Detecting code clones in binary executables. In ISSTA '09.
[21]
Schkufza, E., Sharma, R., and Aiken, A. Stochastic superoptimization. In ASPLOS '13.
[22]
Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Data-driven equivalence checking. In OOPSLA'13.
[23]
Singh, R., Gulwani, S., and Solar-Lezama, A. Automated feedback generation for introductory programming assignments. In PLDI '13, pp. 15--26.
[24]
Swamidass, S. J., Azencott, C.-A., Daily, K., and Baldi, P. A CROC stronger than ROC. Bioinformatics 26, 10 (May 2010).
[25]
Wagner, R. A., and Fischer, M. J. The string-to-string correction problem. J. ACM 21, 1 (Jan. 1974), 168--173.

Cited By

View all
  • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
  • (2024)Vulnerabilities and Security Patches Detection in OSS: A SurveyACM Computing Surveys10.1145/369478257:1(1-37)Online publication date: 9-Sep-2024
  • (2024)CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction TechniquesProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657572(143-154)Online publication date: 20-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 49, Issue 6
PLDI '14
June 2014
598 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2666356
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2014
    619 pages
    ISBN:9781450327848
    DOI:10.1145/2594291
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2014
Published in SIGPLAN Volume 49, Issue 6

Check for updates

Author Tags

  1. static binary analysis
  2. x86
  3. x86-64

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)7
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
  • (2024)Vulnerabilities and Security Patches Detection in OSS: A SurveyACM Computing Surveys10.1145/369478257:1(1-37)Online publication date: 9-Sep-2024
  • (2024)CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction TechniquesProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657572(143-154)Online publication date: 20-Jun-2024
  • (2024)CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity DetectionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652117(149-161)Online publication date: 11-Sep-2024
  • (2024)SICode: Embedding-Based Subgraph Isomorphism Identification for Bug DetectionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3646556(304-315)Online publication date: 15-Apr-2024
  • (2024)Dynamic Neural Control Flow Execution: an Agent-Based Deep Equilibrium Approach for Binary Vulnerability DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679726(1215-1225)Online publication date: 21-Oct-2024
  • (2024)Cross-Inlining Binary Function Similarity DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639080(1-13)Online publication date: 20-May-2024
  • (2024)BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity DetectionIEEE Transactions on Software Engineering10.1109/TSE.2024.341107250:10(2485-2497)Online publication date: Oct-2024
  • (2024)BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniquesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001634:2(100163)Online publication date: Jun-2024
  • (2024)UniBin: Assembly Semantic-enhanced Binary Vulnerability Detection without DisassemblyInformation Sciences10.1016/j.ins.2024.121605(121605)Online publication date: Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media