Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Similarity of binaries through re-optimization

Published: 14 June 2017 Publication History

Abstract

We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.
We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected "in the wild" to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity.
We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.

References

[1]
Esh - statistical similarity of binaries. http://binsim.com.
[2]
gcc optimizations options. https://gcc.gnu.org/onlinedocs/ gcc/Optimize-Options.html.
[3]
Llvm’s analysis and transform passes. http://llvm.org/ docs/Passes.html.
[4]
Mcsema. https://github.com/trailofbits/mcsema.
[5]
Shellshock vulnerability cve information. https://cve. mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6271.
[6]
Yard - yet another roc drawer. http://github.com/ ntamas/yard.
[7]
zynamics bindi ff. http://www.zynamics.com/bindiff. html.
[8]
zynamics bindi ff manual - understanding bindiff. http: //www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.
[9]
D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. Bap: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 463–469, Berlin, Heidelberg, 2011. Springer-Verlag.
[10]
Y. David, N. Partush, and E. Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016.
[11]
Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 349–360, New York, NY, USA, 2014. ACM.
[12]
J. Duan and J. Regehr. Correctness proofs for device drivers in embedded systems. In 5th International Workshop on Systems Software Verification, SSV’10, Vancouver, BC, Canada, October 6-7, 2010, 2010.
[13]
M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014., pages 303–317, 2014.
[14]
S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: E fficient cross-architecture identification of bugs in binary code. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016, 2016.
[15]
Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24- 28, 2016, pages 480–491, 2016.
[16]
C. Hawblitzel, S. K. Lahiri, K. Pawar, H. Hashmi, S. Gokbulut, L. Fernando, D. Detlefs, and S. Wadsworth. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 191–201, 2013.
[17]
E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools, PASTE ’11, pages 1–8, New York, NY, USA, 2011. ACM.
[18]
J. Jang, D. Brumley, and S. Venkataraman. BitShred : Feature Hashing Malware for Scalable Triage and Semantic Analysis. Proceedings of the 18th ACM Conference on Computer and Communications Security, pages 309–320, 2011.
[19]
W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 329–338, Piscataway, NJ, USA, 2013. IEEE Press.
[20]
S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In CAV, pages 712–717, 2012.
[21]
K. R. M. Leino. This is boogie 2. http://microsoft.com/ en-us/research/publication/this-is-boogie-2-2/.
[22]
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89–100, 2007.
[23]
B. H. Ng and A. Prakash. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual, pages 492–501, July 2013.
[24]
N. Partush and E. Yahav. Abstract semantic di fferencing for numerical programs. In Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 238–258. Springer, 2013.
[25]
N. Partush and E. Yahav. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 811–828, 2014.
[26]
S. Person, M. B. Dwyer, S. G. Elbaum, and C. S. Pasareanu. Di fferential symbolic execution. In SIGSOFT FSE, pages 226–237, 2008.
[27]
J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 709–724, 2015.
[28]
J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC ’14, pages 406–415, New York, NY, USA, 2014. ACM.
[29]
D. A. Ramos and D. R. Engler. Practical, low-e ffort equivalence verification of real code. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 669–685, Berlin, Heidelberg, 2011. Springer-Verlag.
[30]
N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the Toolchain Provenance of Binary Code Categories and Subject Descriptors. 20th International Symposium on Software Testing and Analysis (ISSTA), page 11, 2011.
[31]
R. Sharma, E. Schkufza, B. Churchill, and A. Aiken. Datadriven equivalence checking. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’13, pages 391–406, New York, NY, USA, 2013. ACM.
[32]
Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna. Sok: (state of) the art of war: O ffensive techniques in binary analysis. 2016.
[33]
R. Smith and S. Horwitz. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC), 2009.
[34]
S. J. Swamidass, C. Azencott, K. Daily, and P. Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics, 26(10):1348–1356, 2010.
[35]
M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981., pages 439–449, 1981.

Cited By

View all
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)MalFusion: Simple String Manipulations Confuse Malware Detection2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619782(113-121)Online publication date: 3-Jun-2024
  • (2023)OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00013(774-785)Online publication date: 9-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 6
PLDI '17
June 2017
708 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140587
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2017
    708 pages
    ISBN:9781450349888
    DOI:10.1145/3062341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017
Published in SIGPLAN Volume 52, Issue 6

Check for updates

Author Tags

  1. binary code search
  2. static binary analysis
  3. statistical similarity

Qualifiers

  • Article

Funding Sources

  • European Union's Seventh Framework Programme (FP7)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)9
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)MalFusion: Simple String Manipulations Confuse Malware Detection2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619782(113-121)Online publication date: 3-Jun-2024
  • (2023)OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00013(774-785)Online publication date: 9-Oct-2023
  • (2023)Binary Code Representation With Well-Balanced Instruction NormalizationIEEE Access10.1109/ACCESS.2023.325948111(29183-29198)Online publication date: 2023
  • (2023)Multi-semantic feature fusion attention network for binary code similarity detectionScientific Reports10.1038/s41598-023-31280-w13:1Online publication date: 12-Mar-2023
  • (2022)Function Representations for Binary SimilarityIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.305185219:4(2259-2273)Online publication date: 1-Jul-2022
  • (2021)Implementing a high-efficiency similarity analysis approach for firmware codePLOS ONE10.1371/journal.pone.024509816:1(e0245098)Online publication date: 12-Jan-2021
  • (2021)A Semantics-Based Hybrid Approach on Binary Code Similarity ComparisonIEEE Transactions on Software Engineering10.1109/TSE.2019.291832647:6(1241-1258)Online publication date: 1-Jun-2021
  • (2021)Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark ModelProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00084(873-884)Online publication date: 22-May-2021
  • (2020)Binary Similarity Analysis for Vulnerability Detection2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC48688.2020.0-110(1121-1122)Online publication date: Jul-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media