Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3052973.3052995acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

Extracting Conditional Formulas for Cross-Platform Bug Search

Published: 02 April 2017 Publication History

Abstract

With the recent increase in security breaches in embedded systems and IoT devices, it becomes increasingly important to search for vulnerabilities directly in binary executables in a cross-platform setting. However, very little has been explored in this domain. The existing efforts are prone to producing considerable false positives, and their results cannot provide explainable evidence for human analysts to eliminate these false positives. In this paper, we propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a bug: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, binary code search on conditional formulas produces significantly higher accuracy and provide meaningful evidence for human analysts to further examine the search results. We have implemented a prototype, XMATCH, and evaluated it using well-known software, including OpenSSL and BusyBox. Experimental results have shown that XMATCH outperforms the existing bug search techniques in terms of accuracy. Moreover, by evaluating 5 recent vulnerabilities, XMATCH provides clear evidence for human analysts to determine if a matched candidate is indeed vulnerable or has been patched.

References

[1]
The LLVM Compiler Infrastructure. http://llvm.org/.
[2]
The z3 theorem prover. https://z3.codeplex.com/, 2010.
[3]
Dd-wrt firmware image r21676. ftp://ftp.dd-wrt.com/others/eko/BrainSlayer-V24-preSP2/2013/05-27-2013-r21676/senao-eoc5610/linux.bin(lastvisit: 2016-1-20), 2013.
[4]
Retargetable decompiler. https://retdec.com, 2013.
[5]
AVGERINOS, T., CHA, S. K., REBERT, A., SCHWARTZ,E. J., WOO, M., AND BRUMLEY, D. Automatic exploit generation. Communications of the ACM 57, 2 (2014), 74--84.
[6]
BALAKRISHNAN, G., GRUIAN, R., REPS, T., AND TEITELBAUM, T. Codesurfer/x86: A platform for analyzing x86 executables. In Compiler Construction, Lecture Notes in Computer Science. 2005.
[7]
BALAKRISHNAN, G., AND REPS, T. Analyzing memory accesses in x86 executables. In Compiler Construction (2004).
[8]
BRUMLEY, D., JAGER, I., AVGERINOS, T., AND SCHWARTZ, E. J. Bap: a binary analysis platform. In Computer aided verification (2011), Springer, pp. 463--469.
[9]
BRUMLEY, D., NEWSOME, J., SONG, D., WANG, H., AND JHA, S. Towards automatic generation of vulnerability-based signatures. In IEEE Symposium on Security and Privacy (Oakland) (2006).
[10]
CABALLERO, J., JOHNSON, N. M., MCCAMANT, S., AND SONG, D. Binary code extraction and interface identification for security applications. In Proceedings of the 17th Annual Network and Distributed System Security Symposium (San Diego, CA, Feb. 2010).
[11]
CHA, S. K., WOO, M., AND BRUMLEY, D. Program-adaptive mutational fuzzing. In IEEE Symposium on Security and Privacy (Oakland) (2015).
[12]
CHEN, D. D., EGELE, M., WOO, M., AND BRUMLEY, D. Towards automated dynamic analysis for linux-based embedded firmware. In NDSS (2016).
[13]
DAVID, Y., AND YAHAV, E. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI'14) (2014), ACM.
[14]
DINABURG, A., AND RUEF, A. Mcsema: Static translation of x86 instructions to llvm. In ReCon (2014).
[15]
DOLAN-GAVITT, B., LEEK, T., HODOSH, J., AND LEE, W. Tappan zee (north) bridge: mining memory accesses for introspection. In CCS (2013).
[16]
DULLIEN, T., AND PORST, S. Reil: A platform-independent intermediate representation of disassembled code for static code analysis. CanSecWest (2009).
[17]
EGELE, M., WOO, M., CHAPMAN, P., AND BRUMLEY, D. Blanket execution: Dynamic similarity testing for program binaries and components. In USENIX Security (2014).
[18]
ELWAZEER, K., ANAND, K., KOTHA, A., SMITHSON, M., AND BARUA, R. Scalable variable and data type detection in a binary rewriter. In ACM SIGPLAN Notices (2013).
[19]
ESCHWEILER, S., YAKDAN, K., AND GERHARDS-PADILLA, E. discovre: Efficient cross-architecture identification of bugs in binary code. In NDSS (2016).
[20]
FENG, Q., ZHOU, R., XU, C., CHENG, Y., TESTA, B., AND YIN, H. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016).
[21]
GAO, D., REITER, M. K., AND SONG, D. Binhunt: Automatically finding semantic differences in binary programs. In Information and Communications Security. Springer, 2008, pp. 238--255.
[22]
GEOFFRION, A. M. Lagrangean relaxation for integer programming. Springer, 1974.
[23]
IRELAND, A., AND STARK, J. On the automatic discovery of loop invariants. In NASA Conference Publication (1997).
[24]
JANG, J. Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code. PhD thesis, CARNEGIE MELLON UNIVERSITY, 2013.
[25]
JANG, J., AGRAWAL, A., AND BRUMLEY, D. Redebug: finding unpatched code clones in entire os distributions. In IEEE Symposium on Security and Privacy (Oakland) (2012).
[26]
JHALA, R., AND MAJUMDAR, R. Path slicing. In ACM SIGPLAN Notices (2005).
[27]
KAMIYA, T., KUSUMOTO, S., AND INOUE, K. Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654--670.
[28]
KARIM, M. E., WALENSTEIN, A., LAKHOTIA, A., AND PARIDA, L. Malware phylogeny generation using permutations of code. Journal in Computer Virology 1, 1-2 (2005), 13--23.
[29]
KHOO, W. M., MYCROFT, A., AND ANDERSON, R. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories (2013), IEEE Press.
[30]
KUHN, H. W. The hungarian method for the assignment problem. In 50 Years of Integer Programming 1958-2008. 2010, pp. 29--47.
[31]
LEE, J., AVGERINOS, T., AND BRUMLEY, D. Tie: Principled reverse engineering of types in binary programs. In Network and Distributed System Security Symposium (Feb. 2011).
[32]
LI, Z., LU, S., MYAGMAR, S., AND ZHOU, Y. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI (2004), vol. 4, pp. 289--302.
[33]
MING, J., PAN, M., AND GAO, D. ibinhunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology. Springer, 2012, pp. 92--109.
[34]
NETHERCOTE, N., AND SEWARD, J. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI (2007), pp. 89--100.
[35]
PEWNY, J., GARMANY, B., GAWLIK, R., ROSSOW, C., AND HOLZ, T. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy (Oakland'15) (2015), IEEE.
[36]
PEWNY, J., SCHUSTER, F., BERNHARD, L., HOLZ, T., AND ROSSOW, C. Leveraging semantic signatures for bug search in binary programs. In ACSAC (2014).
[37]
REBERT, A., CHA, S. K., AVGERINOS, T., FOOTE, J., WARREN, D., GRIECO, G., AND BRUMLEY, D. Optimizing seed selection for fuzzing. In USENIX Security (2014).
[38]
RIESEN, K., NEUHAUS, M., AND BUNKE, H. Bipartite graph matching for computing the edit distance of graphs. In Graph-Based Representations in Pattern Recognition. 2007, pp. 1--12.
[39]
SCHWARTZ, E. J., LEE, J., WOO, M., AND BRUMLEY, D. Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In USENIX Security (2013).
[40]
SHOSHITAISHVILI, Y., WANG, R., HAUSER, C., KRUEGEL, C., AND VIGNA, G. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In NDSS (2015).
[41]
SONG, D., BRUMLEY, D., YIN, H., CABALLERO, J., JAGER, I., KANG, M. G., LIANG, Z., NEWSOME, J., POOSANKAM, P., AND SAXENA, P. BitBlaze: A newapproach to computer security via binary analysis. In Proceedings of the 4th International Conference on Information Systems Security (Hyderabad, India, Dec. 2008).
[42]
STEPHENS, N., GROSEN, J., SALLS, C., DUTCHER, A., AND WANG, R. Driller: Augmenting fuzzing through selective symbolic execution. In NDSS (2016).
[43]
TAHA, H. A. Integer programming: theory, applications, and computations. Academic Press, 2014.

Cited By

View all
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Semantic aware-based instruction embedding for binary code similarity detectionPLOS ONE10.1371/journal.pone.030529919:6(e0305299)Online publication date: 11-Jun-2024
  • (2024)CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity DetectionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652117(149-161)Online publication date: 11-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security
April 2017
952 pages
ISBN:9781450349444
DOI:10.1145/3052973
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary analysis
  2. firmware security
  3. vulnerability search

Qualifiers

  • Research-article

Funding Sources

  • DARPA Grant
  • National Science Foundation Grant
  • Air Force Research Lab Grant

Conference

ASIA CCS '17
Sponsor:

Acceptance Rates

ASIA CCS '17 Paper Acceptance Rate 67 of 359 submissions, 19%;
Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)4
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Semantic aware-based instruction embedding for binary code similarity detectionPLOS ONE10.1371/journal.pone.030529919:6(e0305299)Online publication date: 11-Jun-2024
  • (2024)CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity DetectionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652117(149-161)Online publication date: 11-Sep-2024
  • (2024)HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detectionComputers & Security10.1016/j.cose.2024.104029145(104029)Online publication date: Oct-2024
  • (2024)Optir-SBERT: Cross-Architecture Binary Code Similarity Detection Based on Optimized LLVM IRDigital Forensics and Cyber Crime10.1007/978-3-031-56583-0_7(95-113)Online publication date: 3-Apr-2024
  • (2023)IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block RelationsSensors10.3390/s2318778923:18(7789)Online publication date: 11-Sep-2023
  • (2023)Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity DetectionElectronics10.3390/electronics1207172212:7(1722)Online publication date: 4-Apr-2023
  • (2023)BlockMatch: A Fine-Grained Binary Code Similarity Detection Approach Using Contrastive Learning for Basic Block MatchingApplied Sciences10.3390/app13231275113:23(12751)Online publication date: 28-Nov-2023
  • (2023)LibAM: An Area Matching Framework for Detecting Third-Party Libraries in BinariesACM Transactions on Software Engineering and Methodology10.1145/362529433:2(1-35)Online publication date: 23-Dec-2023
  • (2023)Discovering Causes of Traffic Congestion via Deep Transfer ClusteringACM Transactions on Intelligent Systems and Technology10.1145/360481014:5(1-24)Online publication date: 11-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media