Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2950290.2950335acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Public Access

Extracting instruction semantics via symbolic execution of code generators

Published: 01 November 2016 Publication History

Abstract

Binary analysis and instrumentation form the basis of many tools and frameworks for software debugging, security hardening, and monitoring. Accurate modeling of instruction semantics is paramount in this regard, as errors can lead to program crashes, or worse, bypassing of security checks. Semantic modeling is a daunting task for modern processors such as x86 and ARM that support over a thousand instructions, many of them with complex semantics. This paper describes a new approach to automate this semantic modeling task. Our approach leverages instruction semantics knowledge that is already encoded into today's production compilers such as GCC and LLVM. Such an approach can greatly reduce manual effort, and more importantly, avoid errors introduced by manual modeling. Furthermore, it is applicable to any of the numerous architectures already supported by the compiler. In this paper, we develop a new symbolic execution technique to extract instruction semantics from a compiler's source code. Unlike previous applications of symbolic execution that were focused on identifying a single program path that violates a property, our approach addresses the all paths problem, extracting the entire input/output behavior of the code generator. We have applied it successfully to the 120K lines of C-code used in GCC's code generator to extract x86 instruction semantics. To demonstrate architecture-neutrality, we have also applied it to AVR, a processor used in the popular Arduino platform.

References

[1]
Dagger. http://dagger.repzret.org.
[2]
Mart´ın Abadi, Mihai Budiu, ´ Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity principles, implementations, and applications. ACM Transactions on Information and System Security (TISSEC), 2009.
[3]
Mihhail Aizatulin, Andrew D. Gordon, and Jan Jürjens. Extracting and verifying cryptographic models from c protocol code by symbolic execution. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pages 331–340, 2011.
[4]
Kapil Anand, Matthew Smithson, Aparna Kotha, Khaled Elwazeer, and Rajeev Barua. Decompilation to compiler high ir in a binary rewriter. Technical report, Tech. rep., University of Maryland (November 2010), http://www. ece. umd. edu/barua/high-IR-technical-report10. pdf.
[5]
Thanassis Avgerinos, Sang Kil Cha, Brent Lim Tze Hao, and David Brumley. AEG: Automatic Exploit Generation. In Network and Distributed System Security Symposium, Feburary 2011.
[6]
Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. Codesurfer/x86—a platform for analyzing x86 executables. In Compiler Construction. Springer, 2005.
[7]
Fabrice Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Conference on Annual Technical Conference, 2005.
[8]
Derek L. Bruening. Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Cambridge, MA, USA, 2004.
[9]
David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J. Schwartz. Bap: a binary analysis platform. In Proceedings of the 23rd international conference on Computer aided verification, CAV’11, 2011.
[10]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI’08, pages 209–224, Berkeley, CA, USA, 2008.
[11]
Cristian Cadar and Dawson Engler. Execution generated test cases: How to make systems code crash itself. In Proceedings of the 12th International Conference on Model Checking Software, SPIN’05, pages 2–23, 2005.
[12]
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. EXE: automatically generating inputs of death. In Proceedings of the 13th ACM conference on Computer and communications security, CCS ’06, pages 322–335, 2006.
[13]
Vitaly Chipounov and George Candea. Dynamically Translating x86 to LLVM using QEMU. Technical report, 2010.
[14]
Vitaly Chipounov, Vlad Georgescu, Cristian Zamfir, and George Candea. Selective symbolic execution. In Workshop on Hot Topics in Dependable Systems, 2009.
[15]
Cristina Cifuentes, Brian Lewis, and David Ung. Walkabout - a retargetable dynamic binary translation framework. In In Proceedings of the 2002 Workshop on Binary Translation, 2002.
[16]
Cristina Cifuentes, Mike Van Emmerik, and Norman Ramsey. The design of a resourceable and retargetable binary translator. In Reverse Engineering, 1999. Proceedings. Sixth Working Conference on, pages 280–291. IEEE, 1999.
[17]
Jack W. Davidson and Christopher W. Fraser. Code Selection Through Object Code Optimization. ACM Trans. Program. Lang. Syst., 1984.
[18]
Leonardo De Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, 2008.
[19]
Manuel Egele, Christopher Kruegel, Engin Kirda, Heng Yin, and Dawn Song. Dynamic spyware analysis. In Proceedings of the USENIX Conference on Annual Technical Conference, 2007.
[20]
Ulfar Erlingsson, Silicon Valley, Martin Abadi, Michael Vrable, Mihai Budiu, and George C. Necula. Xfi: software guards for system address spaces. In OSDI, 2006.
[21]
Vijay Ganesh and David L. Dill. A Decision Procedure for Bit-vectors and Arrays. In Proceedings of the 19th International Conference on Computer Aided Verification, CAV’07, pages 519–531, 2007.
[22]
Patrice Godefroid. Compositional dynamic test generation. In Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’07, pages 47–54, 2007.
[23]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, PLDI ’05, pages 213–223, 2005.
[24]
Patrice Godefroid, Michael Y Levin, and David A Molnar. Automated Whitebox Fuzz Testing. In Network Distributed Security Symposium (NDSS), volume 8, pages 151–166, 2008.
[25]
Patrice Godefroid and Ankur Taly. Automated Synthesis of Symbolic Instruction Encodings from I/O Samples. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, 2012.
[26]
Niranjan Hasabnis. Automatic Synthesis of Instruction Set Semantics and its Applications. PhD thesis, Stony Brook, NY, USA, August, 2015.
[27]
Niranjan Hasabnis, Rui Qiao, and R. Sekar. Checking Correctness of Code Generator Architecture Specifications. In International Symposium on Code Generation and Optimization, CGO, 2015.
[28]
Niranjan Hasabnis and R Sekar. EISSEC - Extracting Instruction Semantics by Symbolic Execution of Code Generators - software release. http://seclab.cs.sunysb.edu/seclab/eissec/.
[29]
Niranjan Hasabnis and R. Sekar. Automatic Generation of Assembly to IR Translators Using Compilers (short paper). In Workshop on Architectural and Microarchitectural Support for Binary Translation (in conjuction with CGO), AMAS-BT, 2015.
[30]
Niranjan Hasabnis and R. Sekar. Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, 2016.
[31]
Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken. Stratified Synthesis: Automatically Learning the x86-64 Instruction Set. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016.
[32]
J.N. Hooker. Solving the incremental satisfiability problem. The Journal of Logic Programming, 15(1–2):177 – 186, 1993.
[33]
Yuan-Shin Hwang, Tzong-Yen Lin, and Rong-Guey Chang. Disirer: Converting a retargetable compiler into a multiplatform binary translator. ACM Trans. Archit. Code Optim., 7(4):18:1–18:36, December 2010.
[34]
Vladimir Kiriansky, Derek Bruening, and Saman P. Amarasinghe. Secure Execution via Program Shepherding. In USENIX Security Symposium, 2002.
[35]
Guodong Li, Indradeep Ghosh, and Sreeranga P. Rajan. KLOVER: a symbolic execution and automatic test generation tool for C++ programs. In Proceedings of the 23rd international conference on Computer aided verification, CAV’11, pages 609–615, 2011.
[36]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, PLDI ’05, 2005.
[37]
George C. Necula, Scott McPeak, Shree Prakash Rahul, and Westley Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. In Proceedings of the 11th International Conference on Compiler Construction, CC ’02, pages 213–228, 2002.
[38]
Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI ’07. ACM, 2007.
[39]
James Newsome and Dawn Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In Network Distributed Security Symposium (NDSS), 2005.
[40]
Corina S. Păsăreanu and Neha Rungta. Symbolic PathFinder: Symbolic Execution of Java Bytecode. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE ’10, pages 179–180, 2010.
[41]
Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan Zhou, and Youfeng Wu. LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks. In The Annual IEEE/ACM International Symposium on Microarchitecture, 2006.
[42]
Prateek Saxena, R Sekar, and Varun Puranik. Efficient fine-grained binary instrumentationwith applications to taint-tracking. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO ’08, 2008.
[43]
Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 317–331, Washington, DC, USA, 2010.
[44]
R Sekar, IV Ramakrishnan, and Andrei Voronkov. Term indexing, handbook of automated reasoning, 2001.
[45]
R. C. Sekar, R. Ramesh, and I. V. Ramakrishnan. Adaptive Pattern Matching. In Proceedings of the 19th International Colloquium on Automata, Languages and Programming, ICALP ’92, 1992.
[46]
Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, ESEC/FSE-13, pages 263–272, 2005.
[47]
Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Proceedings of the 4th International Conference on Information Systems Security. Keynote invited paper., December 2008.
[48]
Markus Triska. The finite domain constraint solver of SWI-Prolog. In FLOPS, volume 7294 of LNCS, pages 307–316, 2012.
[49]
Markus Triska. Correctness Considerations in CLP(FD) Systems. PhD thesis, Vienna University of Technology, 2014.
[50]
Shaohui Wang, Srinivasan Dwarakanathan, Oleg Sokolsky, and Insup Lee. High-level model extraction via symbolic execution. Technical Report MS-CIS-12-04, Department of Computer and Information Science, University of Pennsylvania, 2012.
[51]
J. Whittemore, Joonyoung Kim, and K. Sakallah. SATIRE: A new incremental satisfiability engine. In Design Automation Conference, 2001. Proceedings, pages 542–545, 2001.
[52]
Jan Wielemaker, Tom Schrijvers, Markus Triska, and Torbjörn Lager. SWI-Prolog. Theory and Practice of Logic Programming, 12(1-2):67–96, 2012.
[53]
Bennet Yee, David Sehr, Gregory Dardyk, J. Bradley Chen, Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. Native client: A sandbox for portable, untrusted x86 native code. In Proceedings of the IEEE Symposium on Security and Privacy, 2009.
[54]
Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda. Panorama: capturing system-wide information flow for malware detection and analysis. In Proceedings of the ACM Conference on Computer and Communications Security, 2007.
[55]
Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Stephen McCamant, Laszlo Szekeres, Dawn Song, and Wei Zou. Practical control flow integrity & randomization for binary executables. In Proceedings of the IEEE Symposium on Security and Privacy, 2013.
[56]
Mingwei Zhang, Rui Qiao, Niranjan Hasabnis, and R. Sekar. A platform for secure static binary instrumentation. In ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2014.
[57]
Mingwei Zhang and R. Sekar. Control flow integrity for COTS binaries. In USENIX Security Symposium, 2013.

Cited By

View all
  • (2024)libLISA: Instruction Discovery and Analysis on x86-64Proceedings of the ACM on Programming Languages10.1145/36897238:OOPSLA2(333-361)Online publication date: 8-Oct-2024
  • (2024)Accurate Disassembly of Complex Binaries Without Use of Compiler MetadataProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624766(1-18)Online publication date: 7-Feb-2024
  • (2020)Benchmarking the Capability of Symbolic Execution Tools with Logic BombsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2018.286646917:6(1243-1256)Online publication date: 1-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
November 2016
1156 pages
ISBN:9781450342186
DOI:10.1145/2950290
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code generators
  2. Instruction-set semantics extraction
  3. Symbolic execution

Qualifiers

  • Research-article

Funding Sources

Conference

FSE'16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)15
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)libLISA: Instruction Discovery and Analysis on x86-64Proceedings of the ACM on Programming Languages10.1145/36897238:OOPSLA2(333-361)Online publication date: 8-Oct-2024
  • (2024)Accurate Disassembly of Complex Binaries Without Use of Compiler MetadataProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624766(1-18)Online publication date: 7-Feb-2024
  • (2020)Benchmarking the Capability of Symbolic Execution Tools with Logic BombsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2018.286646917:6(1243-1256)Online publication date: 1-Nov-2020
  • (2018)Cross-Architecture Lifter SynthesisSoftware Engineering and Formal Methods10.1007/978-3-319-92970-5_10(155-170)Online publication date: 30-May-2018
  • (2017)Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2017.29(201-212)Online publication date: Jun-2017
  • (2017)A Dictionary Sequence Model to Analyze the Security of Protocol Implementations at the Source Code LevelTrusted Computing and Information Security10.1007/978-981-10-7080-8_11(126-142)Online publication date: 23-Nov-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media