research-article

Public Access

Extracting instruction semantics via symbolic execution of code generators

Authors:

Niranjan Hasabnis,

R. SekarAuthors Info & Claims

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 301 - 313

https://doi.org/10.1145/2950290.2950335

Published: 01 November 2016 Publication History

Abstract

Binary analysis and instrumentation form the basis of many tools and frameworks for software debugging, security hardening, and monitoring. Accurate modeling of instruction semantics is paramount in this regard, as errors can lead to program crashes, or worse, bypassing of security checks. Semantic modeling is a daunting task for modern processors such as x86 and ARM that support over a thousand instructions, many of them with complex semantics. This paper describes a new approach to automate this semantic modeling task. Our approach leverages instruction semantics knowledge that is already encoded into today's production compilers such as GCC and LLVM. Such an approach can greatly reduce manual effort, and more importantly, avoid errors introduced by manual modeling. Furthermore, it is applicable to any of the numerous architectures already supported by the compiler. In this paper, we develop a new symbolic execution technique to extract instruction semantics from a compiler's source code. Unlike previous applications of symbolic execution that were focused on identifying a single program path that violates a property, our approach addresses the all paths problem, extracting the entire input/output behavior of the code generator. We have applied it successfully to the 120K lines of C-code used in GCC's code generator to extract x86 instruction semantics. To demonstrate architecture-neutrality, we have also applied it to AVR, a processor used in the popular Arduino platform.

References

[1]

Dagger. http://dagger.repzret.org.

[2]

Mart´ın Abadi, Mihai Budiu, ´ Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity principles, implementations, and applications. ACM Transactions on Information and System Security (TISSEC), 2009.

Digital Library

[3]

Mihhail Aizatulin, Andrew D. Gordon, and Jan Jürjens. Extracting and verifying cryptographic models from c protocol code by symbolic execution. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pages 331–340, 2011.

Digital Library

[4]

Kapil Anand, Matthew Smithson, Aparna Kotha, Khaled Elwazeer, and Rajeev Barua. Decompilation to compiler high ir in a binary rewriter. Technical report, Tech. rep., University of Maryland (November 2010), http://www. ece. umd. edu/barua/high-IR-technical-report10. pdf.

[5]

Thanassis Avgerinos, Sang Kil Cha, Brent Lim Tze Hao, and David Brumley. AEG: Automatic Exploit Generation. In Network and Distributed System Security Symposium, Feburary 2011.

[6]

Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. Codesurfer/x86—a platform for analyzing x86 executables. In Compiler Construction. Springer, 2005.

Digital Library

[7]

Fabrice Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Conference on Annual Technical Conference, 2005.

Digital Library

[8]

Derek L. Bruening. Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Cambridge, MA, USA, 2004.

Digital Library

[9]

David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J. Schwartz. Bap: a binary analysis platform. In Proceedings of the 23rd international conference on Computer aided verification, CAV’11, 2011.

Digital Library

[10]

Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI’08, pages 209–224, Berkeley, CA, USA, 2008.

Digital Library

[11]

Cristian Cadar and Dawson Engler. Execution generated test cases: How to make systems code crash itself. In Proceedings of the 12th International Conference on Model Checking Software, SPIN’05, pages 2–23, 2005.

Digital Library

[12]

Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. EXE: automatically generating inputs of death. In Proceedings of the 13th ACM conference on Computer and communications security, CCS ’06, pages 322–335, 2006.

Digital Library

[13]

Vitaly Chipounov and George Candea. Dynamically Translating x86 to LLVM using QEMU. Technical report, 2010.

[14]

Vitaly Chipounov, Vlad Georgescu, Cristian Zamfir, and George Candea. Selective symbolic execution. In Workshop on Hot Topics in Dependable Systems, 2009.

[15]

Cristina Cifuentes, Brian Lewis, and David Ung. Walkabout - a retargetable dynamic binary translation framework. In In Proceedings of the 2002 Workshop on Binary Translation, 2002.

[16]

Cristina Cifuentes, Mike Van Emmerik, and Norman Ramsey. The design of a resourceable and retargetable binary translator. In Reverse Engineering, 1999. Proceedings. Sixth Working Conference on, pages 280–291. IEEE, 1999.

Digital Library

[17]

Jack W. Davidson and Christopher W. Fraser. Code Selection Through Object Code Optimization. ACM Trans. Program. Lang. Syst., 1984.

Digital Library

[18]

Leonardo De Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, 2008.

Digital Library

[19]

Manuel Egele, Christopher Kruegel, Engin Kirda, Heng Yin, and Dawn Song. Dynamic spyware analysis. In Proceedings of the USENIX Conference on Annual Technical Conference, 2007.

Digital Library

[20]

Ulfar Erlingsson, Silicon Valley, Martin Abadi, Michael Vrable, Mihai Budiu, and George C. Necula. Xfi: software guards for system address spaces. In OSDI, 2006.

Digital Library

[21]

Vijay Ganesh and David L. Dill. A Decision Procedure for Bit-vectors and Arrays. In Proceedings of the 19th International Conference on Computer Aided Verification, CAV’07, pages 519–531, 2007.

Digital Library

[22]

Patrice Godefroid. Compositional dynamic test generation. In Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’07, pages 47–54, 2007.

Digital Library

[23]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, PLDI ’05, pages 213–223, 2005.

Digital Library

[24]

Patrice Godefroid, Michael Y Levin, and David A Molnar. Automated Whitebox Fuzz Testing. In Network Distributed Security Symposium (NDSS), volume 8, pages 151–166, 2008.

[25]

Patrice Godefroid and Ankur Taly. Automated Synthesis of Symbolic Instruction Encodings from I/O Samples. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, 2012.

Digital Library

[26]

Niranjan Hasabnis. Automatic Synthesis of Instruction Set Semantics and its Applications. PhD thesis, Stony Brook, NY, USA, August, 2015.

[27]

Niranjan Hasabnis, Rui Qiao, and R. Sekar. Checking Correctness of Code Generator Architecture Specifications. In International Symposium on Code Generation and Optimization, CGO, 2015.

[28]

Niranjan Hasabnis and R Sekar. EISSEC - Extracting Instruction Semantics by Symbolic Execution of Code Generators - software release. http://seclab.cs.sunysb.edu/seclab/eissec/.

[29]

Niranjan Hasabnis and R. Sekar. Automatic Generation of Assembly to IR Translators Using Compilers (short paper). In Workshop on Architectural and Microarchitectural Support for Binary Translation (in conjuction with CGO), AMAS-BT, 2015.

[30]

Niranjan Hasabnis and R. Sekar. Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, 2016.

Digital Library

[31]

Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken. Stratified Synthesis: Automatically Learning the x86-64 Instruction Set. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016.

Digital Library

[32]

J.N. Hooker. Solving the incremental satisfiability problem. The Journal of Logic Programming, 15(1–2):177 – 186, 1993.

Digital Library

[33]

Yuan-Shin Hwang, Tzong-Yen Lin, and Rong-Guey Chang. Disirer: Converting a retargetable compiler into a multiplatform binary translator. ACM Trans. Archit. Code Optim., 7(4):18:1–18:36, December 2010.

Digital Library

[34]

Vladimir Kiriansky, Derek Bruening, and Saman P. Amarasinghe. Secure Execution via Program Shepherding. In USENIX Security Symposium, 2002.

Digital Library

[35]

Guodong Li, Indradeep Ghosh, and Sreeranga P. Rajan. KLOVER: a symbolic execution and automatic test generation tool for C++ programs. In Proceedings of the 23rd international conference on Computer aided verification, CAV’11, pages 609–615, 2011.

Digital Library

[36]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, PLDI ’05, 2005.

Digital Library

[37]

George C. Necula, Scott McPeak, Shree Prakash Rahul, and Westley Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. In Proceedings of the 11th International Conference on Compiler Construction, CC ’02, pages 213–228, 2002.

Digital Library

[38]

Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI ’07. ACM, 2007.

Digital Library

[39]

James Newsome and Dawn Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In Network Distributed Security Symposium (NDSS), 2005.

[40]

Corina S. Păsăreanu and Neha Rungta. Symbolic PathFinder: Symbolic Execution of Java Bytecode. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE ’10, pages 179–180, 2010.

Digital Library

[41]

Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan Zhou, and Youfeng Wu. LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks. In The Annual IEEE/ACM International Symposium on Microarchitecture, 2006.

Digital Library

[42]

Prateek Saxena, R Sekar, and Varun Puranik. Efficient fine-grained binary instrumentationwith applications to taint-tracking. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO ’08, 2008.

Digital Library

[43]

Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 317–331, Washington, DC, USA, 2010.

Digital Library

[44]

R Sekar, IV Ramakrishnan, and Andrei Voronkov. Term indexing, handbook of automated reasoning, 2001.

Digital Library

[45]

R. C. Sekar, R. Ramesh, and I. V. Ramakrishnan. Adaptive Pattern Matching. In Proceedings of the 19th International Colloquium on Automata, Languages and Programming, ICALP ’92, 1992.

Digital Library

[46]

Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, ESEC/FSE-13, pages 263–272, 2005.

Digital Library

[47]

Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Proceedings of the 4th International Conference on Information Systems Security. Keynote invited paper., December 2008.

Digital Library

[48]

Markus Triska. The finite domain constraint solver of SWI-Prolog. In FLOPS, volume 7294 of LNCS, pages 307–316, 2012.

Digital Library

[49]

Markus Triska. Correctness Considerations in CLP(FD) Systems. PhD thesis, Vienna University of Technology, 2014.

[50]

Shaohui Wang, Srinivasan Dwarakanathan, Oleg Sokolsky, and Insup Lee. High-level model extraction via symbolic execution. Technical Report MS-CIS-12-04, Department of Computer and Information Science, University of Pennsylvania, 2012.

[51]

J. Whittemore, Joonyoung Kim, and K. Sakallah. SATIRE: A new incremental satisfiability engine. In Design Automation Conference, 2001. Proceedings, pages 542–545, 2001.

Digital Library

[52]

Jan Wielemaker, Tom Schrijvers, Markus Triska, and Torbjörn Lager. SWI-Prolog. Theory and Practice of Logic Programming, 12(1-2):67–96, 2012.

Digital Library

[53]

Bennet Yee, David Sehr, Gregory Dardyk, J. Bradley Chen, Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. Native client: A sandbox for portable, untrusted x86 native code. In Proceedings of the IEEE Symposium on Security and Privacy, 2009.

Digital Library

[54]

Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda. Panorama: capturing system-wide information flow for malware detection and analysis. In Proceedings of the ACM Conference on Computer and Communications Security, 2007.

Digital Library

[55]

Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Stephen McCamant, Laszlo Szekeres, Dawn Song, and Wei Zou. Practical control flow integrity & randomization for binary executables. In Proceedings of the IEEE Symposium on Security and Privacy, 2013.

Digital Library

[56]

Mingwei Zhang, Rui Qiao, Niranjan Hasabnis, and R. Sekar. A platform for secure static binary instrumentation. In ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2014.

Digital Library

[57]

Mingwei Zhang and R. Sekar. Control flow integrity for COTS binaries. In USENIX Security Symposium, 2013.

Digital Library

Cited By

Craaijo JVerbeek FRavindran B(2024)libLISA: Instruction Discovery and Analysis on x86-64Proceedings of the ACM on Programming Languages10.1145/36897238:OOPSLA2(333-361)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689723
Priyadarshan SNguyen HSekar RAamodt TSwift MJerger N(2024)Accurate Disassembly of Complex Binaries Without Use of Compiler MetadataProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624766(1-18)Online publication date: 7-Feb-2024
https://doi.org/10.1145/3623278.3624766
Xu HZhao ZZhou YLyu M(2020)Benchmarking the Capability of Symbolic Execution Tools with Logic BombsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2018.286646917:6(1243-1256)Online publication date: 1-Nov-2020
https://doi.org/10.1109/TDSC.2018.2866469
Show More Cited By

Index Terms

Extracting instruction semantics via symbolic execution of code generators
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
    2. Software post-development issues
      1. Software reverse engineering
  2. Software notations and tools
    1. Compilers
      1. Retargetable compilers
      2. Source code generation
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning

Recommendations

Retargetable instruction scheduling for pipelined processors
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Increasing the instruction fetch rate via block-structured instruction set architectures
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

November 2016

1156 pages

ISBN:9781450342186

DOI:10.1145/2950290

General Chair:
Thomas Zimmermann
Microsoft Research, USA
,
Program Chairs:
Jane Cleland-Huang
University of Notre Dame, USA
,
Zhendong Su
University of California at Davis, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

FSE'16

Sponsor:

SIGSOFT

FSE'16: 24nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering

November 13 - 18, 2016

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
722
Total Downloads

Downloads (Last 12 months)130
Downloads (Last 6 weeks)15

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Craaijo JVerbeek FRavindran B(2024)libLISA: Instruction Discovery and Analysis on x86-64Proceedings of the ACM on Programming Languages10.1145/36897238:OOPSLA2(333-361)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689723
Priyadarshan SNguyen HSekar RAamodt TSwift MJerger N(2024)Accurate Disassembly of Complex Binaries Without Use of Compiler MetadataProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624766(1-18)Online publication date: 7-Feb-2024
https://doi.org/10.1145/3623278.3624766
Xu HZhao ZZhou YLyu M(2020)Benchmarking the Capability of Symbolic Execution Tools with Logic BombsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2018.286646917:6(1243-1256)Online publication date: 1-Nov-2020
https://doi.org/10.1109/TDSC.2018.2866469
van Tonder RLe Goues C(2018)Cross-Architecture Lifter SynthesisSoftware Engineering and Formal Methods10.1007/978-3-319-92970-5_10(155-170)Online publication date: 30-May-2018
https://doi.org/10.1007/978-3-319-92970-5_10
Qiao RSekar R(2017)Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2017.29(201-212)Online publication date: Jun-2017
https://doi.org/10.1109/DSN.2017.29
Wu FZhang H(2017)A Dictionary Sequence Model to Analyze the Security of Protocol Implementations at the Source Code LevelTrusted Computing and Information Security10.1007/978-981-10-7080-8_11(126-142)Online publication date: 23-Nov-2017
https://doi.org/10.1007/978-981-10-7080-8_11

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents