Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Model-assisted machine-code synthesis

Published: 12 October 2017 Publication History

Abstract

Binary rewriters are tools that are used to modify the functionality of binaries lacking source code. Binary rewriters can be used to rewrite binaries for a variety of purposes including optimization, hardening, and extraction of executable components. To rewrite a binary based on semantic criteria, an essential primitive to have is a machine-code synthesizer---a tool that synthesizes an instruction sequence from a specification of the desired behavior, often given as a formula in quantifier-free bit-vector logic (QFBV). However, state-of-the-art machine-code synthesizers such as McSynth++ employ naive search strategies for synthesis: McSynth++ merely enumerates candidates of increasing length without performing any form of prioritization. This inefficient search strategy is compounded by the huge number of unique instruction schemas in instruction sets (e.g., around 43,000 in Intel's IA-32) and the exponential cost inherent in enumeration. The effect is slow synthesis: even for relatively small specifications, McSynth++ might take several minutes or a few hours to find an implementation.
In this paper, we describe how we use machine learning to make the search in McSynth++ smarter and potentially faster. We converted the linear search in McSynth++ into a best-first search over the space of instruction sequences. The cost heuristic for the best-first search comes from two models---used together---built from a corpus of 〈QFBV-formula, instruction-sequence〉 pairs: (i) a language model that favors useful instruction sequences, and (ii) a regression model that correlates features of instruction sequences with features of QFBV formulas, and favors instruction sequences that are more likely to implement the input formula. Our experiments for IA-32 showed that our model-assisted synthesizer enables synthesis of code for 6 out of 50 formulas on which McSynth++ times out, speeding up the synthesis time by at least 549X, and for the remaining formulas, speeds up synthesis by 4.55X.

References

[1]
M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. 2005. Control-flow Integrity. In CCS.
[2]
G. Balakrishnan and T. Reps. 2010. WYSINWYX: What You See Is Not What You eXecute. TOPLAS 32, 6 (2010).
[3]
S. Bansal and A. Aiken. 2006. Automatic Generation of Peephole Superoptimizers. In ASPLOS.
[4]
S. Bansal and A. Aiken. 2008. Binary Translation Using Peephole Superoptimizers. In OSDI.
[5]
D. Brumley, I. Jager, T. Avgerinos, and E. Schwartz. 2011. BAP: A Binary Analysis Platform. In CAV.
[6]
B. Dutertre and L. de Moura. 2006. Yices: An SMT Solver. (2006). http://yices.csl.sri.com/.
[7]
K. ElWazeer, K. Anand, A. Kotha, M. Smithson, and R. Barua. 2013. Scalable Variable and Data Type Detection in a Binary Rewriter. In PLDI.
[8]
T. Gvero and V. Kuncak. 2015. Synthesizing Java expressions from free-form queries. In OOPSLA.
[9]
J. Henning. 2006. SPEC CP U2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (2006), 1–17.
[10]
B. Hsu and J. Glass. 2008. Iterative Language Model Estimation: Efficient Data Structure and Algorithms. In Interspeech.
[11]
R. Joshi, G. Nelson, and K. Randall. 2002. Denali: A Goal-directed Superoptimizer. In PLDI.
[12]
J. Lim, A. Lal, and T. Reps. 2011. Symbolic Analysis via Semantic Reinterpretation. Softw. Tools for Tech. Transfer 13, 1 (2011), 61–87.
[13]
J. Lim and T. Reps. 2013. TSL: A system for generating abstract interpreters and its application to machine-code analysis. TOPLAS 35, 4 (2013).
[14]
H. Massalin. 1987. Superoptimizer: A Look at the Smallest Program. In ASPLOS.
[15]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[16]
P. Phothilimthana, A. Thakur, R. Bodik, and D. Ghurjati. 2016a. GreenThumb: Superoptimizer Construction Framework. UCB/EECS-2016-8. University of California–Berkeley Tech Report. http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/ EECS- 2016- 8.pdf
[17]
P. Phothilimthana, A. Thakur, R. Bodik, and D. Ghurjati. 2016b. Scaling up Superoptimization. In ASPLOS.
[18]
V. Raychev, M. Vechev, and A. Krause. 2015. Predicting Program Properties from“Big Code". In POPL.
[19]
V. Raychev, M. Vechev, and E. Yahav. 2014. Code Completion with Statistical Language Models. In PLDI.
[20]
H. Saïdi. 2008. Logical Foundation for Static Analysis: Application to Binary Static Analysis for Security. ACM SIGAda Ada Letters 28, 1 (2008), 96–102.
[21]
E. Schkufza, R. Sharma, and A. Aiken. 2013. Stochastic Superoptimization. In ASPLOS.
[22]
A. Slowinska, T. Stancescu, and H. Bos. 2012. Body Armor for Binaries: Preventing Buffer Overflows Without Recompilation. In ATC.
[23]
D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. 2008. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Int. Conf. on Information Systems Security.
[24]
V. Srinivasan and T. Reps. 2015a. Partial Evaluation of Machine Code. In OOPSLA.
[25]
V. Srinivasan and T. Reps. 2015b. Synthesis of Machine Code from Semantics. In PLDI.
[26]
V. Srinivasan and T. Reps. 2016. An Improved Algorithm for Slicing Machince Code. In OOPSLA.
[27]
V. Srinivasan, T. Sharma, and T. Reps. 2016. Speeding-up Machine-Code Synthesis. In OOPSLA.
[28]
B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In S&P.

Cited By

View all
  • (2023)Towards Porting Operating Systems with Program SynthesisACM Transactions on Programming Languages and Systems10.1145/356394345:1(1-70)Online publication date: 3-Mar-2023
  • (2021)Assuage: Assembly Synthesis Using A Guided ExplorationThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474740(134-148)Online publication date: 10-Oct-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 1, Issue OOPSLA
October 2017
1786 pages
EISSN:2475-1421
DOI:10.1145/3152284
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2017
Published in PACMPL Volume 1, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. IA-32 instruction set
  2. best-first search
  3. machine learning
  4. machine-code synthesis
  5. n-gram language model
  6. regression model

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)10
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Towards Porting Operating Systems with Program SynthesisACM Transactions on Programming Languages and Systems10.1145/356394345:1(1-70)Online publication date: 3-Mar-2023
  • (2021)Assuage: Assembly Synthesis Using A Guided ExplorationThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474740(134-148)Online publication date: 10-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media