Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3623278.3624766acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Accurate Disassembly of Complex Binaries Without Use of Compiler Metadata

Published: 07 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Accurate disassembly of stripped binaries is the first step in binary analysis, instrumentation and reverse engineering. Complex instruction sets such as the x86 pose major challenges in this context because it is very difficult to distinguish between code and embedded data. To make progress, many recent approaches have either made optimistic assumptions (e.g., absence of embedded data) or relied on additional compiler-generated metadata (e.g., relocation info and/or exception handling metadata). Unfortunately, many complex binaries do contain embedded data, while lacking the additional metadata needed by these techniques. We therefore present a novel approach for accurate disassembly that uses statistical properties of data to detect code, and behavioral properties of code to flag data. We present new static analysis and data-driven probabilistic techniques that are then combined using a prioritized error correction algorithm to achieve results that are 3X to 4X more accurate than the best previous results.

    References

    [1]
    Amogh Akshintala, Bhushan Jain, Chia-Che Tsai, Michael Ferdman, and Donald E Porter. 2019. Occurrence of instructions among C/C++ binaries in Ubuntu 16.04. http://x86instructionpop.com/.
    [2]
    Amogh Akshintala, Bhushan Jain, Chia-Che Tsai, Michael Ferdman, and Donald E Porter. 2019. X86-64 instruction usage among C/C++ applications. In SYSTOR.
    [3]
    Jim Alves-Foss and Jia Song. 2019. Function boundary detection in stripped binaries. In ACSAC.
    [4]
    Dennis Andriesse, Xi Chen, Victor Van Der Veen, Asia Slowinska, and Herbert Bos. 2016. An in-depth analysis of disassembly on full-scale x86/x64 binaries. In USENIX Security.
    [5]
    Dennis Andriesse, Asia Slowinska, and Herbert Bos. 2017. Compiler-agnostic function detection in binaries. In IEEE S&P.
    [6]
    Michael Backes and Stefan Nürnberger. 2014. Oxymoron: Making fine-grained memory randomization practical by allowing code sharing. In USENIX Security.
    [7]
    G. Balakrishnan and T. Reps. 2004. Analyzing memory accesses in x86 executables. In Compiler Construction.
    [8]
    T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley. 2014. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In USENIX Security.
    [9]
    Erick Bauman, Zhiqiang Lin, and Kevin W Hamlen. 2018. Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics. In NDSS.
    [10]
    M Ammar Ben Khadra, Dominik Stoffel, and Wolfgang Kunz. 2016. Speculative disassembly of binary code. In CASES.
    [11]
    Sandeep Bhatkar, R. Sekar, and Daniel C. DuVarney. 2005. Efficient techniques for comprehensive protection from memory error exploits. In USENIX Security.
    [12]
    Martial Bourquin, Andy King, and Edward Robbins. 2013. Binslayer: accurate comparison of binary executables. In ACM SIGPLAN Program Protection and Reverse Engineering Workshop.
    [13]
    Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In CGO.
    [14]
    Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2006. Detecting self-mutating malware using control-flow graph matching. In Detection of Intrusions and Malware & Vulnerability Assessment: Third International Conference (DIMVA 2006).
    [15]
    Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In ACM SIGSOFT.
    [16]
    Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In ACM Principles of programming languages.
    [17]
    Marco Cova, Viktoria Felmetsger, Greg Banks, and Giovanni Vigna. 2006. Static detection of vulnerabilities in x86 executables. In ACSAC.
    [18]
    Lucas Vincenzo Davi, Alexandra Dmitrienko, Stefan Nürnberger, and Ahmad-Reza Sadeghi. 2013. Gadge me if you can: secure and efficient ad-hoc instruction-level randomization for x86 and ARM. In ACM CCS.
    [19]
    Bjorn De Sutter, Bruno De Bus, and Koen De Bosschere. 2005. Link-time binary rewriting techniques for program compaction. ACM TOPLAS (2005).
    [20]
    Alessandro Di Federico and Giovanni Agosta. 2016. A jump-target identification method for multi-architecture static binary translation. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems.
    [21]
    Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. 2017. rev.ng: a unified binary analysis framework to recover CFGs and function boundaries. In Proceedings of the 26th International Conference on Compiler Construction.
    [22]
    Sushant Dinesh, Nathan Burow, Dongyan Xu, and Mathias Payer. 2020. RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization. In IEEE S&P.
    [23]
    Andrew Edwards, Amitabh Srivastava, and Hoi Vo. 2001. Vulcan: Binary transformation in a distributed environment. Technical Report. Technical Report MSR-TR-2001-50, Microsoft Research.
    [24]
    Alan Eustace and Amitabh Srivastava. 1995. ATOM: A flexible interface for building high performance program analysis tools. In USENIX.
    [25]
    Antonio Flores-Montoya and Eric Schulte. 2020. Datalog disassembly. In USENIX Security.
    [26]
    Masoud Ghaffarinia and Kevin W Hamlen. 2019. Binary control-flow trimming. In Proceedings of the 2019 ACM SIGSAC Conference on CCS.
    [27]
    GNU. [n.d.]. Index of/gnu/binutils. https://ftp.gnu.org/gnu/binutils/. Accessed: 2023-03-03.
    [28]
    Part Guide. 2011. Intel® 64 and IA-32 architectures software developerâĂŹs manual. Volume 3B: System programming Guide, Part (2011).
    [29]
    Sumit Gulwani and George C Necula. 2003. Discovering affine equalities using random interpretation. In POPL.
    [30]
    Laune C Harris and Barton P Miller. 2005. Practical analysis of stripped binary code. ACM SIGARCH (2005).
    [31]
    Niranjan Hasabnis and R Sekar. 2016. Extracting Instruction Semantics Via Symbolic Execution of Code Generators. In ACM FSE.
    [32]
    Niranjan Hasabnis and R Sekar. 2016. Lifting assembly to intermediate representation: A novel approach leveraging compilers. In ASPLOS.
    [33]
    Xin Hu and Kang G Shin. 2013. DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles. In ACSAC.
    [34]
    Vladimir Kiriansky, Derek Bruening, and Saman P. Amarasinghe. 2002. Secure Execution via Program Shepherding. In USENIX Security.
    [35]
    Hyungjoon Koo, Yaohui Chen, Long Lu, Vasileios P Kemerlis, and Michalis Polychronakis. 2018. Compiler-assisted code randomization. In Security and Privacy.
    [36]
    Christopher Kruegel, William Robertson, Fredrik Valeur, and Giovanni Vigna. 2004. Static disassembly of obfuscated binaries. In USENIX Security.
    [37]
    James R. Larus and Eric Schnarr. 1995. EEL: machine-independent executable editing. In PLDI.
    [38]
    JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled reverse engineering of types in binary programs. (2011).
    [39]
    Lixin Li, Jim Just, and R. Sekar. 2006. Address-space randomization for windows systems. In ACSAC.
    [40]
    Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018).
    [41]
    Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Automatic reverse engineering of data structures from binary execution. In Proceedings of the 11th Annual Information Security Symposium.
    [42]
    HJ Lu, Michael Matz, J Hubicka, A Jaeger, and M Mitchell. 2018. System V application binary interface. AMD64 Architecture Processor Supplement (2018).
    [43]
    C-K Luk, Robert Muth, Harish Patil, Robert Cohn, and Geoff Lowney. 2004. Ispike: a post-link optimizer for the intel/spl reg/itanium/spl reg/architecture. In CGO.
    [44]
    Xiaozhu Meng and Barton P Miller. 2016. Binary code is not easy. In ISSTA.
    [45]
    Kenneth Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, Xiangyu Zhang, and Zhiqiang Lin. 2019. Probabilistic disassembly. In IEEE/ACM ICSE.
    [46]
    Markus Müller-Olm and Helmut Seidl. 2004. A note on KarrâĂŹs algorithm. In International Colloquium on Automata, Languages, and Programming.
    [47]
    Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavy-weight dynamic binary instrumentation. In PLDI.
    [48]
    Huan Nguyen, Niranjan Hasabnis, and R Sekar. 2019. LISC v2: Learning Instruction Semantics from Code Generators. http://www.seclab.cs.sunysb.edu/seclab/liscV2/. Accessed: 2023-08-06.
    [49]
    Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. Bolt: a practical binary optimizer for data centers and beyond. In IEEE/ACM CGO.
    [50]
    Chengbin Pang, Ruotong Yu, Yaohui Chen, Eric Koskinen, Georgios Portokalidis, Bing Mao, and Jun Xu. 2021. SoK: All you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In IEEE S&P.
    [51]
    Chengbin Pang, Ruotong Yu, Dongpeng Xu, Eric Koskinen, Georgios Portokalidis, and Jun Xu. 2021. Towards Optimal Use of Exception Handling Information for Function Detection. In DSN.
    [52]
    Kexin Pei, Jonas Guan, David Williams-King, Junfeng Yang, and Suman Jana. 2020. Xda: Accurate, robust disassembly with transfer learning. arXiv preprint arXiv:2010.00770 (2020).
    [53]
    Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In IEEE S&P.
    [54]
    Pin [n. d.]. Pin - A Dynamic Binary Instrumentation Tool. http://pintool.org/.
    [55]
    Soumyakant Priyadarshan, Huan Nguyen, Rohit Chouhan, and R Sekar. 2023. SAFER: Efficient and Error-Tolerant Binary Instrumentation. In USENIX Security.
    [56]
    Soumyakant Priyadarshan, Huan Nguyen, and R. Sekar. 2020. On the Impact of Exception Handling Compatibility on Binary Instrumentation. In ACM FEAST.
    [57]
    Soumyakant Priyadarshan, Huan Nguyen, and R. Sekar. 2020. Practical Fine-Grained Binary Code Randomization. In ACSAC.
    [58]
    Chenxiong Qian, Hong Hu, Mansour Alharthi, Simon Pak Ho Chung, Taesoo Kim, and Wenke Lee. 2019. RAZOR: A Framework for Post-deployment Software Debloating. In USENIX Security.
    [59]
    Rui Qiao and R Sekar. 2017. A Principled Approach for Function Recognition in COTS Binaries. In Dependable Systems and Networks (DSN).
    [60]
    Rui Qiao, Mingwei Zhang, and R Sekar. 2015. A Principled Approach for ROP Defense. In ACSAC.
    [61]
    Nilo Redini, Ruoyu Wang, Aravind Machiry, Yan Shoshitaishvili, Giovanni Vigna, and Christopher Kruegel. 2019. BinTrimmer: Towards static binary debloating through abstract interpretation. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, (DIMVA 2019).
    [62]
    Roman Rohleder. 2019. Hands-on ghidra-a tutorial about the software reverse engineering framework. In Proceedings of the 3rd ACM Workshop on Software Protection.
    [63]
    Prateek Saxena, R Sekar, and Varun Puranik. 2008. Efficient fine-grained binary instrumentation with applications to taint-tracking. In CGO.
    [64]
    Benjamin Schwarz, Saumya Debray, and Gregory Andrews. 2002. Disassembly of executable code revisited. In Working Conference on Reverse Engineering.
    [65]
    Kevin Scott and Jack Davidson. 2001. Strata: A software dynamic translation infrastructure. In IEEE Workshop on Binary Translation.
    [66]
    Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks. In USENIX Security.
    [67]
    Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. 2016. Sok:(state of) the art of war: Offensive techniques in binary analysis. In Security and Privacy (SP).
    [68]
    Weiqing Sun, R. Sekar, Gaurav Poothia, and Tejas Karandikar. 2008. Practical Proactive Integrity Preservation: A Basis for Malware Defense. In IEEE S&P.
    [69]
    Victor Van der Veen, Dennis Andriesse, Enes Göktaş, Ben Gras, Lionel Sambuc, Asia Slowinska, Herbert Bos, and Cristiano Giuffrida. 2015. Practical context-sensitive CFI. In ACM CCS.
    [70]
    Ruoyu Wang, Yan Shoshitaishvili, Antonio Bianchi, Aravind Machiry, John Grosen, Paul Grosen, Christopher Kruegel, and Giovanni Vigna. 2017. Ramblr: Making Reassembly Great Again. In NDSS.
    [71]
    Richard Wartell, Vishwath Mohan, Kevin W Hamlen, and Zhiqiang Lin. 2012. Binary stirring: Self-randomizing instruction addresses of legacy x86 binary code. In ACM CCS.
    [72]
    Richard Wartell, Yan Zhou, Kevin W Hamlen, and Murat Kantarcioglu. 2014. Shingled graph disassembly: Finding the undecideable path. In Pacific-Asia Conference on Knowledge Discovery and Data Mining.
    [73]
    David Williams-King, Hidenori Kobayashi, Kent Williams-King, Graham Patterson, Frank Spano, Yu Jian Wu, Junfeng Yang, and Vasileios P Kemerlis. 2020. Egalito: Layout-Agnostic Binary Recompilation. In ASPLOS.
    [74]
    Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres, Stephen McCamant, Dawn Song, and Wei Zou. 2013. Practical control flow integrity and randomization for binary executables. In IEEE S&P.
    [75]
    Mingwei Zhang, Michalis Polychronakis, and R Sekar. 2017. Protecting COTS Binaries from Disclosure-guided Code Reuse Attacks. In ACSAC.
    [76]
    Mingwei Zhang and R Sekar. 2013. Control flow integrity for COTS binaries. In USENIX Security.
    [77]
    Mingwei Zhang and R Sekar. 2015. Control flow and code integrity for COTS binaries: An effective defense against real-world ROP attacks. In ACSAC.
    [78]
    Zhuo Zhang, Wei You, Guanhong Tao, Yousra Aafer, Xuwei Liu, and Xiangyu Zhang. 2021. Stochfuzz: Sound and cost-effective fuzzing of stripped binaries by incremental and stochastic rewriting. In IEEE S&P.

    Index Terms

    1. Accurate Disassembly of Complex Binaries Without Use of Compiler Metadata
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4
        March 2023
        430 pages
        ISBN:9798400703942
        DOI:10.1145/3623278
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 February 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Badges

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        ASPLOS '23

        Acceptance Rates

        Overall Acceptance Rate 535 of 2,713 submissions, 20%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 183
          Total Downloads
        • Downloads (Last 12 months)183
        • Downloads (Last 6 weeks)27

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media