Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Optimizing Indirect Branches in Dynamic Binary Translators

Published: 05 April 2016 Publication History

Abstract

Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect branches. Unlike direct branches, which have a known target at translation time, an indirect branch requires translating a source program counter address to a translated program counter address every time the branch is executed. This translation can impose a serious runtime penalty if it is not handled efficiently.
MAMBO-X64, a dynamic binary translator that translates 32-bit ARM (AArch32) code to 64-bit ARM (AArch64) code, uses three novel techniques to improve the performance of indirect branch translation. Together, these techniques allow MAMBO-X64 to achieve a very low performance overhead of only 10% on average compared to native execution of 32-bit programs. Hardware-assisted function returns use a software return address stack to predict the targets of function returns, making use of several novel optimizations while also exploiting hardware return address prediction. This technique has a significant impact on most benchmarks, reducing binary translation overhead compared to native execution by 40% on average and by 90% on some benchmarks. Branch table inference, an algorithm for detecting and translating branch tables, can reduce the overhead of translated code by up to 40% on some SPEC CPU2006 benchmarks. The remaining indirect branches are handled using a fast atomic hash table, which is optimized to work with multiple threads. This last technique translates indirect branches using a single shared hash table while avoiding expensive synchronization in performance-critical lookup code. This allows the performance to be on par with thread-private hash tables while having superior memory scalability.

References

[1]
Keith Adams and Ole Agesen. 2006. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). ACM, 2--13.
[2]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference. USENIX, 41--46.
[3]
Derek Bruening, Timothy Garnett, and Saman P. Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the 1st IEEE/ACM International Symposium on Code Generation and Optimization (CGO'03). IEEE Computer Society, 265--275. CGO.2003.1191551
[4]
Derek Bruening, Vladimir Kiriansky, Timothy Garnett, and Sanjeev Banerji. 2006. Thread-shared software code caches. In Proceedings of the 4th IEEE/ACM International Symposium on Code Generation and Optimization (CGO'06). IEEE Computer Society, 28--38.
[5]
Derek Bruening, Qin Zhao, and Saman P. Amarasinghe. 2012. Transparent dynamic instrumentation. In Proceedings of the 8th International Conference on Virtual Execution Environments (VEE'12). ACM, 133--144.
[6]
Derek Lane Bruening. 2004. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.D. Dissertation. Massachusetts Institute of Technology.
[7]
James C. Dehnert, Brian Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The transmeta code morphing - software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the 1st IEEE/ACM International Symposium on Code Generation and Optimization (CGO'03). IEEE Computer Society, 15--24.
[8]
Kim M. Hazelwood and Artur Klauser. 2006. A dynamic binary instrumentation engine for the ARM architecture. In Proceedings of the 2006 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'06). ACM, 261--270.
[9]
Kim M. Hazelwood, Greg Lueck, and Robert Cohn. 2009. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 8th International Symposium on Memory Management (ISMM'09), Hillel Kolodner and Guy L. Steele Jr. (Eds.). ACM, 20--29. 10.1145/1542431.1542435
[10]
Jason Hiser, Daniel W. Williams, Wei Hu, Jack W. Davidson, Jason Mars, and Bruce R. Childers. 2007. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In Proceedings of the 5th International Symposium on Code Generation and Optimization (CGO'07). IEEE Computer Society, 61--73.
[11]
Raymond J. Hookway and Mark A. Herdeg. 1997. DIGITAL FX!32: Combining emulation and binary translation. Digital Technical Journal 9, 1 (1997), 3--12. http://www.hpl.hp.com/hpjournal/dtj/vol9num1/ vol9num1art1.pdf.
[12]
R. Nigel Horspool and Nenad Marovac. 1980. An approach to the problem of detranslation of computer programs. Computer Journal 23, 3 (1980), 223--229.
[13]
Ning Jia, Chun Yang, Yu He, and Xu Cheng. 2014a. DTT: Program structure-aware indirect branch optimization via direct-TPC-table in DBT system. In Proceedings of the Computing Frontiers Conference (CF'14). ACM, 12:1--12:10.
[14]
Ning Jia, Chun Yang, Yu He, and Xu Cheng. 2014b. SPTU: Improving dynamic binary translation through software prediction with target updating. In Proceedings of the International Conference on Systems and Storage (SYSTOR'14). ACM, 2:1--2:12.
[15]
Ning Jia, Chun Yang, Jing Wang, Dong Tong, and Keyi Wang. 2013. SPIRE: Improving dynamic binary translation through SPC-indexed indirect branch redirecting. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'13). ACM, 1--12. 10.1145/2451512.2451516
[16]
Ho-Seop Kim and James E. Smith. 2003. Hardware support for control transfers in code caches. In Proceedings of the 36th Annual International Symposium on Microarchitecture. ACM/IEEE Computer Society, 253--264.
[17]
Chi-Keung Luk, Robert S. Cohn, Robert Muth, Harish Patil, Artur Klauser, P. Geoffrey Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim M. Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation. ACM, 190--200. 1065010.1065034
[18]
Ryan W. Moore, José Baiocchi, Bruce R. Childers, Jack W. Davidson, and Jason Hiser. 2009. Addressing the challenges of DBT for the ARM architecture. In Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'09). ACM, 147--156.
[19]
Tipp Moseley, Daniel A. Connors, Dirk Grunwald, and Ramesh Peri. 2007. Identifying potential parallelism via loop-centric profiling. In Proceedings of the 4th Conference on Computing Frontiers. ACM, 143--152.
[20]
Mathias Payer and Thomas R. Gross. 2010. Generating low-overhead dynamic binary translators. In Proceedings of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference. ACM.
[21]
Yukinori Sato, Yasushi Inoguchi, and Tadao Nakamura. 2011. On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In Proceedings of the 8th Conference on Computing Frontiers. ACM, 25.
[22]
Kevin Scott, Naveen Kumar, Bruce R. Childers, Jack W. Davidson, and Mary Lou Soffa. 2004. Overhead reduction techniques for software dynamic translation. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04). IEEE Computer Society. 10.1109/IPDPS.2004.1303224
[23]
Julian Seward and Nicholas Nethercote. 2005. Using valgrind to detect undefined value errors with bit-precision. In Proceedings of the 2005 USENIX Annual Technical Conference. USENIX, 17--30. http://www.usenix.org/events/usenix05/tech/general/seward.html
[24]
Swaroop Sridhar, Jonathan S. Shapiro, and Prashanth P. Bungale. 2005. HDTrans: A low-overhead dynamic translator. In Proceedings of the 2005 Workshop on Binary Instrumentation and Applications. IEEE Computer Society.
[25]
Jon Watson. 2008. Virtualbox: Bits and bytes masquerading as machines. Linux Journal 2008, 166 (2008), 1.
[26]
Qin Zhao, David Koh, Syed Raza, Derek Bruening, Weng-Fai Wong, and Saman P. Amarasinghe. 2011. Dynamic cache contention detection in multi-threaded applications. In Proceedings of the 7th International Conference on Virtual Execution Environments (VEE'11). ACM, 27--38. 1952682.1952688

Cited By

View all
  • (2024)An Instruction Inflation Analyzing Framework for Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/364081321:2(1-25)Online publication date: 15-Jan-2024
  • (2024)SPC-Indexed Indirect Branch Hardware Cache Redirecting Technique in Binary TranslationJournal of Circuits, Systems and Computers10.1142/S0218126624502426Online publication date: 28-Mar-2024
  • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 1
April 2016
347 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2899032
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 April 2016
Accepted: 01 December 2015
Revised: 01 December 2015
Received: 01 June 2015
Published in TACO Volume 13, Issue 1

Check for updates

Author Tags

  1. Dynamic binary translation
  2. code cache
  3. indirect branch

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • UK EPSRC
  • Royal Society University Research Fellowship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)368
  • Downloads (Last 6 weeks)39
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Instruction Inflation Analyzing Framework for Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/364081321:2(1-25)Online publication date: 15-Jan-2024
  • (2024)SPC-Indexed Indirect Branch Hardware Cache Redirecting Technique in Binary TranslationJournal of Circuits, Systems and Computers10.1142/S0218126624502426Online publication date: 28-Mar-2024
  • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
  • (2023)Towards Efficient Dynamic Binary Translation Optimizations Based on RISC Architectural FeaturesJournal of Circuits, Systems and Computers10.1142/S021812662450104433:06Online publication date: 26-Oct-2023
  • (2023)Evaluating the Impact of Optimizations for Dynamic Binary Modification on 64-bit RISC-V2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00017(81-91)Online publication date: Apr-2023
  • (2023)LAST: An Efficient In-place Static Binary Translator for RISC ArchitecturesAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0801-7_14(235-254)Online publication date: 20-Oct-2023
  • (2022)Profile-guided optimisation for indirect branches in a binary translatorConnection Science10.1080/09540091.2022.204155534:1(749-765)Online publication date: 19-Feb-2022
  • (2022)Hyperchaining for LLVM-Based Binary Translators on the x86-64 PlatformJournal of Signal Processing Systems10.1007/s11265-022-01803-194:12(1569-1589)Online publication date: 5-Sep-2022
  • (2021)Hyperchaining Optimizations for an LLVM-Based Binary Translator on x86-64 and RISC-V Platforms50th International Conference on Parallel Processing Workshop10.1145/3458744.3473348(1-9)Online publication date: 9-Aug-2021
  • (2020)Optimising dynamic binary modification across 64-bit Arm microarchitecturesProceedings of the 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3381052.3381322(185-197)Online publication date: 17-Mar-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media