Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2597917.2597944acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

DTT: program structure-aware indirect branch optimization via direct-TPC-table in DBT system

Published: 20 May 2014 Publication History

Abstract

Indirect branch handling is a major source of performance overhead in Dynamic Binary Translation (DBT) systems. Most existing solutions for indirect branches involve a run-time address translation from Source Program Counter (SPC) of the branch target to Translated Program Counter (TPC) at every execution of the indirect branches.
This paper analyzes the program structures that cause indirect branches, and finds out that most of the branch targets are prestored in the program's memory as some kind of address tables. In other words, the branch target of an indirect branch is not obtained by "calculating", but by "selecting" from the memory.
Based on this observation, we propose a program structure-aware indirect branch handling mechanism called Direct TPC Table (DTT). Our DTT approach probes the target address table of an indirect branch by a passive exception-based scheme, and generates a TPC table from the probed SPC address table at the translation time. Thus, the translated program can load the TPC of a branch target from the TPC table directly, which avoids performing an expensive SPC-to-TPC translation at every execution. In many cases, only 2 instructions are need to handle an indirect branch execution.
We implemented the DTT mechanism on a public x86 DBT system. The experiment shows that, DTT improves the system performance by 19.0% compared with hash lookup on a set of indirect intensive benchmarks. Furthermore, DTT does not depend on the underlying architecture or special hardware, so that it can be deployed on various platforms. Meanwhile, DTT can cooperate with other optimization technique of different DBT systems to enhance the performance.

References

[1]
Bansal, S. and Aiken, A., 2008. Binary translation using peephole superoptimizers. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 177--192.
[2]
Bellard, F., 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the annual conference on USENIX Annual Technical Conference, 41--41.
[3]
Borin, E. and Wu, Y., 2009. Characterization of DBT overhead. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 178--187. DOI= http://dx.doi.org/10.1109/IISWC.2009.5306785.
[4]
Brankovic, A., Stavrou, K., Gibert, E., and Gonzalez, A., 2013. Performance analysis and predictability of the software layer in dynamic binary translators/optimizers. In Proceedings of the ACM International Conference on Computing Frontiers, 1--10. DOI= http://dx.doi.org/10.1145/2482767.2482786.
[5]
Bruening, D., Garnett, T., and Amarasinghe, S., 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code generation and Optimization: Feedback-directed and Runtime Optimization, 265--275.
[6]
Dhanasekaran, B. and Hazelwood, K., 2011. Improving indirect branch translation in dynamic binary translators. In Proceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, 11--18.
[7]
Guan, H., Liu, B., Qi, Z., Yang, Y., Yang, H., and Liang, A., 2010. CoDBT: A multi-source dynamic binary translator using hardware-software collaborative techniques. Journal of Systems Architecture 56, 10, 500--508. DOI= http://dx.doi.org/10.1016/j.sysarc.2010.07.008.
[8]
Guha, A., Hazelwood, K., and Soffa, M. L., 2007. Reducing exit stub memory consumption in code caches. In Proceedings of the 2nd international Conference on High Performance Embedded Architectures and Compilers, 87--101.
[9]
Hazelwood, K. and Smith, M. D., 2003. Generational cache management of code traces in dynamic optimization systems. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, 169.
[10]
Hiser, J. D., Williams, D. W., Hu, W., Davidson, J. W., Mars, J., and Childers, B. R., 2011. Evaluating indirect branch handling mechanisms in software dynamic translation systems. ACM Transactions on Architecture and Code Optimization 8, 2, 1--28. DOI= http://dx.doi.org/10.1145/1970386.1970390.
[11]
Hong, D.-Y., Hsu, C.-C., Yew, P.-C., Wu, J.-J., Hsu, W.-C., Liu, P., Wang, C.-M., and Chung, Y.-C., 2012. HQEMU: a multithreaded and retargetable dynamic binary translator on multicores. In Proceedings of the 10th International Symposium on Code Generation and Optimization, 104--113. DOI= http://dx.doi.org/10.1145/2259016.2259030.
[12]
Hsu, C.-C., Liu, P., Wu, J.-J., Yew, P.-C., Hong, D.-Y., Hsu, W.-C., and Wang, C.-M., 2013. Improving dynamic binary optimization through early-exit guided code region formation. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 23--32. DOI=http://dx.doi.org/10.1145/2451512.2451519.
[13]
Hu, W., Wang, J., Gao, X., Chen, Y., Liu, Q., and Li, G., 2009. Godson-3: a scalable multicore RISC processor with x86 emulation. IEEE Micro 29, 2, 17--29. DOI=http://dx.doi.org/10.1109/MM.2009.30.
[14]
Jia, N., Yang, C., Wang, J., Tong, D., and Wang, K., 2013. SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 1--12. DOI=http://dx.doi.org/10.1145/2451512.2451516.
[15]
Kim, H.-S. and Smith, J. E., 2003. Hardware support for control transfers in code caches. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, 253--264.
[16]
Kim, H., Joao, J. A., Mutlu, O., Lee, C. J., Patt, Y. N., and Cohn, R., 2007. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. In Proceedings of the 34th Annual International Symposium on Computer Architecture, 424--435. DOI=http://dx.doi.org/10.1145/1250662.1250715.
[17]
Koju, T., Tong, X., Sheikh, A. I., Ohara, M., and Nakatani, T., 2012. Optimizing indirect branches in a system-level dynamic binary translator. In Proceedings of the 5th Annual International Systems and Storage Conference, 1--12. DOI= http://dx.doi.org/10.1145/2367589.2367599.
[18]
Li, J. and Wu, C., 2008. A new replacement algorithm on content associative memory for binary translation system. In Proceedings of the Workshop on Architectural and Microarchitectural Support for Binary Translation, 45--54.
[19]
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K., 2005. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, 190--200. DOI= http://dx.doi.org/10.1145/1065010.1065034.
[20]
Nethercote, N. and Seward, J., 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, 89--100. DOI= http://dx.doi.org/10.1145/1250734.1250746.
[21]
Ottoni, G., Chinya, G., Hoflehner, G., Collins, J., Kumar, A., Schuchman, E., Ditzel, D., Singhal, R., and Wang, H., 2011. AstroLIT: enabling simulation-based microarchitecture comparison between Intel and Transmeta designs. In Proceedings of the 8th ACM International Conference on Computing Frontiers, 1--2. DOI= http://dx.doi.org/10.1145/2016604.2016629.
[22]
Ottoni, G., Hartin, T., Weaver, C., Brandt, J., Kuttanna, B., and Wang, H., 2011. Harmonia: a transparent, efficient, and harmonious dynamic binary translator targeting the Intel architecture. In Proceedings of the 8th ACM International Conference on Computing Frontiers, 1--10. DOI= http://dx.doi.org/10.1145/2016604.2016635.
[23]
Payer, M. and Gross, T. R., 2010. Generating low-overhead dynamic binary translators. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, 1--14. DOI= http://dx.doi.org/10.1145/1815695.1815724.
[24]
Reddi, V. J., Connors, D., Cohn, R., and Smith, M. D., 2007. Persistent code caching: exploiting code reuse across executions and applications. In Proceedings of the International Symposium on Code Generation and Optimization, 74--88. DOI= http://dx.doi.org/10.1109/CGO.2007.29.
[25]
Sato, Y., Inoguchi, Y., and Nakamura, T., 2011. On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In Proceedings of the 8th ACM International Conference on Computing Frontiers, 1--10. DOI= http://dx.doi.org/10.1145/2016604.2016634.
[26]
Scott, K. and Davidson, J., 2001. Strata: a software dynamic translation infrastructure. In IEEE Workshop on Binary Translation.
[27]
Sridhar, S., Shapiro, J. S., Northup, E., and Bungale, P. P., 2006. HDTrans: an open source, low-level dynamic instrumentation system. In Proceedings of the 2nd International Conference on Virtual Execution Environments, 175--185. DOI= http://dx.doi.org/10.1145/1134760.1220166.
[28]
Williams, D., Hiser, J. D., and Davidson, J. W., 2009. Using program metadata to support SDT in object-oriented applications. In Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems, 55--62. DOI= http://dx.doi.org/10.1145/1565824.1565832.

Cited By

View all
  • (2024)SPC-Indexed Indirect Branch Hardware Cache Redirecting Technique in Binary TranslationJournal of Circuits, Systems and Computers10.1142/S0218126624502426Online publication date: 28-Mar-2024
  • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024
  • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
  • Show More Cited By

Index Terms

  1. DTT: program structure-aware indirect branch optimization via direct-TPC-table in DBT system

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers
      May 2014
      305 pages
      ISBN:9781450328708
      DOI:10.1145/2597917
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 May 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dynamic translation
      2. indirect branch
      3. program structure

      Qualifiers

      • Research-article

      Conference

      CF'14
      Sponsor:
      CF'14: Computing Frontiers Conference
      May 20 - 22, 2014
      Cagliari, Italy

      Acceptance Rates

      CF '14 Paper Acceptance Rate 28 of 62 submissions, 45%;
      Overall Acceptance Rate 273 of 785 submissions, 35%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SPC-Indexed Indirect Branch Hardware Cache Redirecting Technique in Binary TranslationJournal of Circuits, Systems and Computers10.1142/S0218126624502426Online publication date: 28-Mar-2024
      • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024
      • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
      • (2016)Short-circuit dispatchACM SIGARCH Computer Architecture News10.1145/3007787.300116844:3(291-303)Online publication date: 18-Jun-2016
      • (2016)Optimizing Indirect Branches in Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/286657313:1(1-25)Online publication date: 5-Apr-2016
      • (2016)Short-circuit dispatchProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.34(291-303)Online publication date: 18-Jun-2016

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media