research-article

TAO: two-level atomicity for dynamic binary optimizations

Authors:

Mauricio Breternitz, Jr.,

Esfir Natanzon,

Roni RosnerAuthors Info & Claims

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

Pages 12 - 21

https://doi.org/10.1145/1772954.1772959

Published: 24 April 2010 Publication History

Abstract

Dynamic binary translation is a key component of Hardware/Software (HW/SW) co-design, which is an enabling technology for processor microarchitecture innovation. There are two well-known dynamic binary optimization techniques based on atomic execution support. Frame-based optimizations leverage processor pipeline support to enable atomic execution of hot traces. Region level optimizations employ transactional-memory-like atomicity support to aggressively optimize large regions of code. In this paper we propose a two-level atomic optimization scheme which not only overcomes the limitations of the two approaches, but also boosts the benefits of the two approaches effectively. Our experiment shows that the combined approach can achieve a total of 21.5% performance improvement over an aggressive out-of-order baseline machine and improve the performance over the frame-based approach by an additional 5.3%.

References

[1]

Almog, Y., Rosner, R., Schwartz, N., and Schmorak, A. Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture. In Proceedings of the international symposium on code generation and optimization (CGO'04), Palo Alto, CA, 2004.

Digital Library

[2]

Baraz, L., Devor, T., Etzion, O., Goldenberg, S., Skalesky, A., Wang, and Y., Zemach, Y. IA-32 Execution Layer: A Two Phase Dynamic Translator Designed to Support IA-32 Applications on Itanium-based Systems. In Proceedings of the 36th international symposium on microarchitecture (MICRO'03). San Diego, CA, 2003.

Digital Library

[3]

Bruening, D. L. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.D thesis, Massachusetts Institute of Technology, 2004.

Digital Library

[4]

Chen, L-L. and Wu, Y. Aggressive Compiler Optimization and Parallelization with Thread-Level Speculation. In Proceedings of international conference on parallel processing (ICPP'03). Kaohsiung, Taiwan, 2003.

[5]

Chen, M. K. and Olukotun, K. The Jrpm System for Dynamically Parallelizing Java Programs. In Proceedings of the 30th annual international symposium on computer architecture (ISCA'03). San Diego, CA, 2003.

Digital Library

[6]

Colwell, B., and Steck, R. A 0.6 um BiCMOS processor with dynamic execution. In Digest of Technical Papers of 1995 IEEE international solid-state circuits conference (ISSCC'95). San Francisco, CA, 1995.

[7]

Dehnert, J. C, Grant, B., Banning, J. P., Johnson, R., Kistler, T, Klaiber, A., and Mattson, J. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. In Proceedings of the international symposium on code generation and optimization (CGO'03). San Francisco, CA, 2003.

Digital Library

[8]

Du, Z.-H., Lim, C.-C., Li, X.-F., Yang, C., Zhao, Q., and Ngai, T.-F. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proceedings of the ACM SIGPLAN 2004 conference on programming language design and implementation (PLDI'04). Washington, DC, 2004.

Digital Library

[9]

Ebcioglu, K., Altman, E., Gschwind, M., and Sathaye, S. Dynamic Binary Translation and Optimization. IEEE Transactions on Computers.50, 6 (Jun. 2001), 529--548.

Digital Library

[10]

Fahs, B., Mahesri, A., Spadini, F., Patel, S., and Lumetta, S. The Performance Potential of Trace-based Dynamic Optimization. Tech. report, University of Illinois at Urbana-Champaign, 2005.

[11]

Gopal, S., Vijaykumar, T. N., Smith, J.E., and Sohi, G.S. Speculative Versioning Cache. In Proceedings of the 4th international symposium on high performance computer architecture (HPCA'98). Las Vegas, NV, 1998.

Digital Library

[12]

Gschwind, M., Ebcioglu, K., Altman, E., and Sathaye, S. Binary Translation and Architecture COnvergence issues for IBM System 390. In Proceedings of International Converence on Supercomputing, Santa Fe, NM, 2000.

Digital Library

[13]

Herlihy, M., and Moss, J. E. B. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th annual international symposium on computer architecture (ISCA '93). New York, NY, 1993.

Digital Library

[14]

Kim, H-S. and Smith, J. Hardware Support for Control Transfers in Code Caches. In proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). Washington, DC, 2003.

Digital Library

[15]

Klaiber, A. The Technology Behind the Crusoe Processors. White Paper, http://www.charmed.com/PDF/CrusoeTechnologyWhitePaper 1-19-00.pdf, Jan. 2000.

[16]

Krewell, K. Transmeta Gets More Efficeon. Microprocessor report. v.17, October, 2003

[17]

Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., and Torrellas, J. POSH: a TLS compiler that exploits program structure. In Proceedings of the 11th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPOPP'06). New York, NY, 2006.

Digital Library

[18]

Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney G., Wallace, S., Reddi, V., and Hazelwood K. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation (PLDI'05). New York, NY, 2005.

Digital Library

[19]

Luo, Y., Packirisamy, V., Hsu, W.-C., Zhai, A., Mungre, N., and Tarkas, A. Dynamic performance tuning for speculative threads. In Proceedings of the 36th annual international symposium on computer architecture (ISCA'09). Austin, TX, 2009.

Digital Library

[20]

Merten, M. C., Trick, A. R., George, C. N., Gyllenhaal, J. C., and Hwu, W-m. W. A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. In Proceedings of the 26th annual international symposium on computer architecture (ISCA'99). Atlanta, GA, 1999.

Digital Library

[21]

Merten, M. C., Trick, A. R., Nystrom, E. M., Barnes, R. D., and Hwu, W-m. W. A hardware mechanism for dynamic extraction and relayout of program hot spots. In Proceedings of the 27th annual international symposium on computer architecture (ISCA'00). Vancouver, Canada, 2000.

Digital Library

[22]

Moravan, M., Bobba, J.,Moore, K., Yen, L., Hill, M., Liblit, B., Swift, M., and Wood, D. Supporting nested transactional memory in logTM. In Proceedings of the 12th international conference on architectural support for programming languages and operating systems (ASPLOS'06). San Jose, CA, 2006.

Digital Library

[23]

Muchnick, S. S. Advanced compiler design and implementation, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998

Digital Library

[24]

Neelakantam, N., Rajwar, R., Srinivas, S., Srinivasan, U., and Zilles, C. B. Hardware atomicity for reliable software speculation. In Proceedings of the 34th annual international symposium on computer architecture (ISCA'07). San Diego, CA, 2007.

Digital Library

[25]

Patel, S. J. and Lumetta, S. S. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Transactions on Computers.50, 6 (Jun. 2001), 590--608.

Digital Library

[26]

Patel, S., Tung, T., Bose, S., and Crum, M. Increasing the size of atomic instruction blocks using control flow assertions. In Proceedings of the 33rd annual ACM/IEEE international symposium on microarchitecture (MICRO'00), Monterey, CA, 2000.

Digital Library

[27]

Rosner, R., Almog, Y., Moffie, M., Schwartz, N., and Mendelson, A. Power Awareness through Selective Dynamically Optimized Frames. In Proceedings of the 31st annual international symposium on computer architecture (ISCA'04). Mnchen, Germany, 2004.

Digital Library

[28]

Rotenberg, E., Bennett, S., and Smith, J. Trace cache: A low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th international symposium on microarchitecture (MICRO'29). Paris, France, 1996.

Digital Library

[29]

Slechta, B., Crowe, D., Fahs, B., Fertig, M., Muthler, G., Quek, J., Spadini, F., Patel, S. J., and Lumetta, S. S. Dynamic Optimization of Micro-Operations. In Proceedings of the 9th international symposium on high-performance computer Architecture (HPCA'03), Washington, DC, 2003.

Digital Library

[30]

Sridhar, S., Shapiro, J. S., Northup, E., and Bungale, P. HDTrans: An Open Source, Low-Level Dynamic Instrumentation System. In Proceedings of the 2nd international conference on virtual execution environments (VEE'06), Ottawa, Canada, 2006.

Digital Library

[31]

Wang, C., Hu, S., Kim, H-S., Nair, S. R., Breternitz Jr., M., Ying, Z., and Wu, Y. StarDBT: An Efficient Multi-platform Dynamic Binary Translation System. In Proceedings of Asia--pacific computer systems architecture conference, 2007.

Digital Library

Cited By

Park SWu YLee JAupov AMahlke S(2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1145/3358185
Hong DWu JLiu YFu SHsu W(2018)Processor-Tracing Guided Region Formation in Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/328166415:4(1-25)Online publication date: 16-Nov-2018
https://dl.acm.org/doi/10.1145/3281664
Wang CWu Y(2013)TSO_ATOMICITYACM SIGPLAN Notices10.1145/2499368.245117248:4(509-520)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451172
Show More Cited By

Index Terms

TAO: two-level atomicity for dynamic binary optimizations
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging languages and compilers
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support
VEE '12: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments

Dynamic Binary Translators (DBT) and Dynamic Binary Optimization (DBO) by software are used widely for several reasons including performance, design simplification and virtualization. However, the software layer in such systems introduces non-negligible ...
Hardware/software co-design of a fuzzy RISC processor
DATE '98: Proceedings of the conference on Design, automation and test in Europe

In this paper, we show how hardware/software co-evaluation can be applied to instruction set definition. As a case study, we show the definition and evaluation of instruction set extensions for fuzzy processing. These instructions are based on the use ...
Enabling Efficient Alias Speculation
LCTES'15: Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM

Microprocessors designed using HW/SW codesign principles, such as Transmeta™ Efficeon™ and the soon-to-ship NVIDIA 64-bit Tegra® K1, use dynamic binary optimization to extract instruction-level parallelism. Many code optimizations are made significantly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

April 2010

300 pages

ISBN:9781605586359

DOI:10.1145/1772954

General Chairs:
Andreas Moshovos
University of Toronto
,
Greg Steffan
University of Toronto
,
Program Chairs:
Kim Hazelwood
University of Virginia
,
David Kaeli
Northeastern University

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE CS uArch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 April 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CGO '10

Sponsor:

CGO '10: 8th Annual IEEE/ ACM International Symposium on Code Generation and Optimization

April 24 - 28, 2010

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
594
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park SWu YLee JAupov AMahlke S(2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1145/3358185
Hong DWu JLiu YFu SHsu W(2018)Processor-Tracing Guided Region Formation in Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/328166415:4(1-25)Online publication date: 16-Nov-2018
https://dl.acm.org/doi/10.1145/3281664
Wang CWu Y(2013)TSO_ATOMICITYACM SIGPLAN Notices10.1145/2499368.245117248:4(509-520)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451172
Ahn WDuan YTorrellas J(2013)DeAliaserACM SIGPLAN Notices10.1145/2499368.245113648:4(167-180)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2499368.2451136
Wang CWu Y(2013)TSO_ATOMICITYACM SIGARCH Computer Architecture News10.1145/2490301.245117241:1(509-520)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451172
Ahn WDuan YTorrellas J(2013)DeAliaserACM SIGARCH Computer Architecture News10.1145/2490301.245113641:1(167-180)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2490301.2451136
Wang CWu YSarkar VBodik R(2013)TSO_ATOMICITYProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451172(509-520)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451172
Ahn WDuan YTorrellas JSarkar VBodik R(2013)DeAliaserProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451136(167-180)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2451116.2451136
Mars JKumar NLu STorrellas J(2012)BlockChopProceedings of the 39th Annual International Symposium on Computer Architecture10.5555/2337159.2337221(536-547)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.5555/2337159.2337221
Mars JKumar N(2012)BlockChopACM SIGARCH Computer Architecture News10.1145/2366231.233722140:3(536-547)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.1145/2366231.2337221
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents