Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1772954.1772959acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

TAO: two-level atomicity for dynamic binary optimizations

Published: 24 April 2010 Publication History

Abstract

Dynamic binary translation is a key component of Hardware/Software (HW/SW) co-design, which is an enabling technology for processor microarchitecture innovation. There are two well-known dynamic binary optimization techniques based on atomic execution support. Frame-based optimizations leverage processor pipeline support to enable atomic execution of hot traces. Region level optimizations employ transactional-memory-like atomicity support to aggressively optimize large regions of code. In this paper we propose a two-level atomic optimization scheme which not only overcomes the limitations of the two approaches, but also boosts the benefits of the two approaches effectively. Our experiment shows that the combined approach can achieve a total of 21.5% performance improvement over an aggressive out-of-order baseline machine and improve the performance over the frame-based approach by an additional 5.3%.

References

[1]
Almog, Y., Rosner, R., Schwartz, N., and Schmorak, A. Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture. In Proceedings of the international symposium on code generation and optimization (CGO'04), Palo Alto, CA, 2004.
[2]
Baraz, L., Devor, T., Etzion, O., Goldenberg, S., Skalesky, A., Wang, and Y., Zemach, Y. IA-32 Execution Layer: A Two Phase Dynamic Translator Designed to Support IA-32 Applications on Itanium-based Systems. In Proceedings of the 36th international symposium on microarchitecture (MICRO'03). San Diego, CA, 2003.
[3]
Bruening, D. L. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.D thesis, Massachusetts Institute of Technology, 2004.
[4]
Chen, L-L. and Wu, Y. Aggressive Compiler Optimization and Parallelization with Thread-Level Speculation. In Proceedings of international conference on parallel processing (ICPP'03). Kaohsiung, Taiwan, 2003.
[5]
Chen, M. K. and Olukotun, K. The Jrpm System for Dynamically Parallelizing Java Programs. In Proceedings of the 30th annual international symposium on computer architecture (ISCA'03). San Diego, CA, 2003.
[6]
Colwell, B., and Steck, R. A 0.6 um BiCMOS processor with dynamic execution. In Digest of Technical Papers of 1995 IEEE international solid-state circuits conference (ISSCC'95). San Francisco, CA, 1995.
[7]
Dehnert, J. C, Grant, B., Banning, J. P., Johnson, R., Kistler, T, Klaiber, A., and Mattson, J. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. In Proceedings of the international symposium on code generation and optimization (CGO'03). San Francisco, CA, 2003.
[8]
Du, Z.-H., Lim, C.-C., Li, X.-F., Yang, C., Zhao, Q., and Ngai, T.-F. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proceedings of the ACM SIGPLAN 2004 conference on programming language design and implementation (PLDI'04). Washington, DC, 2004.
[9]
Ebcioglu, K., Altman, E., Gschwind, M., and Sathaye, S. Dynamic Binary Translation and Optimization. IEEE Transactions on Computers.50, 6 (Jun. 2001), 529--548.
[10]
Fahs, B., Mahesri, A., Spadini, F., Patel, S., and Lumetta, S. The Performance Potential of Trace-based Dynamic Optimization. Tech. report, University of Illinois at Urbana-Champaign, 2005.
[11]
Gopal, S., Vijaykumar, T. N., Smith, J.E., and Sohi, G.S. Speculative Versioning Cache. In Proceedings of the 4th international symposium on high performance computer architecture (HPCA'98). Las Vegas, NV, 1998.
[12]
Gschwind, M., Ebcioglu, K., Altman, E., and Sathaye, S. Binary Translation and Architecture COnvergence issues for IBM System 390. In Proceedings of International Converence on Supercomputing, Santa Fe, NM, 2000.
[13]
Herlihy, M., and Moss, J. E. B. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th annual international symposium on computer architecture (ISCA '93). New York, NY, 1993.
[14]
Kim, H-S. and Smith, J. Hardware Support for Control Transfers in Code Caches. In proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). Washington, DC, 2003.
[15]
Klaiber, A. The Technology Behind the Crusoe Processors. White Paper, http://www.charmed.com/PDF/CrusoeTechnologyWhitePaper 1-19-00.pdf, Jan. 2000.
[16]
Krewell, K. Transmeta Gets More Efficeon. Microprocessor report. v.17, October, 2003
[17]
Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., and Torrellas, J. POSH: a TLS compiler that exploits program structure. In Proceedings of the 11th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPOPP'06). New York, NY, 2006.
[18]
Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney G., Wallace, S., Reddi, V., and Hazelwood K. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation (PLDI'05). New York, NY, 2005.
[19]
Luo, Y., Packirisamy, V., Hsu, W.-C., Zhai, A., Mungre, N., and Tarkas, A. Dynamic performance tuning for speculative threads. In Proceedings of the 36th annual international symposium on computer architecture (ISCA'09). Austin, TX, 2009.
[20]
Merten, M. C., Trick, A. R., George, C. N., Gyllenhaal, J. C., and Hwu, W-m. W. A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. In Proceedings of the 26th annual international symposium on computer architecture (ISCA'99). Atlanta, GA, 1999.
[21]
Merten, M. C., Trick, A. R., Nystrom, E. M., Barnes, R. D., and Hwu, W-m. W. A hardware mechanism for dynamic extraction and relayout of program hot spots. In Proceedings of the 27th annual international symposium on computer architecture (ISCA'00). Vancouver, Canada, 2000.
[22]
Moravan, M., Bobba, J.,Moore, K., Yen, L., Hill, M., Liblit, B., Swift, M., and Wood, D. Supporting nested transactional memory in logTM. In Proceedings of the 12th international conference on architectural support for programming languages and operating systems (ASPLOS'06). San Jose, CA, 2006.
[23]
Muchnick, S. S. Advanced compiler design and implementation, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998
[24]
Neelakantam, N., Rajwar, R., Srinivas, S., Srinivasan, U., and Zilles, C. B. Hardware atomicity for reliable software speculation. In Proceedings of the 34th annual international symposium on computer architecture (ISCA'07). San Diego, CA, 2007.
[25]
Patel, S. J. and Lumetta, S. S. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Transactions on Computers.50, 6 (Jun. 2001), 590--608.
[26]
Patel, S., Tung, T., Bose, S., and Crum, M. Increasing the size of atomic instruction blocks using control flow assertions. In Proceedings of the 33rd annual ACM/IEEE international symposium on microarchitecture (MICRO'00), Monterey, CA, 2000.
[27]
Rosner, R., Almog, Y., Moffie, M., Schwartz, N., and Mendelson, A. Power Awareness through Selective Dynamically Optimized Frames. In Proceedings of the 31st annual international symposium on computer architecture (ISCA'04). Mnchen, Germany, 2004.
[28]
Rotenberg, E., Bennett, S., and Smith, J. Trace cache: A low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th international symposium on microarchitecture (MICRO'29). Paris, France, 1996.
[29]
Slechta, B., Crowe, D., Fahs, B., Fertig, M., Muthler, G., Quek, J., Spadini, F., Patel, S. J., and Lumetta, S. S. Dynamic Optimization of Micro-Operations. In Proceedings of the 9th international symposium on high-performance computer Architecture (HPCA'03), Washington, DC, 2003.
[30]
Sridhar, S., Shapiro, J. S., Northup, E., and Bungale, P. HDTrans: An Open Source, Low-Level Dynamic Instrumentation System. In Proceedings of the 2nd international conference on virtual execution environments (VEE'06), Ottawa, Canada, 2006.
[31]
Wang, C., Hu, S., Kim, H-S., Nair, S. R., Breternitz Jr., M., Ying, Z., and Wu, Y. StarDBT: An Efficient Multi-platform Dynamic Binary Translation System. In Proceedings of Asia--pacific computer systems architecture conference, 2007.

Cited By

View all
  • (2019)Multi-objective Exploration for Practical Optimization Decisions in Binary TranslationACM Transactions on Embedded Computing Systems10.1145/335818518:5s(1-19)Online publication date: 7-Oct-2019
  • (2018)Processor-Tracing Guided Region Formation in Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/328166415:4(1-25)Online publication date: 16-Nov-2018
  • (2013)TSO_ATOMICITYACM SIGPLAN Notices10.1145/2499368.245117248:4(509-520)Online publication date: 16-Mar-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
April 2010
300 pages
ISBN:9781605586359
DOI:10.1145/1772954
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS uArch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. atomic execution
  2. dynamic binary optimization
  3. hardware/software co-design
  4. large region optimization

Qualifiers

  • Research-article

Conference

CGO '10

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media