Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1133956.1133980acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
Article

Fast and efficient partial code reordering: taking advantage of dynamic recompilatior

Published: 10 June 2006 Publication History

Abstract

Poor instruction cache locality can degrade performance on modern architectures. For example, our simulation results show that eliminating all instruction cache misses improves performance by as much as 16% for a modestly sized instruction cache. In this paper, we show how to take advantage of dynamic code generationin a Java Virtual Machine (VM) to improve instruction locality at run-time. We develop a dynamic code reordering (DCR) system; alow overhead, online approach for improving instruction locality. DCR has three optimizations: (1) Interprocedural method separation; (2) Intraprocedural code splitting; and (3) Code padding. DCR uses the dynamic call graph and an edge profile that most VMs already collect to separate hot/cold methods and hot/cold code within a method. It also puts padding between methods to minimize conflict misses between frequent caller/callee pairs. It incrementally performs these optimizations only when the VM is optimizing a method at a higher level. We implement DCR in Jikes RVM and show its overhead is negligible. Extensive simulation and run-time experiments show that a simple code space improves average performance on a Pentium 4 by around 6% on SPEC and DaCapo Java benchmarks. These programs however have very small instruction cache footprints that limit opportunities for DCR to improve performance. Consequently, DCR optimizations on average show little effect, sometimes degrading performance and occasionally improving performance by up to 5%. Our work shows that the VM has the potential to dynamically improve instruction locality incrementally by simply piggybacking on hotspot recompilation.

References

[1]
M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Architecture and policy for adaptive optimization in virtual machines. Technical Report 23429, IBM Research, Nov. 2004.]]
[2]
M. Arnold, S. J. Fink, D. Grove, M. Hind, and P. Sweeney. Adaptive optimization in the Jalapeñ no JVM. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 47--65, Minneapolis, MN, October 2000.]]
[3]
M. Arnold, A. Welc, and V. T. Rajan. Improving virtual machine performance using a cross-run profile repository. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 297--311, 2005.]]
[4]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In ACM Conference on Measurement & Modeling Computer Systems, pages 25--36, NY, NY, June 2004.]]
[5]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? High performance garbage collection in Java with JMTk. In Proceedings of the International Conference on Software Engineering, pages 137--146, Scotland, UK, May 2004.]]
[6]
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, S. Z. Guyer, A. Hosking, M. Jump, J. E. B. Moss, D. StefanoviĆ, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis. Technical Report TR-CS-06-01, Dept. of Computer Science, Austrailian National University, Mar. 2006. http://ali-www.cs.umass.edu/DaCapo/-Benchmarks.]]
[7]
D. Bruening, V. Kiriansky, T. Garnett, and S. Banerji. Thread-shared software code caches. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 28--38, NY, NY, Mar. 2006.]]
[8]
D. Burger and T. M. Austin. The SimpleScalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.]]
[9]
J. B. Chen and B. D. D. Leupen. Improving instruction locality with just-in-time code layout. In Proceedings of the USENIX Windows NT Workshop, pages 25--32, 1997.]]
[10]
C. Click. Personal communication, Jan 2006.]]
[11]
R. Cohn, D. Goodwin, P. G. Lowney, and N. Rubin. Spike: An Optimizer for Alpha/NT Executables. In USENIX Windows NTWorkshop, pages 17--24, 1997.]]
[12]
S. Dieckmann and U. Hölzle. A study of the allocation behavior of the SPECjvm98 Java benchmarks. In Proceedings of the European Conference on Object-Oriented Programming, pages 92--115, June 1999.]]
[13]
L. Eeckhout, A. Georges, and K. D. Bosschere. How Java programs interact with virtual machines at the microarchitectural level. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 244--358, Anaheim, CA, Oct. 2003.]]
[14]
N. Gloy and M. D. Smith. Procedure Placement Using Temporal-Ordering Information. ACM Transactions on Programming Languages and Systems, 21(5):977--1027, 1999.]]
[15]
A. H. Hashemi, D. R. Kaeli, and B. Calder. Efficient Procedure Mapping Using Cache Line Coloring. In ACM Conference on Programming Languages Design and Implementation, pages 171--182, 1997.]]
[16]
K. Hazelwood and R. Cohn. A cross-architectural interface for code cache manipulation. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 17--27, NY, NY, Mar. 2006.]]
[17]
K. Hazelwood and J. E. Smith. Exploring code cache eviction granularities in dynamic optimization systems. In International Symposium on Code Generation and Optimization, pages 89--99, Palo Alto, CA, March 2004.]]
[18]
X. Huang, J. E. B.Moss, K. S. McKinley, S. Blackburn, and D. Burger. Dynamic SimpleScalar: Simulating Java virtual machines. Technical Report TR-03-03, University of Texas at Austin, Department of Computer Sciences, Feb. 2003.]]
[19]
X. Huang, B. T. Lewis, and K. S. McKinley. Dynamic code management: Improving whole program code locality in managed runtimes. In International Conference on Virtual Execution Environments, Ottawa, Canada, June 2006.]]
[20]
X. Huang, Z. Wang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, and P. Cheng. The garbage collection advantage: Improving program locality. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, Vancouver, BC, 2004.]]
[21]
Jikes Research Virtual Machine (RVM). http://jikesrvm.sourceforge.net.]]
[22]
C.-K. Luk, R. Muth, H. Patil, R. S. Cohn, and P. G. Lowney. Ispike: A Post-link Optimizer for the Intel Itanium Architecture. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 15--26, 2004.]]
[23]
S. McFarling. Program Optimization for Instruction Caches. In ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, 1989.]]
[24]
K. Pettis and R. C. Hansen. Profile-guided code positioning. In ACM Conference on Programming Languages Design and Implementation, pages 16--27, 1990.]]
[25]
A. Ramirez, J.-L. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero. Software Trace Cache. In International Conference on Supercomputing, pages 119--126, 1999.]]
[26]
E. Rotenberg, S. Bennett, and J. E. Smith. A Trace Cache Microarchitecture and Evaluation. IEEE Transactions on Computers, 48(2):111--120, 1999.]]
[27]
D. Scales. Efficient Dynamic Procedure Placement. Technical Report WRL-98/5, Compaq WRL Research Lab, May 1998.]]
[28]
Standard Performance Evaluation Corporation. SPECjvm98 Docu-mentation, release 1.03 edition, March 1999.]]
[29]
Standard Performance Evaluation Corporation. SPECjbb2000 (JavaBusiness Benchmark) Documentation, release 1.01 edition, 2001.]]
[30]
J. Whaley. Dynamic Optimization Through the Use of Automatic Runtime Specialization. Master's thesis, Massachusetts Institute of Technology, May 1999.]]
[31]
B. Zorn. Performance in the Age of Trustworthy Computing, January 2004. Presentation at the DaCapo winter meeting. The University of Colorado, Boulder, CO.]]

Cited By

View all
  • (2013)Combining code reordering and cache configurationACM Transactions on Embedded Computing Systems10.1145/2362336.239917711:4(1-20)Online publication date: 1-Jan-2013
  • (2012)Runtime adaptationProceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era10.1145/2185475.2185476(1-11)Online publication date: 3-Mar-2012
  • (2008)Trace fragment selection within method-based JVMsProceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments10.1145/1346256.1346263(41-50)Online publication date: 5-Mar-2008

Index Terms

  1. Fast and efficient partial code reordering: taking advantage of dynamic recompilatior

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISMM '06: Proceedings of the 5th international symposium on Memory management
    June 2006
    202 pages
    ISBN:1595932216
    DOI:10.1145/1133956
    • General Chair:
    • Erez Petrank,
    • Program Chair:
    • Eliot Moss
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. JIT compilation
    2. dynamic
    3. instruction
    4. locality

    Qualifiers

    • Article

    Conference

    ISMM06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 72 of 156 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)Combining code reordering and cache configurationACM Transactions on Embedded Computing Systems10.1145/2362336.239917711:4(1-20)Online publication date: 1-Jan-2013
    • (2012)Runtime adaptationProceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era10.1145/2185475.2185476(1-11)Online publication date: 3-Mar-2012
    • (2008)Trace fragment selection within method-based JVMsProceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments10.1145/1346256.1346263(41-50)Online publication date: 5-Mar-2008

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media