Article

Dynamic code management: improving whole program code locality in managed runtimes

Authors:

Xianglong Huang,

Kathryn S McKinleyAuthors Info & Claims

VEE '06: Proceedings of the 2nd international conference on Virtual execution environments

Pages 133 - 143

https://doi.org/10.1145/1134760.1134779

Published: 14 June 2006 Publication History

Abstract

Poor code locality degrades application performance by increasing memory stalls due to instruction cache and TLB misses. This problem is particularly an issue for large server applications written in languages such as Java and C# that provide just-in-time (JIT) compilation, dynamic class loading, and dynamic recompilation. However, managed runtimes also offer an opportunity to dynamically profile applications and adapt them to improve their performance. This paper describes a Dynamic Code Management system (DCM) in a managed runtime that performs whole program code layout optimizations to improve instruction locality.We begin by implementing the widely used Pettis-Hansen algorithm for method layout to improve code locality. Unfortunately, this algorithm is too costly for a dynamic optimization system, O(n³) in time in the call graph. For example, Pettis-Hansen requires a prohibitively expensive 35 minutes to lay out MiniBean which has 15,586 methods. We propose three new code placement algorithms that target ITLB misses, which typically have the greatest impact on performance. The best of these algorithms, Code Tiling, groups methods into page sized tiles by performing a depth-first traversal of the call graph based on call frequency. Excluding overhead, experimental results show that DCM with Code Tiling improves performance by 6% on the large MiniBean benchmark over a baseline that orders methods based on invocation order, whereas Pettis-Hansen placement offers less improvement, 2%, over the same base. Furthermore, Code Tiling lays out MiniBean in just 0.35 seconds for 15,586 methods (6000 times faster than Pettis-Hansen) which makes it suitable for high-performance managed runtimes.

References

[1]

A.-R. Adl-Tabatabai, J. Bharadwaj, D.-Y. Chen, A. Ghuloum, V. S. Menon, B. R. Murphy, M. Serrano, and T. Shpeisman. The StarJIT compiler: a Dynamic Compiler for Managed Runtime Environments. Intel Technology Journal, 7(1), February 2003.]]

[2]

J. B. Chen and B. D. D. Leupen. Improving instruction locality with just-in-time code layout. In Proceedings of the USENIX Windows NT Workshop, pages 25--32, 1997.]]

Digital Library

[3]

M. Cierniak, M. Eng, N. Glew, B. Lewis, and J. Stichnoth. Open Runtime Platform: A Flexible High-Performance Managed Runtime Environment. Intel Technology Journal, 7(1), February 2003. Available at http://intel.com/technology/itj/2003/volume07issue01/art01_orp/p01_abstract.htm.]]

[4]

M. Cierniak, G.-Y. Lueh, and J. Stichnoth. Practicing JUDO: Java Under Dynamic Optimizations. Proceedings of the SIGPLAN '00 Conference on Programming Language Design and Implementation, June 2000.]]

Digital Library

[5]

R. Cohn, D. Goodwin, P. G. Lowney, and N. Rubin. Spike: An Optimizer for Alpha/NT Executables. In USENIX Windows NT Workshop, pages 17--24, 1997.]]

Digital Library

[6]

R. Flower, C.-K. Luk, R. Muth, H. Patil, J. Shakshober, R. Cohn, and P. G. Lowney. Kernel Optimizations and Prefetch with the Spike Executable Optimizer. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), 2001.]]

[7]

N. Gloy and M. D. Smith. Procedure Placement Using Temporal-Ordering Information. ACM Transactions on Programming Languages and Systems, 21(5):977--1027, 1999.]]

Digital Library

[8]

A. H. Hashemi, D. R. Kaeli, and B. Calder. Efficient Procedure Mapping Using Cache Line Coloring. In ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 171--182, 1997.]]

Digital Library

[9]

K. Hazelwood and J. E. Smith. Exploring code cache eviction granularities in dynamic optimization systems. In International Symposium on Code Generation and Optimization, pages 89--99, Palo Alto, CA, March 2004.]]

Digital Library

[10]

X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 69--80, 2004.]]

Digital Library

[11]

C.-K. Luk, R. Muth, H. Patil, R. S. Cohn, and P. G. Lowney. Ispike: A Post-link Optimizer for the Intel® Itanium® Architecture. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 15--26, 2004.]]

Digital Library

[12]

S. McFarling. Program Optimization for Instruction Caches. In ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, 1989.]]

Digital Library

[13]

K. Pettis and R. C. Hansen. Profile-guided code positioning. In ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 16--27, New York, NY, USA, 1990. ACM Press.]]

Digital Library

[14]

A. Ramirez, J.-L. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero. Software Trace Cache. In International Conference on Supercomputing, pages 119--126, 1999.]]

Digital Library

[15]

E. Rotenberg, S. Bennett, and J. E. Smith. A Trace Cache Microarchitecture and Evaluation. IEEE Transactions on Computers, 48(2):111--120, 1999.]]

Digital Library

[16]

D. Scales. Efficient Dynamic Procedure Placement. Technical Report WRL-98/5, Compaq WRL Research Lab, May 1998.]]

[17]

J. Whaley. Dynamic Optimization Through the Use of Automatic Runtime Specialization. Master's thesis, Massachusetts Institute of Technology, May 1999.]]

[18]

B. Zorn. Performance in the Age of Trustworthy Computing, January 2004. Slides for a presentation at the University of Colorado and other universities.]]

Cited By

Lin WQin JChen YJin ZXu JZhang YCai SFu LChen YChen W(2023)JACO: JAva Code Layout Optimizer Enabling Continuous Optimization without Pausing Application Services2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00032(295-306)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00032
Muranushi TEisenberg R(2014)Experience reportACM SIGPLAN Notices10.1145/2775050.263336249:12(31-38)Online publication date: 3-Sep-2014
https://dl.acm.org/doi/10.1145/2775050.2633362
Gordon-Ross AVahid FDutt N(2013)Combining code reordering and cache configurationACM Transactions on Embedded Computing Systems10.1145/2362336.239917711:4(1-20)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.1145/2362336.2399177
Show More Cited By

Index Terms

Dynamic code management: improving whole program code locality in managed runtimes
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Combining code reordering and cache configuration

The instruction cache is a popular optimization target due to the cache's high impact on system performance and power and because of the cache's predictable temporal and spatial locality. This article is an in depth study on the interaction of code ...
Code Cache Management Schemes for Dynamic Optimizers
INTERACT '02: Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures

A dynamic optimizer is a software-based system that performs code modifications at runtime, and several such systems have been proposed over the past several years. These systems typically perform optimization on the level of an instruction trace, and ...
Managing bounded code caches in dynamic binary optimization systems

Dynamic binary optimizers store altered copies of original program instructions in software-managed code caches in order to maximize reuse of transformed code. Code caches store code blocks that may vary in size, reference other code blocks, and carry a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VEE '06: Proceedings of the 2nd international conference on Virtual execution environments

June 2006

194 pages

ISBN:1595933328

DOI:10.1145/1134760

General Chair:
Hans-J. Boehm
HP Labs, USA
,
Program Chair:
David Grove
IBM Research, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

VEE06

Sponsor:

VEE06: Second International Conference on Virtual Execution Environments

June 14 - 16, 2006

Ontario, Ottawa, Canada

Acceptance Rates

Overall Acceptance Rate 80 of 235 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
492
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lin WQin JChen YJin ZXu JZhang YCai SFu LChen YChen W(2023)JACO: JAva Code Layout Optimizer Enabling Continuous Optimization without Pausing Application Services2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00032(295-306)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00032
Muranushi TEisenberg R(2014)Experience reportACM SIGPLAN Notices10.1145/2775050.263336249:12(31-38)Online publication date: 3-Sep-2014
https://dl.acm.org/doi/10.1145/2775050.2633362
Gordon-Ross AVahid FDutt N(2013)Combining code reordering and cache configurationACM Transactions on Embedded Computing Systems10.1145/2362336.239917711:4(1-20)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.1145/2362336.2399177
McDaniel MHazelwood K(2012)Runtime adaptationProceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era10.1145/2185475.2185476(1-11)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2185475.2185476
Jeyapaul RShrivastava ADeprettere EStefanov T(2010)B2P2Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems10.1145/1811212.1811215(1-10)Online publication date: 28-Jun-2010
https://dl.acm.org/doi/10.1145/1811212.1811215
Lerner BFlower MGrossman DChambers C(2007)Searching for type-error messagesACM SIGPLAN Notices10.1145/1273442.125078342:6(425-434)Online publication date: 10-Jun-2007
https://dl.acm.org/doi/10.1145/1273442.1250783
Zhang XTallam SGupta NGupta R(2007)Towards locating execution omission errorsACM SIGPLAN Notices10.1145/1273442.125078242:6(415-424)Online publication date: 10-Jun-2007
https://dl.acm.org/doi/10.1145/1273442.1250782
Schneider FPayer MGross T(2007)Online optimizations driven by hardware performance monitoringACM SIGPLAN Notices10.1145/1273442.125077742:6(373-382)Online publication date: 10-Jun-2007
https://dl.acm.org/doi/10.1145/1273442.1250777
Krishnamoorthy SBaskaran MBondhugula URamanujam JRountev ASadayappan P(2007)Effective automatic parallelization of stencil computationsACM SIGPLAN Notices10.1145/1273442.125076142:6(235-244)Online publication date: 10-Jun-2007
https://dl.acm.org/doi/10.1145/1273442.1250761
Ding CShen XKelsey KTice CHuang RZhang C(2007)Software behavior oriented parallelizationACM SIGPLAN Notices10.1145/1273442.125076042:6(223-234)Online publication date: 10-Jun-2007
https://dl.acm.org/doi/10.1145/1273442.1250760
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten