Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1134760.1134779acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
Article

Dynamic code management: improving whole program code locality in managed runtimes

Published: 14 June 2006 Publication History

Abstract

Poor code locality degrades application performance by increasing memory stalls due to instruction cache and TLB misses. This problem is particularly an issue for large server applications written in languages such as Java and C# that provide just-in-time (JIT) compilation, dynamic class loading, and dynamic recompilation. However, managed runtimes also offer an opportunity to dynamically profile applications and adapt them to improve their performance. This paper describes a Dynamic Code Management system (DCM) in a managed runtime that performs whole program code layout optimizations to improve instruction locality.We begin by implementing the widely used Pettis-Hansen algorithm for method layout to improve code locality. Unfortunately, this algorithm is too costly for a dynamic optimization system, O(n3) in time in the call graph. For example, Pettis-Hansen requires a prohibitively expensive 35 minutes to lay out MiniBean which has 15,586 methods. We propose three new code placement algorithms that target ITLB misses, which typically have the greatest impact on performance. The best of these algorithms, Code Tiling, groups methods into page sized tiles by performing a depth-first traversal of the call graph based on call frequency. Excluding overhead, experimental results show that DCM with Code Tiling improves performance by 6% on the large MiniBean benchmark over a baseline that orders methods based on invocation order, whereas Pettis-Hansen placement offers less improvement, 2%, over the same base. Furthermore, Code Tiling lays out MiniBean in just 0.35 seconds for 15,586 methods (6000 times faster than Pettis-Hansen) which makes it suitable for high-performance managed runtimes.

References

[1]
A.-R. Adl-Tabatabai, J. Bharadwaj, D.-Y. Chen, A. Ghuloum, V. S. Menon, B. R. Murphy, M. Serrano, and T. Shpeisman. The StarJIT compiler: a Dynamic Compiler for Managed Runtime Environments. Intel Technology Journal, 7(1), February 2003.]]
[2]
J. B. Chen and B. D. D. Leupen. Improving instruction locality with just-in-time code layout. In Proceedings of the USENIX Windows NT Workshop, pages 25--32, 1997.]]
[3]
M. Cierniak, M. Eng, N. Glew, B. Lewis, and J. Stichnoth. Open Runtime Platform: A Flexible High-Performance Managed Runtime Environment. Intel Technology Journal, 7(1), February 2003. Available at http://intel.com/technology/itj/2003/volume07issue01/art01_orp/p01_abstract.htm.]]
[4]
M. Cierniak, G.-Y. Lueh, and J. Stichnoth. Practicing JUDO: Java Under Dynamic Optimizations. Proceedings of the SIGPLAN '00 Conference on Programming Language Design and Implementation, June 2000.]]
[5]
R. Cohn, D. Goodwin, P. G. Lowney, and N. Rubin. Spike: An Optimizer for Alpha/NT Executables. In USENIX Windows NT Workshop, pages 17--24, 1997.]]
[6]
R. Flower, C.-K. Luk, R. Muth, H. Patil, J. Shakshober, R. Cohn, and P. G. Lowney. Kernel Optimizations and Prefetch with the Spike Executable Optimizer. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), 2001.]]
[7]
N. Gloy and M. D. Smith. Procedure Placement Using Temporal-Ordering Information. ACM Transactions on Programming Languages and Systems, 21(5):977--1027, 1999.]]
[8]
A. H. Hashemi, D. R. Kaeli, and B. Calder. Efficient Procedure Mapping Using Cache Line Coloring. In ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 171--182, 1997.]]
[9]
K. Hazelwood and J. E. Smith. Exploring code cache eviction granularities in dynamic optimization systems. In International Symposium on Code Generation and Optimization, pages 89--99, Palo Alto, CA, March 2004.]]
[10]
X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 69--80, 2004.]]
[11]
C.-K. Luk, R. Muth, H. Patil, R. S. Cohn, and P. G. Lowney. Ispike: A Post-link Optimizer for the Intel® Itanium® Architecture. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 15--26, 2004.]]
[12]
S. McFarling. Program Optimization for Instruction Caches. In ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, 1989.]]
[13]
K. Pettis and R. C. Hansen. Profile-guided code positioning. In ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 16--27, New York, NY, USA, 1990. ACM Press.]]
[14]
A. Ramirez, J.-L. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero. Software Trace Cache. In International Conference on Supercomputing, pages 119--126, 1999.]]
[15]
E. Rotenberg, S. Bennett, and J. E. Smith. A Trace Cache Microarchitecture and Evaluation. IEEE Transactions on Computers, 48(2):111--120, 1999.]]
[16]
D. Scales. Efficient Dynamic Procedure Placement. Technical Report WRL-98/5, Compaq WRL Research Lab, May 1998.]]
[17]
J. Whaley. Dynamic Optimization Through the Use of Automatic Runtime Specialization. Master's thesis, Massachusetts Institute of Technology, May 1999.]]
[18]
B. Zorn. Performance in the Age of Trustworthy Computing, January 2004. Slides for a presentation at the University of Colorado and other universities.]]

Cited By

View all
  • (2023)JACO: JAva Code Layout Optimizer Enabling Continuous Optimization without Pausing Application Services2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00032(295-306)Online publication date: 31-Oct-2023
  • (2014)Experience reportACM SIGPLAN Notices10.1145/2775050.263336249:12(31-38)Online publication date: 3-Sep-2014
  • (2013)Combining code reordering and cache configurationACM Transactions on Embedded Computing Systems10.1145/2362336.239917711:4(1-20)Online publication date: 1-Jan-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VEE '06: Proceedings of the 2nd international conference on Virtual execution environments
June 2006
194 pages
ISBN:1595933328
DOI:10.1145/1134760
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code generation
  2. code layout
  3. dynamic optimization
  4. locality
  5. performance monitoring
  6. virtual machines

Qualifiers

  • Article

Conference

VEE06

Acceptance Rates

Overall Acceptance Rate 80 of 235 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)JACO: JAva Code Layout Optimizer Enabling Continuous Optimization without Pausing Application Services2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00032(295-306)Online publication date: 31-Oct-2023
  • (2014)Experience reportACM SIGPLAN Notices10.1145/2775050.263336249:12(31-38)Online publication date: 3-Sep-2014
  • (2013)Combining code reordering and cache configurationACM Transactions on Embedded Computing Systems10.1145/2362336.239917711:4(1-20)Online publication date: 1-Jan-2013
  • (2012)Runtime adaptationProceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era10.1145/2185475.2185476(1-11)Online publication date: 3-Mar-2012
  • (2010)B2P2Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems10.1145/1811212.1811215(1-10)Online publication date: 28-Jun-2010
  • (2007)Searching for type-error messagesACM SIGPLAN Notices10.1145/1273442.125078342:6(425-434)Online publication date: 10-Jun-2007
  • (2007)Towards locating execution omission errorsACM SIGPLAN Notices10.1145/1273442.125078242:6(415-424)Online publication date: 10-Jun-2007
  • (2007)Online optimizations driven by hardware performance monitoringACM SIGPLAN Notices10.1145/1273442.125077742:6(373-382)Online publication date: 10-Jun-2007
  • (2007)Effective automatic parallelization of stencil computationsACM SIGPLAN Notices10.1145/1273442.125076142:6(235-244)Online publication date: 10-Jun-2007
  • (2007)Software behavior oriented parallelizationACM SIGPLAN Notices10.1145/1273442.125076042:6(223-234)Online publication date: 10-Jun-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media