Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1640089.1640100acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

How a Java VM can get more from a hardware performance monitor

Published: 25 October 2009 Publication History

Abstract

This paper describes our sampling-based profiler that exploits a processor's HPM (Hardware Performance Monitor) to collect information on running Java applications for use by the Java VM. Our profiler provides two novel features: Java-level event profiling and lightweight context-sensitive event profiling. For Java events, we propose new techniques to leverage the sampling facility of the HPM to generate object creation profiles and lock activity profiles. The HPM sampling is the key to achieve a smaller overhead compared to profilers that do not rely on hardware helps. To sample the object creations with the HPM, which can only sample hardware events such as executed instructions or cache misses, we correlate the object creations with the store instructions for Java object headers. For the lock activity profile, we introduce an instrumentation-based technique, called ProbeNOP, which uses a special NOP instruction whose executions are counted by the HPM. For the context-sensitive event profiling, we propose a new technique called CallerChaining, which detects the calling context of HPM events based on the call stack depth (the value of the stack frame pointer). We show that it can detect the calling contexts in many programs including a large commercial application. Our proposed techniques enable both programmers and runtime systems to get more valuable information from the HPM to understand and optimize the programs without adding significant runtime overhead.

References

[1]
G. Ammons, T. Ball, and J. R. Larus. "Exploiting hardware performance counters with flow and context sensitive profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 85--96, 1997.
[2]
N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. "Java just-in-time compiler and virtual machine improvements for server and middleware applications". In Proceedings of the USENIX Virtual Machine Research and Technology Symposium, pp. 151--162, 2004.
[3]
H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. "IBM POWER6 microarchitecture". IBM Journal of Research and Development, Vol. 51 (6), pp. 639--662, 2007.
[4]
A. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. "Prefetch injection based on hardware monitoring and object metadata". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 267--276, 2004.
[5]
T. Ogasawara, H. Komatsu, and T. Nakatani. "To-lock: Removing lock overhead using the owners' temporal locality". In Proceedings of the Conference on Parallel Architectures and Compilation Techniques, pp. 255-266, 2004.
[6]
K. Kawachiya, A. Koseki, and T. Onodera. "Lock reservation: Java locks can mostly do without atomic operations". In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 292--310, 2002.
[7]
R. Jones and C. Ryder. "A Study of Java Object Demographics". In Proceedings of the ACM International Symposium on Memory Management, pp. 121--130, 2008.
[8]
M. L. Seidl and B. G. Zorn. "Segregating heap objects by reference behavior and lifetime". In Proceedings of the eighth Architectural Support for Programming Languages and Operating Systems, pp 12--23, 1998.
[9]
F. E. Levine. "A programmer's view of performance monitoring in the PowerPC microprocessor". IBM Journal of Research and Development, Vol 41 (3), pp. 345--356, 1997.
[10]
OProfile - A System Profiler for Linux. http://oprofile.sourceforge.net/news/
[11]
Intel Corp. IA-32 Intel Architecture Software Developer's Manual.
[12]
JVM Tool Interface version 1.0. http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html
[13]
M. Jump, S. M. Blackburn, and K.S. McKinley. "Dynamic object sampling for pretenuring", In Proceedings of the International Symposium on Memory Management, pp. 152--162, 2004.
[14]
M. Hauswirth and T. M. Chilimbi. "Low-overhead memory leak detection using adaptive statistical profiling", in Proceedings of the international conference on Architectural support for programming languages and operating systems table of contents, pp. 156--164, 2004.
[15]
M. Arnold, and B. G. Ryder. "A framework for reducing the cost of instrumented code". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 168--179, 2001.
[16]
J. M. Spivey. "Fast, Accurate Call Graph Profiling". Software: Practice and Experience, Vol. 34 (3), pp. 249--264, 2004.
[17]
M. D. Bond, and K. S. McKinley. "Probabilistic Calling Context". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 97--112, 2007.
[18]
X. Zhuang, M. J. Serrano, H. W. Cain, and J Choi. "Accurate, efficient, and adaptive calling context profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 263--271, 2006.
[19]
M. Arnold and P. F. Sweeney. "Approximating the calling context tree via sampling". IBM Research Report, 2000.
[20]
J. Whaley. "A portable sampling-based profiler for java virtualmachines". In Proceedings of ACM Java Grande, pp. 78--87, 2000.
[21]
T. Mytkowicz, D. Coughlin, and A. Diwan. "Inferred Call Path Profiling", In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, to appear, 2009.
[22]
F. T. Schneider, M. Payer, and T. R. Gross. "Online optimizations driven by hardware performance monitoring". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 373--382, 2007.
[23]
J. Cuthbertson, S. Viswanathan, K. Bobrovsky, A. Astapchuk, E. Kaczmarek, and U. Srinivasan. "A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM". In Proceedings of the International Symposium on Code Generation and Optimization, pp. 190--199, 2009.
[24]
M. Serrano and X. Zhuang, "Placement Optimization Using Data Context Collected During Garbage Collection", In Proceedings of the International Symposium on Memory Management, pp. 69--78, 2009.
[25]
J. Dolby. "Automatic Inline Allocation of Objects", In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 7--17, 1997.
[26]
Power.org, Power Instruction Set Architecture Version 2.05. http://www.power.org/resources/reading/PowerISA_V2.05.pdf
[27]
N. Grcevski, "Effective method for Java Lock Reservation for Java Virtual Machines that Have Cooperative Multithreading" 6th Workshop on Compiler-Driven Performance, 2007.
[28]
D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. "Thin Locks: Featherweight Synchronization for Java". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 258--268, 1998.
[29]
T. Onodera and K. Kawachiya. "A study of locking objects with bimodal fields". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 223--237, 1999.
[30]
Performance Inspector, http://perfinsp.sourceforge.net/
[31]
S. L. Graham, P. B. Kessler, and M K. McKusick. "An execution profiler for modular programs". Software: Practice and Experience, Vol. 13 (8), pp. 671--685, 1983.
[32]
Standard Performance Evaluation Corporation. SPECjbb2005. http://www.spec.org/jbb2005/
[33]
Standard Performance Evaluation Corporation. SPECjvm2008. http://www.spec.org/jvm2008/
[34]
The Apache Software Foundation. DayTrader. http://cwiki.apache.org/GMOxDOC20/daytrader.html
[35]
IBM Corporation. WebSphere Application Server. http://www-01.ibm.com/software/webservers/appserv/was/

Cited By

View all
  • (2019)Evaluating the effectiveness of program data features for guiding memory managementProceedings of the International Symposium on Memory Systems10.1145/3357526.3357537(383-395)Online publication date: 30-Sep-2019
  • (2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
  • (2018)Probabilistic programming with programmable inferenceACM SIGPLAN Notices10.1145/3296979.319240953:4(603-616)Online publication date: 11-Jun-2018
  • Show More Cited By

Index Terms

  1. How a Java VM can get more from a hardware performance monitor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
    October 2009
    590 pages
    ISBN:9781605587660
    DOI:10.1145/1640089
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 10
      OOPSLA '09
      October 2009
      554 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1639949
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. calling context
    2. hardware performance monitor
    3. profiling

    Qualifiers

    • Research-article

    Conference

    OOPSLA09
    Sponsor:

    Acceptance Rates

    OOPSLA '09 Paper Acceptance Rate 25 of 144 submissions, 17%;
    Overall Acceptance Rate 268 of 1,244 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Evaluating the effectiveness of program data features for guiding memory managementProceedings of the International Symposium on Memory Systems10.1145/3357526.3357537(383-395)Online publication date: 30-Sep-2019
    • (2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
    • (2018)Probabilistic programming with programmable inferenceACM SIGPLAN Notices10.1145/3296979.319240953:4(603-616)Online publication date: 11-Jun-2018
    • (2017)Efficient Sampling-based Lock Contention Profiling for JavaProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030234(331-334)Online publication date: 17-Apr-2017
    • (2016)Impact of Intrinsic Profiling Limitations on Effectiveness of Adaptive OptimizationsACM Transactions on Architecture and Code Optimization10.1145/300866113:4(1-26)Online publication date: 12-Dec-2016
    • (2016)Efficient Tracing and Versatile Analysis of Lock Contention in Java Applications on the Virtual Machine LevelProceedings of the 7th ACM/SPEC on International Conference on Performance Engineering10.1145/2851553.2851559(263-274)Online publication date: 12-Mar-2016
    • (2016)Extracting Arabic Causal Relations Using Linguistic PatternsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/280078615:3(1-20)Online publication date: 8-Mar-2016
    • (2015)Efficient dynamic analysis of the synchronization performance of Java applicationsProceedings of the 13th International Workshop on Dynamic Analysis10.1145/2823363.2823367(14-18)Online publication date: 26-Oct-2015
    • (2015)Lightweight Java Profiling with Partial Safepoints and Incremental Stack TracingProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688038(75-86)Online publication date: 28-Jan-2015
    • (2014)Continuously measuring critical section pressure with the free-lunch profilerACM SIGPLAN Notices10.1145/2714064.266021049:10(291-307)Online publication date: 15-Oct-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media