Article

Statistically rigorous java performance evaluation

Authors:

Dries Buytaert,

Lieven EeckhoutAuthors Info & Claims

OOPSLA '07: Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems, languages and applications

Pages 57 - 76

https://doi.org/10.1145/1297027.1297033

Published: 21 October 2007 Publication History

Abstract

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various.

There exist a wide variety of Java performance evaluation methodologies usedby researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution; yet others consider multiple VM invocations and iterate the benchmark multiple times.

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

References

[1]

M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeño JVM. In OOPSLA, pages 47--65, Oct. 2000.

Digital Library

[2]

M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of Java. In OOPSLA, pages 111--129, Nov. 2002.

Digital Library

[3]

K. Barabash, Y. Ossia, and E. Petrank. Mostly concurrent garbage collection revisited. In OOPSLA, pages 255--268, Nov. 2003.

Digital Library

[4]

O. Ben-Yitzhak, I. Goft, E. K. Kolodner, K. Kuiper, and V. Leikehman. An algorithm for parallel incremental compaction. In ISMM, pages 207--212, Feb. 2003.

Digital Library

[5]

S. Blackburn, P. Cheng, and K. McKinley. Myths and reality: The performance impact of garbage collection. In SIGMETRICS, pages 25--36, June 2004.

Digital Library

[6]

S. Blackburn, P. Cheng, and K. McKinley. Oil and water? High performance garbage collection in Java with JMTk. In ICSE, pages 137--146, May 2004.

Digital Library

[7]

S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, MHirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, pages 169--190, Oct. 2006.

Digital Library

[8]

S. M. Blackburn and K. S. McKinley. In or out?: Putting write barriers in their place. In ISMM, pages 281--290, June 2002.

Digital Library

[9]

S. M. Blackburn and K. S. McKinley. Ulterior reference counting: Fast garbage collection without a long wait. In OOPSLA, pages 344--358, Oct. 2003.

Digital Library

[10]

L. Eeckhout, A. Georges, and K. De Bosschere. How Java programs interact with virtual machines at the microarchitectural level. In OOPSLA, pages 169- 186, Oct. 2003.

Digital Library

[11]

D. Gu, C. Verbrugge, and E. M. Gagnon. Relative factors in performance analysis of Java virtual machines. In VEE, pages 111--121, June 2006.

Digital Library

[12]

M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: Understanding the behavior of object-priented applications. In OOPSLA, pages 251--269, Oct. 2004.

Digital Library

[13]

J.L. Hintze, and R.D. Nelson. Violin Plots: A Box Plot-Density Trace Synergism In The American Statistician, Volume 52(2), pages 181--184, May 1998.

[14]

X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: Improving program locality. In OOPSLA, pages 69--80, Oct. 2004.

Digital Library

[15]

R. A. Johnson and D.W. Wichern Applied Multivariate Statistical Analysis Prentice Hall, 2002

Digital Library

[16]

D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000.

Digital Library

[17]

J. Maebe, D. Buytaert, L. Eeckhout, and K. De Bosschere. Javana: A system for building customized Java program analysis tools. In OOPSLA, pages 153--168, Oct. 2006.

Digital Library

[18]

P. McGachey and A. L. Hosking. Reducing generational copy reserve overhead with fallback compaction. In ISMM, pages 17--28, June 2006.

Digital Library

[19]

J. Neter, M. H. Kutner, W. Wasserman, and C. J. Nachtsheim Applied Linear Statistical Models WCB/McGraw-Hill, 1996.

[20]

N. Sachindran and J. E. B. Moss. Mark-copy: Fast copying GC with less space overhead. In OOPSLA, pages 326--343, Oct. 2003.

Digital Library

[21]

K. Sagonas and J. Wilhelmsson. Mark and split. In ISMM, pages 29--39, June 2006.

Digital Library

[22]

D. Siegwart and M. Hirzel. Improving locality with parallel hierarchical copying GC. In ISMM, pages 52--63, June 2006.

Digital Library

[23]

Standard Performance Evaluation Corporation. SPECjvm98 Benchmarks. http://www.spec.org/jvm98.

[24]

P. F. Sweeney, M. Hauswirth, B. Cahoon, P. Cheng, A. Diwan, D. Grove, and M. Hind. Using hardware performance monitors to understand the behavior of Java applications. In VM, pages 57--72, May 2004.

Digital Library

[25]

C. Zhang, K. Kelsey, X. Shen, C. Ding, MHertz, and M. Ogihara. Program--level adaptive memory management. In ISMM, pages 174--183, June 2006.

Digital Library

Cited By

Singh RMehta A(2024)Leveraging Partially Context-Sensitive Profiles for Enhanced AOT Compilation: A ReviewInternational Journal of Engineering and Advanced Technology10.35940/ijeat.A4538.1401102414:1(6-9)Online publication date: 30-Oct-2024
https://doi.org/10.35940/ijeat.A4538.14011024
Traini LDi Menna FCortellessa VFilkov VRay BZhou M(2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695017
Halalingaiah SSundaresan VMaier DNandivada V(2024)The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and EfficientlyProceedings of the ACM on Programming Languages10.1145/36898038:OOPSLA2(2606-2632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689803
Show More Cited By

Index Terms

Statistically rigorous java performance evaluation

Recommendations

The DaCapo benchmarks: java benchmarking development and analysis
OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications

Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection ...
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 2006 OOPSLA Conference

Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection ...
Statistically rigorous java performance evaluation
Proceedings of the 2007 OOPSLA conference

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

OOPSLA '07: Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems, languages and applications

October 2007

728 pages

ISBN:9781595937865

DOI:10.1145/1297027

General Chair:
Richard P. Gabriel
IBM Research, USA
,
Program Chairs:
David F. Bacon
IBM Research, USA
,
Cristina Videira Lopes
University of California, Irvine, USA
,
Guy L. Steele
Sun Labs, USA

ACM SIGPLAN Notices Volume 42, Issue 10
Proceedings of the 2007 OOPSLA conference
October 2007
686 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1297105
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

OOPSLA07

Sponsor:

OOPSLA07: ACM SIGPLAN Object Oriented Programming Systems and Applications Conference

October 21 - 25, 2007

Quebec, Montreal, Canada

Acceptance Rates

OOPSLA '07 Paper Acceptance Rate 33 of 156 submissions, 21%;

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

150
Total Citations
View Citations
2,880
Total Downloads

Downloads (Last 12 months)112
Downloads (Last 6 weeks)16

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh RMehta A(2024)Leveraging Partially Context-Sensitive Profiles for Enhanced AOT Compilation: A ReviewInternational Journal of Engineering and Advanced Technology10.35940/ijeat.A4538.1401102414:1(6-9)Online publication date: 30-Oct-2024
https://doi.org/10.35940/ijeat.A4538.14011024
Traini LDi Menna FCortellessa VFilkov VRay BZhou M(2024)AI-driven Java Performance Testing: Balancing Result Quality with Testing TimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695017(443-454)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695017
Halalingaiah SSundaresan VMaier DNandivada V(2024)The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and EfficientlyProceedings of the ACM on Programming Languages10.1145/36898038:OOPSLA2(2606-2632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689803
Norlinder JÖsterlund EBlack-Schaffer DWrigstad T(2024)Mark–Scavenge: Waiting for Trash to Take Itself OutProceedings of the ACM on Programming Languages10.1145/36897918:OOPSLA2(2268-2295)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689791
Sareen KBlackburn SHamouda SGidra LBond MLee JPayer H(2024)Memory Management on Mobile DevicesProceedings of the 2024 ACM SIGPLAN International Symposium on Memory Management10.1145/3652024.3665510(15-29)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3652024.3665510
Laaber CYue TAli S(2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3380836
Schirmer TPfandzelter TBermbach D(2024)ElastiBench: Scalable Continuous Benchmarking on Cloud FaaS Platforms2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00016(83-92)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00016
Galindo CPérez SSilva J(2024)The expression dependence graphJournal of Logical and Algebraic Methods in Programming10.1016/j.jlamp.2024.101016(101016)Online publication date: Sep-2024
https://doi.org/10.1016/j.jlamp.2024.101016
Willemsen FSchoonhoven RFilipovič JTørring Jvan Nieuwpoort Rvan Werkhoven B(2024)A methodology for comparing optimization algorithms for auto-tuningFuture Generation Computer Systems10.1016/j.future.2024.05.021159(489-504)Online publication date: Oct-2024
https://doi.org/10.1016/j.future.2024.05.021
Greenman B(2023)GTP Benchmarks for Gradual Typing PerformanceProceedings of the 2023 ACM Conference on Reproducibility and Replicability10.1145/3589806.3600034(102-114)Online publication date: 27-Jun-2023
https://dl.acm.org/doi/10.1145/3589806.3600034
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten