research-article

STABILIZER: statistically sound performance evaluation

Authors:

Charlie Curtsinger,

Emery D. BergerAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 41, Issue 1

Pages 219 - 228

https://doi.org/10.1145/2490301.2451141

Published: 16 March 2013 Publication History

Abstract

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.

Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.

This paper presents Stabilizer, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. Stabilizer thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate Stabilizer's efficiency (<7% median overhead) and its effectiveness by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

References

[1]

A. Alameldeen and D. Wood. Variability in Architectural Simulations of Multi-threaded Workloads. In HPCA '03, pp. 7--18. IEEE Computer Society, 2003.

Digital Library

[2]

L. E. Bassham, III, A. L. Rukhin, J. Soto, J. R. Nechvatal, M. E. Smid, E. B. Barker, S. D. Leigh, M. Levenson, M. Vangel, D. L. Banks, N. A. Heckert, J. F. Dray, and S. Vo. SP 800--22 Rev. 1a. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Tech. rep., National Institute of Standards & Technology, Gaithersburg, MD, United States, 2010.

Digital Library

[3]

E. D. Berger and B. G. Zorn. DieHard: Probabilistic Memory Safety for Unsafe Languages. In PLDI '06, pp. 158--168. ACM, 2006.

Digital Library

[4]

E. D. Berger, B. G. Zorn, and K. S. McKinley. Composing High-Performance Memory Allocators. In PLDI '01, pp. 114--124. ACM, 2001.

Digital Library

[5]

S. Bhatkar, D. C. DuVarney, and R. Sekar. Address Obfuscation: an Efficient Approach to Combat a Broad Range of Memory Error Exploits. In USENIX Security '03, pp. 8--8. USENIX Association, 2003.

Digital Library

[6]

S. Bhatkar, R. Sekar, and D. C. DuVarney. Efficient Techniques for Comprehensive Protection from Memory Error Exploits. In SSYM '05, pp. 271---286. USENIX Association, 2005.

Digital Library

[7]

S. M. Blackburn, A. Diwan, M. Hauswirth, A. M. Memon, and P. F. Sweeney. Workshop on Experimental Evaluation of Software and Systems in Computer Science (Evaluate 2010). In SPLASH '10, pp. 291--292. ACM, 2010.

Digital Library

[8]

S. M. Blackburn, A. Diwan, M. Hauswirth, P. F. Sweeney, et al. TR 1: Can You Trust Your Experimental Results? Tech. rep., Evaluate Collaboratory, 2012.

[9]

A. Demers, M. Weiser, B. Hayes, H. Boehm, D. Bobrow, and S. Shenker. Combining Generational and Conservative Garbage Collection: Framework and Implementations. In POPL '90, pp. 261--269. ACM, 1990.

Digital Library

[10]

R. Durstenfeld. Algorithm 235: Random Permutation. Communications of the ACM, 7(7):420, 1964.

Digital Library

[11]

W. Feller. An Introduction to Probability Theory and Applications, volume 1. John Wiley & Sons Publishers, 3rd edition, 1968.

[12]

A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In OOPSLA '07, pp. 57--76. ACM, 2007.

Digital Library

[13]

G. Hamerly, E. Perelman, J. Lau, B. Calder, and T. Sherwood. Using Machine Learning to Guide Architecture Simulation. Journal of Machine Learning Research, 7:343--378, Dec. 2006.

Digital Library

[14]

C. A. R. Hoare. Quicksort. The Computer Journal, 5(1):10--16, 1962.

[15]

D. A. Jiménez. Code Placement for Improving Dynamic Branch Prediction Accuracy. In PLDI '05, pp. 107--116. ACM, 2005.

Digital Library

[16]

C. Kil, J. Jun, C. Bookholt, J. Xu, and P. Ning. Address Space Layout Permutation (ASLP): Towards Fine-Grained Randomization of Commodity Software. In ACSAC '06, pp. 339--348. IEEE Computer Society, 2006.

Digital Library

[17]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04, pp. 75--86. IEEE Computer Society, 2004.

Digital Library

[18]

G. Marsaglia. Random Number Generation. In Encyclopedia of Computer Science, 4th Edition, pp. 1499--1503. John Wiley and Sons Ltd., Chichester, UK, 2003.

[19]

M. Masmano, I. Ripoll, A. Crespo, and J. Real. TLSF: A New Dynamic Memory Allocator for Real-Time Systems. In ECRTS '04, pp. 79--86. IEEE Computer Society, 2004.

Digital Library

[20]

I. Molnar. Exec-Shield. http://people.redhat.com/mingo/exec-shield/.

[21]

D. A. Moon. Garbage Collection in a Large LISP System. In LFP '84, pp. 235--246. ACM, 1984.

Digital Library

[22]

T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! In ASPLOS '09, pp. 265--276. ACM, 2009.

Digital Library

[23]

G. Novark and E. D. Berger. DieHarder: Securing the Heap. In CCS '10, pp. 573--584. ACM, 2010.

Digital Library

[24]

G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: Automatically Correcting Memory Errors with High Probability. Communications of the ACM, 51(12):87--95, 2008.

Digital Library

[25]

The Chromium Project. Performance Dashboard. http://build.chromium.org/f/chromium/perf/dashboard/overview.html.

[26]

The LLVM Team. Clang: a C Language Family Frontend for LLVM. http://clang.llvm.org, 2012.

[27]

The LLVM Team. Dragonegg - Using LLVM as a GCC Backend. http://dragonegg.llvm.org, 2013.

[28]

The Mozilla Foundation. Buildbot/Talos. https://wiki.mozilla.org/Buildbot/Talos.

[29]

The PaX Team. The PaX Project. http://pax.grsecurity.net, 2001.

[30]

D. Tsafrir and D. Feitelson. Instability in Parallel Job Scheduling Simulation: the Role of Workload Flurries. In IPDPS '06. IEEE Computer Society, 2006.

Digital Library

[31]

D. Tsafrir, K. Ouaknine, and D. G. Feitelson. Reducing Performance Evaluation Sensitivity and Variability by Input Shaking. In MASCOTS '07, pp. 231--237. IEEE Computer Society, 2007.

Digital Library

[32]

F. Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80--83, 1945.

[33]

P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. Dynamic Storage Allocation: A Survey and Critical Review. Lecture Notes in Computer Science, 986, 1995.

Digital Library

[34]

H. Xu and S. J. Chapin. Improving Address Space Randomization with a Dynamic Offset Randomization Technique. In SAC '06, pp. 384--391. ACM, 2006.

Digital Library

[35]

J. Xu, Z. Kalbarczyk, and R. Iyer. Transparent Runtime Randomization for Security. In SRDS '03, pp. 260--269. IEEE Computer Society, 2003.

Cited By

Li YBao YChung Y(2024)Randomize the Running Function When It Is DisclosedIEEE Transactions on Computers10.1109/TC.2024.337177673:6(1516-1530)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1109/TC.2024.3371776
Ahmadi AAbdelhafez HJaiswal SPattabiraman KRipeanu MDing APeltonen EHilt VAral A(2023)Hot Under the Hood: An Analysis of Ambient Temperature Impact on Heterogeneous Edge PlatformsProceedings of the 6th International Workshop on Edge Systems, Analytics and Networking10.1145/3578354.3592868(25-30)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3578354.3592868
Swanzen JBotes KMolvi HMonchwe OPhala Dvan der Haar DVieira MCardellini VDi Marco ATuma P(2023)Analysing Static Source Code Features to Determine a Correlation to Steady State Performance in Java MicrobenchmarksCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584692(89-93)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578245.3584692
Show More Cited By

Index Terms

STABILIZER: statistically sound performance evaluation

Recommendations

STABILIZER: statistically sound performance evaluation
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade ...
STABILIZER: statistically sound performance evaluation
ASPLOS '13

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade ...
Statistical nonparametric mapping: Multivariate permutation tests for location, correlation, and regression problems in neuroimaging
Nonparametric statistical inference via permutation testing is on the rise in neuroimaging research. This rise in popularity is likely in response to recent studies that have demonstrated limitations of parametric inference in certain situations. ...
Permutation and sampling distributions for T statistic given n samples from a left skewed distribution. Red shading denotes 5% in each tail. image image

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 41, Issue 1

ASPLOS '13

March 2013

540 pages

ISSN:0163-5964

DOI:10.1145/2490301

Issue’s Table of Contents

ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
March 2013
574 pages
ISBN:9781450318709
DOI:10.1145/2451116
General Chair:
Vivek Sarkar
Rice University, USA
,
Program Chair:
Rastislav Bodik
University of California, Berkeley, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013

Published in SIGARCH Volume 41, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

115
Total Citations
View Citations
1,567
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)9

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li YBao YChung Y(2024)Randomize the Running Function When It Is DisclosedIEEE Transactions on Computers10.1109/TC.2024.337177673:6(1516-1530)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1109/TC.2024.3371776
Ahmadi AAbdelhafez HJaiswal SPattabiraman KRipeanu MDing APeltonen EHilt VAral A(2023)Hot Under the Hood: An Analysis of Ambient Temperature Impact on Heterogeneous Edge PlatformsProceedings of the 6th International Workshop on Edge Systems, Analytics and Networking10.1145/3578354.3592868(25-30)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3578354.3592868
Swanzen JBotes KMolvi HMonchwe OPhala Dvan der Haar DVieira MCardellini VDi Marco ATuma P(2023)Analysing Static Source Code Features to Determine a Correlation to Steady State Performance in Java MicrobenchmarksCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584692(89-93)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578245.3584692
Iqbal MKrishna RJavidian MRay BJamshidi PBromberg YKermarrec AKozyrakis C(2022)UnicornProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519575(199-217)Online publication date: 28-Mar-2022
https://dl.acm.org/doi/10.1145/3492321.3519575
Uta ACustura ADuplyakin DJimenez IRellermeyer JMaltzahn CRicci RIosup ABhagwan RPorter G(2020)Is big data performance reproducible in modern cloud networks?Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388280(513-528)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388280
Ahmed SXiao YSnow KTan GMonrose FYao DLigatti JOu XKatz JVigna G(2020)Methodologies for Quantifying (Re-)randomization Security and Timing under JIT-ROPProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417248(1803-1820)Online publication date: 30-Oct-2020
https://dl.acm.org/doi/10.1145/3372297.3417248
Agosta GFornaciari WAtienza DCanal RCilardo AFlich Cardo JHernandez Luz CKulczewski MMassari GTornero Gavilá RZapater M(2020)The RECIPE approach to challenges in deeply heterogeneous high performance systemsMicroprocessors & Microsystems10.1016/j.micpro.2020.10318577:COnline publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1016/j.micpro.2020.103185
Maricq ADuplyakin DJimenez IMaltzahn CStutsman RRicci RArpaci-Dusseau AVoelker G(2018)Taming performance variabilityProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291198(409-425)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291198
Fornaciari WAgosta GAtienza DBrandolese CCammoun LCremona LCilardo AFarres AFlich JHernandez CKulchewski MLibutti SMartínez JMassari GOleksiak APupykina AReghenzani FTornero RZanella MZapater MZoni DMudge TPnevmatikatos D(2018)Reliable power and time-constraints-aware predictive management of heterogeneous exascale systemsProceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1145/3229631.3239368(187-194)Online publication date: 15-Jul-2018
https://dl.acm.org/doi/10.1145/3229631.3239368
Backes MNürnberger SFu K(2014)OxymoronProceedings of the 23rd USENIX conference on Security Symposium10.5555/2671225.2671253(433-447)Online publication date: 20-Aug-2014
https://dl.acm.org/doi/10.5555/2671225.2671253
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents