research-article

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

Authors:

Norbert Podhorszki,

Jong Youl Choi, and

Scott KlaskyAuthors Info & Claims

ACM SIGMETRICS Performance Evaluation Review, Volume 43, Issue 1

Pages 309 - 320

https://doi.org/10.1145/2796314.2745876

Published: 15 June 2015 Publication History

Abstract

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.

References

[1]

APPrime Website. http://www.apprimecodes.org/.

[2]

DOE INCITE. http://www.doeleadershipcomputing.org/awards/.

[3]

OLCF Titan. https://www.olcf.ornl.gov/titan/.

[4]

H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng. DataStager: Scalable Data Staging Services for Petascale Applications. In Cluster Computing, 2010.

Digital Library

[5]

H. Adalsteinsson, S. Cranford, D. A. Evensky, J. P. Kenny, J. Mayo, A. Pinar, and C. L. Janssen. A Simulator for Large-Scale Parallel Computer Architectures. In International Journal of Distributed Systems and Technologies, 2010.

Digital Library

[6]

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 2010.

Digital Library

[7]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmark: Summary and preliminary results. In Supercomputing, pages 158--165, New York, NY, USA, 1991.

Digital Library

[8]

H. Brunst, H.-C. Hoppe, W. E. Nagel, and M. Winkler. Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In International Conference on Computational Science, 2001.

Digital Library

[9]

M. Casas, R. M. Badia, and J. Labarta. Automatic Phase Detection and Structure Extraction of MPI Applications. International Journal of High Performance Computing Applications, 2010.

Digital Library

[10]

D. P. Doane. Aesthetic Frequency Classifications. The American Statistician, 1976.

[11]

J. Dujmović. Automatic Generation of Benchmark and Test Workloads. In WOSP/SIPEW, 2010.

Digital Library

[12]

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In 11th European PVM/MPI Users' Group Meeting, pages 97--104, Budapest, Hungary, September 2004.

[13]

M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr. The Scalasca Performance Toolset Architecture. In Concurrency and Computation: Practice and Experience, 2010.

Digital Library

[14]

GTC2link. GTC-benchmark in NERSC-8 suite, 2013.

[15]

C. L. Janssen, H. Adalsteinsson, and J. P. Kenny. Using Simulation to Design Extremescale Applications and Architectures: Programming Model Exploration. ACM SIGMETRICS, 2011.

Digital Library

[16]

A. M. Joshi, L. Eeckhout, and L. K. John. The Return of Synthetic Benchmarks. In SPEC Benchmark Workshop, 2008.

[17]

J. P. Kenny, G. Hendry, B. Allan, and D. Zhang. Dumpi: The mpi profiler from the sst simulator suite. https://bitbucket.org/jpkenny/dumpi, 2011.

[18]

A. Knupfer, R. Brendel, H. Brunst, H. Mix, and W. Nagel. Introducing the Open Trace Format (OTF). In V. Alexandrov, G. Albada, P. Sloot, and J. Dongarra, editors, International Conference on Computational Science, 2006.

Digital Library

[19]

R. Latham, C. Daley, W. keng Liao, K. Gao, R. Ross, A. Dubey, and A. Choudhary. A case study for scientific i/o: improving the flash astrophysics code. CSD, 5(1):015001, 2012.

[20]

J. Logan, S. Klasky, H. Abbasi, Q. Liu, G. Ostrouchov, M. Parashar, N. Podhorszki, Y. Tian, and M. Wolf. Understanding I/O Performance Using I/O Skeletal Applications. In Euro-Par, 2012.

Digital Library

[21]

M. Noeth, F. Mueller, M. Schulz, and B. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In IPDPS, 2007.

[22]

M. Noeth, P. Ratn, F. Mueller, M. Schulz, and B. R. de Supinski. ScalaTrace: Scalable Compression and Replay of Communication Traces for High-Performance Computing. J. Parallel Distrib. Comput., 2009.

Digital Library

[23]

F. Pachet, P. Roy, and G. Barbieri. Finite-length Markov Processes with Constraints. In IJCAI, 2011.

Digital Library

[24]

M. Pedersoli, A. Vedaldi, and J. Gonzalez. A coarse-to-fine approach for fast deformable object detection. In Computer Vision and Pattern Recognition, pages 1353--1360, 2011.

Digital Library

[25]

L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. ASSP Magazine, pages 4--15, January 1986.

[26]

S. Ku, C. S. Chang, and P. H. Diamond. Full-f Gyrokinetic Particle Simulation of Centrally Heated Global ITG Turbulence from Magnetic Axis to Edge Pedestal Top in A Realistic Tokamak Geometry. Nuclear Fusion, 2009.

[27]

M. Seltzer, D. Krinsky, K. Smith, and X. Zhang. The Case for Application-Specific Benchmarking. In Hot Topics in Operating Systems, 1999.

Digital Library

[28]

S. Shao, A. K. Jones, and R. Melhem. A Compiler-based Communication Analysis Approach for Multiprocessor Systems. In IPDPS, 2006.

Digital Library

[29]

S. Shende and A. D. Malony. TAU: The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2), 2006.

Digital Library

[30]

H. A. Sturges. The Choice of a Class Interval. Journal of the American Statistical Association, 1926.

[31]

G. Vahala, M. Soe, B. Zhang, J. Yepez, L. Vahala, J. Carter, and S. Ziegeler. Unitary Qubit Lattice Simulations of Multiscale Phenomena in Quantum Turbulence. In Supercomputing, 2011.

Digital Library

[32]

L. Van Ertvelde and L. Eeckhout. Dispersing Proprietary Applications as Benchmarks Through Code Mutation. ACM SIGOPS OSR, 2008.

Digital Library

[33]

J. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In IPDPS, 2002.

Digital Library

[34]

J. S. Vetter and M. O. McCracken. Statistical Scalability Analysis of Communication Operations in Distributed Applications. ACM SIGPLAN, 2001.

Digital Library

[35]

W. X. Wang and Z. Lin and W. M. Tang and W. W. Lee and S. Ethier and J. L. V. Lewandowski and G. Rewoldt and T. S. Hahm and J. Manickam. Gyro-kinetic Simulation of Global Turbulent Transport Properties in Tokamak Experiments. Physics of Plasmas, 2006.

[36]

X. Wu, V. Deshpande, and F. Mueller. ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces. In IPDPS, 2012.

Digital Library

[37]

X. Wu and F. Mueller. ScalaExtrap: Trace-based Communication Extrapolation for SPMD Programs. In ACM PPoPP, 2011.

Digital Library

[38]

X. Wu, K. Vijayakumar, F. Mueller, X. Ma, and P. Roth. Probabilistic communication and i/o tracing with deterministic replay at scale. In ICPP, 2011.

Digital Library

[39]

Q. Xu and J. Subhlok. Construction and Evaluation of Coordinated Performance Skeletons. In HiPC, 2008.

Digital Library

[40]

Q. Xu, J. Subhlok, R. Zheng, and S. Voss. Logicalization of Communication Traces from Parallel Execution. In IISWC, 2009.

Digital Library

[41]

L. T. Yang, X. Ma, and F. Mueller. Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution. In Supercomputing, 2005.

Digital Library

[42]

F. Yu, M. Alkhalaf, and T. Bultan. Stranger: An Automata-Based String Analysis Tool for PHP. In J. Esparza and R. Majumdar, editors, Lecture Notes in Computer Science. 2010.

Digital Library

[43]

J. Zhai, J. Hu, X. Tang, X. Ma, and W. Chen. Cypress: Combining static and dynamic analysis for top-down communication trace compression. In Supercomputing, 2014.

Digital Library

[44]

J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: Fast Communication Trace Collection for Parallel Applications Through Program Slicing. In Supercomputing, 2009.

Digital Library

Cited By

Sun JYan TSun HLin HSun G(2021)Lossy Compression of Communication Traces Using Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3132417(1-1)Online publication date: 2021
https://doi.org/10.1109/TPDS.2021.3132417
Patel TGarg RTiwari DNoh SWelch B(2020)GIFTProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386702(103-120)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.5555/3386691.3386702
Patel TByna SLockwood GWright NCarns PRoss RTiwari DNoh SWelch B(2020)Uncovering access, reuse, and sharing characteristics of I/O-intensive files on large-scale production HPC systemsProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386701(91-102)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.5555/3386691.3386701
Show More Cited By

Index Terms

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel ...
Read More
Combining phase identification and statistic modeling for automated parallel benchmark generation
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel ...
Read More
Combining phase identification and statistic modeling for automated parallel benchmark generation
PPoPP '15

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review

ACM SIGMETRICS Performance Evaluation Review Volume 43, Issue 1

Performance evaluation review

June 2015

468 pages

ISSN:0163-5999

DOI:10.1145/2796314

Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary

Issue’s Table of Contents

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
June 2015
488 pages
ISBN:9781450334860
DOI:10.1145/2745844
General Chairs:
Bill Lin
University of California, San Diego
,
Jun (Jim) Xu
Georgia Tech
,
Program Chairs:
Sudipta Sengupta
Microsoft Research
,
Devavrat Shah
Massachusetts Institute of Technology

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2015

Published in SIGMETRICS Volume 43, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
255
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Sun JYan TSun HLin HSun G(2021)Lossy Compression of Communication Traces Using Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3132417(1-1)Online publication date: 2021
https://doi.org/10.1109/TPDS.2021.3132417
Patel TGarg RTiwari DNoh SWelch B(2020)GIFTProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386702(103-120)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.5555/3386691.3386702
Patel TByna SLockwood GWright NCarns PRoss RTiwari DNoh SWelch B(2020)Uncovering access, reuse, and sharing characteristics of I/O-intensive files on large-scale production HPC systemsProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386701(91-102)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.5555/3386691.3386701
Hao MZhang WZhang YSnir MYang L(2019)Automatic generation of benchmarks for I/O-intensive parallel applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2018.10.004124(1-13)Online publication date: Feb-2019
https://doi.org/10.1016/j.jpdc.2018.10.004
Gracia-Tinedo RZou CSanchez-Artigas MGarcia-Lopez P(2018)BenchBox: A User-Driven Benchmarking Framework for Fat-Client Storage SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281965729:10(2191-2205)Online publication date: 1-Oct-2018
https://doi.org/10.1109/TPDS.2018.2819657
Kim YMercati PMore AShriver ERosing TParameswaran S(2017)P4Proceedings of the 36th International Conference on Computer-Aided Design10.5555/3199700.3199791(683-690)Online publication date: 13-Nov-2017
https://dl.acm.org/doi/10.5555/3199700.3199791
Kim YMercati PMore AShriver ERosing T(2017) P 4 : Phase-based power/performance prediction of heterogeneous systems via neural networks 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)10.1109/ICCAD.2017.8203843(683-690)Online publication date: Nov-2017
https://doi.org/10.1109/ICCAD.2017.8203843
Dickson JWright SMaheswaran SHerdmant AMiller MJarvis S(2016)Replicating HPC I/O workloads with proxy applicationsProceedings of the 1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems10.5555/3019046.3019049(13-18)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019046.3019049
Dickson JWright SMaheswaran SHerdman AMiller MJarvis S(2016)Replicating HPC I/O Workloads with Proxy Applications2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)10.1109/PDSW-DISCS.2016.007(13-18)Online publication date: Nov-2016
https://doi.org/10.1109/PDSW-DISCS.2016.007
Qiao ZLiu QPodhorszki NKlasky SChen JCuicchi CQualters IKramer W(2020)Taming I/O variation on QoS-less HPC storageProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433715(1-13)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433715
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents