research-article

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Authors:

Weimin ZhengAuthors Info & Claims

PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 305 - 314

https://doi.org/10.1145/1693453.1693493

Published: 09 January 2010 Publication History

Abstract

For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation time in each process, communication time and their convolution. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines.

This paper proposes a novel approach to predict the sequential computation time accurately and efficiently. We assume that there is at least one node of the target platform but the whole target system need not be available. We make two main technical contributions. First, we employ deterministic replay techniques to execute any process of a parallel application on a single node at real speed. As a result, we can simply measure the real sequential computation time on a target node for each process one by one. Second, we observe that computation behavior of processes in parallel applications can be clustered into a few groups while processes in each group have similar computation behavior. This observation helps us reduce measurement time significantly because we only need to execute representative parallel processes instead of all of them.

We have implemented a performance prediction framework, called PHANTOM, which integrates the above computation-time acquisition approach with a trace-driven network simulator. We validate our approach on several platforms. For ASCI Sweep3D, the error of our approach is less than 5% on 1024 processor cores. Compared to a recent regression-based prediction approach, PHANTOM presents better prediction accuracy across different platforms.

References

[1]

A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the logp model for parallel computation. Journal of Parallel and Distributed Computing, 44(1): 71--79, 1997.

Digital Library

[2]

D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. NAS Systems Division, NASA Ames Research Center, Moffett Field, CA, 1995.

[3]

K. J. Barker, S. Pakin, and D. J. Kerbyson. A performance model of the krak hydrodynamics application. In ICPP'06, pages 245--254, 2006.

Digital Library

[4]

B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. In ICS'08, pages 368--377, 2008.

Digital Library

[5]

A. Bouteiller, G. Bosilca, and J. Dongarra. Retrospect: Deterministic replay of MPI applications for interactive distributed debugging. In EuroPVM/MPI, pages 297--306, 2007.

Digital Library

[6]

N. Choudhury, Y. Mehta, and T. L. W. et al. Scaling an optimistic parallel simulation of large-scale interconnection networks. In WSC'05, pages 591--600, 2005.

Digital Library

[7]

A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications, 14(4):330--346, 2000.

Digital Library

[8]

D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC'01, pages 37--48, 2001.

Digital Library

[9]

J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. DiP: A parallel program development environment. In Euro-Par'96, pages 665--674, 1996.

Digital Library

[10]

T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987.

Digital Library

[11]

B. C. Lee, D. M. Brooks, and B. R. de Supinski et al. Methods of inference and learning for performance modeling of parallel applications. In PPoPP'07, pages 249--258, 2007.

Digital Library

[12]

LLNL. ASCI purple benchmark. URL https://asc. llnl.gov/computing_resources/purple/archive/ benchmarks.

[13]

G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In SIGMETRICS'04, pages 2--13, 2004.

Digital Library

[14]

M. Maruyama, T. Tsumura, and H. Nakashima. Parallel program debugging based on data-replay. In PDCS'05, pages 151--156, 2005.

[15]

M. Mathias, D. Kerbyson, and A. Hoisie. A performance model of non-deterministic particle transport on large-scale systems. In Workshop on Performance Modeling and Analysis. ICCS, 2003.

Digital Library

[16]

S. Prakash and R. Bagrodia. MPI-SIM: Using parallel simulation to evaluate MPI programs. In Winter Simulation Conference, pages 467--474, 1998.

Digital Library

[17]

S. Shao, A. K. Jones, and R. G. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In IPDPS, 2006.

Digital Library

[18]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, pages 45--57, 2002.

Digital Library

[19]

A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In SC'02, pages 1--17, 2002.

Digital Library

[20]

D. Sundaram-Stukel and M. K. Vernon. Predictive analysis of a wavefront application using LogGP. In PPoPP, pages 141--150, 1999.

Digital Library

[21]

R. Susukita, H. Ando, and M. A. et al. Performance prediction of large--scale parallell system and application using macro-level simulation. In SC'08, pages 1--9, 2008.

Digital Library

[22]

Tsinghua University. SIM-MPI simulator. URL http://www. hpctest.org.cn/resources/sim-mpi.tgz.

[23]

T. Wilmarth, G. Zheng, and E. J. B. et al. Performance prediction using simulation of large-scale interconnection networks in POSE. In Proc. 19th Workshop on Parallel and Distributed Simulation, pages 109--118, 2005.

Digital Library

[24]

R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. MPIWiz: subgroup reproducible replay of mpi applications. In PPoPP'09, pages 251--260, 2009.

Digital Library

[25]

L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC'05, page 40, 2005.

Digital Library

[26]

J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: fast communication trace collection for parallel applications through program slicing. In SC'09, 2009.

Digital Library

[27]

G. Zheng, G. Kakulapati, and L. V. Kale. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS'04, pages 78--87, 2004.

[28]

Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In PLDI'04, pages 255--266, 2004.

Digital Library

Cited By

Lin FLiu YWang XGai X(2023)Leveraging simulation of high performance computing systems with node simulation using architecture simulatorCCF Transactions on High Performance Computing10.1007/s42514-023-00173-95:4(442-464)Online publication date: 13-Nov-2023
https://doi.org/10.1007/s42514-023-00173-9
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_6
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_5
Show More Cited By

Index Terms

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Recommendations

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node
PPoPP '10

For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by ...
Extraction of Parallel Application Signatures for Performance Prediction
HPCC '10: Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications

Predicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of ...
Parallel Application Signature for Performance Analysis and Prediction
Predicting the performance of parallel scientific applications is becoming increasingly complex. Our goal was to characterize the behavior of message-passing applications on different target machines. To achieve this goal, we developed a method called ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

January 2010

372 pages

ISBN:9781605588773

DOI:10.1145/1693453

General Chairs:
R. Govindarajan
Indian Institute of Science
,
David Padua
UIUC
,
Program Chair:
Mary Hall
University of Utah

ACM SIGPLAN Notices Volume 45, Issue 5
PPoPP '10
May 2010
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1837853
Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '10

Sponsor:

SIGPLAN

PPoPP '10: ACM SIGPLAN Principles and Practice of Parallel Computing

January 9 - 14, 2010

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

123
Total Citations
View Citations
976
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin FLiu YWang XGai X(2023)Leveraging simulation of high performance computing systems with node simulation using architecture simulatorCCF Transactions on High Performance Computing10.1007/s42514-023-00173-95:4(442-464)Online publication date: 13-Nov-2023
https://doi.org/10.1007/s42514-023-00173-9
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_6
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_5
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_3
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Fast Communication Trace CollectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_2(9-41)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_2
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Background and OverviewPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_1(1-5)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_1
Sun JYan TSun HLin HSun G(2021)Lossy Compression of Communication Traces Using Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3132417(1-1)Online publication date: 2021
https://doi.org/10.1109/TPDS.2021.3132417
Jin YWang HTang XHoefler TLiu XZhai JGupta RShen X(2020)Identifying scalability bottlenecks for large-scale parallel programs with graph analysisProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374518(409-410)Online publication date: 19-Feb-2020
https://dl.acm.org/doi/10.1145/3332466.3374518
Flores-Contreras JDuran-Limon HChavoya AAlmanza-Ruiz S(2020)Performance prediction of parallel applications: a systematic literature reviewThe Journal of Supercomputing10.1007/s11227-020-03417-5Online publication date: 10-Sep-2020
https://doi.org/10.1007/s11227-020-03417-5
Brunetta JBorin EJohnson KSpillner JKlusáček DAnjum A(2019)Selecting Efficient Cloud Resources for HPC WorkloadsProceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3344341.3368798(155-164)Online publication date: 2-Dec-2019
https://dl.acm.org/doi/10.1145/3344341.3368798
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents