Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1693453.1693493acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Published: 09 January 2010 Publication History

Abstract

For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation time in each process, communication time and their convolution. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines.
This paper proposes a novel approach to predict the sequential computation time accurately and efficiently. We assume that there is at least one node of the target platform but the whole target system need not be available. We make two main technical contributions. First, we employ deterministic replay techniques to execute any process of a parallel application on a single node at real speed. As a result, we can simply measure the real sequential computation time on a target node for each process one by one. Second, we observe that computation behavior of processes in parallel applications can be clustered into a few groups while processes in each group have similar computation behavior. This observation helps us reduce measurement time significantly because we only need to execute representative parallel processes instead of all of them.
We have implemented a performance prediction framework, called PHANTOM, which integrates the above computation-time acquisition approach with a trace-driven network simulator. We validate our approach on several platforms. For ASCI Sweep3D, the error of our approach is less than 5% on 1024 processor cores. Compared to a recent regression-based prediction approach, PHANTOM presents better prediction accuracy across different platforms.

References

[1]
A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the logp model for parallel computation. Journal of Parallel and Distributed Computing, 44(1): 71--79, 1997.
[2]
D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. NAS Systems Division, NASA Ames Research Center, Moffett Field, CA, 1995.
[3]
K. J. Barker, S. Pakin, and D. J. Kerbyson. A performance model of the krak hydrodynamics application. In ICPP'06, pages 245--254, 2006.
[4]
B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. In ICS'08, pages 368--377, 2008.
[5]
A. Bouteiller, G. Bosilca, and J. Dongarra. Retrospect: Deterministic replay of MPI applications for interactive distributed debugging. In EuroPVM/MPI, pages 297--306, 2007.
[6]
N. Choudhury, Y. Mehta, and T. L. W. et al. Scaling an optimistic parallel simulation of large-scale interconnection networks. In WSC'05, pages 591--600, 2005.
[7]
A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications, 14(4):330--346, 2000.
[8]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC'01, pages 37--48, 2001.
[9]
J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. DiP: A parallel program development environment. In Euro-Par'96, pages 665--674, 1996.
[10]
T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987.
[11]
B. C. Lee, D. M. Brooks, and B. R. de Supinski et al. Methods of inference and learning for performance modeling of parallel applications. In PPoPP'07, pages 249--258, 2007.
[12]
LLNL. ASCI purple benchmark. URL https://asc. llnl.gov/computing_resources/purple/archive/ benchmarks.
[13]
G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In SIGMETRICS'04, pages 2--13, 2004.
[14]
M. Maruyama, T. Tsumura, and H. Nakashima. Parallel program debugging based on data-replay. In PDCS'05, pages 151--156, 2005.
[15]
M. Mathias, D. Kerbyson, and A. Hoisie. A performance model of non-deterministic particle transport on large-scale systems. In Workshop on Performance Modeling and Analysis. ICCS, 2003.
[16]
S. Prakash and R. Bagrodia. MPI-SIM: Using parallel simulation to evaluate MPI programs. In Winter Simulation Conference, pages 467--474, 1998.
[17]
S. Shao, A. K. Jones, and R. G. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In IPDPS, 2006.
[18]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, pages 45--57, 2002.
[19]
A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In SC'02, pages 1--17, 2002.
[20]
D. Sundaram-Stukel and M. K. Vernon. Predictive analysis of a wavefront application using LogGP. In PPoPP, pages 141--150, 1999.
[21]
R. Susukita, H. Ando, and M. A. et al. Performance prediction of large--scale parallell system and application using macro-level simulation. In SC'08, pages 1--9, 2008.
[22]
Tsinghua University. SIM-MPI simulator. URL http://www. hpctest.org.cn/resources/sim-mpi.tgz.
[23]
T. Wilmarth, G. Zheng, and E. J. B. et al. Performance prediction using simulation of large-scale interconnection networks in POSE. In Proc. 19th Workshop on Parallel and Distributed Simulation, pages 109--118, 2005.
[24]
R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. MPIWiz: subgroup reproducible replay of mpi applications. In PPoPP'09, pages 251--260, 2009.
[25]
L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC'05, page 40, 2005.
[26]
J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: fast communication trace collection for parallel applications through program slicing. In SC'09, 2009.
[27]
G. Zheng, G. Kakulapati, and L. V. Kale. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS'04, pages 78--87, 2004.
[28]
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In PLDI'04, pages 255--266, 2004.

Cited By

View all
  • (2023)Leveraging simulation of high performance computing systems with node simulation using architecture simulatorCCF Transactions on High Performance Computing10.1007/s42514-023-00173-95:4(442-464)Online publication date: 13-Nov-2023
  • (2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
  • (2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
January 2010
372 pages
ISBN:9781605588773
DOI:10.1145/1693453
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 45, Issue 5
    PPoPP '10
    May 2010
    346 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1837853
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deterministic replay
  2. parallel application
  3. performance prediction
  4. trace-driven simulation

Qualifiers

  • Research-article

Conference

PPoPP '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Leveraging simulation of high performance computing systems with node simulation using architecture simulatorCCF Transactions on High Performance Computing10.1007/s42514-023-00173-95:4(442-464)Online publication date: 13-Nov-2023
  • (2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
  • (2023)Graph Analysis for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_5(101-128)Online publication date: 19-Jun-2023
  • (2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
  • (2023)Fast Communication Trace CollectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_2(9-41)Online publication date: 19-Jun-2023
  • (2023)Background and OverviewPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_1(1-5)Online publication date: 19-Jun-2023
  • (2021)Lossy Compression of Communication Traces Using Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3132417(1-1)Online publication date: 2021
  • (2020)Identifying scalability bottlenecks for large-scale parallel programs with graph analysisProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374518(409-410)Online publication date: 19-Feb-2020
  • (2020)Performance prediction of parallel applications: a systematic literature reviewThe Journal of Supercomputing10.1007/s11227-020-03417-5Online publication date: 10-Sep-2020
  • (2019)Selecting Efficient Cloud Resources for HPC WorkloadsProceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3344341.3368798(155-164)Online publication date: 2-Dec-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media