research-article

FACT: fast communication trace collection for parallel applications through program slicing

Authors:

Weimin ZhengAuthors Info & Claims

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Article No.: 27, Pages 1 - 12

https://doi.org/10.1145/1654059.1654087

Published: 14 November 2009 Publication History

Abstract

A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive.

In this paper, we propose a novel technique, called Fact, which can perform FAst Communication Trace collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. The program slice preserves all the variables and statements in the original program relevant to spatial and volume communication attributes. Our idea is based on an observation that most computation and message contents in message-passing parallel applications are independent of these attributes, and therefore can be removed from the programs for the purpose of communication trace collection.

We have implemented Fact and evaluated it with NPB programs and Sweep3D. The results show that Fact can preserve the spatial and volume communication attributes of original programs and reduce resource consumptions by two orders of magnitude in most cases. For example, Fact collects the communication traces of the Sweep3D for 512 processes on a 4-node (32 cores) platform in just 6.79 seconds, consuming 1.25 GB memory, while the original program takes 256.63 seconds and consumes 213.83 GB memory on a 32-node (512 cores) platform. Finally, we present an application of Fact.

References

[1]

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986.

Digital Library

[2]

A. W. Appel. Modern Compiler Implementation in C: Basic Techniques. Cambridge University Press, New York, USA, 1997.

Digital Library

[3]

Argonne National Laboratory. MPICH2. http://www.mcs.anl.gov/research/projects/mpich2.

[4]

D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. NAS Systems Division, NASA Ames Research Center, Moffett Field, CA, 1995.

[5]

J. Banning. An efficient way to find side effects of procedure calls and aliases of variables. In POPL, pages 29--41, 1979.

Digital Library

[6]

D. Binkley. The application of program slicing to regression testing. Information and Software Technology, 40(11--12):583--594, 1998.

[7]

G. Bronevetsky. Communication-sensitive static dataflow for parallel message passing applications. In CGO, 2009.

Digital Library

[8]

H. Chen, W. G. Chen, J. Huang, B. Robert, and H. Kuhn. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In ICS, 2006.

Digital Library

[9]

Y. Chen, S. Byna, X. Sun, R. Thakur, and W. Gropp. Hiding I/O latency with pre-execution prefetching for parallel applications. In SC'08, pages 1--10, 2008.

Digital Library

[10]

S. Chodnekar, V. Srinivasan, A. S. Vaidya, A. Sivasubramaniam, and C. R. Das. Towards a communication characterization methodology for parallel applications. In HPCA, 1997.

Digital Library

[11]

Z. Ding, R. Hoare, A. Jones, D. Li, S. Shao, S. Tung, J. Zheng, and R. Melhem. Switch design to enable predictive multiplexed switching. In IPDPS, page 100.1, 2005.

Digital Library

[12]

J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers, 2003.

Digital Library

[13]

A. Faraj and X. Yuan. Communication characteristics in the NAS parallel benchmarks. In International Conference on Parallel and Distributed Computing Systems, 2002.

[14]

J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.

Digital Library

[15]

K. B. Gallagher and J. R. Lyle. Using program slicing in software maintenance. IEEE Transactions on Software Engineering, 17(8):751--761, 1991.

Digital Library

[16]

M. Harman and S. Danicic. Using program slicing to simplify testing. Journal of Software Testing, Verification and Reliability, 5:143--162, 1995.

[17]

S. Ho and N. Lin. Static analysis of communication structures in parallel programs. In International Computer Symposium, 2002.

[18]

S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. ACM Trans. Program. Lang. Syst., 12(1):26--60, 1990.

Digital Library

[19]

Intel Ltd. Intel trace analyzer&collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm.

[20]

D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Supercomputing, pages 37--48, 2001.

Digital Library

[21]

J. Kim and D. J. Lilja. Characterization of communication patterns in message-passing parallel scientific application programs. In CANPC, pages 202--216, 1998.

Digital Library

[22]

J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. DiP: A parallel program development environment. In Euro-Par'96, pages 665--674, 1996.

Digital Library

[23]

LLNL. ASCI purple benchmark. https://asc.llnl.gov/computing_resources/purple/archive/benchmarks.

[24]

B. Mohr and F. Wolf. KOJAK--A tool set for automatic performance analysis of parallel programs. In Euro-Par, 2003.

[25]

S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997.

Digital Library

[26]

W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1), Jan. 1996.

[27]

Ohio State University. MVAPICH: MPI over infiniband and iWARP. http://mvapich.cse.ohio-state.edu.

[28]

R. Preissl, T. Köckerbauer, M. Schulz, D. Kranzlmüller, B. R. de Supinski, and D. J. Quinlan. Detecting patterns in MPI communication traces. In ICPP, pages 230--237, 2008.

Digital Library

[29]

R. Preissl, M. Schulz, D. Kranzlmüller, B. R. de Supinski, and D. J. Quinlan. Using MPI communication patterns to guide source code transformations. In ICCS, pages 253--260, 2008.

Digital Library

[30]

M. Schulz. Extracting critical path graphs from MPI applications. In CLUSTER, pages 1--10, 2005.

[31]

SGI. Open64 compiler and tools. http://www.open64.net.

[32]

S. Shao, A. K. Jones, and R. G. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In IPDPS, 2006.

Digital Library

[33]

S. Shende and A. D. Malony. TAU: The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2), 2006.

Digital Library

[34]

A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In SC, pages 1--17, 2002.

Digital Library

[35]

M. M. Strout, B. Kreaseck, and P. Hovland. Data-flow analysis for MPI programs. In ICPP, pages 175--184, 2006.

Digital Library

[36]

Tsinghua University. Proof of live-propagation slicing algorithm. http://www.hpctest.org.cn/paper/Thu-HPC-TR20090717.pdf, 2009.

[37]

J. S. Vetter and M. O. McCracken. Statistical scalability analysis of communication operations in distributed applications. In PPoPP, pages 123--132, 2001.

Digital Library

[38]

J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In IPDPS, pages 853--865, 2002.

Digital Library

[39]

M. Weiser. Programmers use slices when debugging. Communications of the ACM, 25(7):446--52, 1982.

Digital Library

[40]

M. Weiser. Program slicing. IEEE Transactions on Software Engineering, 10(4):352--357, 1984.

Digital Library

[41]

R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. MPIWiz: subgroup reproducible replay of mpi applications. In PPoPP, pages 251--260, 2009.

Digital Library

[42]

R. Zamani and A. Afsahi. Communication characteristics of message-passing scientific and engineering applications. In International Conference on Parallel and Distributed Computing Systems, 2005.

[43]

J. Zhang, J. Zhai, W. Chen, and W. Zheng. Process mapping for mpi collective communications. In Euro-Par, 2009.

Digital Library

Cited By

Huang HJin YXue W(2024)BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core SystemProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673131(262-272)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673131
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_3
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Fast Communication Trace CollectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_2(9-41)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_2
Show More Cited By

Index Terms

FACT: fast communication trace collection for parallel applications through program slicing

Recommendations

Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications

Communication patterns of parallel applications are important to optimize application performance and design better communication subsystems. Communication patterns can be extracted from communication traces. However, existing approaches to generate ...
MPACP: An Approach for Automatic Matching of Parallel Application Communication Patterns
APSCC '08: Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference

Current trends in HPC (High Performance Computing) suggest that clusters will soon consist with hundreds, if not thousands, processors and the size of current scientific problems becomes much larger than before. Many researchers have predicted that the ...
Optimizing MPI collective communication by orthogonal structures

MPI collective communication operations to distribute or gather data are used for many parallel applications from scientific computing, but they may lead to scalability problems since their execution times increase with the number of participating ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

November 2009

778 pages

ISBN:9781605587448

DOI:10.1145/1654059

Conference Chair:
Wilfred Pinfold

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '09

Sponsor:

SIGARCH
IEEE-CS

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 20, 2009

Oregon, Portland

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
61
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang HJin YXue W(2024)BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core SystemProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673131(262-272)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673131
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_3
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Fast Communication Trace CollectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_2(9-41)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_2
Miwa SLaguna ISchulz M(2021)PredCom: A Predictive Approach to Collecting Approximated Communication TracesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301112132:1(45-58)Online publication date: 1-Jan-2021
https://doi.org/10.1109/TPDS.2020.3011121
Obaida MLiu JChennupati GSanthi NEidenbenz SQuaglia FPellegrini ATheodoropoulos G(2018)Parallel Application Performance Prediction Using Analysis Based Models and HPC SimulationsProceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3200921.3200937(49-59)Online publication date: 14-May-2018
https://dl.acm.org/doi/10.1145/3200921.3200937
Panadero JWong ARexachs DLuque E(2018)P3S: A Methodology to Analyze and Predict Application ScalabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.276314829:3(642-658)Online publication date: 1-Mar-2018
https://doi.org/10.1109/TPDS.2017.2763148
Zhai JChen WZheng WLi K(2016)Performance Prediction for Large-Scale Parallel Applications Using Representative ReplayIEEE Transactions on Computers10.1109/TC.2015.247963065:7(2184-2198)Online publication date: 1-Jul-2016
https://doi.org/10.1109/TC.2015.2479630
Panadero JWong ARexachs DLuque E(2016)Synthetic Signature Program for Performance ScalabilityParallel Processing and Applied Mathematics10.1007/978-3-319-32149-3_33(345-355)Online publication date: 2-Apr-2016
https://doi.org/10.1007/978-3-319-32149-3_33
Jin YMa XLiu MLiu QLogan JPodhorszki NChoi JKlasky S(2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274587643:1(309-320)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2796314.2745876
Jin YMa XLiu MLiu QLogan JPodhorszki NChoi JKlasky SLin BXu JSengupta SShah D(2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745876(309-320)Online publication date: 15-Jun-2015
https://dl.acm.org/doi/10.1145/2745844.2745876
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents