Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1654059.1654087acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

FACT: fast communication trace collection for parallel applications through program slicing

Published: 14 November 2009 Publication History

Abstract

A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive.
In this paper, we propose a novel technique, called Fact, which can perform FAst Communication Trace collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. The program slice preserves all the variables and statements in the original program relevant to spatial and volume communication attributes. Our idea is based on an observation that most computation and message contents in message-passing parallel applications are independent of these attributes, and therefore can be removed from the programs for the purpose of communication trace collection.
We have implemented Fact and evaluated it with NPB programs and Sweep3D. The results show that Fact can preserve the spatial and volume communication attributes of original programs and reduce resource consumptions by two orders of magnitude in most cases. For example, Fact collects the communication traces of the Sweep3D for 512 processes on a 4-node (32 cores) platform in just 6.79 seconds, consuming 1.25 GB memory, while the original program takes 256.63 seconds and consumes 213.83 GB memory on a 32-node (512 cores) platform. Finally, we present an application of Fact.

References

[1]
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986.
[2]
A. W. Appel. Modern Compiler Implementation in C: Basic Techniques. Cambridge University Press, New York, USA, 1997.
[3]
Argonne National Laboratory. MPICH2. http://www.mcs.anl.gov/research/projects/mpich2.
[4]
D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. NAS Systems Division, NASA Ames Research Center, Moffett Field, CA, 1995.
[5]
J. Banning. An efficient way to find side effects of procedure calls and aliases of variables. In POPL, pages 29--41, 1979.
[6]
D. Binkley. The application of program slicing to regression testing. Information and Software Technology, 40(11--12):583--594, 1998.
[7]
G. Bronevetsky. Communication-sensitive static dataflow for parallel message passing applications. In CGO, 2009.
[8]
H. Chen, W. G. Chen, J. Huang, B. Robert, and H. Kuhn. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In ICS, 2006.
[9]
Y. Chen, S. Byna, X. Sun, R. Thakur, and W. Gropp. Hiding I/O latency with pre-execution prefetching for parallel applications. In SC'08, pages 1--10, 2008.
[10]
S. Chodnekar, V. Srinivasan, A. S. Vaidya, A. Sivasubramaniam, and C. R. Das. Towards a communication characterization methodology for parallel applications. In HPCA, 1997.
[11]
Z. Ding, R. Hoare, A. Jones, D. Li, S. Shao, S. Tung, J. Zheng, and R. Melhem. Switch design to enable predictive multiplexed switching. In IPDPS, page 100.1, 2005.
[12]
J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers, 2003.
[13]
A. Faraj and X. Yuan. Communication characteristics in the NAS parallel benchmarks. In International Conference on Parallel and Distributed Computing Systems, 2002.
[14]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.
[15]
K. B. Gallagher and J. R. Lyle. Using program slicing in software maintenance. IEEE Transactions on Software Engineering, 17(8):751--761, 1991.
[16]
M. Harman and S. Danicic. Using program slicing to simplify testing. Journal of Software Testing, Verification and Reliability, 5:143--162, 1995.
[17]
S. Ho and N. Lin. Static analysis of communication structures in parallel programs. In International Computer Symposium, 2002.
[18]
S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. ACM Trans. Program. Lang. Syst., 12(1):26--60, 1990.
[19]
Intel Ltd. Intel trace analyzer&collector. http://www.intel.com/cd/software/products/asmo-na/eng/244171.htm.
[20]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Supercomputing, pages 37--48, 2001.
[21]
J. Kim and D. J. Lilja. Characterization of communication patterns in message-passing parallel scientific application programs. In CANPC, pages 202--216, 1998.
[22]
J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. DiP: A parallel program development environment. In Euro-Par'96, pages 665--674, 1996.
[23]
LLNL. ASCI purple benchmark. https://asc.llnl.gov/computing_resources/purple/archive/benchmarks.
[24]
B. Mohr and F. Wolf. KOJAK--A tool set for automatic performance analysis of parallel programs. In Euro-Par, 2003.
[25]
S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997.
[26]
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1), Jan. 1996.
[27]
Ohio State University. MVAPICH: MPI over infiniband and iWARP. http://mvapich.cse.ohio-state.edu.
[28]
R. Preissl, T. Köckerbauer, M. Schulz, D. Kranzlmüller, B. R. de Supinski, and D. J. Quinlan. Detecting patterns in MPI communication traces. In ICPP, pages 230--237, 2008.
[29]
R. Preissl, M. Schulz, D. Kranzlmüller, B. R. de Supinski, and D. J. Quinlan. Using MPI communication patterns to guide source code transformations. In ICCS, pages 253--260, 2008.
[30]
M. Schulz. Extracting critical path graphs from MPI applications. In CLUSTER, pages 1--10, 2005.
[31]
SGI. Open64 compiler and tools. http://www.open64.net.
[32]
S. Shao, A. K. Jones, and R. G. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In IPDPS, 2006.
[33]
S. Shende and A. D. Malony. TAU: The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2), 2006.
[34]
A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In SC, pages 1--17, 2002.
[35]
M. M. Strout, B. Kreaseck, and P. Hovland. Data-flow analysis for MPI programs. In ICPP, pages 175--184, 2006.
[36]
Tsinghua University. Proof of live-propagation slicing algorithm. http://www.hpctest.org.cn/paper/Thu-HPC-TR20090717.pdf, 2009.
[37]
J. S. Vetter and M. O. McCracken. Statistical scalability analysis of communication operations in distributed applications. In PPoPP, pages 123--132, 2001.
[38]
J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In IPDPS, pages 853--865, 2002.
[39]
M. Weiser. Programmers use slices when debugging. Communications of the ACM, 25(7):446--52, 1982.
[40]
M. Weiser. Program slicing. IEEE Transactions on Software Engineering, 10(4):352--357, 1984.
[41]
R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. MPIWiz: subgroup reproducible replay of mpi applications. In PPoPP, pages 251--260, 2009.
[42]
R. Zamani and A. Afsahi. Communication characteristics of message-passing scientific and engineering applications. In International Conference on Parallel and Distributed Computing Systems, 2005.
[43]
J. Zhang, J. Zhai, W. Chen, and W. Zheng. Process mapping for mpi collective communications. In Euro-Par, 2009.

Cited By

View all
  • (2024)BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core SystemProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673131(262-272)Online publication date: 12-Aug-2024
  • (2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
  • (2023)Fast Communication Trace CollectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_2(9-41)Online publication date: 19-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
November 2009
778 pages
ISBN:9781605587448
DOI:10.1145/1654059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. communication pattern
  2. communication trace
  3. message passing program
  4. parallel application

Qualifiers

  • Research-article

Conference

SC '09
Sponsor:

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core SystemProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673131(262-272)Online publication date: 12-Aug-2024
  • (2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
  • (2023)Fast Communication Trace CollectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_2(9-41)Online publication date: 19-Jun-2023
  • (2021)PredCom: A Predictive Approach to Collecting Approximated Communication TracesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301112132:1(45-58)Online publication date: 1-Jan-2021
  • (2018)Parallel Application Performance Prediction Using Analysis Based Models and HPC SimulationsProceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3200921.3200937(49-59)Online publication date: 14-May-2018
  • (2018)P3S: A Methodology to Analyze and Predict Application ScalabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.276314829:3(642-658)Online publication date: 1-Mar-2018
  • (2016)Performance Prediction for Large-Scale Parallel Applications Using Representative ReplayIEEE Transactions on Computers10.1109/TC.2015.247963065:7(2184-2198)Online publication date: 1-Jul-2016
  • (2016)Synthetic Signature Program for Performance ScalabilityParallel Processing and Applied Mathematics10.1007/978-3-319-32149-3_33(345-355)Online publication date: 2-Apr-2016
  • (2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationACM SIGMETRICS Performance Evaluation Review10.1145/2796314.274587643:1(309-320)Online publication date: 15-Jun-2015
  • (2015)Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark GenerationProceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/2745844.2745876(309-320)Online publication date: 15-Jun-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media