Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3458817.3476151acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Pilgrim: scalable and (near) lossless MPI tracing

Published: 13 November 2021 Publication History

Abstract

Traces of MPI communications are used by many performance analysis and visualization tools. Storing exhaustive traces of large scale MPI applications is infeasible, due to their large volume. Aggregated or lossy MPI traces are smaller, but provide much less information. In this paper, we present Pilgrim, a near lossless MPI tracing tool that incurs moderate overheads and generates small trace files at large scales, by using sophisticated compression techniques. Furthermore, for codes with regular communication patterns, Pilgrim can store their traces in constant space regardless of the problem size, the number of processors, and the number of iterations. In comparison with existing tools, Pilgrim preserves more information with less space in all the programs we tested.

Supplementary Material

MP4 File (Pilgrim_ Scalable and (Near) Lossless MPI Tracing.mp4.mp4)
Presentation video

References

[1]
2016. MILC Code Version 7. http://www.physics.utah.edu/~detar/milc/milc_qcd.html.
[2]
2019. Flash Center for Computational Science. http://flash.uchicago.edu.
[3]
2020. MPI: A Message-Passing Interface Standard Version 4.0 (Draft). https://www.mpi-forum.org/docs/drafts/mpi-2020-draft-report.pdf.
[4]
2020. NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb.html.
[5]
2020. OSU Micro-Benchmarks 5.7. http://mvapich.cse.ohio-state.edu/benchmarks.
[6]
Amir Bahmani and Frank Mueller. 2017. Scalable Communication Event Tracing via Clustering. J. Parallel and Distrib. Comput. 109 (2017), 230--244.
[7]
Amir Bahmani and Frank Mueller. 2018. Chameleon: Online Clustering of MPI Program Traces. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1102--1112.
[8]
Sudheer Chunduri, Scott Parker, Pavan Balaji, Kevin Harms, and Kalyan Kumaran. 2018. Characterization of MPI Usage on a Production Supercomputer. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 386--400.
[9]
Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, and Rob Ross. 2015. Using formal grammars to predict I/O behaviors in HPC: The omnisc'IO approach. IEEE Transactions on Parallel and Distributed Systems 27, 8 (2015), 2435--2449.
[10]
Dominic Eschweiler, Michael Wagner, Markus Geimer, Andreas Knüpfer, Wolfgang E Nagel, and Felix Wolf. 2011. Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries. In PARCO, Vol. 22. 481--490.
[11]
Markus Geimer, Felix Wolf, Brian JN Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 22, 6 (2010), 702--719.
[12]
Tobias Hilbrich, Martin Schulz, Bronis R de Supinski, and Matthias S Müller. 2010. MUST: A Scalable Approach to Runtime Error Detection in MPI Programs. In Tools for high performance computing 2009. Springer, 53--66.
[13]
Sascha Hunold and Alexandra Carpen-Amarie. 2018. Hierarchical Clock Synchronization in MPI. In CLUSTER. 325--336.
[14]
Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, and Laxmikant V Kale. 2016. Evaluating HPC Networks via Simulation of Parallel Workloads. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 154--165.
[15]
Terry Jones and Gregory A Koenig. 2010. A Clock Synchronization Strategy for Minimizing Clock Variance at Runtime in High-End Computing Environments. In 2010 22nd International Symposium on Computer Architecture and High Performance Computing. IEEE, 207--214.
[16]
Andreas Knüpfer, Ronny Brendel, Holger Brunst, Hartmut Mix, and Wolfgang E Nagel. 2006. Introducing the Open Trace Format (OTF). In International Conference on Computational Science. Springer, 526--533.
[17]
Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S Müller, and Wolfgang E Nagel. 2008. The Vampir Performance Analysis Tool-Set. In Tools for high performance computing. Springer, 139--155.
[18]
Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen Malony, et al. 2012. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011. Springer, 79--91.
[19]
Sriram Krishnamoorthy and Khushbu Agarwal. 2010. Scalable Communication Trace Compression. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE, 408--417.
[20]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. Acm sigplan notices 40, 6 (2005), 190--200.
[21]
Mellanox. 2020. Highly Accurate Time Synchronization with ConnectX-3 and TimeKeeper.
[22]
Craig G Nevill-Manning and Ian H Witten. 1997. Identifying Hierarchical Structure in Sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 7 (1997), 67--82.
[23]
Michael Noeth, Prasun Ratn, Frank Mueller, Martin Schulz, and Bronis R De Supinski. 2009. ScalaTrace: Scalable Compression and Replay of Communication Traces for High Performance Computing. J. Parallel and Distrib. Comput. 69, 8 (2009), 696--710.
[24]
Robert Preissl, Martin Schulz, Dieter Kranzlmüller, Bronis R. de Supinski, and Daniel J. Quinlan. 2008. Using MPI Communication Patterns to Guide Source Code Transformations. In Computational Science - ICCS 2008, Marian Bubak, Geert Dick van Albada, Jack Dongarra, and Peter M. A. Sloot (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 253--260.
[25]
Sameer Shende, Allen D Malony, Wyatt Spear, and Karen Schuchardt. 2011. Characterizing I/O Performance Using the TAU Performance System. In PARCO. 647--655.
[26]
David Skinner. 2005. Performance Monitoring of Parallel Scientific Applications. Technical Report. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US).
[27]
Sukhdeep Sodhi and Jaspal Subhlok. 2005. Automatic Construction and Evaluation of Performance Skeletons. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE, 10--pp.
[28]
Sukhdeep Sodhi, Jaspal Subhlok, and Qiang Xu. 2008. Performance Prediction with Skeletons. Cluster Computing 11, 2 (2008), 151--165.
[29]
Saeed Taheri, Sindhu Devale, Ganesh Gopalakrishnan, and Martin Burtscher. 2017. ParLOT: Efficient Whole-program Call Tracing for HPC Applications. In Programming and Performance Visualization Tools. Springer, 162--184.
[30]
Mustafa M Tikir, Michael A Laurenzano, Laura Carrington, and Allan Snavely. 2009. PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications. In European Conference on Parallel Processing. Springer, 135--148.
[31]
Jeffrey S Vetter and Michael O McCracken. 2001. Statistical Scalability Analysis of Communication Operations in Distributed Applications. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming. 123--132.
[32]
Michael Wagner, Jens Doleschal, and Andreas Knüpfer. 2015. MPI-focused Tracing with OTFX: An MPI-aware In-memory Event Tracing Extension to the Open Trace Format 2. In Proceedings of the 22nd European MPI Users' Group Meeting. ACM, 7.
[33]
Michael Wagner, Andreas Knupfer, and Wolfgang E Nagel. 2012. Enhanced Encoding Techniques for the Open Trace Format 2. Procedia Computer Science 9 (2012), 1979--1987.
[34]
Chen Wang, Jinghan Sun, Marc Snir, Kathryn Mohror, and Elsa Gonsiorowski. 2020. Recorder 2.0: Efficient parallel I/O tracing and analysis. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 1--8.
[35]
Xing Wu and Frank Mueller. 2013. Elastic and Scalable Tracing and Accurate Replay of Non-Deterministic Events. In Proceedings of the 27th international ACM conference on International conference on supercomputing. 59--68.
[36]
Qiang Xu, Jaspal Subhlok, and Nathaniel Hammen. 2010. Efficient Discovery of Loop Nests in Execution Traces. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE, 193--202.
[37]
Jidong Zhai, Jianfei Hu, Xiongchao Tang, Xiaosong Ma, and Wenguang Chen. 2014. Cypress: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 143--153.

Cited By

View all
  • (2022)An Overhead Analysis of MPI Profiling and Tracing ToolsProceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy10.1145/3526063.3535353(5-13)Online publication date: 30-Jun-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. communication tracing
  2. lossless mpi tracing
  3. trace compression

Qualifiers

  • Research-article

Funding Sources

  • NSF OAC

Conference

SC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,447 of 6,132 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)4
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)An Overhead Analysis of MPI Profiling and Tracing ToolsProceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy10.1145/3526063.3535353(5-13)Online publication date: 30-Jun-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media