research-article

Pilgrim: scalable and (near) lossless MPI tracing

Authors:

Marc SnirAuthors Info & Claims

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 52, Pages 1 - 14

https://doi.org/10.1145/3458817.3476151

Published: 13 November 2021 Publication History

Abstract

Traces of MPI communications are used by many performance analysis and visualization tools. Storing exhaustive traces of large scale MPI applications is infeasible, due to their large volume. Aggregated or lossy MPI traces are smaller, but provide much less information. In this paper, we present Pilgrim, a near lossless MPI tracing tool that incurs moderate overheads and generates small trace files at large scales, by using sophisticated compression techniques. Furthermore, for codes with regular communication patterns, Pilgrim can store their traces in constant space regardless of the problem size, the number of processors, and the number of iterations. In comparison with existing tools, Pilgrim preserves more information with less space in all the programs we tested.

Supplementary Material

MP4 File (Pilgrim_ Scalable and (Near) Lossless MPI Tracing.mp4.mp4)

Presentation video

Download
172.66 MB

References

[1]

2016. MILC Code Version 7. http://www.physics.utah.edu/~detar/milc/milc_qcd.html.

[2]

2019. Flash Center for Computational Science. http://flash.uchicago.edu.

[3]

2020. MPI: A Message-Passing Interface Standard Version 4.0 (Draft). https://www.mpi-forum.org/docs/drafts/mpi-2020-draft-report.pdf.

[4]

2020. NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb.html.

[5]

2020. OSU Micro-Benchmarks 5.7. http://mvapich.cse.ohio-state.edu/benchmarks.

[6]

Amir Bahmani and Frank Mueller. 2017. Scalable Communication Event Tracing via Clustering. J. Parallel and Distrib. Comput. 109 (2017), 230--244.

[7]

Amir Bahmani and Frank Mueller. 2018. Chameleon: Online Clustering of MPI Program Traces. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1102--1112.

[8]

Sudheer Chunduri, Scott Parker, Pavan Balaji, Kevin Harms, and Kalyan Kumaran. 2018. Characterization of MPI Usage on a Production Supercomputer. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 386--400.

Digital Library

[9]

Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, and Rob Ross. 2015. Using formal grammars to predict I/O behaviors in HPC: The omnisc'IO approach. IEEE Transactions on Parallel and Distributed Systems 27, 8 (2015), 2435--2449.

Digital Library

[10]

Dominic Eschweiler, Michael Wagner, Markus Geimer, Andreas Knüpfer, Wolfgang E Nagel, and Felix Wolf. 2011. Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries. In PARCO, Vol. 22. 481--490.

[11]

Markus Geimer, Felix Wolf, Brian JN Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 22, 6 (2010), 702--719.

[12]

Tobias Hilbrich, Martin Schulz, Bronis R de Supinski, and Matthias S Müller. 2010. MUST: A Scalable Approach to Runtime Error Detection in MPI Programs. In Tools for high performance computing 2009. Springer, 53--66.

[13]

Sascha Hunold and Alexandra Carpen-Amarie. 2018. Hierarchical Clock Synchronization in MPI. In CLUSTER. 325--336.

[14]

Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, and Laxmikant V Kale. 2016. Evaluating HPC Networks via Simulation of Parallel Workloads. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 154--165.

Digital Library

[15]

Terry Jones and Gregory A Koenig. 2010. A Clock Synchronization Strategy for Minimizing Clock Variance at Runtime in High-End Computing Environments. In 2010 22nd International Symposium on Computer Architecture and High Performance Computing. IEEE, 207--214.

Digital Library

[16]

Andreas Knüpfer, Ronny Brendel, Holger Brunst, Hartmut Mix, and Wolfgang E Nagel. 2006. Introducing the Open Trace Format (OTF). In International Conference on Computational Science. Springer, 526--533.

Digital Library

[17]

Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S Müller, and Wolfgang E Nagel. 2008. The Vampir Performance Analysis Tool-Set. In Tools for high performance computing. Springer, 139--155.

[18]

Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen Malony, et al. 2012. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011. Springer, 79--91.

[19]

Sriram Krishnamoorthy and Khushbu Agarwal. 2010. Scalable Communication Trace Compression. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE, 408--417.

[20]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. Acm sigplan notices 40, 6 (2005), 190--200.

[21]

Mellanox. 2020. Highly Accurate Time Synchronization with ConnectX-3 and TimeKeeper.

[22]

Craig G Nevill-Manning and Ian H Witten. 1997. Identifying Hierarchical Structure in Sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 7 (1997), 67--82.

Digital Library

[23]

Michael Noeth, Prasun Ratn, Frank Mueller, Martin Schulz, and Bronis R De Supinski. 2009. ScalaTrace: Scalable Compression and Replay of Communication Traces for High Performance Computing. J. Parallel and Distrib. Comput. 69, 8 (2009), 696--710.

Digital Library

[24]

Robert Preissl, Martin Schulz, Dieter Kranzlmüller, Bronis R. de Supinski, and Daniel J. Quinlan. 2008. Using MPI Communication Patterns to Guide Source Code Transformations. In Computational Science - ICCS 2008, Marian Bubak, Geert Dick van Albada, Jack Dongarra, and Peter M. A. Sloot (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 253--260.

[25]

Sameer Shende, Allen D Malony, Wyatt Spear, and Karen Schuchardt. 2011. Characterizing I/O Performance Using the TAU Performance System. In PARCO. 647--655.

[26]

David Skinner. 2005. Performance Monitoring of Parallel Scientific Applications. Technical Report. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US).

[27]

Sukhdeep Sodhi and Jaspal Subhlok. 2005. Automatic Construction and Evaluation of Performance Skeletons. In 19th IEEE International Parallel and Distributed Processing Symposium. IEEE, 10--pp.

[28]

Sukhdeep Sodhi, Jaspal Subhlok, and Qiang Xu. 2008. Performance Prediction with Skeletons. Cluster Computing 11, 2 (2008), 151--165.

Digital Library

[29]

Saeed Taheri, Sindhu Devale, Ganesh Gopalakrishnan, and Martin Burtscher. 2017. ParLOT: Efficient Whole-program Call Tracing for HPC Applications. In Programming and Performance Visualization Tools. Springer, 162--184.

[30]

Mustafa M Tikir, Michael A Laurenzano, Laura Carrington, and Allan Snavely. 2009. PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications. In European Conference on Parallel Processing. Springer, 135--148.

[31]

Jeffrey S Vetter and Michael O McCracken. 2001. Statistical Scalability Analysis of Communication Operations in Distributed Applications. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming. 123--132.

Digital Library

[32]

Michael Wagner, Jens Doleschal, and Andreas Knüpfer. 2015. MPI-focused Tracing with OTFX: An MPI-aware In-memory Event Tracing Extension to the Open Trace Format 2. In Proceedings of the 22nd European MPI Users' Group Meeting. ACM, 7.

Digital Library

[33]

Michael Wagner, Andreas Knupfer, and Wolfgang E Nagel. 2012. Enhanced Encoding Techniques for the Open Trace Format 2. Procedia Computer Science 9 (2012), 1979--1987.

[34]

Chen Wang, Jinghan Sun, Marc Snir, Kathryn Mohror, and Elsa Gonsiorowski. 2020. Recorder 2.0: Efficient parallel I/O tracing and analysis. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 1--8.

[35]

Xing Wu and Frank Mueller. 2013. Elastic and Scalable Tracing and Accurate Replay of Non-Deterministic Events. In Proceedings of the 27th international ACM conference on International conference on supercomputing. 59--68.

Digital Library

[36]

Qiang Xu, Jaspal Subhlok, and Nathaniel Hammen. 2010. Efficient Discovery of Loop Nests in Execution Traces. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE, 193--202.

[37]

Jidong Zhai, Jianfei Hu, Xiongchao Tang, Xiaosong Ma, and Wenguang Chen. 2014. Cypress: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 143--153.

Cited By

Hunold SAjanohoun JVardas ITräff JScully-Allison CLiem RVeroneze Solorzano A(2022)An Overhead Analysis of MPI Profiling and Tracing ToolsProceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy10.1145/3526063.3535353(5-13)Online publication date: 30-Jun-2022
https://dl.acm.org/doi/10.1145/3526063.3535353

Index Terms

Pilgrim: scalable and (near) lossless MPI tracing
1. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Data compression
      2. Pattern matching
    2. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

Lossless Trace Compression

The tremendous storage space required for a useful data base of program traces has prompted a search for trace reduction techniques. In this paper, we discuss a range of information-lossless address and instruction trace compression schemes that can ...
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and long execution times. While many tools to study this behavior have been developed, these approaches either aggregate ...
Elastic and scalable tracing and accurate replay of non-deterministic events
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

SCALATRACE represents the state-of-the-art of parallel application tracing for high performance computing (HPC). This paper presents SCALATRACE II, a next generation tracer that delivers even higher trace compression capability, even when events are not ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2021

1493 pages

ISBN:9781450384421

DOI:10.1145/3458817

General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF OAC

Conference

SC '21

Sponsor:

SIGHPC

SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 19, 2021

Missouri, St. Louis

Acceptance Rates

Overall Acceptance Rate 1,447 of 6,132 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
464
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)4

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hunold SAjanohoun JVardas ITräff JScully-Allison CLiem RVeroneze Solorzano A(2022)An Overhead Analysis of MPI Profiling and Tracing ToolsProceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy10.1145/3526063.3535353(5-13)Online publication date: 30-Jun-2022
https://dl.acm.org/doi/10.1145/3526063.3535353

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents