research-article

Enabling Agile Analysis of I/O Performance Data with PyDarshan

Authors:

Jakob Luettgau,

Nikolaus Awtrey,

Philip CarnsAuthors Info & Claims

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1380 - 1391

https://doi.org/10.1145/3624062.3624207

Published: 12 November 2023 Publication History

Abstract

Modern scientific applications utilize numerous software and hardware layers to efficiently access data. This approach poses a challenge for I/O optimization because of the need to instrument and correlate information across those layers. The Darshan characterization tool seeks to address this challenge by providing efficient, transparent, and compact runtime instrumentation of many common I/O interfaces. It also includes command-line tools to generate actionable insights and summary reports. However, the extreme diversity of today’s scientific applications means that not all applications are well served by one-size-fits-all analysis tools.

In this work we present PyDarshan, a Python-based library that enables agile analysis of I/O performance data. PyDarshan caters to both novice and advanced users by offering ready-to-use HTML reports as well as a rich collection of APIs to facilitate custom analyses. We present the design of PyDarshan and demonstrate its effectiveness in four diverse real-world analysis use cases.

References

[1]

2021. The ATLAS Collaboration Software and Firmware. Technical Report ATL-SOFT-PUB-2021-001. CERN, Geneva. ATL-SOFT-PUB-2021-001 ATL-SOFT-PUB-2021-001

[2]

2023. BPF Compiler Collection (BCC). IO Visor Project. https://github.com/iovisor/bcc

[3]

2023. cuDF - GPU DataFrames. RAPIDS. https://github.com/rapidsai/cudf

[4]

2023. Grafana: The Open and Composable Observability and Data Visualization Platform. Grafana Labs. https://github.com/grafana/grafana

[5]

2023. LLNL/Adiak. Lawrence Livermore National Laboratory. https://github.com/LLNL/Adiak

[6]

2023. Pola-Rs/Polars. pola-rs. https://github.com/pola-rs/polars

[7]

Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685–701.

[8]

Hammad Ather, Jean Luca Bez, Boyana Norris, and Suren Byna. 2023. Illuminating The I/O Optimization Path Of Scientific Applications. In High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings. Springer-Verlag, Berlin, Heidelberg, 22–41. https://doi.org/10.1007/978-3-031-32041-5_2

Digital Library

[9]

ATLAS Collaboration. 2008. The ATLAS Experiment at the CERN Large Hadron Collider. JINST 3 (2008), S08003. https://doi.org/10.1088/1748-0221/3/08/S08003

[10]

Jean Luca Bez, Hammad Ather, and Suren Byna. 2022. Drishti: Guiding End-Users in the I/O Optimization Journey. In 2022 IEEE/ACM International Parallel Data Systems Workshop (PDSW). 1–6. https://doi.org/10.1109/PDSW56643.2022.00006

[11]

Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, and Suren Byna. 2021. I/O Bottleneck Detection and Tuning: Connecting the Dots Using Interactive Log Analysis. In 2021 IEEE/ACM Sixth Int. Parallel Data Systems Workshop (PDSW). 15–22. https://doi.org/10.1109/PDSW54622.2021.00008

[12]

Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and Karan Vahi. 2008. Characterization of Scientific Workflows. In 2008 Third Workshop on Workflows in Support of Large-Scale Science. IEEE, Austin, TX, USA, 1–10. https://doi.org/10.1109/WORKS.2008.4723958

[13]

Abhinav Bhatele, Stephanie Brink, and Todd Gamblin. 2019. Hatchet: Pruning the Overgrowth in Parallel Profiles. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, Denver Colorado, 1–21. https://doi.org/10.1145/3295500.3356219

Digital Library

[14]

David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 550–560. https://doi.org/10.1109/SC.2016.46

[15]

Stephanie Brink, Michael McKinsey, David Boehme, Connor Scully-Allison, Ian Lumsden, Daryl Hawkins, Treece Burgess, Katherine E Isaacs, Vanessa Lama, Michela Taufer, Jakob Luettgau, and Olga Pearce. 2023. Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees. (2023).

[16]

Rene Brun, Fons Rademakers, Philippe Canal, Axel Naumann, Olivier Couet, Lorenzo Moneta, Vassil Vassilev, Sergey Linev, Danilo Piparo, Gerardo GANIS, Bertrand Bellenot, Enrico Guiraud, Guilherme Amadio, wverkerke, Pere Mato, TimurP, Matevž Tadel, wlav, Enric Tejedor, Jakob Blomer, Andrei Gheata, Stephan Hageboeck, Stefan Roiser, marsupial, Stefan Wunsch, Oksana Shadura, Anirudha Bose, CristinaCristescu, Xavier Valls, and Raphael Isemann. 2019. Root-Project/Root: V6.18/02. Zenodo. https://doi.org/10.5281/zenodo.3895860

[17]

Paolo Calafiura, Charles Leggett, Rolf Seuster, Vakhtang Tsulaia, Peter Van Gemmeren, and on behalf of the ATLAS Collaboration. 2015. Running ATLAS Workloads within Massively Parallel Distributed Applications Using Athena Multi-Process Framework (AthenaMP). Journal of Physics: Conference Series 664, 7 (Dec. 2015), 072050. https://doi.org/10.1088/1742-6596/664/7/072050

[18]

Philip Carns, Kevin Harms, William Allcock, and Charles Bacon. 2011. Understanding and Improving Computational Science Storage Access through Continuous Characterization. ACM Transactions on Storage (2011), 25.

[19]

Dask Development Team. 2016. Dask: Library for Dynamic Task Scheduling. https://dask.org

[20]

DOE E3SM Project. 2021. Energy Exascale Earth System Model v2.0. [Computer Software] https://doi.org/10.11578/E3SM/dc.20210927.1. https://doi.org/10.11578/E3SM/dc.20210927.1

[21]

eBPF authors. 2023. eBPF Website - Dynamically Program the Kernel for Efficient Networking, Observability, Tracing, and Security. https://ebpf.io

[22]

Rafael Ferreira da Silva, Rosa Filgueira, Ilia Pietri, Ming Jiang, Rizos Sakellariou, and Ewa Deelman. 2017. A Characterization of Workflow Management Systems for Extreme-Scale Applications. Future Generation Computer Systems 75 (Oct. 2017), 228–238. https://doi.org/10.1016/j.future.2017.02.026

[23]

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357–362. https://doi.org/10.1038/s41586-020-2649-2

[24]

S. Hoyer and J. Hamman. 2016. xarray: N-D labeled arrays and datasets in Python. in prep, J. Open Res. Software (2016).

[25]

jaegertracing.io. 2019. Jaeger: Open Source, End-to-End Distributed Tracing. https://www.jaegertracing.io/

[26]

Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen D. Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, and Felix Wolf. 2011. Score-p: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011 - Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, ZIH, Dresden, September 2011, Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch (Eds.). Springer. https://doi.org/10.1007/978-3-642-31476-6_7

[27]

Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked Algorithms and Task Scheduling. In Proceedings of the 14th Python in Science Conference, Kathryn Huff and James Bergstra (Eds.). 130–136.

[28]

John Mellor-Crummey. 2003. HPCToolkit: Multi-platform Tools for Profile-Based Performance Analysis. In 5th International Workshop on Automatic Performance Analysis (APART).

[29]

OpenTelemetry. 2019. OpenTelemetry: High-quality, Ubiquitous, and Portable Telemetry to Enable Effective Observability. https://opentelemetry.io/

[30]

OpenZipkin. 2015. OpenZipkin: A Distributed Tracing System. https://zipkin.io/

[31]

The pandas development team. 2020. Pandas-Dev/Pandas: Pandas. Zenodo. https://doi.org/10.5281/zenodo.3509134

[32]

Parquet Contributers. 2023. Apache Parquet Documentation. https://parquet.apache.org/docs/

[33]

Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (May 2006), 287–311. https://doi.org/10.1177/1094342006064482

Digital Library

[34]

Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna. 2022. Transparent Asynchronous Parallel I/O Using Background Threads. IEEE TPDS 33, 4 (2022), 891–902. https://doi.org/10.1109/TPDS.2021.3090322

[35]

Rajeev Thakur and Alok Choudhary. 1995. Accessing Sections of Out-of-Core Arrays Using an Extended Two-Phase Method. Technical Report SCCS-685. NPAC, Syracuse University.

[36]

Chen Wang, Jinghan Sun, Marc Snir, Kathryn Mohror, and Elsa Gonsiorowski. 2020. Recorder 2.0: Efficient Parallel I/O Tracing and Analysis. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1–8. https://doi.org/10.1109/IPDPSW50202.2020.00176

[37]

Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 56–61. https://doi.org/10.25080/Majora-92bf1922-00a

[38]

Katy Williams, Alex Bigelow, and Katherine Isaacs. 2019. Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and Concerns. IEEE transactions on visualization and computer graphics PP (Aug. 2019). https://doi.org/10.1109/TVCG.2019.2934285

[39]

Cong Xu, Shane Snyder, Omkar Kulkarni, Vishwanath Venkatesan, Philip Carns, Suren Byna, Robert Sisneros, and Kalyana Chadalavada. 2017. DXT: Darshan eXtended Tracing. (2017).

[40]

Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren, and John Bell. 2021. AMReX: Block-structured Adaptive Mesh Refinement for Multiphysics Applications. The International Journal of High Performance Computing Applications 35, 6 (2021), 508–526. https://doi.org/10.1177/10943420211022811

Digital Library

Cited By

Egersdoerfer CSareen ABez JByna SDai D(2024)ION: Navigating the HPC I/O Optimization Journey using Large Language ModelsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665950(86-92)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665950

Index Terms

Enabling Agile Analysis of I/O Performance Data with PyDarshan
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering

Index terms have been assigned to the content through auto-classification.

Recommendations

Agile Java(TM): Crafting Code with Test-Driven Development (Robert C. Martin Series)
Support and optimization of Java RMI over bluetooth environments
JGI '02: Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande

In this paper, we investigate the issues to support Java RMI over Bluetooth environments. Our supports include several technical items. First, we develop a set of protocol stack layers written in Java for Bluetooth support, called JavaBT. In JavaBT, the ...
Characterization of Java Applications at Bytecode and Ultra-SPARC Machine Code Levels
ICCD '99: Proceedings of the 1999 IEEE International Conference on Computer Design

This paper identifies some of the most important execution characteristics of a recent suite of Java benchmarks (SPEC JVM98) from a bytecode perspective and while running in an interpreted environment on the Sun Ultra SPARC-II. We instrumented the Java ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 2023

2180 pages

ISBN:9798400707858

DOI:10.1145/3624062

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

U.S. Department of Energy
U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

Conference

SC-W 2023

SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 12 - 17, 2023

CO, Denver, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
85
Total Downloads

Downloads (Last 12 months)85
Downloads (Last 6 weeks)20

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Egersdoerfer CSareen ABez JByna SDai D(2024)ION: Navigating the HPC I/O Optimization Journey using Large Language ModelsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665950(86-92)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665950

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents