Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624062.3624207acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Enabling Agile Analysis of I/O Performance Data with PyDarshan

Published: 12 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Modern scientific applications utilize numerous software and hardware layers to efficiently access data. This approach poses a challenge for I/O optimization because of the need to instrument and correlate information across those layers. The Darshan characterization tool seeks to address this challenge by providing efficient, transparent, and compact runtime instrumentation of many common I/O interfaces. It also includes command-line tools to generate actionable insights and summary reports. However, the extreme diversity of today’s scientific applications means that not all applications are well served by one-size-fits-all analysis tools.
    In this work we present PyDarshan, a Python-based library that enables agile analysis of I/O performance data. PyDarshan caters to both novice and advanced users by offering ready-to-use HTML reports as well as a rich collection of APIs to facilitate custom analyses. We present the design of PyDarshan and demonstrate its effectiveness in four diverse real-world analysis use cases.

    References

    [1]
    2021. The ATLAS Collaboration Software and Firmware. Technical Report ATL-SOFT-PUB-2021-001. CERN, Geneva. ATL-SOFT-PUB-2021-001 ATL-SOFT-PUB-2021-001
    [2]
    2023. BPF Compiler Collection (BCC). IO Visor Project. https://github.com/iovisor/bcc
    [3]
    2023. cuDF - GPU DataFrames. RAPIDS. https://github.com/rapidsai/cudf
    [4]
    2023. Grafana: The Open and Composable Observability and Data Visualization Platform. Grafana Labs. https://github.com/grafana/grafana
    [5]
    2023. LLNL/Adiak. Lawrence Livermore National Laboratory. https://github.com/LLNL/Adiak
    [6]
    2023. Pola-Rs/Polars. pola-rs. https://github.com/pola-rs/polars
    [7]
    Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685–701.
    [8]
    Hammad Ather, Jean Luca Bez, Boyana Norris, and Suren Byna. 2023. Illuminating The I/O Optimization Path Of Scientific Applications. In High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings. Springer-Verlag, Berlin, Heidelberg, 22–41. https://doi.org/10.1007/978-3-031-32041-5_2
    [9]
    ATLAS Collaboration. 2008. The ATLAS Experiment at the CERN Large Hadron Collider. JINST 3 (2008), S08003. https://doi.org/10.1088/1748-0221/3/08/S08003
    [10]
    Jean Luca Bez, Hammad Ather, and Suren Byna. 2022. Drishti: Guiding End-Users in the I/O Optimization Journey. In 2022 IEEE/ACM International Parallel Data Systems Workshop (PDSW). 1–6. https://doi.org/10.1109/PDSW56643.2022.00006
    [11]
    Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, and Suren Byna. 2021. I/O Bottleneck Detection and Tuning: Connecting the Dots Using Interactive Log Analysis. In 2021 IEEE/ACM Sixth Int. Parallel Data Systems Workshop (PDSW). 15–22. https://doi.org/10.1109/PDSW54622.2021.00008
    [12]
    Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and Karan Vahi. 2008. Characterization of Scientific Workflows. In 2008 Third Workshop on Workflows in Support of Large-Scale Science. IEEE, Austin, TX, USA, 1–10. https://doi.org/10.1109/WORKS.2008.4723958
    [13]
    Abhinav Bhatele, Stephanie Brink, and Todd Gamblin. 2019. Hatchet: Pruning the Overgrowth in Parallel Profiles. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, Denver Colorado, 1–21. https://doi.org/10.1145/3295500.3356219
    [14]
    David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 550–560. https://doi.org/10.1109/SC.2016.46
    [15]
    Stephanie Brink, Michael McKinsey, David Boehme, Connor Scully-Allison, Ian Lumsden, Daryl Hawkins, Treece Burgess, Katherine E Isaacs, Vanessa Lama, Michela Taufer, Jakob Luettgau, and Olga Pearce. 2023. Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees. (2023).
    [16]
    Rene Brun, Fons Rademakers, Philippe Canal, Axel Naumann, Olivier Couet, Lorenzo Moneta, Vassil Vassilev, Sergey Linev, Danilo Piparo, Gerardo GANIS, Bertrand Bellenot, Enrico Guiraud, Guilherme Amadio, wverkerke, Pere Mato, TimurP, Matevž Tadel, wlav, Enric Tejedor, Jakob Blomer, Andrei Gheata, Stephan Hageboeck, Stefan Roiser, marsupial, Stefan Wunsch, Oksana Shadura, Anirudha Bose, CristinaCristescu, Xavier Valls, and Raphael Isemann. 2019. Root-Project/Root: V6.18/02. Zenodo. https://doi.org/10.5281/zenodo.3895860
    [17]
    Paolo Calafiura, Charles Leggett, Rolf Seuster, Vakhtang Tsulaia, Peter Van Gemmeren, and on behalf of the ATLAS Collaboration. 2015. Running ATLAS Workloads within Massively Parallel Distributed Applications Using Athena Multi-Process Framework (AthenaMP). Journal of Physics: Conference Series 664, 7 (Dec. 2015), 072050. https://doi.org/10.1088/1742-6596/664/7/072050
    [18]
    Philip Carns, Kevin Harms, William Allcock, and Charles Bacon. 2011. Understanding and Improving Computational Science Storage Access through Continuous Characterization. ACM Transactions on Storage (2011), 25.
    [19]
    Dask Development Team. 2016. Dask: Library for Dynamic Task Scheduling. https://dask.org
    [20]
    DOE E3SM Project. 2021. Energy Exascale Earth System Model v2.0. [Computer Software] https://doi.org/10.11578/E3SM/dc.20210927.1. https://doi.org/10.11578/E3SM/dc.20210927.1
    [21]
    eBPF authors. 2023. eBPF Website - Dynamically Program the Kernel for Efficient Networking, Observability, Tracing, and Security. https://ebpf.io
    [22]
    Rafael Ferreira da Silva, Rosa Filgueira, Ilia Pietri, Ming Jiang, Rizos Sakellariou, and Ewa Deelman. 2017. A Characterization of Workflow Management Systems for Extreme-Scale Applications. Future Generation Computer Systems 75 (Oct. 2017), 228–238. https://doi.org/10.1016/j.future.2017.02.026
    [23]
    Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357–362. https://doi.org/10.1038/s41586-020-2649-2
    [24]
    S. Hoyer and J. Hamman. 2016. xarray: N-D labeled arrays and datasets in Python. in prep, J. Open Res. Software (2016).
    [25]
    jaegertracing.io. 2019. Jaeger: Open Source, End-to-End Distributed Tracing. https://www.jaegertracing.io/
    [26]
    Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen D. Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, and Felix Wolf. 2011. Score-p: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011 - Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, ZIH, Dresden, September 2011, Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch (Eds.). Springer. https://doi.org/10.1007/978-3-642-31476-6_7
    [27]
    Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked Algorithms and Task Scheduling. In Proceedings of the 14th Python in Science Conference, Kathryn Huff and James Bergstra (Eds.). 130–136.
    [28]
    John Mellor-Crummey. 2003. HPCToolkit: Multi-platform Tools for Profile-Based Performance Analysis. In 5th International Workshop on Automatic Performance Analysis (APART).
    [29]
    OpenTelemetry. 2019. OpenTelemetry: High-quality, Ubiquitous, and Portable Telemetry to Enable Effective Observability. https://opentelemetry.io/
    [30]
    OpenZipkin. 2015. OpenZipkin: A Distributed Tracing System. https://zipkin.io/
    [31]
    The pandas development team. 2020. Pandas-Dev/Pandas: Pandas. Zenodo. https://doi.org/10.5281/zenodo.3509134
    [32]
    Parquet Contributers. 2023. Apache Parquet Documentation. https://parquet.apache.org/docs/
    [33]
    Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (May 2006), 287–311. https://doi.org/10.1177/1094342006064482
    [34]
    Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna. 2022. Transparent Asynchronous Parallel I/O Using Background Threads. IEEE TPDS 33, 4 (2022), 891–902. https://doi.org/10.1109/TPDS.2021.3090322
    [35]
    Rajeev Thakur and Alok Choudhary. 1995. Accessing Sections of Out-of-Core Arrays Using an Extended Two-Phase Method. Technical Report SCCS-685. NPAC, Syracuse University.
    [36]
    Chen Wang, Jinghan Sun, Marc Snir, Kathryn Mohror, and Elsa Gonsiorowski. 2020. Recorder 2.0: Efficient Parallel I/O Tracing and Analysis. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1–8. https://doi.org/10.1109/IPDPSW50202.2020.00176
    [37]
    Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 56–61. https://doi.org/10.25080/Majora-92bf1922-00a
    [38]
    Katy Williams, Alex Bigelow, and Katherine Isaacs. 2019. Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and Concerns. IEEE transactions on visualization and computer graphics PP (Aug. 2019). https://doi.org/10.1109/TVCG.2019.2934285
    [39]
    Cong Xu, Shane Snyder, Omkar Kulkarni, Vishwanath Venkatesan, Philip Carns, Suren Byna, Robert Sisneros, and Kalyana Chadalavada. 2017. DXT: Darshan eXtended Tracing. (2017).
    [40]
    Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren, and John Bell. 2021. AMReX: Block-structured Adaptive Mesh Refinement for Multiphysics Applications. The International Journal of High Performance Computing Applications 35, 6 (2021), 508–526. https://doi.org/10.1177/10943420211022811

    Cited By

    View all
    • (2024)ION: Navigating the HPC I/O Optimization Journey using Large Language ModelsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665950(86-92)Online publication date: 8-Jul-2024

    Index Terms

    1. Enabling Agile Analysis of I/O Performance Data with PyDarshan
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
            November 2023
            2180 pages
            ISBN:9798400707858
            DOI:10.1145/3624062
            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 12 November 2023

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. High-Performance Computing
            2. Input/Output
            3. Performance Analysis
            4. Storage

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            Conference

            SC-W 2023

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)85
            • Downloads (Last 6 weeks)20
            Reflects downloads up to 12 Aug 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)ION: Navigating the HPC I/O Optimization Journey using Large Language ModelsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665950(86-92)Online publication date: 8-Jul-2024

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media