Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624062.3624141acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

REMORA Resource Monitor: Usability, Performance and User Interface Improvements

Published: 12 November 2023 Publication History

Abstract

As modern HPC systems grow in complexity, assessing the performance of an application in a user’s job may require collecting metrics about various system components, such as processors, memory, file systems, and the network. Although monitoring tools exist for system administrators and managers to collect such information, end users often have limited access to the system monitor of choice, or the user may be overloaded with interface access and performance metric selection. For novices and non-HPC experts, Remora (REsource Monitoring for Remote Applications) provides simple tools that allow quick diagnostic assessments of their jobs, and flexible tools that are adaptable to various types of workflow.
This paper presents the latest improvements made to the Remora resource monitoring tool. It is a user-oriented, lightweight job-monitoring tool that offers a high-level profile of system resource utilization, providing valuable insights into the efficiency of a user’s application through timeline and statistical visualizations, and intuitive reports. Additionally, two new tools are introduced: RemoraPy, a Python wrapper; and RP-Stats, a JupyterLab-based GUI of RemoraPy. These tools are being developed to broaden Remora’s capabilities in data collection, visualization, and analysis. Also, the introduction of the Remora Python API will benefit the ever-growing Python community by enabling users to include resource monitoring directly within Python workflows.

References

[1]
Omar Aaziz, Ujjwal Panthi, and Jonathan Cook. 2017. YAViT (Yet Another Viz Tool): Raising the Level of Abstraction in End-User HPC Interactions. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 814–817. https://doi.org/10.1109/CLUSTER.2017.81
[2]
Mark Abraham, Andrey Alekseenko, Cathrine Bergh, Christian Blau, Eliane Briand, Mahesh Doijade, Stefan Fleischmann, Vytautas Gapsys, Gaurav Garg, Sergey Gorelov, Gilles Gouaillardet, Alan Gray, M. Eric Irrgang, Farzaneh Jalalypour, Joe Jordan, Christoph Junghans, Prashanth Kanduri, Sebastian Keller, Carsten Kutzner, Justin A. Lemkul, Magnus Lundborg, Pascal Merz, Vedran Miletić, Dmitry Morozov, Szilárd Páll, Roland Schulz, Michael Shirts, Alexey Shvetsov, Bálint Soproni, David van der Spoel, Philip Turner, Carsten Uphoff, Alessandra Villa, Sebastian Wingbermühle, Artem Zhmurov, Paul Bauer, Berk Hess, and Erik Lindahl. 2023. GROMACS 2023.2 Manual. https://doi.org/10.5281/zenodo.8134388
[3]
Anthony Agelastos, Benjamin Allan, Jim Brandt, Paul Cassella, Jeremy Enos, Joshi Fullop, Ann Gentile, Steve Monk, Nichamon Naksinehaboon, Jeff Ogden, Mahesh Rajan, Michael Showerman, Joel Stevenson, Narate Taerat, and Tom Tucker. 2014. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications. In SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 154–165. https://doi.org/10.1109/SC.2014.18
[4]
Bokeh Development Team. 2023. Bokeh: Python library for interactive visualization. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-2.html [Accessed: (July 18, 2023)].
[5]
J.M. Brandt, A.C. Gentile, D.J. Hale, and P.P. Pebay. 2006. OVIS: a tool for intelligent, real-time monitoring of computational clusters. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. 8 pages. https://doi.org/10.1109/IPDPS.2006.1639698
[6]
Texas Advanced Computing Center. 2021. TACC Analysis Portal user guide. https://docs.tacc.utexas.edu/tutorials/TAP [Accessed: (July 18, 2023)].
[7]
Dask development team. 2019. dask-labextension. https://github.com/dask/dask-labextension [Accessed: (July 18, 2023)].
[8]
Gromacs development team. 2015. Gromacs water GMX50 benchmark. https://ftp.gromacs.org/pub/benchmarks/water_GMX50_bare.tar.gz [Accessed: (July 18, 2023)].
[9]
Jan Eitzinger, Thomas Gruber, Ayesha Afzal, Thomas Zeiser, and Gerhard Wellein. 2019. ClusterCockpit — A web application for job-specific performance monitoring. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). 1–7. https://doi.org/10.1109/CLUSTER.2019.8891017
[10]
Todd Evans, William L. Barth, James C. Browne, Robert L. DeLeon, Thomas R. Furlani, Steven M. Gallo, Matthew D. Jones, and Abani K. Patra. 2014. Comprehensive Resource Use Monitoring for HPC Systems with TACC Stats. In 2014 First International Workshop on HPC User Support Tools. 13–21. https://doi.org/10.1109/HUST.2014.7
[11]
Antonio Gómez-Iglesias, Carlos Rosales, and Todd Evans. 2016. Practical Monitoring of Resource Utilization for HPC Applications. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (Miami, USA) (XSEDE16). Association for Computing Machinery, New York, NY, USA, Article 49, 8 pages. https://doi.org/10.1145/2949550.2949643
[12]
Susan L. Graham, Peter B. Kessler, and Marshall K. Mckusick. 1982. Gprof: A Call Graph Execution Profiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction (Boston, Massachusetts, USA) (SIGPLAN ’82). Association for Computing Machinery, New York, NY, USA, 120–126. https://doi.org/10.1145/800230.806987
[13]
Carla Guillen, Wolfram Hesse, and Matthias Brehm. 2014. The PerSyst Monitoring Tool. In Euro-Par 2014: Parallel Processing Workshops, Luís Lopes, Julius Žilinskas, Alexandru Costan, Roberto G. Cascella, Gabor Kecskemeti, Emmanuel Jeannot, Mario Cannataro, Laura Ricci, Siegfried Benkner, Salvador Petit, Vittorio Scarano, José Gracia, Sascha Hunold, Stephen L. Scott, Stefan Lankes, Christian Lengauer, Jesús Carretero, Jens Breitbart, and Michael Alexander (Eds.). Springer International Publishing, Cham, 363–374.
[14]
Intel. 2023. Intel VTune Profiler User Guide. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-2.html [Accessed: (July 18, 2023)].
[15]
Alessio Netti, Micha Müller, Axel Auweter, Carla Guillen, Michael Ott, Daniele Tafani, and Martin Schulz. 2019. From Facility to Application Sensor Data: Modular, Continuous and Holistic Monitoring with DCDB. In SC ’19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’19). Association for Computing Machinery, New York, NY, USA, Article 64, 27 pages. https://doi.org/10.1145/3295500.3356191
[16]
Jeffrey T. Palmer, Steven M. Gallo, Thomas R. Furlani, Matthew D. Jones, Robert L. DeLeon, Joseph P. White, Nikolay Simakov, Abani K. Patra, Jeanette Sperhac, Thomas Yearke, Ryan Rathsam, Martins Innus, Cynthia D. Cornelius, James C. Browne, William L. Barth, and Richard T. Evans. 2015. Open XDMoD: A Tool for the Comprehensive Management of High-Performance Computing Resources. Computing in Science & Engineering 17, 4 (2015), 52–62. https://doi.org/10.1109/MCSE.2015.68
[17]
Carlos Rosales, Antonio Gómez-Iglesias, and Andrew Predoehl. 2015. Remora: A Resource Monitoring Tool for Everyone. In Proceedings of the Second International Workshop on HPC User Support Tools (Austin, Texas) (HUST ’15). Association for Computing Machinery, New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/2834996.2834999
[18]
C Rosales and DS Whyte. 2010. Dual grid lattice Boltzmann method for multiphase flows. International journal for numerical methods in engineering 84, 9 (2010), 1068–1084.
[19]
Thomas Röhl, Jan Eitzinger, Georg Hager, and Gerhard Wellein. 2017. LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 781–784. https://doi.org/10.1109/CLUSTER.2017.115
[20]
Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. The International Journal of High Performance Computing Applications 20, 2 (2006), 287–311.
[21]
Luka Stanisic and Klaus Reuter. 2020. MPCDF HPC Performance Monitoring System: Enabling Insight via Job-Specific Analysis. In Euro-Par 2019: Parallel Processing Workshops, Ulrich Schwardmann, Christian Boehme, Dora B. Heras, Valeria Cardellini, Emmanuel Jeannot, Antonio Salis, Claudio Schifanella, Ravi Reddy Manumachu, Dieter Schwamborn, Laura Ricci, Oh Sangyoon, Thomas Gruber, Laura Antonelli, and Stephen L. Scott (Eds.). Springer International Publishing, Cham, 613–625.
[22]
RAPIDS Development Team. 2019. JupyterLab NVDashboard. https://github.com/rapidsai/jupyterlab-nvdashboard [Accessed: (July 18, 2023)].
[23]
Ying Zhu. 2012. Introducing google chart tools and google maps api in data visualization courses. IEEE computer graphics and applications 32, 6 (2012), 6–9.

Index Terms

  1. REMORA Resource Monitor: Usability, Performance and User Interface Improvements

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Check for updates

Author Tags

  1. HPC tools
  2. resource utilization monitoring

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSF

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 161
    Total Downloads
  • Downloads (Last 12 months)161
  • Downloads (Last 6 weeks)26
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media