Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2016741.2016759acmotherconferencesArticle/Chapter ViewAbstractPublication PagestgConference Proceedingsconference-collections
research-article

Performance metrics and auditing framework for high performance computer systems

Published: 18 July 2011 Publication History

Abstract

This paper describes a comprehensive auditing framework, XDMoD, for use by high performance computing centers to readily provide metrics regarding resource utilization (CPU hours, job size, wait time, etc), resource performance, and the center's impact in terms of scholarship and research. This role-based auditing framework is designed to meet the following objectives: (1) provide the user community with an easy to use tool to oversee their allocations and optimize their use of resources, (2) provide staff with easy access to performance metrics and diagnostics to monitor and tune resource performance for the benefit of the users, (3) provide senior management with a tool to easily monitor utilization, user base, and performance of resources, and (4) help ensure that the resources are effectively enabling research and scholarship. XDMoD is initially focused on the NSF TeraGrid (TG) and follow-on XSEDE (XD) program, where it will become a key component of the TG/XSEDE User Portal. However, this auditing system is intended to have a general applicability to any HPC system or center.
The XDMoD auditing system is architected using a set of modular components that facilitate the utilization of community contributed components information. It includes an active and reactive (as opposed to passive) service set accessible through a variety of endpoints such as web-based user interface, RESTful web services, and provided development tools. One component also provides a computationally lightweight and flexible application kernel auditing system that reflects best-in-class performance kernels to measure overall system performance with respect to existing applications that are actually being run by users. This allows continuous resource auditing to monitor all aspects of system performance, most critically from a completely user-centric point of view.

Index Terms

  1. Performance metrics and auditing framework for high performance computer systems

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
          July 2011
          256 pages
          ISBN:9781450308885
          DOI:10.1145/2016741

          Sponsors

          • University of Illinois: University of Illinois

          In-Cooperation

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 18 July 2011

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article

          Conference

          TG'11
          Sponsor:
          • University of Illinois
          TG'11: TeraGrid 2011
          July 18 - 21, 2011
          Utah, Salt Lake City

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 01 Sep 2024

          Other Metrics

          Citations

          View Options

          Get Access

          Login options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media