Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2063348.2063352acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program

Published: 12 November 2011 Publication History

Abstract

The U. S. Department of Defense High Performance Computing Modernization Program (HPCMP) has implemented sustained systems performance testing on high performance computing systems in use at DoD Supercomputing Resource Centers. The intent is to monitor performance improvements by updates to the operating system, compiler suites, and numerical and communications libraries, and to monitor penalties arising from security patches. In practice, each system's workload is simulated by appropriate choices of user application codes representative of the HPCMP computational technical areas. Past successes include surfacing an imminent failure of an OST in a Cray XT3, incomplete configuration of a scheduler update on an SGI Altix 4700, performance issues associated with a communications library update for a Linux Networx Advanced Technology Cluster, and intermittent resetting of Intel Nehalem cores to standard mode from turbo mode. This history demonstrates that SSP testing is critical to deliver the highest quality of service to the HPCMP users.

References

[1]
Bennett, P., Cable, S., Alter, R., Mahmoodi, M., and Oppe, T. 2006. Targeting CCM-, CEA-, and CSM-based computing to specific architectures based upon HPCMP systems assessment. In Proceedings of the HPCMP Users Group Conference 2006 (Denver, CO, June 26-29, 2006). UGC '06. IEEE Computer Society, Los Alamitos, CA, 360--366.
[2]
Blackford, L., Cleary, A., Choi, J., D'Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. 1997. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA.
[3]
Bleck, R. 2002. An oceanic general circulation model framed in hybrid isopycnic-Cartesian coordinates. Ocean Model., 4, 1 (Jan., 2002), 55--88.
[4]
Cable, S., Oppe, T., Ward, W., Jr., Campbell, R., Jr., Gordnier, R., Burnley, V., Grismer, M., and Buning, P. 2005. CFD-based HPCMP systems assessment using AERO, AVUS, and OVERFLOW-2. In Proceedings of the HPCMP Users Group Conference 2005 (Nashville, TN, June 27-30, 2005). UGC '05. IEEE Computer Society, Los Alamitos, CA, 349--355.
[5]
Cliburn, J. 2005. ERDC MSRC installs most powerful supercomputer in DoD, ERDC MSRC Major Shared Resource Center RESOURCE (Fall 2005), 10.
[6]
Donagarra, J., and Luszczek, P. 2005. Introduction to the HPCChallenge Benchmark Suite. Technical Report ICL-UT-05-01. University of Tennessee, Knoxville.
[7]
Fatoohi, R. 2008. Performance evaluation of NSF application benchmarks on parallel systems. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing (Miami, FL, Apr. 14--18, 2008), IPDPS '08. IEEE Computer Society, Los Alamitos, CA, 1--8.
[8]
Gordon, M. and Schmidt, M. 2005. Advances in electronic structure theory: GAMESS a decade later. In Theory and Applications of Computational Chemistry, the first forty years, eds. Dykstra, C., Frenking, G., Kim, K., and Scuseria, G. Elsevier, Amsterdam, The Netherlands.
[9]
Hertel E., Jr., Bell, R., Elrick, M., Farnsworth, A., Kerley, G., McGiaun, J., Pemey, S., Silling, S., Taylor, P., and Yarrington, L. 1992. CTH: A software family for multi-dimensional shock physics analysis. Technical Report SAND-92-2089C. Sandia National Laboratories, Albuquerque, New Mexico.
[10]
Hertel, E., Jr., et al. 1993. CTH: A Software Family for Multi-Dimensional Shock Physics Analysis. In Proceedings of the 19 th International Symposium on Shock Waves (Marseilles, France, July 26-30, 1993). Springer-Verlag, Berlin, Germany. Volume 1, 377--382.
[11]
Karypis, G. and Kumar, V. 1998. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 1 (Jan. 10, 1998), 96--129.
[12]
Karypis, G. and Kumar, V. 1999. Parallel multilevel k-way partitioning scheme for irregular graphs. SIAM Rev. 41, 2 (1999), 278--300.
[13]
Karypis, G., Schloegel, K., and Kumar, V. 1997. PARMETIS: Parallel graph partitioning scheme and matrix ordering library. Technical report, University of Minnesota, Department of Computer Science and Engineering.
[14]
Kramer, W., Shalf, J., and Strohmaier, E. 2005. The NERSC Sustained System Performance (SSP) Metric. Paper LBNL-58868. Lawrence Berkeley National Laboratory.
[15]
Leach, C., Oppe, T., Ward, W., Jr., and Campbell, R., Jr. 2005. CWO-based HPCMP systems assessment using HYCOM and WRF. In Proceedings of the HPCMP Users Group Conference 2005 (Nashville, TN, June 27-30, 2005), UGC '05. IEEE Computer Society, Los Alamitos, CA, 356--359.
[16]
Schloegel, K., Karypis, G., and Kumar, V. 2000. A unified algorithm for load-balancing adaptive scientific simulations. In Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM) (Dallas, TX, Nov. 4-10, 2000), Supercomputing '00. IEEE Computer Society, Washington, D. C., Article 59.
[17]
Schmidt, M., Baldridge, K., Boatz, J., Elbert, S., Gordon, M., Jensen, J., Koseki, S., Matsunaga, N., Nguyen, K., Su, S., Windus, T., Dupuis, M., and Montgomery, J. 1993. General Atomic and Molecular Electronic Structure System. J. Comput. Chem., 14, 11 (Nov. 1993), 1347--1363.
[18]
Tomaro, R., Strang, W., and Sankar, L. 1997. An implicit algorithm for solving time-dependent flows on unstructured grids. Paper. At 35th Aerospace Sciences Meeting and Exhibit (Reno, NV, Jan. 6-10, 1997), AIAA, Reston, VA, AIAA 97--0333.
[19]
Tracy, F., Oppe, T., Ward, W., Jr., and Peterkin, R., Jr. 2003. A survey of the algorithms in the TI-03 application benchmarking suite with emphasis on linear system solvers. In Proceedings of the 2003 Users Group Conference (Bellevue, WA, June 9-13, 2003), UGC '03. IEEE Computer Society, Los Alamitos, CA, 332--336.
[20]
Tracy, F. 2005. Role of algorithms in understanding performance of the TI-05 benchmark suite. In Proceedings of the HPCMP Users Group Conference 2005 (Nashville, TN, June 27-30, 2005), UGC '05. IEEE Computer Society, Los Alamitos, CA, 420--426.

Cited By

View all
  • (2016)Fabrication and characterization of disposable wireless electronic endoscopeComputer Assisted Surgery10.1080/24699322.2016.124029921:sup1(124-131)Online publication date: 25-Oct-2016
  • (2014)Comprehensive, open-source resource usage measurement and analysis for HPC systemsConcurrency and Computation: Practice & Experience10.5555/2787436.278744726:13(2191-2209)Online publication date: 10-Sep-2014
  • (2014)Comprehensive, open‐source resource usage measurement and analysis for HPC systemsConcurrency and Computation: Practice and Experience10.1002/cpe.324526:13(2191-2209)Online publication date: 6-Mar-2014
  • Show More Cited By

Index Terms

  1. Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '11: State of the Practice Reports
    November 2011
    242 pages
    ISBN:9781450311397
    DOI:10.1145/2063348
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. sustained systems performance

    Qualifiers

    • Research-article

    Conference

    SC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Fabrication and characterization of disposable wireless electronic endoscopeComputer Assisted Surgery10.1080/24699322.2016.124029921:sup1(124-131)Online publication date: 25-Oct-2016
    • (2014)Comprehensive, open-source resource usage measurement and analysis for HPC systemsConcurrency and Computation: Practice & Experience10.5555/2787436.278744726:13(2191-2209)Online publication date: 10-Sep-2014
    • (2014)Comprehensive, open‐source resource usage measurement and analysis for HPC systemsConcurrency and Computation: Practice and Experience10.1002/cpe.324526:13(2191-2209)Online publication date: 6-Mar-2014
    • (2013)Using XDMoD to facilitate XSEDE operations, planning and analysisProceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery10.1145/2484762.2484763(1-8)Online publication date: 22-Jul-2013
    • (2012)Performance metrics and auditing framework using application kernels for high‐performance computer systemsConcurrency and Computation: Practice and Experience10.1002/cpe.287125:7(918-931)Online publication date: 14-Jun-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media