research-article

A Quantitative Analysis of Node Sharing on HPC Clusters Using XDMoD Application Kernels

Authors:

Nikolay A. Simakov,

Robert L. DeLeon,

Joseph P. White,

Thomas R. Furlani,

Steven M. Gallo,

Matthew D. Jones,

Benjamin D. Plessinger,

Jeanette Sperhac,

Jeffrey T. PalmerAuthors Info & Claims

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

Article No.: 32, Pages 1 - 8

https://doi.org/10.1145/2949550.2949553

Published: 17 July 2016 Publication History

Abstract

In this investigation, we study how application performance is affected when jobs are permitted to share compute nodes. A series of application kernels consisting of a diverse set of benchmark calculations were run in both exclusive and node-sharing modes on the Center for Computational Research's high-performance computing (HPC) cluster. Very little increase in runtime was observed due to job contention among application kernel jobs run on shared nodes. The small differences in runtime were quantitatively modeled in order to characterize the resource contention and attempt to determine the circumstances under which it would or would not be important. A machine learning regression model applied to the runtime data successfully fitted the small differences between the exclusive and shared node runtime data; it also provided insight into the contention for node resources that occurs when jobs are allowed to share nodes. Analysis of a representative job mix shows that runtime of shared jobs is affected primarily by the memory subsystem, in particular by the reduction in the effective cache size due to sharing; this leads to higher utilization of DRAM. Insights such as these are crucial when formulating policies proposing node sharing as a mechanism for improving HPC utilization.

References

[1]

Iancu C, Hofmeyr S, Blagojevic F (2010) Oversubscription on multicore processors. 2010 IEEE Int. Symp. Parallel Distrib. Process. IEEE, pp 1--11

[2]

Breslow AD, Tiwari A, Schulz M, Carrington L, Tang L, Mars J (2013) Enabling fair pricing on HPC systems with node sharing. Proc. Int. Conf. High Perform. Comput. Networking, Storage Anal. - SC '13. ACM Press, New York, New York, USA, pp 1--12

Digital Library

[3]

Koop MJ, Luo M, Panda DK (2009) Reducing network contention with mixed workloads on modern multicore, clusters. 2009 IEEE Int. Conf. Clust. Comput. Work. IEEE, pp 1--10

[4]

Breslow AD, Porter L, Tiwari A, Laurenzano M, Carrington L, Tullsen DM, Snavely AE (2016) The case for colocation of high performance computing workloads. Concurr Comput Pract Exp 28:232--251.

Digital Library

[5]

STREAM Benchmark Results on Intel Xeon and Xeon Phi | Karl Rupp. https://www.karlrupp.net/2015/02/stream-benchmark-results-on-intel-xeon-and-xeon-phi/. Accessed 22 Apr 2016

[6]

White JP, Barth WL, Hammond J, DeLeon RL, Furlani TR, Gallo SM, Jones MD, Ghadersohi A, Cornelius CD, Patra AK, Browne JC (2014) An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats. Proc. 2014 Annu. Conf. Extrem. Sci. Eng. Discov. Environ. - XSEDE '14. ACM Press, New York, New York, USA, pp 1--8

Digital Library

[7]

Simakov NA, White JP, DeLeon RL, Ghadersohi A, Furlani TR, Jones MD, Gallo SM, Patra AK (2015) Application kernels: HPC resources performance monitoring and variance analysis. Concurr Comput Pract Exp 27:5238--5260.

Digital Library

[8]

Valiev M, Bylaska EJ, Govind N, Kowalski K, Straatsma TP, Van Dam HJJ, Wang D, Nieplocha J, Apra E, Windus TL, de Jong WA (2010) NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations. Comput Phys Commun 181:1477--1489.

[9]

Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JH, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA (1993) General atomic and molecular electronic structure system. J Comput Chem 14:1347--1363.

Digital Library

[10]

Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kalé L, Schulten K (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26:1781--802.

[11]

Norman ML, Bryan GL, Harkness R, Bordner J, Reynolds D, O'Shea B, Wagner R (2007) Simulating Cosmological Evolution with Enzo. eprint arXiv:0705.1556

[12]

Graph 500. http://www.graph500.org/. Accessed 11 Aug 2014

[13]

Luszczek PR, Bailey DH, Dongarra JJ, Kepner J, Lucas RF, Rabenseifner R, Takahashi D (2006) The HPC Challenge (HPCC) benchmark suite. Proc. 2006 ACM/IEEE Conf. Supercomput. p 213

Digital Library

[14]

IOR HPC Benchmark | Free System Administration software downloads at SourceForge.net. http://sourceforge.net/projects/ior-sio/. Accessed 11 Aug 2014

[15]

Performance Co-Pilot, System Performance and Analysis Framework. http://pcp.io/. Accessed 4 Apr 2016

[16]

Palmer JT, Gallo SM, Furlani TR, Jones MD, DeLeon RL, White JP, Simakov N, Patra AK, Sperhac J, Yearke T, Rathsam R, Innus M, Cornelius CD, Browne JC, Barth WL, Evans RT (2015) Open XDMoD: A Tool for the Comprehensive Management of High-Performance Computing Resources. Comput Sci Eng 17:52--62.

[17]

Liaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2:18--22.

[18]

Gallo SM, White JP, DeLeon RL, Furlani TR, Ngo H, Patra AK, Jones MD, Palmer JT, Simakov N, Sperhac JM, Innus M, Yearke T, Rathsam R (2015) Analysis of XDMoD/SUPReMM Data Using Machine Learning Techniques. 2015 IEEE Int. Conf. Clust. Comput. IEEE, pp 642--649

Digital Library

[19]

Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2011) Cache Pirating: Measuring the Curse of the Shared Cache. 2011 Int. Conf. Parallel Process. IEEE, pp 165--175

Digital Library

[20]

Eklov D, Nikoleris N, Black-Schaffer D, Hagersten E (2012) Bandwidth bandit: Understanding memory contention. 2012 IEEE Int. Symp. Perform. Anal. Syst. Softw. IEEE, pp 116--117

Digital Library

Cited By

Copik MChrapek MSchmid LCalotoiu AHoefler T(2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00021
Simakov NDeleon RWhite JJones MFurlani TSiegmann EHarrison R(2023)Are we ready for broader adoption of ARM in the HPC community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM SystemsProceedings of the HPC Asia 2023 Workshops10.1145/3581576.3581618(78-86)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3581576.3581618
Roy RPatel TTiwari DMartínez JDuato JJohn L(2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00031
Show More Cited By

Index Terms

A Quantitative Analysis of Node Sharing on HPC Clusters Using XDMoD Application Kernels

Recommendations

Using XDMoD to facilitate XSEDE operations, planning and analysis
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery

The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for ...
An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

When a user requests less than a full node for a job on XSEDE's large resources - Stampede and Lonestar4 -, that is less than 16 cores on Stampede or 12 cores on Lonestar4, they are assigned a full node by policy. Although the actual CPU hours consumed ...
Comprehensive, open-source resource usage measurement and analysis for HPC systems

The important role high-performance computing HPC resources play in science and engineering research, coupled with its high cost capital, power and manpower, short life and oversubscription, requires us to optimize its usage - an outcome that is only ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

July 2016

405 pages

ISBN:9781450347556

DOI:10.1145/2949550

General Chair:
Kelly Gaither
Texas Advanced Computing Center

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGAPP: ACM Special Interest Group on Applied Computing
Xsede: Xsede
San Diego Supercomputer Center: San Diego Supercomputer Center
NICS: National Institute for Computational Sciences
University of Illinois: The University of Illinois at Urbana-Champaign

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

XSEDE16

XSEDE16: Diversity, Big Data, and Science at Scale

July 17 - 21, 2016

Miami, USA

Acceptance Rates

Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
129
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Copik MChrapek MSchmid LCalotoiu AHoefler T(2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00021
Simakov NDeleon RWhite JJones MFurlani TSiegmann EHarrison R(2023)Are we ready for broader adoption of ARM in the HPC community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM SystemsProceedings of the HPC Asia 2023 Workshops10.1145/3581576.3581618(78-86)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3581576.3581618
Roy RPatel TTiwari DMartínez JDuato JJohn L(2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00031
Tang XWang HMa XEl-Sayed NZhai JChen WAboulnaga ATaufer MBalaji PPeña A(2019)Spread-n-shareProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356152(1-15)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356152
Zou PFeng XGe R(2019)Contention Aware Workload and Resource Co-Scheduling on Power-Bounded Systems2019 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS.2019.8834721(1-8)Online publication date: Aug-2019
https://doi.org/10.1109/NAS.2019.8834721
Goglin BRubio Proaño A(2019)Opportunities for Partitioning Non-volatile Memory DIMMs Between Co-scheduled Jobs on HPC NodesEuro-Par 2019: Parallel Processing Workshops10.1007/978-3-030-48340-1_7(82-94)Online publication date: 26-Aug-2019
https://dl.acm.org/doi/10.1007/978-3-030-48340-1_7
Simakov NDeLeon RInnus MJones MWhite JGallo SPatra AFurlani TSanielevici S(2018)Slurm SimulatorProceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity10.1145/3219104.3219111(1-8)Online publication date: 22-Jul-2018
https://dl.acm.org/doi/10.1145/3219104.3219111
Xiong QAtes EHerbordt MCoskun A(2018)Tangram: Colocating HPC Applications with Oversubscription2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547644(1-7)Online publication date: Sep-2018
https://doi.org/10.1109/HPEC.2018.8547644
Brown KMatsuoka S(2017)Co-locating Graph Analytics and HPC Applications2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.111(659-660)Online publication date: Sep-2017
https://doi.org/10.1109/CLUSTER.2017.111
Simakov NInnus MJones MDeLeon RWhite JGallo SPatra AFurlani T(2017)A Slurm Simulator: Implementation and Parametric AnalysisHigh Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation10.1007/978-3-319-72971-8_10(197-217)Online publication date: 23-Dec-2017
https://doi.org/10.1007/978-3-319-72971-8_10

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents