research-article

Public Access

Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters

Authors:

Stephen Herbein,

Thomas R.W. Scogland,

Becky Springmeyer,

Michela TauferAuthors Info & Claims

HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

Pages 69 - 80

https://doi.org/10.1145/2907294.2907316

Published: 31 May 2016 Publication History

Abstract

The economics of flash vs. disk storage is driving HPC centers to incorporate faster solid-state burst buffers into the storage hierarchy in exchange for smaller parallel file system (PFS) bandwidth. In systems with an underprovisioned PFS, avoiding I/O contention at the PFS level will become crucial to achieving high computational efficiency. In this paper, we propose novel batch job scheduling techniques that reduce such contention by integrating I/O awareness into scheduling policies such as EASY backfilling. We model the available bandwidth of links between each level of the storage hierarchy (i.e., burst buffers, I/O network, and PFS), and our I/O-aware schedulers use this model to avoid contention at any level in the hierarchy. We integrate our approach into Flux, a next-generation resource and job management framework, and evaluate the effectiveness and computational costs of our I/O-aware scheduling. Our results show that by reducing I/O contention for underprovisioned PFSes, our solution reduces job performance variability by up to 33% and decreases I/O-related utilization losses by up to 21%, which ultimately increases the amount of science performed by scientific workloads.

References

[1]

CORAL. Retrieved Jan 3, 2015 from https://asc.llnl.gov/CORAL/.

[2]

The Moab workload manager. http://www.adaptivecomputing.com/resources/docs/mwm/help.htm#topics/0-intro/productOverview.htm. Retrieved Jan 3, 2015.

[3]

Trinity and NERSC-8 Computing Platforms Draft Technical Requirements. Retrieved Jan 3, 2015 from http://www.lanl.gov/business/vendors/assets/docs/Trinity-NERSC-8-DRAFT-technical-requirements. pdf.

[4]

D. H. Ahn, J. Garlick, M. Grondona, D. Lipari, B. Springmeyer, and M. Schulz. Flux: A Next-Generation Resource Management Framework for Large HPC Centers. In Proc. of the 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed System (SRMPDS), Sep. 2014.

Digital Library

[5]

J. Bent, S. Faibish, J. Ahrens, G. Grider, J. Patchett, P. Tzelnic, and J. Woodring. Jitter-free Co-Processing on a Prototype Exascale Storage Stack. In Proc. of 2012 IEEE 28th Symposium on Mass Storage Systems, April 2012.

[6]

M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage System. In Proc. of the 2000 Mass Storage Conference, March 2000.

[7]

A. Chandio, C.-Z. Xu, N. Tziritas, K. Bilal, and S. Khan. A Comparative Study of Job Scheduling Strategies in Large-Scale Parallel Computational Systems. In Proc. of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), July 2013.

Digital Library

[8]

T. Dinh, L. Andrew, and P. Branch. Exploiting per User Information for Supercomputing Workload Prediction Requires Care. In Proc. of the 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2013.

Digital Library

[9]

M. Dorier, G. Antoniu, R. Ross, D. Kimpe, and S. Ibrahim. CALCioM: Mitigating I/O Interference in HPC Systems Through Cross-Application Coordination. In Proc. of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS), May 2014.

Digital Library

[10]

A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert, and M. Snir. Scheduling the I/O of HPC Applications Under Congestion. In Proc. of the 2015 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015.

Digital Library

[11]

L. F. Goes, P. Guerra, B. Coutinho, L. Rocha, W. Meira, R. Ferreira, D. Guedes, and W. Cirne. AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs. In Job Scheduling Strategies for Parallel Processing, volume 3834 of Lecture Notes in Computer Science, pages 108--122. Springer Berlin Heidelberg, 2005.

Digital Library

[12]

G. Grider. Speed Matching and What Economics Will Allow. In HEC FSIO Research and Development Workshop, August 2010.

[13]

S. Herbein, D. H. Ahn, and M. Taufer. Poster: Exploring the Trade-Off Space of Hierarchical Scheduling for Very Large HPC Centers. In Proc, of the 27th ACM/IEEE International Conference for High Performance Computing and Communications Conference (SC), November2015.

[14]

IBM. General Parallel File System. http://www-03.ibm.com/software/products/en/software. Retrieved 10 Nov, 2015.

[15]

IBM. Network-aware scheduling. https://www-01.ibm.com/support/knowledgecenter/#!/SSETD49.1.2/lsfadmin/penetworkawaresched.dita. Retrieved 12 Nov, 2015.

[16]

B. B. Khoo, B. Veeravalli, T. Hung, and C. S. See. A Multi-dimensional Scheduling Scheme in a Grid Computing Environment. Journal of Parallel and Distributed Computing,67(6):659--673,2007.

Digital Library

[17]

D. Klusÿček and š. Tóth. On Interactions among Scheduling Policies: Finding Efficient Queue Setup Using High-Resolution Simulations. In Proc. of the 2014 20th International Euro-Par Conference on Parallel Processing, August 2014.

[18]

Lawrence Livermore National Laboratory. Advanced Simulation and Computing Sequoia. https://asc.llnl.gov/computingresources/sequoia. Retrieved 16 May, 2015.

[19]

Lawrence Livermore National Laboratory. Linux @ Livermore. https://computing.llnl.gov/linux/projects. html. Retrieved 10 Nov, 2015.

[20]

Lawrence Livermore National Laboratory. Tri-Laboratory Commodity Technology System 1. https://asc.llnl.gov/computers/cts1-rfp. Retrieved 11 Nov, 2015.

[21]

Lawrence Livermore National Laboratory. Trinity / NERSC-8 Use Case Scenarios. https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/trinity-NERSC8-use-case-v1.2a.pdf. Retrieved 10 Nov, 2015.

[22]

D. A. Lifka. The ANL/IBM SP Scheduling System. In Proc. of the Workshop on Job Scheduling Strategies for Parallel Processing (IPPS), May 1995.

Digital Library

[23]

N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the Role of Burst Buffers in Leadership-class Storage Systems. In Proc. of the 2012 IEEE Conference on Massive Data Storage (MDS), Apr. 2012.

[24]

J. Lofstead, I. Jimenez, and C. Maltzahn. Consistency and Fault Tolerance Considerations for the Next Iteration of the DOE Fast Forward Storage and IO Project. In Proc. of the 6th Workshop on Interfaces and Architectures for Scientific Data Storage (IASDS), Sep. 2014.

Digital Library

[25]

Lustre. Lustre. http://lustre.org. Retrieved 10 Nov, 2015.

[26]

R. Mckenna, T. Gamblin, A. Moody, and M. Taufer. Poster: Forecasting Storms in Parallel File Systems. In Proc, of the 27th ACM/IEEE International Conference for High Performance Computing and Communications Conference (SC), November2015.

[27]

NERSC. Burst buffer architecture and software roadmap. http://www.nersc.gov/users/computational-systems/cori/burst-buffer/burst-buffer/. Retrieved 01 Apr, 2016.

[28]

Oak Ridge National Laboratory. Introducing Titan - Advancing the Area of Accelerated Computing. https://www.olcf.ornl.gov/titan/. Retrieved 16 May, 2015.

[29]

S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf. A Batch System with Fair Scheduling for Evolving Applications. In Proc. of the 43rd International Conference on Parallel Processing(ICPP), pages 351--360, Sep. 2014.

Digital Library

[30]

S. Prabhakaran, M. Neumann, S. Rinke, F. Wolf, A. Gupta, and L. V. Kalé. A Batch System with E Adaptive Scheduling for Malleable and Evolving Applications. In Proc. of 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015.

Digital Library

[31]

K. Sato, K. Mohror, A. Moody, T. Gamblin, B. de Supinski, N. Maruyama, and S. Matsuoka. A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers. In Proc. of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2014.

Digital Library

[32]

F. Schmuck and R. Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. In Proc. of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Jan 2002.

Digital Library

[33]

G. M. Shipman, D. A. Dillow, S. Oral, and F. Wang. The Spider Center Wide File System: From Concept to Reality. In Conference, 2009.

[34]

S. Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, and A. Moody. IO-Cop: Managing Concurrent Accesses to Shared Parallel File System. In Proc. of the 6th Workshop on Interfaces and Architecture for Scientific Data Storage (IASDS), Sep. 2014.

Digital Library

[35]

S. Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, and A. Moody. IO-Cop: Managing Concurrent Accesses to Shared Parallel File System. In Proc. of 2014 43rd International Conference on Parallel Processing Workshops (ICCPW), Sept 2014.

Digital Library

[36]

Z. Zhou, X. Yang, D. Zhao, P. Rich, W. Tang, J. Wang, and Z. Lan. I/O-Aware Batch Scheduling for Petascale Computing Systems. In Proc. of 2015 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2015.

Digital Library

Cited By

Yazdani APaul AKarimi AWang FButt A(2025)User-based I/O Profiling for Leadership Scale HPC WorkloadsProceedings of the 26th International Conference on Distributed Computing and Networking10.1145/3700838.3700865(181-190)Online publication date: 4-Jan-2025
https://dl.acm.org/doi/10.1145/3700838.3700865
Dai YWang RDong YLu K(2024)Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC SystemsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00031(1-15)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00031
Li JMichelogiannakis GMaloney SCook BSuarez EShalf JChen Y(2024)Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00033(297-309)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00033
Show More Cited By

Index Terms

Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters
1. Networks
  1. Network services
    1. Cloud computing

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Scientific workflows are increasingly common in the workloads of current High Performance Computing (HPC) systems. However, HPC schedulers do not incorporate workflow-specific mechanisms beyond the capacity to declare dependencies between their jobs. ...
Plan-Based Job Scheduling for Supercomputers with Shared Burst Buffers
Euro-Par 2021: Parallel Processing
Abstract
The ever-increasing gap between compute and I/O performance in HPC platforms, together with the development of novel NVMe storage devices (NVRAM), led to the emergence of the burst buffer concept—an intermediate persistent storage layer logically ...
Contention-Aware Resource Scheduling for Burst Buffer Systems
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Many scientific applications in critical areas are becoming more and more data-intensive. As the data volume continues to grow, the data movement between storage and compute nodes has turned into a crucial performance bottleneck for many data-intensive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

May 2016

302 pages

ISBN:9781450343145

DOI:10.1145/2907294

General Chair:
Hiroshi Nakashima
Kyoto University, Japan
,
Program Chairs:
Kenjiro Taura
The University of Tokyo, Japan
,
Jack Lange
University of Pittsburgh, USA

Copyright © 2016 ACM.

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

University of Arizona: University of Arizona
SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
LLNL

Conference

HPDC'16

Sponsor:

University of Arizona
SIGARCH

HPDC'16: The 25th International Symposium on High-Performance Parallel and Distributed Computing

May 31 - June 4, 2016

Kyoto, Japan

Acceptance Rates

HPDC '16 Paper Acceptance Rate 20 of 129 submissions, 16%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
1,139
Total Downloads

Downloads (Last 12 months)253
Downloads (Last 6 weeks)35

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yazdani APaul AKarimi AWang FButt A(2025)User-based I/O Profiling for Leadership Scale HPC WorkloadsProceedings of the 26th International Conference on Distributed Computing and Networking10.1145/3700838.3700865(181-190)Online publication date: 4-Jan-2025
https://dl.acm.org/doi/10.1145/3700838.3700865
Dai YWang RDong YLu K(2024)Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC SystemsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00031(1-15)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00031
Li JMichelogiannakis GMaloney SCook BSuarez EShalf JChen Y(2024)Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00033(297-309)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00033
Guo LTang MLee HFiroz JTallent N(2024)Improving I/O-aware Workflow Scheduling via Data Flow Characterization and trade-off Analysis2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825855(3674-3681)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825855
Ilias LTsapelas GKapsalis PMichalakopoulos VKormpakis GMouzakitis SAskounis D(2023)Leveraging extreme scale analytics, AI and digital twins for maritime digitalization: the VesselAI architectureFrontiers in Big Data10.3389/fdata.2023.12203486Online publication date: 27-Jul-2023
https://doi.org/10.3389/fdata.2023.1220348
Jeannot EPallez GVidal N(2023)IO-aware Job-Scheduling: Exploiting the Impacts of Workload Characterizations to select the Mapping StrategyThe International Journal of High Performance Computing Applications10.1177/1094342023117585437:3-4(213-228)Online publication date: 15-May-2023
https://doi.org/10.1177/10943420231175854
Karrels EHuang LKan YArora IWang YKatz DGropp WZhang ZMohror KArnold DBadia R(2023)Fine-grained Policy-driven I/O Sharing for Burst BuffersProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607041(1-12)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607041
Saeedizade ETaheri RArslan E(2023)I/O Burst Prediction for HPC Clusters Using Darshan Logs2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254871(1-10)Online publication date: 9-Oct-2023
https://doi.org/10.1109/e-Science58273.2023.10254871
Boito FPallez GTeylo LVidal N(2023)IO-Sets: Simple and Efficient Approaches for I/O Bandwidth ManagementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330502834:10(2783-2796)Online publication date: Oct-2023
https://doi.org/10.1109/TPDS.2023.3305028
Wang JLiang XWhitney BChen JGong QHe XWan LKlasky SPodhorszki NLiu Q(2023)Improving Progressive Retrieval for HPC Scientific Data using Deep Neural Network2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00209(2727-2739)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00209
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten