Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2907294.2907316acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters

Published: 31 May 2016 Publication History

Abstract

The economics of flash vs. disk storage is driving HPC centers to incorporate faster solid-state burst buffers into the storage hierarchy in exchange for smaller parallel file system (PFS) bandwidth. In systems with an underprovisioned PFS, avoiding I/O contention at the PFS level will become crucial to achieving high computational efficiency. In this paper, we propose novel batch job scheduling techniques that reduce such contention by integrating I/O awareness into scheduling policies such as EASY backfilling. We model the available bandwidth of links between each level of the storage hierarchy (i.e., burst buffers, I/O network, and PFS), and our I/O-aware schedulers use this model to avoid contention at any level in the hierarchy. We integrate our approach into Flux, a next-generation resource and job management framework, and evaluate the effectiveness and computational costs of our I/O-aware scheduling. Our results show that by reducing I/O contention for underprovisioned PFSes, our solution reduces job performance variability by up to 33% and decreases I/O-related utilization losses by up to 21%, which ultimately increases the amount of science performed by scientific workloads.

References

[1]
CORAL. Retrieved Jan 3, 2015 from https://asc.llnl.gov/CORAL/.
[2]
The Moab workload manager. http://www.adaptivecomputing.com/resources/docs/mwm/help.htm#topics/0-intro/productOverview.htm. Retrieved Jan 3, 2015.
[3]
Trinity and NERSC-8 Computing Platforms Draft Technical Requirements. Retrieved Jan 3, 2015 from http://www.lanl.gov/business/vendors/assets/docs/Trinity-NERSC-8-DRAFT-technical-requirements. pdf.
[4]
D. H. Ahn, J. Garlick, M. Grondona, D. Lipari, B. Springmeyer, and M. Schulz. Flux: A Next-Generation Resource Management Framework for Large HPC Centers. In Proc. of the 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed System (SRMPDS), Sep. 2014.
[5]
J. Bent, S. Faibish, J. Ahrens, G. Grider, J. Patchett, P. Tzelnic, and J. Woodring. Jitter-free Co-Processing on a Prototype Exascale Storage Stack. In Proc. of 2012 IEEE 28th Symposium on Mass Storage Systems, April 2012.
[6]
M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage System. In Proc. of the 2000 Mass Storage Conference, March 2000.
[7]
A. Chandio, C.-Z. Xu, N. Tziritas, K. Bilal, and S. Khan. A Comparative Study of Job Scheduling Strategies in Large-Scale Parallel Computational Systems. In Proc. of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), July 2013.
[8]
T. Dinh, L. Andrew, and P. Branch. Exploiting per User Information for Supercomputing Workload Prediction Requires Care. In Proc. of the 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2013.
[9]
M. Dorier, G. Antoniu, R. Ross, D. Kimpe, and S. Ibrahim. CALCioM: Mitigating I/O Interference in HPC Systems Through Cross-Application Coordination. In Proc. of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS), May 2014.
[10]
A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert, and M. Snir. Scheduling the I/O of HPC Applications Under Congestion. In Proc. of the 2015 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015.
[11]
L. F. Goes, P. Guerra, B. Coutinho, L. Rocha, W. Meira, R. Ferreira, D. Guedes, and W. Cirne. AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs. In Job Scheduling Strategies for Parallel Processing, volume 3834 of Lecture Notes in Computer Science, pages 108--122. Springer Berlin Heidelberg, 2005.
[12]
G. Grider. Speed Matching and What Economics Will Allow. In HEC FSIO Research and Development Workshop, August 2010.
[13]
S. Herbein, D. H. Ahn, and M. Taufer. Poster: Exploring the Trade-Off Space of Hierarchical Scheduling for Very Large HPC Centers. In Proc, of the 27th ACM/IEEE International Conference for High Performance Computing and Communications Conference (SC), November2015.
[14]
IBM. General Parallel File System. http://www-03.ibm.com/software/products/en/software. Retrieved 10 Nov, 2015.
[15]
IBM. Network-aware scheduling. https://www-01.ibm.com/support/knowledgecenter/#!/SSETD49.1.2/lsfadmin/penetworkawaresched.dita. Retrieved 12 Nov, 2015.
[16]
B. B. Khoo, B. Veeravalli, T. Hung, and C. S. See. A Multi-dimensional Scheduling Scheme in a Grid Computing Environment. Journal of Parallel and Distributed Computing,67(6):659--673,2007.
[17]
D. Klusÿček and š. Tóth. On Interactions among Scheduling Policies: Finding Efficient Queue Setup Using High-Resolution Simulations. In Proc. of the 2014 20th International Euro-Par Conference on Parallel Processing, August 2014.
[18]
Lawrence Livermore National Laboratory. Advanced Simulation and Computing Sequoia. https://asc.llnl.gov/computingresources/sequoia. Retrieved 16 May, 2015.
[19]
Lawrence Livermore National Laboratory. Linux @ Livermore. https://computing.llnl.gov/linux/projects. html. Retrieved 10 Nov, 2015.
[20]
Lawrence Livermore National Laboratory. Tri-Laboratory Commodity Technology System 1. https://asc.llnl.gov/computers/cts1-rfp. Retrieved 11 Nov, 2015.
[21]
Lawrence Livermore National Laboratory. Trinity / NERSC-8 Use Case Scenarios. https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/trinity-NERSC8-use-case-v1.2a.pdf. Retrieved 10 Nov, 2015.
[22]
D. A. Lifka. The ANL/IBM SP Scheduling System. In Proc. of the Workshop on Job Scheduling Strategies for Parallel Processing (IPPS), May 1995.
[23]
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the Role of Burst Buffers in Leadership-class Storage Systems. In Proc. of the 2012 IEEE Conference on Massive Data Storage (MDS), Apr. 2012.
[24]
J. Lofstead, I. Jimenez, and C. Maltzahn. Consistency and Fault Tolerance Considerations for the Next Iteration of the DOE Fast Forward Storage and IO Project. In Proc. of the 6th Workshop on Interfaces and Architectures for Scientific Data Storage (IASDS), Sep. 2014.
[25]
Lustre. Lustre. http://lustre.org. Retrieved 10 Nov, 2015.
[26]
R. Mckenna, T. Gamblin, A. Moody, and M. Taufer. Poster: Forecasting Storms in Parallel File Systems. In Proc, of the 27th ACM/IEEE International Conference for High Performance Computing and Communications Conference (SC), November2015.
[27]
NERSC. Burst buffer architecture and software roadmap. http://www.nersc.gov/users/computational-systems/cori/burst-buffer/burst-buffer/. Retrieved 01 Apr, 2016.
[28]
Oak Ridge National Laboratory. Introducing Titan - Advancing the Area of Accelerated Computing. https://www.olcf.ornl.gov/titan/. Retrieved 16 May, 2015.
[29]
S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf. A Batch System with Fair Scheduling for Evolving Applications. In Proc. of the 43rd International Conference on Parallel Processing(ICPP), pages 351--360, Sep. 2014.
[30]
S. Prabhakaran, M. Neumann, S. Rinke, F. Wolf, A. Gupta, and L. V. Kalé. A Batch System with E Adaptive Scheduling for Malleable and Evolving Applications. In Proc. of 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015.
[31]
K. Sato, K. Mohror, A. Moody, T. Gamblin, B. de Supinski, N. Maruyama, and S. Matsuoka. A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers. In Proc. of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2014.
[32]
F. Schmuck and R. Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. In Proc. of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Jan 2002.
[33]
G. M. Shipman, D. A. Dillow, S. Oral, and F. Wang. The Spider Center Wide File System: From Concept to Reality. In Conference, 2009.
[34]
S. Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, and A. Moody. IO-Cop: Managing Concurrent Accesses to Shared Parallel File System. In Proc. of the 6th Workshop on Interfaces and Architecture for Scientific Data Storage (IASDS), Sep. 2014.
[35]
S. Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, and A. Moody. IO-Cop: Managing Concurrent Accesses to Shared Parallel File System. In Proc. of 2014 43rd International Conference on Parallel Processing Workshops (ICCPW), Sept 2014.
[36]
Z. Zhou, X. Yang, D. Zhao, P. Rich, W. Tang, J. Wang, and Z. Lan. I/O-Aware Batch Scheduling for Petascale Computing Systems. In Proc. of 2015 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2015.

Cited By

View all
  • (2025)User-based I/O Profiling for Leadership Scale HPC WorkloadsProceedings of the 26th International Conference on Distributed Computing and Networking10.1145/3700838.3700865(181-190)Online publication date: 4-Jan-2025
  • (2024)Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC SystemsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00031(1-15)Online publication date: 17-Nov-2024
  • (2024)Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00033(297-309)Online publication date: 24-Sep-2024
  • Show More Cited By

Index Terms

  1. Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing
    May 2016
    302 pages
    ISBN:9781450343145
    DOI:10.1145/2907294
    © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. high performance computing
    2. nonvolatile memory
    3. resource management
    4. scheduling algorithms

    Qualifiers

    • Research-article

    Funding Sources

    • NSF
    • LLNL

    Conference

    HPDC'16
    Sponsor:

    Acceptance Rates

    HPDC '16 Paper Acceptance Rate 20 of 129 submissions, 16%;
    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)253
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)User-based I/O Profiling for Leadership Scale HPC WorkloadsProceedings of the 26th International Conference on Distributed Computing and Networking10.1145/3700838.3700865(181-190)Online publication date: 4-Jan-2025
    • (2024)Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC SystemsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00031(1-15)Online publication date: 17-Nov-2024
    • (2024)Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00033(297-309)Online publication date: 24-Sep-2024
    • (2024)Improving I/O-aware Workflow Scheduling via Data Flow Characterization and trade-off Analysis2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825855(3674-3681)Online publication date: 15-Dec-2024
    • (2023)Leveraging extreme scale analytics, AI and digital twins for maritime digitalization: the VesselAI architectureFrontiers in Big Data10.3389/fdata.2023.12203486Online publication date: 27-Jul-2023
    • (2023)IO-aware Job-Scheduling: Exploiting the Impacts of Workload Characterizations to select the Mapping StrategyThe International Journal of High Performance Computing Applications10.1177/1094342023117585437:3-4(213-228)Online publication date: 15-May-2023
    • (2023)Fine-grained Policy-driven I/O Sharing for Burst BuffersProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607041(1-12)Online publication date: 12-Nov-2023
    • (2023)I/O Burst Prediction for HPC Clusters Using Darshan Logs2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254871(1-10)Online publication date: 9-Oct-2023
    • (2023)IO-Sets: Simple and Efficient Approaches for I/O Bandwidth ManagementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.330502834:10(2783-2796)Online publication date: Oct-2023
    • (2023)Improving Progressive Retrieval for HPC Scientific Data using Deep Neural Network2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00209(2727-2739)Online publication date: Apr-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media