research-article

Open access

Fine-grained Policy-driven I/O Sharing for Burst Buffers

Authors:

Daniel S. Katz,

Zhao ZhangAuthors Info & Claims

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 95, Pages 1 - 12

https://doi.org/10.1145/3581784.3607041

Published: 11 November 2023 Publication History

Abstract

A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5--13.7% higher I/O throughput and 19.5--40.4% lower performance variation than existing algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1--99.8% caused by I/O interference.

References

[1]

Lorenzo Casalino, Abigail C Dommer, Zied Gaieb, Emilia P Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Heng Ma, Hyungro Lee, et al. 2020. AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. BioRxiv (2020).

[2]

Tom Charnock and Adam Moss. 2016. Deep Recurrent Neural Networks for Supernovae Classification. arXiv preprint arXiv:1606.07442 (2016).

[3]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009). 248--255.

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[5]

Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In SC'99: International Conference for High Performance Computing, Networking, Storage and Analysis, Vol. 99. 5--33.

[6]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision. 2961--2969.

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[8]

Dave Henseler, Benjamin Landsteiner, Doug Petesch, Cornell Wright, and Nicholas J Wright. 2016. Architecture and design of Cray DataWarp. In Cray User Group meeting.

[9]

Stephen Herbein, Dong H Ahn, Don Lipari, Thomas RW Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Becky Springmeyer, and Michela Taufer. 2016. Scalable I/O-aware job scheduling for burst buffer enabled HPC clusters. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. ACM, 69--80.

Digital Library

[10]

Galen Hunt and Doug Brubacher. 1999. Detours: Binary interception of Win32 functions. In 3rd USENIX Windows NT Symposium.

Digital Library

[11]

Kamil Iskra, John W Romein, Kazutomo Yoshii, and Pete Beckman. 2008. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 153--162.

Digital Library

[12]

Xu Ji, Bin Yang, Tianyu Zhang, Xiaosong Ma, Xiupeng Zhu, Xiyang Wang, Nosayba El-Sayed, Jidong Zhai, Weiguo Liu, and Wei Xue. 2019. Automatic, application-aware I/O forwarding resource allocation. In 17th {USENIX} Conference on File and Storage Technologies ({FAST} 19). 265--279.

[13]

Julian Kates-Harbeck, Alexey Svyatkovskiy, and William Tang. 2019. Predicting disruptive instabilities in controlled fusion plasmas through deep learning. Nature 568, 7753 (2019), 526--531.

[14]

Michael Kerrisk and P Zijlstra. 2014. Linux Programmer's Manual. The Linux man-pages project 3 (2014).

[15]

Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, and Xian-He Sun. 2016. Leveraging burst buffer coordination to prevent I/O interference. In 2016 IEEE 12th International Conference on e-Science (e-Science). IEEE, 371--380.

[16]

Weihao Liang, Yong Chen, Jialin Liu, and Hong An. 2019. CARS: A contention-aware scheduler for efficient resource management of HPC storage systems. Parallel Comput. 87 (2019), 25--34.

Digital Library

[17]

Jay F Lofstead, Scott Klasky, Karsten Schwan, Norbert Podhorszki, and Chen Jin. 2008. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In 6th International workshop on Challenges of Large Applications in Distributed Environments (CLADE). ACM, 15--24.

Digital Library

[18]

Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Kelvin Li, Nikhil Jain, Shane Snyder, Robert Ross, Christopher D Carothers, Abhinav Bhatele, and Kwan-Liu Ma. 2017. Quantifying I/O and communication traffic interference on dragonfly networks equipped with burst buffers. In IEEE International Conference on Cluster Computing. IEEE, 204--215.

[19]

Tirthak Patel, Rohan Garg, and Devesh Tiwari. 2020. GIFT: A coupon based throttle-and-reward mechanism for fair and efficient i/o bandwidth management on parallel storage systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 103--119.

[20]

J Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, and Zhao Zhang. 2021. KAISA: an adaptive second-order optimizer framework for deep neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.

Digital Library

[21]

J Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, and Ian T Foster. 2020. Convolutional Neural Network Training with Distributed K-FAC. International Conference for High Performance Computing, Networking, Storage and Analysis (2020).

[22]

James C Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kale, and Klaus Schulten. 2005. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 16 (2005), 1781--1802.

[23]

Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.

Digital Library

[24]

Pavel Shamis, Manjunath Gorentla Venkata, M Graham Lopez, Matthew B Baker, Oscar Hernandez, Yossi Itigin, Mike Dubman, Gilad Shainer, Richard L Graham, Liran Liss, et al. 2015. UCX: an open source framework for HPC network APIs and beyond. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. IEEE, 40--43.

Digital Library

[25]

William C Skamarock, Joseph B Klemp, and Jimy Dudhia. 2001. Prototypes for the WRF (Weather Research and Forecasting) model. In Preprints, Ninth Conf. Mesoscale Processes. Amer. Meteorol. Soc., J11--J15.

[26]

Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. On implementing MPI-IO portably and with high performance. In Proceedings of 6th Workshop on I/O in Parallel and Distributed Systems. 23--32.

Digital Library

[27]

Rajeev Thakur, Ewing Lusk, and William Gropp. 1997. Users guide for ROMIO: A high-performance, portable MPI-IO implementation. Technical Report. Argonne National Laboratory.

[28]

Sagar Thapaliya, Purushotham Bangalore, Jat Lofstead, Kathryn Mohror, and Adam Moody. 2016. Managing I/O interference in a shared burst buffer system. In 45th International Conference on Parallel Processing (ICPP). IEEE, 416--425.

Recommendations

Analysis of the burst loss rate in OBS rings with depth limited optical buffers

Optical Burst Switching (OBS) is considered as the promised solution for next generation backbone networks, but the high burst loss rate caused by the burst contention is a great hindrance to the practical use.The authors proposed a new way to solve the ...
TCP over optical burst-switched networks with controlled burst retransmission

For optical burst-switched (OBS) networks in which TCP is implemented at a higher layer, the loss of bursts can lead to serious degradation of TCP performance. Due to the bufferless nature of OBS, random burst losses may occur, even at low traffic ...
Burst loss and delay in optical buffers with offset-time management

In this paper, we study the loss and delay of data bursts in an optical buffer. We assume that this buffer consists of a number of fiber delay lines (FDLs). In order to guarantee quality of service (QoS) differentiation in such a buffer, we investigate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2023

1428 pages

ISBN:9798400701092

DOI:10.1145/3581784

Chair:
Dorian Arnold,
Program Chair:
Rosa M Badia,
Program Co-chair:
Kathryn Mohror

Copyright © 2023 Owner/Author(s).

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2023

Check for updates

Badges

Qualifiers

Research-article

Funding Sources

Conference

SC '23

Sponsor:

SIGHPC

SC '23: International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2023

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
416
Total Downloads

Downloads (Last 12 months)416
Downloads (Last 6 weeks)44

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents