research-article

Pattern-driven parallel I/O tuning

Authors:

Marc SnirAuthors Info & Claims

PDSW '15: Proceedings of the 10th Parallel Data Storage Workshop

Pages 43 - 48

https://doi.org/10.1145/2834976.2834977

Published: 15 November 2015 Publication History

Abstract

The contemporary parallel I/O software stack is complex due to a large number of configurations for tuning I/O performance. Without a proper configuration, I/O becomes a performance bottleneck. As high performance computing (HPC) is moving towards exascale, poor I/O performance has a significant impact on the runtime of large-scale simulations producing massive amounts of data. In this paper, we focus on developing a framework for tuning parallel I/O configurations automatically. This auto-tuning framework first traces high-level I/O accesses and analyzes data write patterns. Based on these patterns and historically available tuning parameters for similar patterns, the framework selects best performing configurations at runtime. If previous history for a pattern is unavailable, the framework initiates model-based training to acquire efficient set of tuning parameters. Our framework includes a runtime system to apply the selected configurations using dynamic linking, without the need for changing application source code. In this paper, we describe this framework and evaluate it using multiple I/O kernels extracted from real applications and demonstrate substantial I/O performance improvement.

References

[1]

B. Behzad, S. Byna, S. M. Wild, M. Prabhat, and M. Snir. Improving Parallel I/O Autotuning with Performance Modeling. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, 2014.

Digital Library

[2]

B. Behzad, S. Byna, S. M. Wild, M. Prabhat, and M. Snir. Dynamic Model-driven Parallel I/O Performance Tuning. In IEEE Cluster 2015, 2015.

[3]

B. Behzad, H.-V. Dang, F. Hariri, W. Zhang, and M. Snir. Automatic Generation of I/O Kernels for HPC Applications. In Proceedings of the 9th Parallel Data Storage Workshop, PDSW '14, pages 31--36, Piscataway, NJ, USA, 2014. IEEE Press.

Digital Library

[4]

B. Behzad, L. Huong Vu Thanh, J. Huchette, S. Byna, Prabhat, R. Aydt, Q. Koziol, and M. Snir. Taming Parallel I/O Complexity with Auto-Tuning. In Proceedings of 2013 International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2013), SC '13, 2013.

Digital Library

[5]

K. J. Bowers, B. J. Albright, L. Yin, B. Bergen, and T. J. T. Kwan. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Physics of Plasmas, 15(5):7, 2008.

[6]

S. Breitenfeld, K. Chadalavada, R. Sisneros, S. Byna, Q. Koziol, N. Fortner, Prabhat, and V. Vishwanath. Recent Progress in Tuning Performance of Large-scale I/O with Parallel HDF5. In Proceedings of the 9th Parallel Data Storage Workshop, PDSW '14, 2014.

[7]

S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. Parallel I/O Prefetching Using MPI File Caching and I/O Signatures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08, pages 44:1--44:12, Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

[8]

K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 4:1--4:12, 2008.

Digital Library

[9]

M. Dorier, S. Ibrahim, G. Antoniu, and R. Ross. Omnisc'IO: A Grammar-based Approach to Spatial and Temporal I/O Patterns Prediction. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 623--634, Piscataway, NJ, USA, 2014. IEEE Press.

Digital Library

[10]

Frigo, Matteo, Johnson, and S. G. FFTW: An adaptive software architecture for the FFT. In Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, volume 3, pages 1381--1384. IEEE, 1998.

[11]

T. H. Group. HDF5 Tutorial - Parallel Topics http://www.hdfgroup.org/HDF5/Tutor/parallel.html, Feb. 2011.

[12]

J. He, J. Bent, A. Torres, G. Grider, G. Gibson, C. Maltzahn, and X.-H. Sun. I/O Acceleration with Pattern Detection. In Proceedings of the 22Nd International Symposium on High-performance Parallel and Distributed Computing, HPDC '13, pages 25--36, New York, NY, USA, 2013. ACM.

Digital Library

[13]

M. Howison, Q. Koziol, D. Knaak, J. Mainzer, and J. Shalf. Tuning HDF5 for Lustre File Systems. In Proceedings of 2010 Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10), Heraklion, Crete, Greece, Sept. 2010. LBNL-4803E.

[14]

B. Jeff, A. Krste, C. Chee-Whye, and D. Jim. Optimizing matrix multiply using phipac: a portable, high-performance, ansi c coding methodology. In Proceedings of the 11th international conference on Supercomputing, ICS '97, pages 340--347, 1997.

Digital Library

[15]

LLNL. IOR https://github.com/chaos/ior, Feb. 2015.

[16]

H. Luu, B. Behzad, R. Aydt, and M. Winslett. A multi-level approach for understanding I/O activity in HPC applications. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--5, 2013.

[17]

C. Nieter and J. R. Cary. VORPAL: a versatile plasma simulation code. Journal of Computational Physics, 196:448--472, 2004.

Digital Library

[18]

D. A. Randal and A. Arakawa. Design and Testing of a Global Cloud-Resolving Model. Report, 2009.

[19]

H. Richardson. High Performance Fortran: history, overview and current developments. Technical report, 1.4 TMC-261, Thinking Machines Corporation, 1996.

[20]

H. Simitci and D. A. Reed. A Comparison of Logical and Physical Parallel I/O Patterns. International Journal of High Performance Computing Applications, 12:364--380, 1998.

Digital Library

[21]

S. W. Skillman, M. S. Warren, M. J. Turk, R. H. Wechsler, D. E. Holz, and P. M. Sutter. Dark Sky Simulations: Early Data Release. ArXiv e-prints, July 2014.

[22]

E. Smirni and D. A. Reed. Lessons from Characterizing Input/Output Bahavior of Parallel Scientific Applications. International Journal on Performance Evaluation, 33:27--44, 1998.

Digital Library

[23]

R. Vuduc, J. Demmel, and K. Yelick. Oski: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, 2005.

[24]

R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1--2):3--35, 2001.

[25]

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. A. Yelick, and D. Bailey. PERI: Autotuning memory intensive kernels for multicore. In Journal of Physics, SciDAC PI Conference: Conference Series: 123012001, 2008.

[26]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12, 2007.

Digital Library

[27]

W. Yu, J. Vetter, and H. Oral. Performance characterization and optimization of parallel i/o on the cray xt. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1--11, april 2008.

Cited By

Jeannot ELemarinier PMercier GRobert-Hayek SSartori R(2024)Application-Agnostic Auto-Tuning of Open MPI Collectives Using Bayesian Optimization2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00141(771-781)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00141
Robert-Hayek SZertal SCouvée P(2024)EVADyR: A New Dynamic Resampling Algorithm for Optimizing Noisy Expensive SystemsMetaheuristics and Nature Inspired Computing10.1007/978-3-031-69257-4_19(261-278)Online publication date: 15-Sep-2024
https://doi.org/10.1007/978-3-031-69257-4_19
Kim SSim AWu KByna SSon Y(2023)Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysisJournal of Big Data10.1186/s40537-023-00741-410:1Online publication date: 17-May-2023
https://doi.org/10.1186/s40537-023-00741-4
Show More Cited By

Recommendations

Taming parallel I/O complexity with auto-tuning
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to ...
Ds4000 Best Practices And Performance Tuning Guide
Dynamic Model-Driven Parallel I/O Performance Tuning
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing

Parallel I/O performance depends highly on the interactions among multiple layers of the parallel I/O stack. The most common layers include high-level I/O libraries, MPI-IO middleware, and parallel file system. Each of these layers offers various ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PDSW '15: Proceedings of the 10th Parallel Data Storage Workshop

November 2015

59 pages

ISBN:9781450340083

DOI:10.1145/2834976

Program Chairs:
Ali Butt
Virginia Tech
,
Jay Lofstead
Sandia National Labs

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS\DATC

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2015

Texas, Austin

Acceptance Rates

PDSW '15 Paper Acceptance Rate 9 of 25 submissions, 36%;

Overall Acceptance Rate 17 of 41 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
202
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jeannot ELemarinier PMercier GRobert-Hayek SSartori R(2024)Application-Agnostic Auto-Tuning of Open MPI Collectives Using Bayesian Optimization2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00141(771-781)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00141
Robert-Hayek SZertal SCouvée P(2024)EVADyR: A New Dynamic Resampling Algorithm for Optimizing Noisy Expensive SystemsMetaheuristics and Nature Inspired Computing10.1007/978-3-031-69257-4_19(261-278)Online publication date: 15-Sep-2024
https://doi.org/10.1007/978-3-031-69257-4_19
Kim SSim AWu KByna SSon Y(2023)Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysisJournal of Big Data10.1186/s40537-023-00741-410:1Online publication date: 17-May-2023
https://doi.org/10.1186/s40537-023-00741-4
Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Liu ZZhang CWu HFang JPeng LYe GTang Z(2023)Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00027(234-246)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00027
Li YXiao LFeng JZhang JZheng GYuan Y(2023)IOScout: an I/O Characteristics Prediction Method for the Supercomputer Jobs2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)10.1109/CCAI57533.2023.10201270(205-210)Online publication date: 26-May-2023
https://doi.org/10.1109/CCAI57533.2023.10201270
Kim SSim AWu KByna SSon Y(2022)Design and implementation of dynamic I/O control scheme for large scale distributed file systemsCluster Computing10.1007/s10586-022-03640-025:6(4423-4438)Online publication date: 30-Jul-2022
https://doi.org/10.1007/s10586-022-03640-0
Costa EPatel TSchwaller BBrandt JTiwari Dde Supinski BHall MGamblin T(2021)Systematically inferring I/O performance variability by examining repetitive job behaviorProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476186(1-15)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476186
Bağbaba AWang X(2021)Improving the MPI-IO Performance of Applications with Genetic Algorithm based Auto-tuning2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00118(798-805)Online publication date: Jun-2021
https://doi.org/10.1109/IPDPSW52791.2021.00118
Bagbaba AWang XNiethammer CGracia J(2021)Improving the I/O Performance of Applications with Predictive Modeling based Auto-tuning2021 International Conference on Engineering and Emerging Technologies (ICEET)10.1109/ICEET53442.2021.9659711(1-6)Online publication date: 27-Oct-2021
https://doi.org/10.1109/ICEET53442.2021.9659711
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents