Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2834976.2834977acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Pattern-driven parallel I/O tuning

Published: 15 November 2015 Publication History

Abstract

The contemporary parallel I/O software stack is complex due to a large number of configurations for tuning I/O performance. Without a proper configuration, I/O becomes a performance bottleneck. As high performance computing (HPC) is moving towards exascale, poor I/O performance has a significant impact on the runtime of large-scale simulations producing massive amounts of data. In this paper, we focus on developing a framework for tuning parallel I/O configurations automatically. This auto-tuning framework first traces high-level I/O accesses and analyzes data write patterns. Based on these patterns and historically available tuning parameters for similar patterns, the framework selects best performing configurations at runtime. If previous history for a pattern is unavailable, the framework initiates model-based training to acquire efficient set of tuning parameters. Our framework includes a runtime system to apply the selected configurations using dynamic linking, without the need for changing application source code. In this paper, we describe this framework and evaluate it using multiple I/O kernels extracted from real applications and demonstrate substantial I/O performance improvement.

References

[1]
B. Behzad, S. Byna, S. M. Wild, M. Prabhat, and M. Snir. Improving Parallel I/O Autotuning with Performance Modeling. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, 2014.
[2]
B. Behzad, S. Byna, S. M. Wild, M. Prabhat, and M. Snir. Dynamic Model-driven Parallel I/O Performance Tuning. In IEEE Cluster 2015, 2015.
[3]
B. Behzad, H.-V. Dang, F. Hariri, W. Zhang, and M. Snir. Automatic Generation of I/O Kernels for HPC Applications. In Proceedings of the 9th Parallel Data Storage Workshop, PDSW '14, pages 31--36, Piscataway, NJ, USA, 2014. IEEE Press.
[4]
B. Behzad, L. Huong Vu Thanh, J. Huchette, S. Byna, Prabhat, R. Aydt, Q. Koziol, and M. Snir. Taming Parallel I/O Complexity with Auto-Tuning. In Proceedings of 2013 International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2013), SC '13, 2013.
[5]
K. J. Bowers, B. J. Albright, L. Yin, B. Bergen, and T. J. T. Kwan. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Physics of Plasmas, 15(5):7, 2008.
[6]
S. Breitenfeld, K. Chadalavada, R. Sisneros, S. Byna, Q. Koziol, N. Fortner, Prabhat, and V. Vishwanath. Recent Progress in Tuning Performance of Large-scale I/O with Parallel HDF5. In Proceedings of the 9th Parallel Data Storage Workshop, PDSW '14, 2014.
[7]
S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. Parallel I/O Prefetching Using MPI File Caching and I/O Signatures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08, pages 44:1--44:12, Piscataway, NJ, USA, 2008. IEEE Press.
[8]
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 4:1--4:12, 2008.
[9]
M. Dorier, S. Ibrahim, G. Antoniu, and R. Ross. Omnisc'IO: A Grammar-based Approach to Spatial and Temporal I/O Patterns Prediction. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 623--634, Piscataway, NJ, USA, 2014. IEEE Press.
[10]
Frigo, Matteo, Johnson, and S. G. FFTW: An adaptive software architecture for the FFT. In Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, volume 3, pages 1381--1384. IEEE, 1998.
[11]
T. H. Group. HDF5 Tutorial - Parallel Topics http://www.hdfgroup.org/HDF5/Tutor/parallel.html, Feb. 2011.
[12]
J. He, J. Bent, A. Torres, G. Grider, G. Gibson, C. Maltzahn, and X.-H. Sun. I/O Acceleration with Pattern Detection. In Proceedings of the 22Nd International Symposium on High-performance Parallel and Distributed Computing, HPDC '13, pages 25--36, New York, NY, USA, 2013. ACM.
[13]
M. Howison, Q. Koziol, D. Knaak, J. Mainzer, and J. Shalf. Tuning HDF5 for Lustre File Systems. In Proceedings of 2010 Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10), Heraklion, Crete, Greece, Sept. 2010. LBNL-4803E.
[14]
B. Jeff, A. Krste, C. Chee-Whye, and D. Jim. Optimizing matrix multiply using phipac: a portable, high-performance, ansi c coding methodology. In Proceedings of the 11th international conference on Supercomputing, ICS '97, pages 340--347, 1997.
[15]
LLNL. IOR https://github.com/chaos/ior, Feb. 2015.
[16]
H. Luu, B. Behzad, R. Aydt, and M. Winslett. A multi-level approach for understanding I/O activity in HPC applications. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--5, 2013.
[17]
C. Nieter and J. R. Cary. VORPAL: a versatile plasma simulation code. Journal of Computational Physics, 196:448--472, 2004.
[18]
D. A. Randal and A. Arakawa. Design and Testing of a Global Cloud-Resolving Model. Report, 2009.
[19]
H. Richardson. High Performance Fortran: history, overview and current developments. Technical report, 1.4 TMC-261, Thinking Machines Corporation, 1996.
[20]
H. Simitci and D. A. Reed. A Comparison of Logical and Physical Parallel I/O Patterns. International Journal of High Performance Computing Applications, 12:364--380, 1998.
[21]
S. W. Skillman, M. S. Warren, M. J. Turk, R. H. Wechsler, D. E. Holz, and P. M. Sutter. Dark Sky Simulations: Early Data Release. ArXiv e-prints, July 2014.
[22]
E. Smirni and D. A. Reed. Lessons from Characterizing Input/Output Bahavior of Parallel Scientific Applications. International Journal on Performance Evaluation, 33:27--44, 1998.
[23]
R. Vuduc, J. Demmel, and K. Yelick. Oski: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, 2005.
[24]
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1--2):3--35, 2001.
[25]
S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. A. Yelick, and D. Bailey. PERI: Autotuning memory intensive kernels for multicore. In Journal of Physics, SciDAC PI Conference: Conference Series: 123012001, 2008.
[26]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12, 2007.
[27]
W. Yu, J. Vetter, and H. Oral. Performance characterization and optimization of parallel i/o on the cray xt. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1--11, april 2008.

Cited By

View all
  • (2024)Application-Agnostic Auto-Tuning of Open MPI Collectives Using Bayesian Optimization2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00141(771-781)Online publication date: 27-May-2024
  • (2024)EVADyR: A New Dynamic Resampling Algorithm for Optimizing Noisy Expensive SystemsMetaheuristics and Nature Inspired Computing10.1007/978-3-031-69257-4_19(261-278)Online publication date: 15-Sep-2024
  • (2023)Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysisJournal of Big Data10.1186/s40537-023-00741-410:1Online publication date: 17-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PDSW '15: Proceedings of the 10th Parallel Data Storage Workshop
November 2015
59 pages
ISBN:9781450340083
DOI:10.1145/2834976
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

PDSW '15 Paper Acceptance Rate 9 of 25 submissions, 36%;
Overall Acceptance Rate 17 of 41 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)2
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Application-Agnostic Auto-Tuning of Open MPI Collectives Using Bayesian Optimization2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00141(771-781)Online publication date: 27-May-2024
  • (2024)EVADyR: A New Dynamic Resampling Algorithm for Optimizing Noisy Expensive SystemsMetaheuristics and Nature Inspired Computing10.1007/978-3-031-69257-4_19(261-278)Online publication date: 15-Sep-2024
  • (2023)Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysisJournal of Big Data10.1186/s40537-023-00741-410:1Online publication date: 17-May-2023
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00027(234-246)Online publication date: 31-Oct-2023
  • (2023)IOScout: an I/O Characteristics Prediction Method for the Supercomputer Jobs2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)10.1109/CCAI57533.2023.10201270(205-210)Online publication date: 26-May-2023
  • (2022)Design and implementation of dynamic I/O control scheme for large scale distributed file systemsCluster Computing10.1007/s10586-022-03640-025:6(4423-4438)Online publication date: 30-Jul-2022
  • (2021)Systematically inferring I/O performance variability by examining repetitive job behaviorProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476186(1-15)Online publication date: 14-Nov-2021
  • (2021)Improving the MPI-IO Performance of Applications with Genetic Algorithm based Auto-tuning2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00118(798-805)Online publication date: Jun-2021
  • (2021)Improving the I/O Performance of Applications with Predictive Modeling based Auto-tuning2021 International Conference on Engineering and Emerging Technologies (ICEET)10.1109/ICEET53442.2021.9659711(1-6)Online publication date: 27-Oct-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media