research-article

Characterization and modeling of PIDX parallel I/O for performance optimization

Authors:

Sidharth Kumar,

Venkatram Vishwanath,

John A. Schmidt,

Giorgio Scorzelli,

Michael E. Papkafa,

Jacqueline Chen,

Valerio PascucciAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 67, Pages 1 - 12

https://doi.org/10.1145/2503210.2503252

Published: 17 November 2013 Publication History

Abstract

Parallel I/O library performance can vary greatly in response to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction.

References

[1]

HDF5 home page. http://www.hdfgroup.org/HDF5/.

[2]

K. Barker, K. Davis, and D. Kerbyson. Performance modeling in action: Performance prediction of a cray xt4 system during upgrade. In IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--8, 2009.

Digital Library

[3]

B. Behzad, J. Huchette, H. Luu, R. Aydt, Q. Koziol, M. Prabhat, S. Byna, M. Chaarawi, and Y. Yao. Abstract: Auto-tuning of parallel io parameters for hdf5 applications. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 1430--1430, 2012.

Digital Library

[4]

C. M. Bishop and N. M. Nasrabadi. Pattern recognition and machine learning, volume 1. springer New York, 2006.

Digital Library

[5]

S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 557--558, New York, NY, USA, 2010. ACM.

Digital Library

[6]

S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.

Digital Library

[7]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, HPCA '05, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[8]

J. M. del Rosario, R. Bordawekar, and A. Choudhary. Improved parallel I/O via a two-phase run-time access strategy. SIGARCH Comput. Archit. News, 21:31--38, December 1993.

Digital Library

[9]

T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei. A practical method for estimating performance degradation on multicore processors, and its application to hpc workloads. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 83:1--83:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[10]

M. Fahey, J. Larkin, and J. Adams. I/o performance on a massively parallel cray XT3/XT4. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--12, 2008.

[11]

D. Feng, Q. Zou, H. Jiang, and Y. Zhu. A novel model for synthesizing parallel i/o workloads in scientific applications. In Proceedings of the IEEE International Conference on Cluster Computing, pages 252--261, 2008.

[12]

K. Gao, W.-K. Liao, A. Nisar, A. Choudhary, R. Ross, and R. Latham. Using subfiling to improve programming flexibility and performance of parallel shared-file I/O. In International Conference on Parallel Processing (ICPP), pages 470--477, September 2009.

Digital Library

[13]

S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC '11, pages 22:1--22:14, New York, NY, USA, 2011. ACM.

Digital Library

[14]

P. Hanuliak. Analytical method of performance prediction in parallel algorithms. Open Cybernetics & Systemics Journal, 6:38--47, 2012.

[15]

W. Jiang, J. Liu, H.-W. Jin, D. Panda, W. Gropp, and R. Thakur. High performance mpi-2 one-sided communication over infiniband. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pages 531--538, 2004.

Digital Library

[16]

S. Johnson Baylor, C. Benveniste, and L. Boelhouwer. A methodology for evaluating parallel i/o performance for massively parallel processors. In 27th Annual Simulation Symposium, pages 31--40, 1994.

[17]

S. Kumar, V. Pascucci, V. Vishwanath, P. Carns, R. Latham, T. Peterka, M. Papka, and R. Ross. Towards parallel access of multi-dimensional, multiresolution scientific data. In Proceedings of the Petascale Data Storage Workshop (PDSW), November 2010.

[18]

S. Kumar, V. Vishwanath, P. Carns, J. A. Levine, R. Latham, G. Scorzelli, H. Kolla, R. Grout, R. Ross, M. E. Papka, J. Chen, and V. Pascucci. Efficient data restructuring and aggregation for i/o acceleration in pidx. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 50:1--50:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[19]

S. Kumar, V. Vishwanath, P. Carns, B. Summa, G. Scorzelli, V. Pascucci, R. Ross, J. Chen, H. Kolla, and R. Grout. PIDX: Efficient parallel I/O for multi-resolution multi-dimensional scientific datasets. In IEEE International Conference on Cluster Computing, 2011.

Digital Library

[20]

S. Lakshminarasimhan, D. A. Boyuka, S. V. Pendse, X. Zou, J. Jenkins, V. Vishwanath, M. E. Papka, and N. F. Samatova. Scalable in situ scientific data encoding for analytical query processing. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, HPDC '13, pages 1--12, New York, NY, USA, 2013. ACM.

Digital Library

[21]

A. Landge, J. Levine, A. Bhatele, K. Isaacs, T. Gamblin, M. Schulz, S. Langer, P.-T. Bremer, and V. Pascucci. Visualizing network traffic to understand the performance of massively parallel simulations. IEEE Transactions on Visualization and Computer Graphics, 18(12):2467--2476, 2012.

Digital Library

[22]

S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. I/o performance challenges at leadership scale. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 40:1--40:12, New York, NY, USA, 2009. ACM.

Digital Library

[23]

B. Lee, R. Vuduc, J. Demmel, and K. Yelick. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply. In International Conference on Parallel Processing (ICPP), pages 169--176 vol.1, 2004.

Digital Library

[24]

E. K. Lee and R. H. Katz. An analytic performance model of disk arrays. In Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, SIGMETRICS '93, pages 98--109, New York, NY, USA, 1993. ACM.

Digital Library

[25]

J. Li, W.-K. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netCDF: A high-performance scientific I/O interface. In Proceedings of SC2003: High Performance Networking and Computing, Phoenix, AZ, November 2003. IEEE Computer Society Press.

Digital Library

[26]

J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda. Performance comparison of mpi implementations over infiniband, myrinet and quadrics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 58--58, 2003.

Digital Library

[27]

U. Lublin and D. G. Feitelson. The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput., 63(11):1105--1122, Nov. 2003.

Digital Library

[28]

M. Oberg, H. M. Tufo, and M. Woitaszek. Exploration of parallel storage architectures for a blue gene/1 on the teragrid. In 9th LCI International Conference on High-Performance Clustered Computing, 2008.

[29]

V. Pascucci and R. J. Frank. Global static indexing for real-time exploration of very large regular grids. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2001.

Digital Library

[30]

V. Pascucci, G. Scorzelli, B. Summa, P.-T. Bremer, A. Gyulassy, C. Christensen, S. Philip, and S. Kumar. The ViSUS visualization framework. In E. W. Bethel, H. C. (LBNL), and C. H. (UofU), editors, High Performance Visualization: Enabling Extreme-Scale Scientific Insight, Chapman and Hall/CRC Computational Science, chapter 19. Chapman and Hall/CRC, 2012.

[31]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.

Digital Library

[32]

P. C. Roth. Characterizing the i/o behavior of scientific applications on the cray XT. In International workshop on Petascale data storage (PDSW), PDSW '07, pages 50--55, New York, NY, USA, 2007. ACM.

Digital Library

[33]

H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the i/o performance of hpc applications using a parameterized synthetic benchmark. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1--12, 2008.

Digital Library

[34]

B. Summa, G. Scorzelli, M. Jiang, P.-T. Bremer, and V. Pascucci. Interactive editing of massive imagery made simple: Turning atlanta into atlantis. ACM Trans. Graph., 30:7:1--7:13, April 2011.

Digital Library

[35]

A. Uselton, M. Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, and L. Oliker. Parallel i/o performance: From events to ensembles. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--11, 2010.

[36]

J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel Distrib. Comput., 63(9):853--865, Sept. 2003.

Digital Library

[37]

V. Vishwanath, M. Hereld, V. Morozov, and M. E. Papka. Topology-aware data movement and staging for i/o acceleration on blue gene/p supercomputing systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 19:1--19:11, New York, NY, USA, 2011. ACM.

Digital Library

[38]

B. Xie, J. Chase, D. Dillow, O. Drokin, S. Klasky, S. Oral, and N. Podhorszki. Characterizing output bottlenecks in a supercomputer. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 8:1--8:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[39]

C. S. Yoo, R. Sankaran, and J. H. Chen. Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: flame stabilization and structure. Journal of Fluid Mechanics, pages 453--481, 2009.

[40]

W. Yu, S. Oral, J. Vetter, and R. Barrett. Efficiency evaluation of cray XT parallel io stack. In Cray User Group Meeting (CUG 2007), 2007.

[41]

W. Yu, J. S. Vetter, and H. S. Oral. Performance characterization and optimization of parallel i/o on the cray XT. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--11. IEEE, 2008.

[42]

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In J. C. Hoe and V. S. Adve, editors, ASPLOS, pages 129--142. ACM, 2010.

Digital Library

Cited By

Fan KPetruzza SGilray TKumar S(2024)Configurable Algorithms for All-to-All CollectivesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528936(1-12)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528936
Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Liu ZZhang CWu HFang JPeng LYe GTang Z(2023)Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00027(234-246)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00027
Show More Cited By

Recommendations

PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets
CLUSTER '11: Proceedings of the 2011 IEEE International Conference on Cluster Computing

The IDX data format provides efficient, cache oblivious, and progressive access to large-scale scientific datasets by storing the data in a hierarchical Z (HZ) order. Data stored in IDX format can be visualized in an interactive environment allowing for ...
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale ...
Improving parallel I/O autotuning with performance modeling
HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

Various layers of the parallel I/O subsystem offer tunable parameters for improving I/O performance on large-scale computers. However, searching through a large parameter space is challenging. We are working towards an autotuning framework for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fan KPetruzza SGilray TKumar S(2024)Configurable Algorithms for All-to-All CollectivesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528936(1-12)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528936
Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Liu ZZhang CWu HFang JPeng LYe GTang Z(2023)Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00027(234-246)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00027
Park DKim HKim JKim TLee JRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)SnuQSProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532375(1-13)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532375
Kunas CSerpa MBez JPadoin ENavaux P(2021)Offloading the Training of an I/O Access Pattern Detector to the Cloud2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW53941.2021.00013(15-19)Online publication date: Oct-2021
https://doi.org/10.1109/SBAC-PADW53941.2021.00013
Aupy GGainaru AFèvre V(2019)I/O Scheduling Strategy for Periodic ApplicationsACM Transactions on Parallel Computing10.1145/33385106:2(1-26)Online publication date: 23-Jul-2019
https://dl.acm.org/doi/10.1145/3338510
Behzad BByna SPrabhat Snir M(2019)Optimizing I/O Performance of HPC Applications with AutotuningACM Transactions on Parallel Computing10.1145/33092055:4(1-27)Online publication date: 8-Mar-2019
https://dl.acm.org/doi/10.1145/3309205
Bez JBoito FNou RMiranda ACortes TNavaux P(2019)Detecting I/O Access Patterns of HPC Workloads at Runtime2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2019.00025(80-87)Online publication date: Oct-2019
https://doi.org/10.1109/SBAC-PAD.2019.00025
Xie BTan ZCarns PChase JHarms KLofstead JOral SVazhkudai SWang F(2019)Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)10.1109/PDSW49588.2019.00008(30-39)Online publication date: Nov-2019
https://doi.org/10.1109/PDSW49588.2019.00008
Boito FNou RPilla LBez JMehaut JCortes TNavaux P(2019)On server-side file access pattern matching2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188092(217-224)Online publication date: Jul-2019
https://doi.org/10.1109/HPCS48598.2019.9188092
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents