Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2503210.2503252acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Characterization and modeling of PIDX parallel I/O for performance optimization

Published: 17 November 2013 Publication History

Abstract

Parallel I/O library performance can vary greatly in response to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction.

References

[1]
HDF5 home page. http://www.hdfgroup.org/HDF5/.
[2]
K. Barker, K. Davis, and D. Kerbyson. Performance modeling in action: Performance prediction of a cray xt4 system during upgrade. In IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--8, 2009.
[3]
B. Behzad, J. Huchette, H. Luu, R. Aydt, Q. Koziol, M. Prabhat, S. Byna, M. Chaarawi, and Y. Yao. Abstract: Auto-tuning of parallel io parameters for hdf5 applications. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 1430--1430, 2012.
[4]
C. M. Bishop and N. M. Nasrabadi. Pattern recognition and machine learning, volume 1. springer New York, 2006.
[5]
S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 557--558, New York, NY, USA, 2010. ACM.
[6]
S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
[7]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, HPCA '05, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society.
[8]
J. M. del Rosario, R. Bordawekar, and A. Choudhary. Improved parallel I/O via a two-phase run-time access strategy. SIGARCH Comput. Archit. News, 21:31--38, December 1993.
[9]
T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei. A practical method for estimating performance degradation on multicore processors, and its application to hpc workloads. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 83:1--83:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[10]
M. Fahey, J. Larkin, and J. Adams. I/o performance on a massively parallel cray XT3/XT4. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--12, 2008.
[11]
D. Feng, Q. Zou, H. Jiang, and Y. Zhu. A novel model for synthesizing parallel i/o workloads in scientific applications. In Proceedings of the IEEE International Conference on Cluster Computing, pages 252--261, 2008.
[12]
K. Gao, W.-K. Liao, A. Nisar, A. Choudhary, R. Ross, and R. Latham. Using subfiling to improve programming flexibility and performance of parallel shared-file I/O. In International Conference on Parallel Processing (ICPP), pages 470--477, September 2009.
[13]
S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC '11, pages 22:1--22:14, New York, NY, USA, 2011. ACM.
[14]
P. Hanuliak. Analytical method of performance prediction in parallel algorithms. Open Cybernetics & Systemics Journal, 6:38--47, 2012.
[15]
W. Jiang, J. Liu, H.-W. Jin, D. Panda, W. Gropp, and R. Thakur. High performance mpi-2 one-sided communication over infiniband. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pages 531--538, 2004.
[16]
S. Johnson Baylor, C. Benveniste, and L. Boelhouwer. A methodology for evaluating parallel i/o performance for massively parallel processors. In 27th Annual Simulation Symposium, pages 31--40, 1994.
[17]
S. Kumar, V. Pascucci, V. Vishwanath, P. Carns, R. Latham, T. Peterka, M. Papka, and R. Ross. Towards parallel access of multi-dimensional, multiresolution scientific data. In Proceedings of the Petascale Data Storage Workshop (PDSW), November 2010.
[18]
S. Kumar, V. Vishwanath, P. Carns, J. A. Levine, R. Latham, G. Scorzelli, H. Kolla, R. Grout, R. Ross, M. E. Papka, J. Chen, and V. Pascucci. Efficient data restructuring and aggregation for i/o acceleration in pidx. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 50:1--50:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[19]
S. Kumar, V. Vishwanath, P. Carns, B. Summa, G. Scorzelli, V. Pascucci, R. Ross, J. Chen, H. Kolla, and R. Grout. PIDX: Efficient parallel I/O for multi-resolution multi-dimensional scientific datasets. In IEEE International Conference on Cluster Computing, 2011.
[20]
S. Lakshminarasimhan, D. A. Boyuka, S. V. Pendse, X. Zou, J. Jenkins, V. Vishwanath, M. E. Papka, and N. F. Samatova. Scalable in situ scientific data encoding for analytical query processing. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, HPDC '13, pages 1--12, New York, NY, USA, 2013. ACM.
[21]
A. Landge, J. Levine, A. Bhatele, K. Isaacs, T. Gamblin, M. Schulz, S. Langer, P.-T. Bremer, and V. Pascucci. Visualizing network traffic to understand the performance of massively parallel simulations. IEEE Transactions on Visualization and Computer Graphics, 18(12):2467--2476, 2012.
[22]
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. I/o performance challenges at leadership scale. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 40:1--40:12, New York, NY, USA, 2009. ACM.
[23]
B. Lee, R. Vuduc, J. Demmel, and K. Yelick. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply. In International Conference on Parallel Processing (ICPP), pages 169--176 vol.1, 2004.
[24]
E. K. Lee and R. H. Katz. An analytic performance model of disk arrays. In Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, SIGMETRICS '93, pages 98--109, New York, NY, USA, 1993. ACM.
[25]
J. Li, W.-K. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netCDF: A high-performance scientific I/O interface. In Proceedings of SC2003: High Performance Networking and Computing, Phoenix, AZ, November 2003. IEEE Computer Society Press.
[26]
J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda. Performance comparison of mpi implementations over infiniband, myrinet and quadrics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 58--58, 2003.
[27]
U. Lublin and D. G. Feitelson. The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput., 63(11):1105--1122, Nov. 2003.
[28]
M. Oberg, H. M. Tufo, and M. Woitaszek. Exploration of parallel storage architectures for a blue gene/1 on the teragrid. In 9th LCI International Conference on High-Performance Clustered Computing, 2008.
[29]
V. Pascucci and R. J. Frank. Global static indexing for real-time exploration of very large regular grids. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2001.
[30]
V. Pascucci, G. Scorzelli, B. Summa, P.-T. Bremer, A. Gyulassy, C. Christensen, S. Philip, and S. Kumar. The ViSUS visualization framework. In E. W. Bethel, H. C. (LBNL), and C. H. (UofU), editors, High Performance Visualization: Enabling Extreme-Scale Scientific Insight, Chapman and Hall/CRC Computational Science, chapter 19. Chapman and Hall/CRC, 2012.
[31]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[32]
P. C. Roth. Characterizing the i/o behavior of scientific applications on the cray XT. In International workshop on Petascale data storage (PDSW), PDSW '07, pages 50--55, New York, NY, USA, 2007. ACM.
[33]
H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the i/o performance of hpc applications using a parameterized synthetic benchmark. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1--12, 2008.
[34]
B. Summa, G. Scorzelli, M. Jiang, P.-T. Bremer, and V. Pascucci. Interactive editing of massive imagery made simple: Turning atlanta into atlantis. ACM Trans. Graph., 30:7:1--7:13, April 2011.
[35]
A. Uselton, M. Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, and L. Oliker. Parallel i/o performance: From events to ensembles. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--11, 2010.
[36]
J. S. Vetter and F. Mueller. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel Distrib. Comput., 63(9):853--865, Sept. 2003.
[37]
V. Vishwanath, M. Hereld, V. Morozov, and M. E. Papka. Topology-aware data movement and staging for i/o acceleration on blue gene/p supercomputing systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 19:1--19:11, New York, NY, USA, 2011. ACM.
[38]
B. Xie, J. Chase, D. Dillow, O. Drokin, S. Klasky, S. Oral, and N. Podhorszki. Characterizing output bottlenecks in a supercomputer. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 8:1--8:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[39]
C. S. Yoo, R. Sankaran, and J. H. Chen. Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: flame stabilization and structure. Journal of Fluid Mechanics, pages 453--481, 2009.
[40]
W. Yu, S. Oral, J. Vetter, and R. Barrett. Efficiency evaluation of cray XT parallel io stack. In Cray User Group Meeting (CUG 2007), 2007.
[41]
W. Yu, J. S. Vetter, and H. S. Oral. Performance characterization and optimization of parallel i/o on the cray XT. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--11. IEEE, 2008.
[42]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In J. C. Hoe and V. S. Adve, editors, ASPLOS, pages 129--142. ACM, 2010.

Cited By

View all
  • (2024)Configurable Algorithms for All-to-All CollectivesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528936(1-12)Online publication date: May-2024
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00027(234-246)Online publication date: 31-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O & network characterization
  2. performance modeling

Qualifiers

  • Research-article

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Configurable Algorithms for All-to-All CollectivesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528936(1-12)Online publication date: May-2024
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00027(234-246)Online publication date: 31-Oct-2023
  • (2022)SnuQSProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532375(1-13)Online publication date: 28-Jun-2022
  • (2021)Offloading the Training of an I/O Access Pattern Detector to the Cloud2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW53941.2021.00013(15-19)Online publication date: Oct-2021
  • (2019)I/O Scheduling Strategy for Periodic ApplicationsACM Transactions on Parallel Computing10.1145/33385106:2(1-26)Online publication date: 23-Jul-2019
  • (2019)Optimizing I/O Performance of HPC Applications with AutotuningACM Transactions on Parallel Computing10.1145/33092055:4(1-27)Online publication date: 8-Mar-2019
  • (2019)Detecting I/O Access Patterns of HPC Workloads at Runtime2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2019.00025(80-87)Online publication date: Oct-2019
  • (2019)Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)10.1109/PDSW49588.2019.00008(30-39)Online publication date: Nov-2019
  • (2019)On server-side file access pattern matching2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188092(217-224)Online publication date: Jul-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media