research-article

A cost-intelligent application-specific data layout scheme for parallel file systems

Authors:

Xian-He SunAuthors Info & Claims

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

Pages 37 - 48

https://doi.org/10.1145/1996130.1996138

Published: 08 June 2011 Publication History

Abstract

I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.

References

[1]

"Lustre: A Scalable, Robust, Highly-available Cluster File System," White Paper, Cluster File Systems, Inc., 2006. {Online}. Available: http://www.lustre.org/

[2]

F. Schmuck and R. Haskin, "GPFS: A Shared-disk File System for Large Computing Clusters," in FAST'02: Proceedings of the 1st USENIX Conference on File and Storage Technologies. Berkeley, CA, USA: USENIX Association, 2002, p. 19.

Digital Library

[3]

B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, "Scalable Performance of the Panasas Parallel File System," in phFAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies. Berkeley, CA, USA: USENIX Association, 2008, pp. 1--17.

Digital Library

[4]

P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System for Linux Clusters," in Proceedings of the 4th Annual Linux Showcase and Conference.USENIX Association, 2000, pp. 317--327.

Digital Library

[5]

M. Seltzer, P. Chen, and J. Ousterhout, "Disk Scheduling Revisited," in Proceedings of the USENIX Winter Technical Conference (USENIX Winter 90, 1990, pp. 313--324.

[6]

B. L. Worthington, G. R. Ganger, and Y. N. Patt, "Scheduling Algorithms for Modern Disk Drives," 1994, pp. 241--251.

[7]

C. Ruemmler and J. Wilkes, "An Introduction to Disk Drive Modeling," IEEE Computer, vol. 27, pp. 17--28, 1994.

Digital Library

[8]

C. R. Lumb, J. Schindler, G. R. Ganger, and D. F. Nagle, "Towards Higher Disk Head Utilization: Extracting Free Bandwidth from Busy Disk Drives," in Symposium on Operating Systems Design and Implementation. USENIX Association, 2000, pp. 87--102.

Digital Library

[9]

J. A. Solworth and C. U. Orji, "Write-only Disk Caches," in SIGMOD '90: Proceedings of the 1990 ACM SIGMOD International Conference on Management of data. New York, NY, USA: ACM, 1990, pp. 123--132.

Digital Library

[10]

M. Rosenblum and J. K. Ousterhout, "The Design and Implementation of a Log-structured File System," ACM Trans. Comput. Syst., vol. 10, no. 1, pp. 26--52, 1992.

Digital Library

[11]

R. Thakur, W. Gropp, and E. Lusk, "Data Sieving and Collective I/O in ROMIO," in FRONTIERS '99: Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation. Washington, DC, USA: IEEE Computer Society, 1999, p. 182.

Digital Library

[12]

A. Ching, A. Choudhary, K. Coloma, W.-k. Liao, R. Ross, and W. Gropp, "Noncontiguous I/O Accesses Through MPI-IO," Cluster Computing and the Grid, IEEE International Symposium on, vol. 0, p. 104, 2003.

Digital Library

[13]

A. Ching, A. Choudhary, W.-k. Liao, R. Ross, and W. Gropp, "Efficient Structured Data Access in Parallel File Systems," in Proceedings of the IEEE International Conference on Cluster Computing, 2003.

[14]

K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett, "Server-directed Collective I/O in Panda," in Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) New York, NY, USA: ACM, 1995, p. 57.

Digital Library

[15]

D. Kotz, "Disk-directed I/O for MIMD Multiprocessors," ACMTrans. Comput. Syst., vol. 15, no. 1, pp. 41--74, 1997.

Digital Library

[16]

X. Zhang, S. Jiang, and K. Davis, "Making Resonance a Common Case: A High-performance Implementation of Collective I/O on Parallel File Systems," in IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing. Washington, DC, USA: IEEE Computer Society, 2009, pp. 1--12.

Digital Library

[17]

F. Isaila and W. F. Tichy, "Clusterfile: A Flexible Physical LayoutParallel File System," IEEE International Conference on Cluster Computing, vol. 0, p. 37, 2001.

Digital Library

[18]

S. Rubin, R. Bodík, and T. Chilimbi, "An EfficientProfile-analysis Framework for Data-layout Optimizations," SIGPLAN Not., vol. 37, no. 1, pp. 140--153, 2002.

Digital Library

[19]

Y. Wang and D. Kaeli, "Profile-guided I/O Partitioning," in ICS '03: Proceedings of the 17th annual international conference on Supercomputing. New York, NY, USA: ACM, 2003, pp. 252--260.

Digital Library

[20]

W. W. Hsu, A. J. Smith, and H. C. Young, "The Automatic Improvementof Locality in Storage Systems," ACM Trans. Comput. Syst., vol. 23, no. 4, pp. 424--473, 2005.

Digital Library

[21]

X.-H. Sun, Y. Chen, and Y. Yin, "Data Layout Optimization for Petascale File Systems," in PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage. New York, NY, USA: ACM, 2009, pp. 11--15.

Digital Library

[22]

H. Huang, W. Hung, and K. G. Shin, "FS2: Dynamic Data Replicationin Free Disk Space for Improving Disk Performance and Energy Consumption," in SOSP '05: Proceedings of the Twentieth ACM symposium on Operating systems principles. New York, NY, USA: ACM, 2005, pp. 263--276.

Digital Library

[23]

. Bhadkamkar, J. Guerra, L. Useche, S. Burnett, J. Liptak, R. Rangaswami, and V. Hristidis, "BORG: Block-reorganization for Self-optimizing Storage Systems," in Proccedings of the 7th conference on File and storage technologies. Berkeley, CA, USA: USENIX Association, 2009, pp. 183--196. {Online}. Available: http://portal.acm.org/citation.cfm?id=1525908.1525922

Digital Library

[24]

X. Zhang and S. Jiang, "IternterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication," in phProceedings of the 24th International Conference on Supercomputing, 2010, pp. 223--232.

Digital Library

[25]

R. Thakur and A. Choudhary, "An Extended Two-phase Method for Accessing Sections of Out-of-core Arrays," in Scientific Programming, 5(4):301-C317, Winter, 1996.

Digital Library

[26]

C. Wang, Z. Zhang, X. Ma, S. S. Vazhkudai, and F. Mueller, "Improving the Availability of Supercomputer Job Input Data Using Temporal Replication," Computer Science - Research and Development, vol. 23.

[27]

B. Nitzberg and V. Lo, "Collective Buffering: Improving Parallel I/O Performance," in HPDC '97: Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing. Washington, DC, USA: IEEE Computer Society, 1997, p. 148.

Digital Library

[28]

X. Ma, M. Winslett, J. Lee, and S. Yu, "Faster Collective OutputThrough Active Buffering," in IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium.Washington, DC, USA: IEEE Computer Society, 2002, p. 151.

Digital Library

[29]

F. Isaila, G. Malpohl, V. Olaru, G. Szeder, and W. Tichy, "Integrating Collective I/O and Cooperative Caching into the "Clusterfile" Parallel File System," in ICS '04: Proceedings of the 18th annual international conference on Supercomputing. New York, NY, USA: ACM, 2004, pp. 58--67.

Digital Library

[30]

W.-k. Liao, K. Coloma, A. Choudhary, L. Ward, E. Russell, and S. Tideman, Collective Caching: Application-aware Client-side File Caching," in HPDC '05: Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium. Washington, DC, USA: IEEE Computer Society, 2005, pp. 81--90.

Digital Library

[31]

J. W. C. Fu and J. H. Patel, "Data Prefetching in MultiprocessorVector Cache Memories," in phISCA '91: Proceedings of the 18th annual international symposium on Computer architecture. New York, NY, USA: ACM, 1991, pp. 54--63.

Digital Library

[32]

F. Dahlgren, M. Dubois, and P. Stenstrom, "Fixed and AdaptiveSequential Prefetching in Shared Memory Multiprocessors," in phICPP '93: Proceedings of the 1993 International Conference on Parallel Processing. Washington, DC, USA: IEEE Computer Society, 1993, pp. 56--63.

Digital Library

[33]

R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka, "Informed Prefetching and Caching," in Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles. ACM Press, 1995, pp. 79--95.

Digital Library

[34]

S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp, "Parallel I/O Prefetching Using MPI File Caching and I/O Signatures," in SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA: IEEE Press, 2008, pp. 1--12.

Digital Library

[35]

H. Lei and D. Duchamp, "An Analytical Approach to FilePrefetching," in Proceedings of the USENIX 1997 Annual Technical Conference, 1997, pp. 275--288.

Digital Library

[36]

N. Tran, D. A. Reed, and S. Member, "Automatic Arima Time SeriesModeling for Adaptive I/O Prefetching," IEEE Transactions on Parallel and Distributed Systems, vol. 15, pp. 362--377, 2004.

Digital Library

[37]

Y. Chen, S. Byna, X.-H. Sun, R. Thakur, and W. Gropp, "Hiding I/O Latency with Pre-execution Prefetching for Parallel Applications," in SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA: IEEE Press, 2008, pp. 1--10.

Digital Library

[38]

F. Wang, Q. Xin, B. Hong, S. A. Brandt, E. L. Miller, D. D. E. Long,and T. T. Mclarty, "File System Workload Analysis for Large Scientific Computing Applications," in Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies, Apr. 2004, p. 139--152.

[39]

W. B. Ligon III and R. B. Ross, "Implementation and Performance of a Parallel File System for High Performance Distributed Applications," in HPDC '96: Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing. Washington, DC, USA: IEEE Computer Society, 1996, p. 471.

Digital Library

[40]

K. Vijayakumar, F. Mueller, X. Ma, and P. C. Roth, "Scalable I/O Tracing and Analysis," in phPDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage.New York, NY, USA: ACM, 2009, pp. 26--31.

Digital Library

[41]

H.-C. Yun, S.-K. Lee, J. Lee, and S. Maeng, "An Efficient LockProtocol for Home-based Lazy Release Consistency," in CCGRID '01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid. Washington, DC, USA: IEEE Computer Society, 2001, p. 527.

Digital Library

[42]

Y. Sun and Z. Xu, "Grid Replication Coherence Protocol,"in IPDPS'04: Proceedings of 18th International Parallel and Distributed Processing Symposium, vol. 14, p. 232, 2004.

[43]

A. Phanishayee, E. Krevat, V. Vasudevan, D. G. Andersen, G. R.Ganger, G. A. Gibson, and S. Seshan, "Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems," in FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies.relax Berkeley, CA, USA: USENIX Association, 2008, pp. 1--14.

Digital Library

[44]

. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller, "Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication," in Proceedings of the ACM SIGCOMM 2009 conference on Data communication, ser. SIGCOMM '09. New York, NY, USA: ACM, 2009, pp. 303--314. {Online}. Available: http://doi.acm.org/10.1145/1592568.1592604

Digital Library

[45]

V. Vasudevan, H. Shah, A. Phanishayee, E. Krevat, D. Andersen,G. Ganger, and G. Gibson, "Solving TCP Incast in Cluster Storage Systems (poster presentation)," in FAST'09: Proceedings of the 7th USENIX Conference on File and Storage Technologies. 2009.\endthebibliography

Cited By

Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Kunas CSerpa MBez JPadoin ENavaux P(2021)Offloading the Training of an I/O Access Pattern Detector to the Cloud2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW53941.2021.00013(15-19)Online publication date: Oct-2021
https://doi.org/10.1109/SBAC-PADW53941.2021.00013
He SLi ZZhou JYin YXu XChen YSun X(2020)A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2948901(1-1)Online publication date: 2020
https://doi.org/10.1109/TPDS.2019.2948901
Show More Cited By

Index Terms

A cost-intelligent application-specific data layout scheme for parallel file systems

Recommendations

Cost-intelligent application-specific data layout optimization for parallel file systems

Parallel file systems have been developed in recent years to ease the I/O bottleneck of high-end computing system. These advanced file systems offer several data layout strategies in order to meet the performance goals of specific I/O workloads. However,...
Data layout optimization for petascale file systems
PDSW '09: Proceedings of the 4th Annual Workshop on Petascale Data Storage

In this study, the authors propose a simple performance model to promote a better integration between the parallel I/O middleware layer and parallel file systems. They show that application-specific data layout optimization can improve overall data ...
Improving Parallel I/O Performance with Data Layout Awareness
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

Parallel applications can benefit greatly from massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

June 2011

296 pages

ISBN:9781450305525

DOI:10.1145/1996130

General Chair:
Arthur "Barney" Maccabe
Oak Ridge National Lab, USA
,
Program Chair:
Douglas Thain
University of Notre Dame, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

University of Arizona: University of Arizona
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '11

Sponsor:

University of Arizona
SIGARCH

HPDC '11: The 20th International Symposium on High-Performance Parallel and Distributed Computing

June 8 - 11, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
299
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Kunas CSerpa MBez JPadoin ENavaux P(2021)Offloading the Training of an I/O Access Pattern Detector to the Cloud2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW53941.2021.00013(15-19)Online publication date: Oct-2021
https://doi.org/10.1109/SBAC-PADW53941.2021.00013
He SLi ZZhou JYin YXu XChen YSun X(2020)A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2948901(1-1)Online publication date: 2020
https://doi.org/10.1109/TPDS.2019.2948901
Zhou JChen YXie WDai DHe SWang W(2020)PRS: A Pattern-Directed Replication Scheme for Heterogeneous Object-Based StorageIEEE Transactions on Computers10.1109/TC.2019.295408969:4(591-605)Online publication date: 1-Apr-2020
https://doi.org/10.1109/TC.2019.2954089
Kang DRubel OByna SBlanas S(2020)Predicting and Comparing the Performance of Array Management Libraries2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00097(906-915)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00097
He SYin YSun XZhang XLi Z(2019)Optimizing Parallel I/O Accesses through Pattern-Directed and Layout-Aware ReplicationIEEE Transactions on Computers10.1109/TC.2019.2946135(1-1)Online publication date: 2019
https://doi.org/10.1109/TC.2019.2946135
Bez JBoito FNou RMiranda ACortes TNavaux P(2019)Detecting I/O Access Patterns of HPC Workloads at Runtime2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2019.00025(80-87)Online publication date: Oct-2019
https://doi.org/10.1109/SBAC-PAD.2019.00025
Boito FNou RPilla LBez JMehaut JCortes TNavaux P(2019)On server-side file access pattern matching2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188092(217-224)Online publication date: Jul-2019
https://doi.org/10.1109/HPCS48598.2019.9188092
Zhou JChen YWang WEvripidou SStenström PO'Boyle M(2018)Atributed consistent hashing for heterogeneous storage systemsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243202(1-12)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243202
Boito FInacio EBez JNavaux PDantas MDenneulin Y(2018)A Checkpoint of Research on Parallel I/O for High-Performance ComputingACM Computing Surveys10.1145/315289151:2(1-35)Online publication date: 12-Mar-2018
https://dl.acm.org/doi/10.1145/3152891
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten