Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

The Vesta parallel file system

Published: 01 August 1996 Publication History

Abstract

The Vesta parallel file system is designed to provide parallel file access to application programs running on multicomputers with parallel I/O subsystems. Vesta uses a new abstraction of files: a file is not a sequence of bytes, but rather it can be partitioned into multiple disjoint sequences that are accessed in parallel. The partitioning—which can also be changed dynamically—reduces the need for synchronization and coordination during the access. Some control over the layout of data is also provided, so the layout can be matched with the anticipated access patterns. The system is fully implemented and forms the basis for the AIX Parallel I/O File System on the IBM SP2. The implementation does not compromise scalability or parallelism. In fact, all data accesses are done directly to the I/O node that contains the requested data, without any indirection or access to shared metadata. Disk mapping and caching functions are confined to each I/O node, so there is no need to keep data coherent across nodes. Performance measurements shown good scalability with increased resources. Moreover, different access patterns are show to achieve similar performance.

References

[1]
BATCHER, K.E. 1968. Sorting networks and their applications. In the AFIPS Spring Joint Computer Conference. AFIPS, Montvale, N.J., 307-314.]]
[2]
BORDAWEKAR, R., CHOUDHARY, A., AND DEL ROSARIO, J. M. 1993. An experimental performance evaluation of Touchstone Delta Concurrent File System. In the International Conference on Supercomputing. ACM, New York, 367-376.]]
[3]
BREWER, E. A. AND KUSZMAUL, B.C. 1994. How to get good performance from the CM-5 data network. In the 8th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos, Calif., 858-867.]]
[4]
BREZANY, P., GERNDT, M., MEHROTRA, P., AND ZIMA, H. 1992. Concurrent file operations in a high performance FORTRAN. In Supercomputing '92. IEEE Computer Society Press, Los Alamitos, Calif., 230-237.]]
[5]
CORBETT, P., FEITELSON, D., FINEBERG, S., HSU, Y., NITZBERG, B., FROST, J.-P., SNIR, M., TRAVERSAT, B., AND WONG, P. 1995a. Overview of the MPI-IO parallel I/O interface. In the IPPS '95 Workshop on I/O in Parallel and Distributed Systems. IEEE Computer Society, Washington, D.C., 1-15.]]
[6]
CORBETT, P. F. AND FEITELSON, D. a. 1994. Vesta File System programmer's reference, version 1.01. Res. Rep. RC 19898 (88058), IBM T. J. Watson Research Center, Yorktown Heights, N.Y.]]
[7]
CORBETT, P. F. AND SCHERSON, I.D. 1992. Sorting in mesh connected multiprocessors. IEEE Trans. Parallel Distrib. Syst. 3, 5 (Sept.), 626-632.]]
[8]
CORBETT, P. F., BAYLOR, S. J., AND FEITELSON, D.G. 1993a. Overview of the Vesta parallel file system. In Proceedings of the IPPS '93 Workshop on I~ 0 in Parallel Computer Systems. IEEE Computer Society, Washington, D.C., 1-16. Reprinted in Comput. Arch. News 21, 5, 7-14.]]
[9]
CORBETT, P. F., FEITELSON, D. G., FROST, J.-P., ALMASI, G. S., BAYLOR, S. J., BOLMARCICH, A. S., HSU, Y., SATRAN, J., SNIR, M., COLAO, R., HERR, B. D., KAVAKY, J., MORGAN, T. R., AND ZLOTEK, A. 1995b. Parallel file systems for the IBM SP computers. IBM Syst. J. 34, 2, 222-248.]]
[10]
CORBETT, P. F., FEITELSON, D. G., FROST, J.-P., AND BAYLOR, S.J. 1993b. Parallel access to files in the Vesta file system. In Supercomputing '93. IEEE Computer Society Press, Los Alamitos, Calif., 472-481.]]
[11]
DEBENEDICTIS, E. AND DEL ROSARIO, J.M. 1992. nCUBE parallel I/O software. In the 11th International Phoenix Conference on Computers and Communications. IEEE Computer Society Press, Los Alamitos, Calif., 117-124.]]
[12]
DEL ROSARIO, J. M., BORDAWEKAR, R., AND CHOUDHARY, A. 1993. Improved parallel I/O via a two-phase run-time access strategy. In Proceedings of the IPPS '93 Workshop on I/O in Parallel Computer Systems. IEEE Computer Society, Washington, D.C., 56-70. Reprinted in Comput. Arch. News 21, 5, 31-38.]]
[13]
DIBBLE, P. C., SCOTT, M. L., AND ELLIS, C.S. 1988. Bridge: A high-performance file system for parallel processors. In the 8th International Conference on Distributed Computer Systems. IEEE Computer Society Press, Los Alamitos, Calif., 154-161.]]
[14]
FEITELSON, D. G. 1994. Terminal I/O for massively parallel systems. In the Scalable High-Performance Computer Conference. IEEE Computer Society Press, Los Alamitos, Calif., 263-270.]]
[15]
FEITELSON, D. G., CORBETT, P. F., BAYLOR, S. J., AND HSU, Y. 1993. Satisfying the I/O requirements of massively parallel supercomputers. Res. Rep. RC 19008 (83016), IBM T. J. Watson Research Center, Yorktown Heights, N.Y.]]
[16]
FEITELSON, D. G., CORBETT, P. F., AND FROST, J.-P. 1995. Performance of the Vesta parallel file system. In the 9th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos, Calif.]]
[17]
FRENCH, g. C., PRATT, T. W., AND DAS, M. 1993. Performance measurement of the Concurrent File System of the Intel iPSC/2 hypercube. J. Parallel Distrib. Comput. 17, 1-2 (Jan./Feb.), 115-121.]]
[18]
HOLLAND, M. AND GIBSON, G. A. 1992. Parity declustering for continuous operation in redundant disk arrays. In the 5th International Conference on Architectural Support for Programming Language and Operating Systems. ACM, New York, 23-35.]]
[19]
INTEL. 1994. Paragon User's Guide. Order no. 312489003. Intel Supercomputer Systems Division, Mount Prospect, Ill.]]
[20]
KATZ, R. H., GIBSON, G. A., AND PATTERSON, D.A. 1989. Disk system architectures for high performance computing. Proc. IEEE 77, 12 (Dec.), 1842-1858.]]
[21]
KOTZ, D. 1994. Disk-directed I/O for MIMD multiprocessors. In the 1st Symposium on Operating Systems Design and Implementation. USENIX Assoc., Berkeley, Calif., 61-74.]]
[22]
KOTZ, D. AND ELLIS, C.S. 1993. Caching and writeback policies in parallel file systems. J. Parallel Distrib. Comput. 17, 1-2 (Jan./Feb.), 140-145.]]
[23]
KOTZ, D. AND NIEUWEJAAR, N. 1994. Dynamic file-access characteristics of a production parallel scientific workload. In Supercomputing '94. IEEE Computer Society Press, Los Alamitos, Calif., 640-649.]]
[24]
KOTZ, D. F. AND ELLIS, C.S. 1990. Prefetching in file systems for MIMD multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1, 2 (Apr.), 218-230.]]
[25]
LEVY, E. AND SILBERSCHATZ, A. 1990. Distributed file systems: Concepts and examples. ACM Comput. Surv. 22, 4 (Dec.), 321-374.]]
[26]
LOVEMAN, D.B. 1993. High performance Fortran. IEEE Parallel Distrib. Tech. 1, 1 (Feb.), 25-42.]]
[27]
LOVERSO, S. J., ISMAN, M., NANOPOULOS, A., NESHEIM, W., MILNE, E. D., AND WHEELER, R. 1993. sfs: A parallel file system for the CM-5. In Proceedings of the Summer USENIX Conference. USENIX Assoc., Berkeley, Calif., 291-305.]]
[28]
MILLER, E. L. AND KATZ, R.H. 1991. Input/output behavior of supercomputing applications. In Supercomputing '91. IEEE Computer Society Press, Los Alamitos, Calif., 567-576.]]
[29]
NELSON, M. N., WELCH, B. B., AND OUSTERHOUT, J.K. 1988. Caching in the Sprite network file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 134-154.]]
[30]
NODINE, M. H. AND VITTER, J.S. 1991. Large-scale sorting in parallel memories. In the 3rd Symposium on Parallel Algorithms and Architectures. IEEE Computer Society Press, Los Alamitos, Calif., 29-39.]]
[31]
PATT, Y.N. 1994. The I/O subsystem: A candidate for improvement. Computer 27, 3 (Mar.), 15-16.]]
[32]
PATTERSON, R. H. AND GIBSON, G. A. 1994. Exposing I/O concurrency with informed prefetching. In the 3rd International Conference on Parallel and Distributed Information Systems. IEEE Computer Society Press, Los Alamitos, Calif., 7-16.]]
[33]
PIERCE, P. 1989. A concurrent file system for a highly parallel mass storage subsystem. In the 4th Conference on Hypercubes, Concurrent Computing and Applications. Vol. 1. 155-160.]]
[34]
PURAKAYASTHA, A., ELLIS, C. S., KOTZ, D., NIEUWEJAAR, N., AND BEST, M. 1995. Characterizing parallel file-access patterns on a large-scale multiprocessor. In the 9th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos, Calif.]]
[35]
RoY, P. J. 1993. Unix file access and caching in a multicomputer environment. In the USENIX Mach III Symposium. USENIX Assoc., Berkeley, Calif., 21-37.]]
[36]
SALMON, J. 1987. CUBIX: Programming hypercubes without programming hosts. In Hypercube Multiprocessors 1987, M. T. Heath, Ed. SIAM, Philadelphia, Pa., 3-9.]]
[37]
SANDBERG, R., GOLDBERG, D., KLEIMAN, S., WALSH, D., AND LYON, B. 1985. Design and implementation of the Sun network filesystem. In Proceedings of the Summer USENIX Technical Conference. USENIX Assoc., Berkeley, Calif., 119-130.]]
[38]
STUNKEL, C. B., SHEA, D. G., GRICE, D. G., HOCHSCHILD, P. H., AND TSAO, M. 1994. The SP1 high-performance switch. In the Scalable High-Performance Computer Conference. IEEE Computer Society Press, Los Alamitos, Calif., 150-157.]]
[39]
TORRELLAS, J. AND ZHANG, Z. 1994. The performance of the Cedar multistage switching network. In Supercomputing '94. IEEE Computer Society Press, Los Alamitos, Calif., 265-274.]]
[40]
VITTER, J. S. AND SHRIVER, E. A.M. 1990. Optimal disk I/O with parallel block transfer. In the 22nd Annual Symposium on the Theory of Computing. ACM, New York, 159-169.]]

Cited By

View all
  • (2023)An Adaptive Metadata Management Scheme Based on Deep Reinforcement Learning for Large-Scale Distributed File SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2023.326640031:6(2840-2853)Online publication date: Dec-2023
  • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
  • (2022)LaMeta: An efficient locality aware metadata management technique for an ultra-large distributed storage systemJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.08.01234:10(8323-8335)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Reviews

David Michael Bowen

As multiprocessors become the norm, we need file systems that can take advantage of parallelism in the application. Typically, we have seen parallelism at the device level, with disk striping and redundant array of inexpensive disks (RAID) devices. This paper reports on a different approach to I/O parallelism. Instead of treating a file as a linear collection of bytes, the Vesta parallel file system considers a file as a two-dimensional structure, a linear array of cells where each cell is itself a linear array of bytes. Once the number of cells is specified, each individual process can access rows, columns, or rectangular subsections of the file as needed. The parallelism arises in the mapping of the cells onto the I/O processors. After the essence of the Vesta approach is explained, the authors discuss implementation details, such as caching strategies, concurrency issues, and storage of the file metadata; report performance results for a 16-node IBM SP1 platform; and conclude with a summary of the lessons learned. The performance results and the fact that the Vesta file system formed the basis for the AIX parallel I/O file system for the IBM SP2 suggest that the Vesta approach may be worth including in other systems.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 14, Issue 3
Aug. 1996
86 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/233557
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1996
Published in TOCS Volume 14, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data partitioning
  2. parallel computing
  3. parallel file system

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)8
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An Adaptive Metadata Management Scheme Based on Deep Reinforcement Learning for Large-Scale Distributed File SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2023.326640031:6(2840-2853)Online publication date: Dec-2023
  • (2022)The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and AvailabilityIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.317057433:12(3850-3869)Online publication date: 1-Dec-2022
  • (2022)LaMeta: An efficient locality aware metadata management technique for an ultra-large distributed storage systemJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.08.01234:10(8323-8335)Online publication date: Nov-2022
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Storage technologies and their dataStorage Systems10.1016/B978-0-32-390796-5.00011-5(89-196)Online publication date: 2022
  • (2021)DeltaFSProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476148(1-15)Online publication date: 14-Nov-2021
  • (2021)Transition‐Metal Borides (MBenes) as New High‐Efficiency Catalysts for Nitric Oxide Electroreduction to Ammonia by a High‐Throughput ApproachSmall10.1002/smll.20210077617:24Online publication date: 13-May-2021
  • (2020)A Survey on Serverless Computing and Its Implications for JointCloud Computing2020 IEEE International Conference on Joint Cloud Computing10.1109/JCC49151.2020.00023(94-101)Online publication date: Aug-2020
  • (2020)HCloud: A Serverless Platform for JointCloud Computing2020 IEEE International Conference on Joint Cloud Computing10.1109/JCC49151.2020.00022(86-93)Online publication date: Aug-2020
  • (2020)HSM$$^{2}$$2: A Hybrid and Scalable Metadata Management Method in Distributed File SystemsParallel Architectures, Algorithms and Programming10.1007/978-981-15-2767-8_19(195-206)Online publication date: 26-Jan-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media