Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

HFS: a performance-oriented flexible file system based on building-block compositions

Published: 01 August 1997 Publication History
  • Get Citation Alerts
  • Abstract

    The Hurricane File System (HFS) is designed for (potentially large-scale) shared-memory multiprocessors. Its architecture is based on the principle that, in order to maximize performance for applications with diverse requirements, a file system must support a wide variety of file structures, file system policies, and I/O interfaces. Files in HFS are implemented using simple building blocks composed in potentially complex ways. This approach yields great flexibility, allowing an application to customize the structure and policies of a file to exactly meet its requirements. As an extreme example, HFS allows a file's structure to be optimized for concurrent random-access write-only operations by 10 threads, something no other file system can do. Similarly, the prefetching, locking, and file cache management policies can all be chosen to match an application's access pattern. In contrast, most parallel file systems support a single file structure and a small set of policies. We have implemented HFS as part of the Hurricane operating system running on the Hector shared-memory multiprocessor. We demonstrate that the flexibility of HFS comes with little processing or I/O overhead. We also show that for a number of file access patterns, HFS is able to deliver to the applications the full I/O bandwidth of the disks on our system.

    References

    [1]
    AUSLANDER, M., FRANKE, H., GAMSA, B., KRIEGER, O., AND STUMM, M. 1997. Customization lite. In Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI) (Cape Cod, Mass., May). 43-48.]]
    [2]
    BERSHAD, B., SAVAGE, S., PARDYAK, P., SIRER, E., BECKER, D., FIUCZYNSKI, M., CHAMBERS, C., AND EGGERS, S. 1995. Extensibility, safety and performance in the SPIN operating system. In Proceedings of the 15th Symposium on Operating Systems Principles (Copper Mountain, Colo., Dec.). 267-284.]]
    [3]
    BORDAWEKAR, R., CHOUDHARY, A., AND RAMANUJAM, J. 1996. Compilation and communication strategies for out-of-core programs on distributed-memory machines. J. Parallel Distrib. Comput. 38, 2 (Nov.), 277-288.]]
    [4]
    CORBETT, P. F. AND FEITELSON, D. G. 1996. The Vesta parallel file system. ACM Trans. Comput. Syst. 14, 3 (Aug.), 225-264.]]
    [5]
    CORBETT, P., FEITELSON, D., FINEBERG, S., HSU, Y., NITZBERG, B., FROST, J.-P., SNIR, M., TRAVERSAT, B., AND TONG, P. 1995. Overview of the MPI-IO parallel I/O interface. In Proceedings of the IPPS '95 Workshop on Input/Output in Parallel and Distributed Systems (Santa Barbara, Calif., Apr.). 1-15.]]
    [6]
    CORBETT, P. F., FEITELSON, D. G., FROST, J.-P., ALMASI, G. S., BAYLOR, S. J., BOLMARCICH, A. S., Hsu, Y., SATRAN, J., SNIR, M., COLAO, R., HERR, B., KAVAKY, J., MORGAN, T. R., AND ZLOTEK, A. 1995. Parallel file systems for the IBM SP computers. IBM Syst. J., 222-248.]]
    [7]
    CORMEN, T. H. AND COLVIN, A. 1994. ViC*: A preprocessor for virtual-memory C*. Tech. Rep. PCS-TR94-243, Dept. of Computer Science, Dartmouth College, Hanover, N.H., Nov.]]
    [8]
    CRANDALL, P. E., AYDT, R. A., CHIEN, A. A., AND REED, D.A. 1995. Input/output characteristics of scalable parallel applications. In Proceedings of Supercomputing '95 (San Diego, Calif., Dec.).]]
    [9]
    CROCKETT, T.W. 1989. File concepts for parallel I/O. In Proceedings of Supercomputing '89. 574-579.]]
    [10]
    DEBENEDICTIS, E. P. AND DEL ROSARIO, J.M. 1993. Modular scalable I/O. J. Parallel Distrib. Comput. 17, 1-2 (Jan./Feb.), 122-128.]]
    [11]
    DEL ROSARIO, J. M. AND CHOUDHARY, A. 1994. High performance I/O for massively parallel computers: Problems and prospects. IEEE Comput. 27, 3 (Mar.), 59-68.]]
    [12]
    DIBBLE, P., SCOTT, M., AND ELLIS, C. 1988. Bridge: A high-performance file system for parallel processors. In Proceedings of the 8th International Conference on Distributed Computer Systems (San Jose, Calif., June), 154-161.]]
    [13]
    DRUSCHEL, P. 1993. Efficient support for incremental customization of OS services. In Proceedings of the 3rd International Workshop on Object Orientation in Operating Systems. 186-190.]]
    [14]
    ENGLER, D., KAASHOEK, F., AND JR, J.O. 1995. Exokernel: An operating system architecture for application-level resource management. In Proceedings of the 15th Symposium on Operating Systems Principles (Copper Mountain, Colo., Dec.). 251-267.]]
    [15]
    FEITELSON, D. G., CORBETT, P. F., BAYLOR, S. J., AND HSU, Y. 1995. Parallel I/O subsystems in massively parallel supercomputers. IEEE Parallel Distrib. Technol. 3, 3, 33-49.]]
    [16]
    FRANK, S., ROTHNIE, J., AND BURKHARDT, H. 1993. The KSRI: Bridging the gap between shared memory and MPPs. In IEEE Compcon 1993 Digest of Papers. 285-294.]]
    [17]
    GALBREATH, N., GROPP, W., AND LEVINE, D. 1993. Application-driven parallel I/O. In Proceedings of Supercomputing (Portland, Oreg., Nov.). 388-395.]]
    [18]
    GAMSA, B., K_RIEGER, O., PARSONS, E. W., AND STUMM, M. 1995. Performance issues for multiprocessor operating systems. Tech. Report CSRI-339, Computer Systems Research Inst., Univ. of Toronto, Toronto, Canada, Nov.]]
    [19]
    GAMSA, B., K_RIEGER, O., AND STUMM, M. 1994. Optimizing IPC performance for sharedmemory multiprocessors. In Proceedings of the 1994 International Conference on Parallel Processing (ICPP) (Boca Raton, Fla., Aug.). 208-211.]]
    [20]
    GRIMSHAW, A. S. AND LOYOT, E. C., JR. 1991. ELFS: Object-oriented extensible file systems. In Proceedings of the 1st International Conference on Parallel and Distributed Information Systems (Miami Beach, Fla., Dec.). 177-179.]]
    [21]
    HEIDEMANN, J. S. AND POPEK, G.J. 1994. File-system development with stackable layers. ACM Trans. Comput. Syst. 12, 1 (Feb.), 58-89.]]
    [22]
    HUBER, J., ELFORD, C. L., REED, D. A., CHIEN, A. A., AND BLUMENTHAL, D.S. 1995. PPFS: A high performance portable parallel file system. In Proceedings of the 9th ACM International Conference on Supercomputing (Barcelona, July). ACM, New York, 385-394.]]
    [23]
    INTEL. 1989. Concurrent I/O application examples. Intel Corporation Background Information, Intel Corp.]]
    [24]
    KHALIDI, Y. A. AND NELSON, M.N. 1993. Extensible file systems in Spring. In Proceedings of the 14th ACM Symposium on Operating Systems Principles. ACM, New York, 1-14.]]
    [25]
    KOTZ, D. 1994. Disk-directed I/O for MIMD multiprocessors. In Proceedings of the 1994 Symposium on Operating Systems Design and Implementation (Nov.). 61-74.]]
    [26]
    KRIEGER, O. 1994. HFS: A flexible file system for shared memory multiprocessors. Ph.D. thesis, Dept. of Electrical and Computer Engineering, Univ. of Toronto, Toronto, Canada.]]
    [27]
    KRIEGER, O., STUMM, M., AND UNRAU, R. 1994. The Alloc Stream Facility: A redesign of application-level stream I/O. IEEE Comput. 27, 3 (Mar.), 75-82.]]
    [28]
    KUSKIN, J., OFELT, D., HEINRICH, M., HEINLEIN, J., SIMONI, R., GHARACHORLOO, K., CHAPIN, J., NAKAHIRA, D., BAXTER, J., HOROWITZ, M., GUPTA, A., ROSENBLUM, M., AND HENNESSY, J. 1994. The Stanford FLASH multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture (Chicago, Ill., Apr. 1994). 302-313.]]
    [29]
    LENOSKI, D., LAUDON, J., GHARACHORLOO, K., WEBER, W. D., GUPTA, A., HENESSY, J., HOROWITZ, M., AND LAM, M.S. 1992. The Stanford DASH multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79.]]
    [30]
    LIEDTKE, J. 1993. Improving IPC by kernel design. In Proceedings of the 14th ACM Symposium on Operating System Principles (N. Carol., Dec.). ACM, New York, 175-188.]]
    [31]
    LIN, Z. AND ZHOU, S. 1993. Parallelizing I/O intensive applications on a workstation cluster: A case study. In Proceedings of the IPPS '93 Workshop on Input~Output in Parallel Computer Systems. 17-36.]]
    [32]
    LOVERSO, S. J., ISMAN, M., NANOPOULOS, A., NESHEIM, W., MILNE, E. D., AND WHEELER, R. 1993. sfs: A parallel file system for the CM-5. In Proceedings of the 1993 Summer Usenix Conference. USENIX, Assoc., Berkeley, Calif., 291-305.]]
    [33]
    MASSALIN, H. AND PU, C. 1989. Threads and input/output in the Synthesis kernel. In Proceedings of the 12th Symposium on Operating Systems Principles (Arizona, Dec.). 191-201.]]
    [34]
    MILLER, E. AND KATZ, R. 1991. Input/output behavior of supercomputing applications. In Proceedings of Supercomputing '91 (Nov. 1991). 567-76.]]
    [35]
    MILLER, E. L. AND KATZ, R.H. 1993. RAMA: A file system for massively parallel computers. In Proceedings of the 12th IEEE Symposium on Mass Storage Systems. IEEE, New York, 163-168.]]
    [36]
    MOWRY, T. C., DEMKE, A. K., AND KRIEGER, O. 1996. Automatic compiler-inserted I/O prefetching for out-of-core applications. In Proceedings of the 1996 Symposium on Operating Systems Design and Implementation (Oct.). 3-17.]]
    [37]
    MOYER, S. A. AND SUNDERAM, V. S. 1994. PIOUS: A scalable parallel I/O system for distributed computing environments. In Proceedings of the Scalable High-Performance Computing Conference. 71-78.]]
    [38]
    NIEUWEJAAR, N. AND KOTZ, D. 1997. The Galley parallel file system. Parallel Comput. 23, 4 (June), 447-476.]]
    [39]
    PARSONS, E., GAMSA, B., KRIEGER, O., AND STUMM, M. 1995. (De-)clustering objects for multiprocessor system software. In Proceedings of the 4th International Workshop on Object Orientation in Operating Systems 95 (IWO00S'95). 72-81.]]
    [40]
    PATTERSON, D., GIBSON, G., AND KATZ, R. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM SIGMOD Conference. ACM, New York, 109-116.]]
    [41]
    PETERSON, L., HUTCHINSON, N., O'MALLEY, S., AND RAO, H. 1990. The x-kernel: A platform for accessing internet resources. IEEE Comput. 23, 5 (May), 23-33.]]
    [42]
    PIERCE, P. 1989. A concurrent file system for a highly parallel mass storage system. In the 4th Conference on Hypercube Concurrent Computers and Applications. 155-160.]]
    [43]
    POOLE, J.T. 1994. Preliminary survey of I/O intensive applications. Tech. Rep. CCSF-38, Scalable I/O Initiative, Caltech Concurrent Supercomputing Facilities, Caltech.]]
    [44]
    RITCHIE, D. 1984. A stream input-output system. AT&T Bell Lab. Tech. J. 63, 8 (Oct.), 1897-1910.]]
    [45]
    ROSENBLUM, M. AND OUSTERHOUT, J. K. 1991. The design and implementation of a logstructured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (Pacific Grove, Calif., Oct.). ACM, New York, 1-15.]]
    [46]
    ROSENTHAL, D. S.H. 1990. Evolving the Vnode interface. In USENIX Conference Proceedings (Anaheim, Calif.). USENIX Assoc., Berkeley, Calif., 107-118.]]
    [47]
    SCOTT, D. S. 1993. Parallel I/O and solving out of core systems of linear equations. In Proceedings of the 1993 DAGS /PC Symposium (Hanover, N. Hamp., June). Dartmouth Inst. for Advanced Graduate Studies, Dartmouth, Hanover, N.H.]]
    [48]
    SEAMONS, K. E., CHEN, Y., JONES, P., JOZWIAK, J., AND WINSLETT, M. 1995. Server-directed collective I/O in Panda. In Proceedings of Supercomputing '95.]]
    [49]
    SELTZER, M., ENDO, Y., SMALL, C., AND SMITH, K. 1996. Dealing with disaster: Surviving misbehaved kernel extensions. In Proceedings of the 2nd Symposium on Operating Systems Design and Implementation (Seattle, Wash., Oct.). 213-228.]]
    [50]
    SWEENEY, A., DOUCHETTE, D., Hu, W., ANDERSON, C., NISHIMOTO, M., AND PECK, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Technical Conference (San Diego, Calif., Jan.). USENIX Assoc., Berkeley, Calif., 1-14.]]
    [51]
    THAKUR, R. C., BORDAWEKAR, R., CHOUDHARY, A., PONNUSAMY, R., AND SINGH, T. 1994. PASSION runtime library for parallel I/O. In Proceedings of the Scalable Parallel Libraries Conference (Oct.). 119-128.]]
    [52]
    UNRAU, R. C., KRIEGER, O., GAMSA, B., AND STUMM, M. 1994. Experiences with locking in a NUMA multiprocessor operating kernel. In Proceedings of the 1st USENIX Symposium on Operating System Design and Implementation (Monterey, Calif., Nov.). USENIX Assoc., Berkeley, Calif., 139-152.]]
    [53]
    UNRAU, R. C., KRIEGER, O., GAMSA, B., AND STUMM, M. 1995. Hierarchical clustering: A structure for scalable multiprocessor operating system design. J. Supercomput. 9, 1/2, 105-134.]]
    [54]
    VENGROFF, D. E. AND VITTER, J.S. 1995. I/O-efficient scientific computation using TPIE. In Proceedings of the 1995 IEEE Symposium on Parallel and Distributed Processing (San Antonio, Tex., Oct.). IEEE, New York, 74-77.]]
    [55]
    VRANESIC, Z. G., STUMM, M., WHITE, R., AND LEWIS, D. 1991. The Hector multiprocessor. IEEE Comput. 24, 1 (Jan.), 72-80.]]
    [56]
    ZAJCEW, R., RoY, P., BLACK, D., PEAK, C., GUEDES, P., KEMP, B., LOVERSO, J., LEIBENSPERGER, M., BARNETT, M., RABII, F., AND NETTERWALA, D. 1993. An OSF/1 UNIX for massively parallel multicomputers. In Proceedings of the USENIX Winter Technical Conference (San Diego, Calif., Jan.). USENIX Assoc., Berkeley, Calif., 449-468.]]

    Cited By

    View all

    Recommendations

    Reviews

    Gerald David Chandler

    The title of this paper is an especially accurate one-line summary of the contents, omitting only that it is intended primarily for shared-memory multiprocessor systems. Based on the first author's 1994 thesis with an almost identical title, and augmented by later work, this paper describes how small components, some as short as a few lines, with well-defined interfaces, can be combined in multiple ways to provide file-processing algorithms that vary from file to file and even over time for the same file. This “made to measure” quality allows performance to be optimized according to whether the file is most often read or written; access is contiguous, striped, or sparse; locking is important or unimportant; or any other conceivable set of conditions. This is an interesting example of the programming organization that results from the use of well-defined objects with a well-defined communication protocol between them and that is applied at all levels. The HFS building-block model is based on three layers. At the bottom, there is a physical layer, which directly accesses disks. HFS includes predefined blocks for orthogonal handling of shape (for example, striped or contiguous), properties (such as size unit, parity, and read-write ratio), and locations (distribution and replication). Above the physical layer is what the authors call the logical layer, which could better be called the performance or system layer. This provides common disk-independent services: naming (directories), authentication, and locking. The highest, or application, layer provides an interface that is a superset of the standard Unix file I/O interface. It also provides building blocks for latency hiding and for compression and decompression. Data are given that show that this flexible multilayer system (extending to 11 layers in one example) does not significantly delay file access. The apparent reasons for this are that, while there are indeed losses due to interlayer communication, they are small because the protocols are chosen for their low overhead and the underlying interprocess communication facility of the authors' research operating system is very fast. What losses there are, are nearly compensated for by the elimination of conditional and other code from the specific blocks that are used for a particular file system (this shortens both execution time and loading time) and by the reduction of cross-address space communication. Interestingly, a related paper reports that a specially tailored compiler was able to generate efficient compositions of predefined blocks. The paper is clear but overly long. Some of the space given to repeated expositions of the same points could have been used, for example, to specify the standard interface for the pre-implemented physical layer building blocks (as given in Krieger's thesis).

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 15, Issue 3
    Aug. 1997
    138 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/263326
    • Editor:
    • Ken Birman
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 August 1997
    Published in TOCS Volume 15, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. customization
    2. data partitioning
    3. data replication
    4. flexibility
    5. parallel computing
    6. parallel file system

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)HareProceedings of the Tenth European Conference on Computer Systems10.1145/2741948.2741959(1-16)Online publication date: 17-Apr-2015
    • (2006)K42ACM SIGOPS Operating Systems Review10.1145/1131322.113133340:2(34-42)Online publication date: 1-Apr-2006
    • (2006)Lightweight I/O for Scientific Applications2006 IEEE International Conference on Cluster Computing10.1109/CLUSTR.2006.311853(1-11)Online publication date: Sep-2006
    • (2005)LernaProceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium10.1109/HPDC.2005.1520955(176-187)Online publication date: 24-Jul-2005
    • (2004)The design and implementation of a modular and extensible Java Virtual MachineSoftware: Practice and Experience10.1002/spe.56534:3(287-313)Online publication date: 5-Jan-2004
    • (2003)LachesisProceedings of the 29th international conference on Very large data bases - Volume 2910.5555/1315451.1315512(706-717)Online publication date: 9-Sep-2003
    • (2003)Meta-data snapshottingProceedings of the international workshop on Storage network architecture and parallel I/Os10.1145/1162618.1162624(41-52)Online publication date: 28-Sep-2003
    • (2003)Scalable Storage for Digital LibrariesMultimedia Information Retrieval and Management10.1007/978-3-662-05300-3_12(265-288)Online publication date: 2003
    • (2002)ArmadaFuture Generation Computer Systems10.1016/S0167-739X(01)00076-018:4(501-523)Online publication date: 1-Mar-2002
    • (2001)LegionFSProceedings of the 2001 ACM/IEEE conference on Supercomputing10.1145/582034.582093(59-59)Online publication date: 10-Nov-2001
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media