research-article

Achieving Strong Consistency in a Distributed File System

Authors:

Peter Triantafillou,

Carl NeilsonAuthors Info & Claims

IEEE Transactions on Software Engineering, Volume 23, Issue 1

Pages 35 - 55

https://doi.org/10.1109/32.581328

Published: 01 January 1997 Publication History

Abstract

Distributed file systems nowadays need to provide for fault tolerance. This is typically achieved with the replication of files. Existing approaches to the construction of replicated file systems sacrifice strong semantics (i.e., the guarantees the systems make to running computations when failures occur and/or files are accessed concurrently). This is done mainly for efficiency reasons. This paper puts forward a replicated file system protocol that enforces strong consistency semantics. Enforcing strong semantics allows for distributed systems to behave more like their centralized counterparts an essential feature in order to provide the transparency that is so strived for in distributed computing systems. One fundamental characteristic of our protocol is its distributed nature. Because of it, the extra cost needed to ensure the stronger consistency is kept low since the bottleneck problem noticed in primary-copy systems is avoided, load balancing is facilitated, clients can choose physically close servers, and the work required during failure handling and recovery is reduced. Another characteristic is that instead of optimizing each operation type on its own, file system activity was viewed at the level of a file session and the costs of individual operations were able to be spread over the life of a file session. We have developed a prototype and compared the performance of the prototype to both NFS and a nonreplicated version of the prototype that also achieves strong consistency semantics. Through these comparisons the cost of replication and the cost of enforcing the strong consistency semantics are shown.

References

[1]

M. Baker, et al. "Measurements of a Distributed File System," Proc. 13th ACM Symp. Operating System Principles, pp. 198-212, Oct. 1991.

Digital Library

[2]

M. Baker and J. Ousterhout, "Availability in the Sprite Distributed File System," Operating Systems Review, pp. 198-212, Apr. 1991.

Digital Library

[3]

P. Bernstein V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Databases Systems. Addison-Wesley, 1987.

Digital Library

[4]

R. Floyd, "Short-Term File Reference Patterns in a UNIX Environment," Technical Report 177, Computer Science Dept., The Univ. of Rochester, New York, Mar. 1986.

[5]

D. Gifford, "Weighted Voting for Replicated Data," Proc. Seventh ACM SIGOPS Symp. Operating Systems Principles, Pacific Grove, Calif., pp. 150-162, Dec. 1979.

Digital Library

[6]

R. Guy, et al. "Implementation of the Ficus Replicated File System," Proc. USENIX Conf., Anaheim Calif., pp. 63-71, June 1990.

[7]

A. Hisgen, et al., "Availability and Consistency Tradeoffs in the Echo Distributed File System," Proc. Second Workshop Workstation Operating Systems, pp. 49-54. IEEE CS Press, Sept. 1989.

[8]

J. Kistler and M. Satyanarayanan, "Disconnected Operation in the Code File System," ACM 13th Symp. Operating Systems Principles, pp. 226-238, Oct. 1991.

Digital Library

[9]

E. Levy and A. Silberschatz, "Distributed File Systems: Concepts and Examples," ACM Computing Surveys, vol. 22, no. 4, pp. 321-374, Dec. 1990.

Digital Library

[10]

B. Liskov, et al. "Replication in the Harp File System," Proc. 13th ACM Symp. Operating System Principles, pp. 226-238, Oct. 1991.

Digital Library

[11]

T. Mann A. Hisgen and G. Swart, "An Algorithm for Data Replication," Report 46, DEC System Research Center, Palo Alto, Calif., 1989.

[12]

T. Mann, et al., "A Coherent Distributed File Cache with Directory Write-Behind," ACM Trans. Computer Systems, vol. 12, no. 2, pp. 123-164, May 194.

Digital Library

[13]

K. Marzullo and F. Schmuck, "Supplying High Availability with a Standard Network File System," Proc. Eighth Int'l Conf. Distributed Computing Systems, San Jose, Calif., pp. 447-453, 1988.

[14]

L. Mummert, "Efficient Long-Term File Reference Tracing," Carnegie Mellon Univ., 1993, manuscript in preparation.

[15]

J.-F. Paris, "Voting with Witnesses: A Consistency Scheme for Replicated Files," Proc. Sixth Int'l Conf. Distributed Computing Systems, pp. 606-612, May 1986.

[16]

M. Satyanarayanan, et al., "Coda: A Highly Available File System for a Distributed Workstation Environment," IEEE Trans. Computers, vol. 39, no. 4, pp. 447-459, Apr. 1990.

Digital Library

[17]

A. Siegel K. Birman and K. Marzullo, "Deceit: A Flexible Distributed File System," Technical Report No. 89-1042, Dept.of Computer Science, Cornell Univ., Nov. 1989 (also in USENIX Conf. Proc., Anaheim Calif., p. 5,161, June 1990).

Digital Library

[18]

C. Tait and D. Duchamp, "Service Interface and Replica Management Algorithm for Mobile File System Clients," First Int'l Conf. Parallel and Distributed Information Systems, Miami Beach Fla., pp. 190-197, Dec. 1991.

Digital Library

[19]

C. Tait and D. Duchamp, "An Efficient Variable-Consistency Replicated File Service," Proc. USENIX File Systems Workshop, Ann Arbor Mich., pp. 111-126, May 1992.

[20]

J. Thompson, "Efficient Analysis Of Caching Systems," Technical Report No. UCB/CSD 87/374, Computer Science Division, Univ. of California, Berkeley, Calif., Oct. 1987.

Digital Library

[21]

P. Triantafillou and D.J. Taylor, "Multi-Class Replicated Data Management: Exploiting Replication to Improve Efficiency," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 2, Feb. 1994, pp. 121-138.

Digital Library

[22]

P. Triantafillou and D.J. Taylor, "The Location-Based Paradigm for Replication: Achieving Efficiency and Availability in Distributed Systems," IEEE Trans. Software Eng., vol. 21, no. 1, pp. 1-8, Jan. 1995.

Digital Library

[23]

P. Triantafillou and D.J. Taylor, "VELOS: A New Approach for Efficiently Achieving High Availability in Partitioned Distributed Systems," IEEE Trans. Knowledge and Data Engineering, pp. 305-21, Apr. 1996.

Digital Library

[24]

P. Triantafillou, "Availability and Performance Limitations in Multidatabases," Information Systems: An International Journal, vol. 21, no. 7, pp. 577-93, 1996.

Digital Library

[25]

P. Triantafillou, "Independent Recovery in Large-Scale Distributed Systems," IEEE Trans. Software Eng., vol. 22, no. 11, Nov. 1996.

Digital Library

Cited By

Park JKanitkar VDelis A(2019)Logically Clustered Architectures for Networked DatabasesDistributed and Parallel Databases10.1023/A:101928442957810:2(161-198)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1023/A%3A1019284429578
Vardhan MKushwaha D(2018)TBFRInternational Journal of Information and Communication Technology10.1504/IJICT.2013.0531105:2(97-121)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1504/IJICT.2013.053110
Roussaki IStrimpakou MPils C(2018)Distributed Context Retrieval and Consistency Control in Pervasive ComputingJournal of Network and Systems Management10.1007/s10922-006-9053-615:1(57-74)Online publication date: 24-Dec-2018
https://dl.acm.org/doi/10.1007/s10922-006-9053-6
Show More Cited By

Index Terms

Recommendations

Achieving convergent causal consistency and high availability for cloud storage

The tradeoff between consistency and availability is inevitable when designing distributed data stores, and todays cloud services often choose high availability instead of strong consistency, leading to visible inconsistencies for clients. Convergent ...
Availability in the Flexible and Adaptable Distributed File System
ISPDC '15: Proceedings of the 2015 14th International Symposium on Parallel and Distributed Computing

The goals of a Distributed File Systems (DFS) may vary broadly. It is impossible to design a DFS attaining every desirable characteristic, such as, transparency, performance, privacy, reliability, and availability, for example. In this paper we describe ...
HasFS: optimizing file system consistency mechanism on NVM-based hybrid storage architecture
Abstract
In order to protect the data during system crash, traditional DRAM–DISK architecture file systems (e.g., EXT4) need to synchronize the dirty metadata and data from the memory to disk. At the same time, the disk synchronization may break the ...

Reviews

Reviewer: Jason Gait

The authors develop a protocol for replication in distributed filesystems. Each replica supports read and asynchronous write operations. Whole file caching is integrated with the protocol, and Unix semantics are supported. The cost of replication is confined to the open and close operations, update propagation occurs at close, and shared write access disables caching. Each file is served from one server, and servers exchange state information during open and change-of-server operations. The criteria for selecting a server are client proximity and availability. During the open, the selected server communicates with a majority of the other servers to exchange state. This makes open an expensive operation, especially since lookup is incorporated in open. The authors have confirmed this in their (incomplete) benchmarks, which group the servers within a LAN and benchmark in a flat filesystem. The authors believe that the purpose of replication is availability. In my opinion, however, the efficacy of replication in commercial replicated filesystems, such as AFS and DFS, is in load distribution and access locality in wide area environments. The authors' benchmarks indicate that the cost of replication (in a LAN and with a flat filesystem) is negligible. Their comparison to NFS-2 indicates a 20-to-1 penalty for their protocol. These results are even more unfavorable when we recall that NFS-2 does not support caching and requires synchronous writes.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering

IEEE Transactions on Software Engineering Volume 23, Issue 1

January 1997

61 pages

ISSN:0098-5589

Editor:
Richard A. Kemmerer
Univ. of California, Santa Barbara

Issue’s Table of Contents

Copyright © Copyright © 1997 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 January 1997

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Park JKanitkar VDelis A(2019)Logically Clustered Architectures for Networked DatabasesDistributed and Parallel Databases10.1023/A:101928442957810:2(161-198)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1023/A%3A1019284429578
Vardhan MKushwaha D(2018)TBFRInternational Journal of Information and Communication Technology10.1504/IJICT.2013.0531105:2(97-121)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1504/IJICT.2013.053110
Roussaki IStrimpakou MPils C(2018)Distributed Context Retrieval and Consistency Control in Pervasive ComputingJournal of Network and Systems Management10.1007/s10922-006-9053-615:1(57-74)Online publication date: 24-Dec-2018
https://dl.acm.org/doi/10.1007/s10922-006-9053-6
Labrinidis ALuo QXu JXue W(2010)Caching and Materialization for Web DatabasesFoundations and Trends in Databases10.1561/19000000052:3(169-266)Online publication date: 1-Mar-2010
https://dl.acm.org/doi/10.1561/1900000005
Strimpakou MRoussaki IPils CAngermann MRobertson PAnagnostou M(2005)Context modelling and management in ambient-aware pervasive environmentsProceedings of the First international conference on Location- and Context-Awareness10.1007/11426646_2(2-15)Online publication date: 12-May-2005
https://dl.acm.org/doi/10.1007/11426646_2

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents