Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Achieving Strong Consistency in a Distributed File System

Published: 01 January 1997 Publication History

Abstract

Distributed file systems nowadays need to provide for fault tolerance. This is typically achieved with the replication of files. Existing approaches to the construction of replicated file systems sacrifice strong semantics (i.e., the guarantees the systems make to running computations when failures occur and/or files are accessed concurrently). This is done mainly for efficiency reasons. This paper puts forward a replicated file system protocol that enforces strong consistency semantics. Enforcing strong semantics allows for distributed systems to behave more like their centralized counterparts an essential feature in order to provide the transparency that is so strived for in distributed computing systems. One fundamental characteristic of our protocol is its distributed nature. Because of it, the extra cost needed to ensure the stronger consistency is kept low since the bottleneck problem noticed in primary-copy systems is avoided, load balancing is facilitated, clients can choose physically close servers, and the work required during failure handling and recovery is reduced. Another characteristic is that instead of optimizing each operation type on its own, file system activity was viewed at the level of a file session and the costs of individual operations were able to be spread over the life of a file session. We have developed a prototype and compared the performance of the prototype to both NFS and a nonreplicated version of the prototype that also achieves strong consistency semantics. Through these comparisons the cost of replication and the cost of enforcing the strong consistency semantics are shown.

References

[1]
M. Baker, et al. "Measurements of a Distributed File System," Proc. 13th ACM Symp. Operating System Principles, pp. 198-212, Oct. 1991.
[2]
M. Baker and J. Ousterhout, "Availability in the Sprite Distributed File System," Operating Systems Review, pp. 198-212, Apr. 1991.
[3]
P. Bernstein V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Databases Systems. Addison-Wesley, 1987.
[4]
R. Floyd, "Short-Term File Reference Patterns in a UNIX Environment," Technical Report 177, Computer Science Dept., The Univ. of Rochester, New York, Mar. 1986.
[5]
D. Gifford, "Weighted Voting for Replicated Data," Proc. Seventh ACM SIGOPS Symp. Operating Systems Principles, Pacific Grove, Calif., pp. 150-162, Dec. 1979.
[6]
R. Guy, et al. "Implementation of the Ficus Replicated File System," Proc. USENIX Conf., Anaheim Calif., pp. 63-71, June 1990.
[7]
A. Hisgen, et al., "Availability and Consistency Tradeoffs in the Echo Distributed File System," Proc. Second Workshop Workstation Operating Systems, pp. 49-54. IEEE CS Press, Sept. 1989.
[8]
J. Kistler and M. Satyanarayanan, "Disconnected Operation in the Code File System," ACM 13th Symp. Operating Systems Principles, pp. 226-238, Oct. 1991.
[9]
E. Levy and A. Silberschatz, "Distributed File Systems: Concepts and Examples," ACM Computing Surveys, vol. 22, no. 4, pp. 321-374, Dec. 1990.
[10]
B. Liskov, et al. "Replication in the Harp File System," Proc. 13th ACM Symp. Operating System Principles, pp. 226-238, Oct. 1991.
[11]
T. Mann A. Hisgen and G. Swart, "An Algorithm for Data Replication," Report 46, DEC System Research Center, Palo Alto, Calif., 1989.
[12]
T. Mann, et al., "A Coherent Distributed File Cache with Directory Write-Behind," ACM Trans. Computer Systems, vol. 12, no. 2, pp. 123-164, May 194.
[13]
K. Marzullo and F. Schmuck, "Supplying High Availability with a Standard Network File System," Proc. Eighth Int'l Conf. Distributed Computing Systems, San Jose, Calif., pp. 447-453, 1988.
[14]
L. Mummert, "Efficient Long-Term File Reference Tracing," Carnegie Mellon Univ., 1993, manuscript in preparation.
[15]
J.-F. Paris, "Voting with Witnesses: A Consistency Scheme for Replicated Files," Proc. Sixth Int'l Conf. Distributed Computing Systems, pp. 606-612, May 1986.
[16]
M. Satyanarayanan, et al., "Coda: A Highly Available File System for a Distributed Workstation Environment," IEEE Trans. Computers, vol. 39, no. 4, pp. 447-459, Apr. 1990.
[17]
A. Siegel K. Birman and K. Marzullo, "Deceit: A Flexible Distributed File System," Technical Report No. 89-1042, Dept.of Computer Science, Cornell Univ., Nov. 1989 (also in USENIX Conf. Proc., Anaheim Calif., p. 5,161, June 1990).
[18]
C. Tait and D. Duchamp, "Service Interface and Replica Management Algorithm for Mobile File System Clients," First Int'l Conf. Parallel and Distributed Information Systems, Miami Beach Fla., pp. 190-197, Dec. 1991.
[19]
C. Tait and D. Duchamp, "An Efficient Variable-Consistency Replicated File Service," Proc. USENIX File Systems Workshop, Ann Arbor Mich., pp. 111-126, May 1992.
[20]
J. Thompson, "Efficient Analysis Of Caching Systems," Technical Report No. UCB/CSD 87/374, Computer Science Division, Univ. of California, Berkeley, Calif., Oct. 1987.
[21]
P. Triantafillou and D.J. Taylor, "Multi-Class Replicated Data Management: Exploiting Replication to Improve Efficiency," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 2, Feb. 1994, pp. 121-138.
[22]
P. Triantafillou and D.J. Taylor, "The Location-Based Paradigm for Replication: Achieving Efficiency and Availability in Distributed Systems," IEEE Trans. Software Eng., vol. 21, no. 1, pp. 1-8, Jan. 1995.
[23]
P. Triantafillou and D.J. Taylor, "VELOS: A New Approach for Efficiently Achieving High Availability in Partitioned Distributed Systems," IEEE Trans. Knowledge and Data Engineering, pp. 305-21, Apr. 1996.
[24]
P. Triantafillou, "Availability and Performance Limitations in Multidatabases," Information Systems: An International Journal, vol. 21, no. 7, pp. 577-93, 1996.
[25]
P. Triantafillou, "Independent Recovery in Large-Scale Distributed Systems," IEEE Trans. Software Eng., vol. 22, no. 11, Nov. 1996.

Cited By

View all

Recommendations

Reviews

Jason Gait

The authors develop a protocol for replication in distributed filesystems. Each replica supports read and asynchronous write operations. Whole file caching is integrated with the protocol, and Unix semantics are supported. The cost of replication is confined to the open and close operations, update propagation occurs at close, and shared write access disables caching. Each file is served from one server, and servers exchange state information during open and change-of-server operations. The criteria for selecting a server are client proximity and availability. During the open, the selected server communicates with a majority of the other servers to exchange state. This makes open an expensive operation, especially since lookup is incorporated in open. The authors have confirmed this in their (incomplete) benchmarks, which group the servers within a LAN and benchmark in a flat filesystem. The authors believe that the purpose of replication is availability. In my opinion, however, the efficacy of replication in commercial replicated filesystems, such as AFS and DFS, is in load distribution and access locality in wide area environments. The authors' benchmarks indicate that the cost of replication (in a LAN and with a flat filesystem) is negligible. Their comparison to NFS-2 indicates a 20-to-1 penalty for their protocol. These results are even more unfavorable when we recall that NFS-2 does not support caching and requires synchronous writes.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 23, Issue 1
January 1997
61 pages
ISSN:0098-5589
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 January 1997

Author Tags

  1. Availability
  2. caching
  3. concurrency
  4. consistency semantics
  5. distributed file systems
  6. recovery
  7. replication.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Logically Clustered Architectures for Networked DatabasesDistributed and Parallel Databases10.1023/A:101928442957810:2(161-198)Online publication date: 1-Jun-2019
  • (2018)TBFRInternational Journal of Information and Communication Technology10.1504/IJICT.2013.0531105:2(97-121)Online publication date: 19-Dec-2018
  • (2018)Distributed Context Retrieval and Consistency Control in Pervasive ComputingJournal of Network and Systems Management10.1007/s10922-006-9053-615:1(57-74)Online publication date: 24-Dec-2018
  • (2010)Caching and Materialization for Web DatabasesFoundations and Trends in Databases10.1561/19000000052:3(169-266)Online publication date: 1-Mar-2010
  • (2005)Context modelling and management in ambient-aware pervasive environmentsProceedings of the First international conference on Location- and Context-Awareness10.1007/11426646_2(2-15)Online publication date: 12-May-2005

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media