Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Independent Recovery in Large-Scale Distributed Systems

Published: 01 November 1996 Publication History

Abstract

In large systems, replication can become important means to improve data access times and availability. Existing recovery protocols, on the other hand, were proposed for small-scale distributed systems. Such protocols typically update stale, newly-recovered sites with replicated data and resolve the commit uncertainty of recovering sites. Thus, given that in large systems failures are more frequent and that data access times are costlier, such protocols can potentially introduce large overheads in large systems and must be avoided, if possible. We call these protocols dependent recovery protocols since they require a recovering site to consult with other sites. Independent recovery has been studied in the context of one-copy systems and has been proven unattainable. This paper offers independent recovery protocols for large-scale systems with replicated data. It shows how the protocols can be incorporated into several well-known replication protocols and proves that these protocols continue to ensure data consistency. The paper then addresses the issue of nonblocking atomic commitment. It presents mechanisms which can reduce the overhead of termination protocols and the probability of blocking. Finally, the performance impact of the proposed recovery protocols is studied through the use of simulation and analytical studies. The results of these studies show that the significant benefits of independent recovery can be enjoyed with a very small loss in data availability and a very small increase in the number of transaction abortions.

References

[1]
D. Agrawal and A. El Abbadi, "The Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data," Proc. Int'l Conf. Very Large Data Bases, Aug. 1990.
[2]
P. Bernstein and N. Goodman, "An Algorithm for Concurrency Control and Recovery in Replicated Distributed Databases," ACM Trans. Database Systems, vol. 9, no. 4, pp. 596-615, Dec. 1984.
[3]
P. Bernstein and N. Goodman, "The Failure and Recovery Problem for Replicated Databases," Proc. Second ACM Symp. Principles of Distributed Computing, pp. 114-122, Aug. 1983.
[4]
P. Bernstein V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.
[5]
K. Birman and T.A. Joseph, "Reliable Communication in the Presence of Failures," ACM Trans. Computer Systems, vol. 5, no. 1, pp. 47-76, Feb. 1987.
[6]
K. Brahmadathan and K.V.S. Ramarao, "Read-Only Transactions in Partitioned Replicated Databases," Proc. Fifth Int'l Conf. Data Eng., IEEE, pp. 522-529, Feb. 1989.
[7]
D. Eager and K. Sevcik, "Achieving Robustness in Distributed Database Systems," ACM Trans. Database Systems, vol. 8, no. 3, pp. 354-381, Sept. 1983.
[8]
A. El Abbadi and S. Toueg, "Availability in Partitioned Replicated Databases," ACM Trans. Database Systems, vol. 14, no. 2, pp. 264-290, June 1989.
[9]
D. Gifford, "Weighted Voting for Replicated Data," Proc. Seventh ACM Symp. Operating Systems Principles, pp. 150-162, Dec. 1979.
[10]
J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.
[11]
M. Hammer and D.W. Shipman, "Reliability Mechanisms in SDD-1: A System for Distributed Databases," ACM Trans. Database Systems, vol. 5, no. 4, pp. 431-466, Dec. 1980.
[12]
M. Herlihy, "A Quorum Consensus Method for Abstract Data Types," ACM Trans. Computer Systems, vol. 4, no. 1, pp. 32-53, Feb. 1986.
[13]
B. Lampson, "Atomic Transactions," Lecture Notes in Computer Science, vol. 105, Distributed Systems: Architecture and Implementation, pp. 246-265. Springer-Verlag, 1981.
[14]
D.D.E. Long, "The Management of Replication in a Distributed System," PhD thesis, Dept. of Computer Science, Univ. of California, San Diego, (available as Technical Report UCSC-CRL-88-07 from the Univ. of California, Santa Cruz), 1988.
[15]
D.D.E. Long J.L. Carroll and C.J. Park, "A Study of the Reliability of Internet Sites," Technical Report UCSC-CRL-90-46, Dept. of Computer Science, Univ. of California, Santa, Cruz, 1990.
[16]
B.M. Oki and B. Liskov, "Viewstamped Replication: A New Primary Copy Method to Support Highly Available Distributed Systems," Proc. Seventh ACM Symp. Principles Distributed Computing, pp. 8-17, Aug. 1988.
[17]
P. Triantafillou, "Employing Replication to Achieve Efficiency and High Availability in Distributed Systems," PhD thesis, Univ. of Waterloo, Canada (available as Research Report CS-91-28), July 1991.
[18]
P. Triantafillou and D.J. Taylor, "Efficiently Maintaining Availability in the Presence of Partitionings in Distributed Systems," Proc. Seventh Int'l Conf. Data Eng., IEEE, pp. 34-41, Apr. 8-12, 1991.
[19]
P. Triantafillou and D.J. Taylor, "Multi-Class Replicated Data Management: Exploiting Replication to Improve Efficiency," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 2, pp. 121-139, Feb. 1994.
[20]
P. Triantafillou and D.J. Taylor, "The Location-Based Paradigm for Replication: Achieving Efficiency and Availability in Distributed Systems," IEEE Trans. Software Eng., vol. 21, no. 1, pp. 1-18, Jan. 1985.
[21]
P. Triantafillou and D.J. Taylor, "VELOS: A New Approach for Efficiently Achieving High Availability in Partitioned Distributed Systems," IEEE Trans. Knowledge and Data Eng., pp. 305-321, Apr. 1996.

Cited By

View all
  • (2000)Performance Modeling of Distributed and Replicated DatabasesIEEE Transactions on Knowledge and Data Engineering10.1109/69.86891212:4(645-672)Online publication date: 1-Jul-2000
  • (1997)Achieving Strong Consistency in a Distributed File SystemIEEE Transactions on Software Engineering10.1109/32.58132823:1(35-55)Online publication date: 1-Jan-1997

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 22, Issue 11
November 1996
62 pages
ISSN:0098-5589
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 November 1996

Author Tags

  1. Availability
  2. blocking atomic commitment
  3. concurrency control
  4. crash recovery
  5. distributed computing
  6. independent recovery
  7. replication
  8. transactions.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2000)Performance Modeling of Distributed and Replicated DatabasesIEEE Transactions on Knowledge and Data Engineering10.1109/69.86891212:4(645-672)Online publication date: 1-Jul-2000
  • (1997)Achieving Strong Consistency in a Distributed File SystemIEEE Transactions on Software Engineering10.1109/32.58132823:1(35-55)Online publication date: 1-Jan-1997

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media