Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Seamless Paxos coordinators

Published: 01 June 2014 Publication History

Abstract

The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator's state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.

References

[1]
Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '06) (2006)
[2]
Buzato, L.E., Vieira, G.M.D., Zwaenepoel, W.: Dynamic content web applications: crash, failover, and recovery analysis. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '09), pp. 229---238. IEEE Press, New York (2009).
[3]
Camargos, L.J., Schmidt, R.M., Pedone, F.: Multicoordinated agreement protocols for higher availability. In: Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications (NCA '08), pp. 76---84. IEEE Comp. Soc., Washington (2008).
[4]
Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing (PODC '07), pp. 398---407. ACM Press, New York (2007).
[5]
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685---722 (1996).
[6]
Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51(5), 561---580 (2002).
[7]
Jain, R.: The Art of Computer Systems Performance Analysis. Wiley, New York (1991)
[8]
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558---565 (1978).
[9]
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133---169 (1998).
[10]
Lamport, L.: Fast Paxos. Distrib. Comput. 19(2), 79---103 (2006).
[11]
Lampson, B.W., Sturgis, H.E.: Atomic transactions. In: Lampson, B.W., Paul, M., Siegert, H.J. (eds.) Distributed Systems: Architecture and Implementation, vol. 105, pp. 246---265 (1981)
[12]
MacCormick, J., Murphy, N., Najork, M., Thekkath, C.A., Zhou, L.: Boxwood: abstractions as the foundation for storage infrastructure. In: Proc. of 6th USENIX Symp. on Operating Systems Design and Implementation (2004)
[13]
Malkhi, D., Oprea, F., Zhou, L.: Ω meets Paxos: leader election and stability without eventual timely links. In: Proceedings of the 19th International Conference on Distributed Computing (DISC '05). Lecture Notes in Computer Science, vol. 3724, pp. 199---213. Springer, New York (2005).
[14]
Mao, Y., Junqueira, F.P., Marzullo, K.: Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI '08) (2008)
[15]
Marandi, P.J., Primi, M., Schiper, N., Pedone, F.: Ring Paxos: a high-throughput atomic broadcast protocol. In: 40th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), Chicago, USA, pp. 527---536 (2010).
[16]
Prisco, R.D., Lampson, B., Lynch, N.: Revisiting the Paxos algorithm. Theor. Comput. Sci. 243(1---2), 35---91 (2000).
[17]
Rao, J., Shekita, E.J., Tata, S.: Using Paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 4, 243---254 (2011)
[18]
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299---319 (1990).
[19]
Vieira, G.M.D., Buzato, L.E.: Treplica: ubiquitous replication. In: Proc. of the 26th Brazilian Symposium on Computer Networks and Distributed Systems (SBRC '08), Rio de Janeiro, Brazil (2008)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Cluster Computing
Cluster Computing  Volume 17, Issue 2
June 2014
432 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2014

Author Tags

  1. Consensus
  2. Failure detector
  3. Fault tolerance
  4. Paxos
  5. Replication

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media