article

Seamless Paxos coordinators

Authors:

Gustavo M. Vieira,

Islene C. Garcia,

Luiz E. BuzatoAuthors Info & Claims

Cluster Computing, Volume 17, Issue 2

Pages 463 - 473

https://doi.org/10.1007/s10586-013-0264-9

Published: 01 June 2014 Publication History

Abstract

The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator's state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.

References

[1]

Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '06) (2006)

Digital Library

[2]

Buzato, L.E., Vieira, G.M.D., Zwaenepoel, W.: Dynamic content web applications: crash, failover, and recovery analysis. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '09), pp. 229---238. IEEE Press, New York (2009).

[3]

Camargos, L.J., Schmidt, R.M., Pedone, F.: Multicoordinated agreement protocols for higher availability. In: Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications (NCA '08), pp. 76---84. IEEE Comp. Soc., Washington (2008).

Digital Library

[4]

Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing (PODC '07), pp. 398---407. ACM Press, New York (2007).

Digital Library

[5]

Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685---722 (1996).

Digital Library

[6]

Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51(5), 561---580 (2002).

Digital Library

[7]

Jain, R.: The Art of Computer Systems Performance Analysis. Wiley, New York (1991)

[8]

Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558---565 (1978).

Digital Library

[9]

Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133---169 (1998).

Digital Library

[10]

Lamport, L.: Fast Paxos. Distrib. Comput. 19(2), 79---103 (2006).

Digital Library

[11]

Lampson, B.W., Sturgis, H.E.: Atomic transactions. In: Lampson, B.W., Paul, M., Siegert, H.J. (eds.) Distributed Systems: Architecture and Implementation, vol. 105, pp. 246---265 (1981)

[12]

MacCormick, J., Murphy, N., Najork, M., Thekkath, C.A., Zhou, L.: Boxwood: abstractions as the foundation for storage infrastructure. In: Proc. of 6th USENIX Symp. on Operating Systems Design and Implementation (2004)

[13]

Malkhi, D., Oprea, F., Zhou, L.: Ω meets Paxos: leader election and stability without eventual timely links. In: Proceedings of the 19th International Conference on Distributed Computing (DISC '05). Lecture Notes in Computer Science, vol. 3724, pp. 199---213. Springer, New York (2005).

[14]

Mao, Y., Junqueira, F.P., Marzullo, K.: Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI '08) (2008)

[15]

Marandi, P.J., Primi, M., Schiper, N., Pedone, F.: Ring Paxos: a high-throughput atomic broadcast protocol. In: 40th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), Chicago, USA, pp. 527---536 (2010).

[16]

Prisco, R.D., Lampson, B., Lynch, N.: Revisiting the Paxos algorithm. Theor. Comput. Sci. 243(1---2), 35---91 (2000).

[17]

Rao, J., Shekita, E.J., Tata, S.: Using Paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 4, 243---254 (2011)

Digital Library

[18]

Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299---319 (1990).

Digital Library

[19]

Vieira, G.M.D., Buzato, L.E.: Treplica: ubiquitous replication. In: Proc. of the 26th Brazilian Symposium on Computer Networks and Distributed Systems (SBRC '08), Rio de Janeiro, Brazil (2008)

Index Terms

Seamless Paxos coordinators

Index terms have been assigned to the content through auto-classification.

Recommendations

Paxos Made Wireless: Consensus in the Air
EWSN '19: Proceedings of the 2019 International Conference on Embedded Wireless Systems and Networks

Many applications in low-power wireless networks require complex coordination between their members. Swarms of robots or sensors and actuators in industrial closed-loop control need to coordinate within short periods of time to execute tasks. Failing to ...
Bringing Paxos Consensus in Multi-agent Systems
WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)

Reaching consensus has long been regarded as one of the most important problems in distributed systems. Being able to do so under failures is addressed by the Paxos family of algorithms which is able to guarantee safety, while probabilistically ...
Relaxed Paxos: quorum intersection revisited (again)
PaPoC '22: Proceedings of the 9th Workshop on Principles and Practice of Consistency for Distributed Data

Distributed consensus, the ability to reach agreement in the face of failures, is a fundamental primitive for constructing reliable distributed systems. The Paxos algorithm is synonymous with consensus and widely utilized in production. Paxos uses two ...

Comments

Information & Contributors

Information

Published In

cover image Cluster Computing

Cluster Computing Volume 17, Issue 2

June 2014

432 pages

ISSN:1386-7857

Issue’s Table of Contents

Copyright © Copyright © 2014 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents