research-article

Strengthening Atomic Multicast for Partitioned State Machine Replication

Authors:

Leandro Pacheco,

Fernando Dotti,

Fernando PedoneAuthors Info & Claims

LADC '22: Proceedings of the 11th Latin-American Symposium on Dependable Computing

Pages 51 - 60

https://doi.org/10.1145/3569902.3569909

Published: 17 January 2023 Publication History

Abstract

Partitioned state machine replication is a technique that extends classical state machine replication with state partitioning (or sharding) to provide both fault tolerance and performance scalability. The crux of the technique is ordering client requests within a partition, among the replicas that implement the partition, and across partitions, involving all the replicas accessed by the request. To cope with the complexity of ordering requests, partitioned state machine replication can use atomic multicast, a communication abstraction. Atomic multicast provides the means for requests to be propagated reliably and consistently to one or more sets of groups of replicas, where each replica group implements one partition. The paper revisits atomic multicast from the perspective of partitioned state machine replication and makes the following contributions: First, we show that if one implements partitioned state machine replication using an atomic multicast with global total order, a strong order property, then replicas would need to further coordinate as part of the execution of requests to ensure correctness. Second, we introduce a stronger version of atomic multicast that accounts for real-time dependencies between requests. Our proposed atomic multicast can be used to order requests within and across partitions so that replicas do not need to further coordinate to ensure linearizability. Third, we extend a well-known implementation of atomic multicast to ensure the stronger order property.

References

[1]

Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J. Marathe, and Igor Zablotchi. 2019. The Impact of RDMA on Agreement. Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (2019).

Digital Library

[2]

Hagit Attiya and Jennifer Welch. 2004. Distributed computing: fundamentals, simulations, and advanced topics. Vol. 19. John Wiley & Sons.

[3]

Carlos Eduardo Bezerra, Daniel Cason, and Fernando Pedone. 2015. Ridge: high-throughput, low-latency atomic multicast. In 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS). IEEE, 256–265.

Digital Library

[4]

Carlos Eduardo Bezerra, Fernando Pedone, and Robbert Van Renesse. 2014. Scalable State-Machine Replication. In DSN. 331–342.

[5]

Kenneth P. Birman and Thomas A. Joseph. 1987. Reliable Communication in the Presence of Failures. ACM Trans. Comput. Syst. 5, 1 (1987), 47–76.

Digital Library

[6]

Tushar Deepak Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. Journal of the ACM (JACM) 43, 2 (1996), 225–267.

Digital Library

[7]

Paulo Coelho, Tarcisio Ceolin Junior, Alysson Bessani, Fernando Dotti, and Fernando Pedone. 2018. Byzantine fault-tolerant atomic multicast. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 39–50.

[8]

Paulo Coelho, Nicolas Schiper, and Fernando Pedone. 2017. Fast Atomic Multicast. In DSN.

[9]

James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, 2013. Spanner: Google’s globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1–22.

Digital Library

[10]

James Cowling and Barbara Liskov. 2012. Granola: Low-Overhead Distributed Transaction Coordination. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX Association, Boston, MA, 223–235. https://www.usenix.org/conference/atc12/technical-sessions/presentation/cowling

[11]

Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (2004).

Digital Library

[12]

Carole Delporte-Gallet and Hugues Fauconnier. 2000. Fault-Tolerant Genuine Atomic Multicast to Multiple Groups. In OPODIS. Citeseer, 107–122.

[13]

Cynthia Dwork, Nancy A. Lynch, and Larry J. Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (1988), 288–323.

Digital Library

[14]

Vitor Enes, Carlos Baquero, Alexey Gotsman, and Pierre Sutra. 2021. Efficient replication via timestamp stability. In Proceedings of the Sixteenth European Conference on Computer Systems. 178–193.

Digital Library

[15]

Udo Fritzke, Philippe Ingels, Achour Mostéfaoui, and Michel Raynal. 1998. Fault-tolerant total order multicast to asynchronous groups. In Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (SRDS). IEEE, 228–234.

[16]

Alexey Gotsman, Anatole Lefort, and Gregory Chockler. 2019. White-box atomic multicast. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 176–187.

[17]

Rachid Guerraoui, Jad Hamza, Dragos-Adrian Seredinschi, and Marko Vukolic. 2019. Can 100 Machines Agree?CoRR abs/1911.07966(2019). arXiv:1911.07966http://arxiv.org/abs/1911.07966

[18]

Rachid Guerraoui and André Schiper. 1997. Software-Based Replication for Fault Tolerance. Computer 30, 4 (1997), 68–74.

Digital Library

[19]

Rachid Guerraoui and André Schiper. 2001. Genuine atomic multicast in asynchronous distributed systems. Theor. Comput. Sci. 254, 1-2 (2001), 297–316.

Digital Library

[20]

Vassos Hadzilacos and Sam Toueg. 1994. A Modular Approach to Fault-Tolerant Broadcasts and Related Problems. Technical Report. Ithaca, NY, USA.

[21]

Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492.

Digital Library

[22]

Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21, 7 (1978), 558–565.

Digital Library

[23]

Long Hoang Le, Mojtaba Eslahi-Kelorazi, Paulo R. Coelho, and Fernando Pedone. 2021. RamCast: RDMA-based atomic multicast. Proceedings of the 22nd International Middleware Conference (2021).

Digital Library

[24]

Long Hoang Le, Enrique Fynn, Mojtaba Eslahi-Kelorazi, Robert Soulé, and Fernando Pedone. 2019. DynaStar: Optimized Dynamic Partitioning for Scalable State Machine Replication. In ICDCS.

[25]

Zhongmiao Li, Peter Van Roy, and Paolo Romano. 2017. Enhancing throughput of partially replicated state machines via multi-partition operation scheduling. In 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA). 1–10. https://doi.org/10.1109/NCA.2017.8171364

[26]

Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-Ring Paxos. In International Conference on Dependable Systems and Networks (DSN 2012). IEEE, 1–12.

[27]

Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In Dependable Systems and Networks (DSN). IEEE, 527–536.

[28]

Marta Patiño Martinez, Ricardo Jiménez-Peris, Bettina Kemme, and Gustavo Alonso. 2005. MIDDLE-R: Consistent Database Replication at the Middleware Level. ACM Trans. Comput. Syst. 23, 4 (nov 2005), 375–423.

Digital Library

[29]

Fernando Pedone, Rachid Guerraoui, and André Schiper. 2003. The Database State Machine Approach. Distributed Parallel Databases 14, 1 (2003), 71–98.

Digital Library

[30]

Luis Rodrigues, Rachid Guerraoui, and André Schiper. 1998. Scalable atomic multicast. In International Conference on Computer Communications and Networks. 840–847.

[31]

Nicolas Schiper. 2009. On Multicast Primitives in Large Networks and Partial Replication Protocols. Ph. D. Dissertation. Università della Svizzera italiana.

[32]

Nicolas Schiper and Fernando Pedone. 2008. On the inherent cost of atomic broadcast and multicast in wide area networks. In International Conference on Distributed Computing and Networking (ICDCN). Springer, 147–157.

[33]

Nicolas Schiper and Fernando Pedone. 2008. Solving Atomic Multicast When Groups Crash. In OPODIS, Theodore P. Baker, Alain Bui, and Sébastien Tixeuil (Eds.).

[34]

Nicolas Schiper, Pierre Sutra, and Fernando Pedone. 2010. P-Store: Genuine Partial Replication in Wide Area Networks. In Symposium on Reliable Distributed Systems (SRDS).

[35]

Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In SIGMOD.

Cited By

Alves CIdalino TMendizabal O(2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3697090.3697106

Index Terms

Strengthening Atomic Multicast for Partitioned State Machine Replication
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Theory of computation
  1. Design and analysis of algorithms
    1. Distributed algorithms

Recommendations

High-throughput state-machine replication using software transactional memory

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could ...
Scalable State-Machine Replication
DSN '14: Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

State machine replication (SMR) is a well-known technique able to provide fault-tolerance. SMR consists of sequencing client requests and executing them against replicas in the same order, thanks to deterministic execution, every replica will reach the ...
High performance state-machine replication
DSN '11: Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks

State-machine replication is a well-established approach to fault tolerance. The idea is to replicate a service on multiple servers so that it remains available despite the failure of one or more servers. From a performance perspective, state-machine ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

LADC '22: Proceedings of the 11th Latin-American Symposium on Dependable Computing

November 2022

167 pages

ISBN:9781450397377

DOI:10.1145/3569902

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

CNPq - Brazil
CAPES - Brazil - PUCRS PrInt
FAPERGS - RS - Brazil

Conference

LADC 2022

LADC 2022: Latin-American Symposium on Dependable Computing

November 21 - 24, 2022

Fortaleza/CE, Brazil

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
36
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alves CIdalino TMendizabal O(2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3697090.3697106

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten