research-article

A library for services transparent replication

Authors:

Paola Martins Pereira,

Fernando Luís Dotti,

Cristina Meinhardt,

Odorico Machado MendizabalAuthors Info & Claims

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Pages 268 - 275

https://doi.org/10.1145/3297280.3297308

Published: 08 April 2019 Publication History

Abstract

State Machine Replication is a well-known approach to develop fault-tolerant application. Although it seems conceptually simple, building replicated state machines is not a trivial task. The developer has to be acquainted with aspects of the inner working of the specific agreement protocol to correctly develop and deploy the replicated service (and auxiliary processes - e.g. Paxos roles), instead of focusing on the specific service. In this work we propose a replication library that facilitates the development and deployment of fault-tolerant services, and provides replication transparency to service builders. This library allows to deploy a base SMR on top of which new services can be registered at runtime. A service builder focuses on service implementation and registers the service with the base SMR to enjoy the benefits of replication. Besides separating the complexity of providing a replicated infrastructure from service implementation, multiple services share the same consensus and replication infrastructure, allowing cost amortization. According to our evaluation, this approach leads to higher overall throughput compared to the separate deployment of different SMRs over the same resources.

References

[1]

Deniz Altinbuken and Emin Gun Sirer. 2012. Commodifying replicated state machines with openreplica. Technical Report.

[2]

H. Attiya and J. Welch. 2004. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience.

Digital Library

[3]

Alysson Bessani, João Sousa, and Eduardo EP Alchieri. 2014. State machine replication for the masses with BFT-SMaRt. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on. IEEE, 355--362.

Digital Library

[4]

Martin Biely, Zarko Milosevic, Nuno Santos, and Andre Schiper. 2012. S-paxos: Offloading the leader for high throughput state machine replication. In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on. IEEE, 111--120.

Digital Library

[5]

Dhruba Borthakur, Jonathan Gray, and Joydeep Sen et al. Sarma. 2011. Apache Hadoop Goes Realtime at Facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). 1071--1080.

Digital Library

[6]

Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). 335--350.

Digital Library

[7]

Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. 2007. Paxos Made Live: An Engineering Perspective. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing (PODC '07). 398--407.

Digital Library

[8]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2 (2008), 1--26.

Digital Library

[9]

Heming Cui, Rui Gu, Cheng Liu, Tianyu Chen, and Junfeng Yang. 2015. P axos made transparent. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 105--120.

Digital Library

[10]

Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (2004), 372--421.

Digital Library

[11]

S. Ghemawat, H. Gobioff, and S.-T. Leung. 2003. The Google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles. 29--43.

Digital Library

[12]

Maurice P Herlihy and Jeannette M Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 3 (1990), 463--492.

Digital Library

[13]

Patrick Hunt, Mahadev Konar, Flavio P Junqueira, and Benjamin Reed. 2010. ZooKeeper: wait-free coordination for internet-scale systems. In ATC, Vol. 8.

Digital Library

[14]

Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on. IEEE, 245--256.

Digital Library

[15]

J. Kirsch and Y. Amir. 2008. Paxos for System Builders: An overview. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS). 1--6.

Digital Library

[16]

Jonathan Kirsch and Yair Amir. 2008. Paxos for System Builders: An Overview. In Proceedings of the 2Nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, New York, NY, USA, Article 3, 6 pages.

Digital Library

[17]

Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2(2010), 35--40.

Digital Library

[18]

L. Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems 16, 2 (May 1998), 133--169.

Digital Library

[19]

Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report MSR-TR-2005-33. Microsoft Research (MSR).

[20]

Parisa Jalili Marandi, Samuel Benz, Fernando Pedone, and Kenneth P. Birman. 2014. Practical Experience Report: The Performance of Paxos in the Cloud. CoRR abs/1404.6719 (2014).

[21]

Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-Ring Paxos. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on. IEEE, 1--12.

Digital Library

[22]

Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on. IEEE, 527--536.

[23]

Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). 358--372.

Digital Library

[24]

Khanh Nguyen, Lu Fang, Christian Navasca, Guoqing Xu, Brian Demsky, and Shan Lu. 2018. SKYWAY: Connecting managed heaps in distributed big data systems. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 56--69.

Digital Library

[25]

Diego Ongaro and John K Ousterhout. 2014. In search of an understandable consensus algorithm. In USENIX Annual Technical Conference. 305--319.

Digital Library

[26]

F. B. Schneider. 1990. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Comput. Surveys 22, 4 (1990), 299--319.

Digital Library

[27]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. Ieee, 1--10.

Digital Library

[28]

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 18.

Digital Library

[29]

Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. 2002. An integrated experimental environment for distributed systems and networks. ACM SIGOPS Operating Systems Review 36, SI (2002), 255--270.

Digital Library

[30]

Wenbing Zhao, PM Melliar-Smith, and Louise E Moser. 2010. Fault tolerance middleware for cloud computing. In Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on. IEEE, 67--74.

Digital Library

Cited By

Alves CIdalino TMendizabal O(2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3697090.3697106

Index Terms

A library for services transparent replication
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance
      2. Software usability

Recommendations

Transparent replication
AUTOMATIC REPLICATION FOR HIGHLY AVAILABLE SERVICES
Separating agreement from execution for byzantine fault tolerant services
SOSP '03

We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This separation yields two fundamental and practically significant advantages ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

April 2019

2682 pages

ISBN:9781450359337

DOI:10.1145/3297280

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University, Marietta, Georgia
,
George A. Papadopoulos
University of Cyprus, Nicosia, Cyprus

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES)
Fundacao de Amparo a Pesquisa do Estado do RS (FAPERGS)

Conference

SAC '19

Sponsor:

SIGAPP

SAC '19: The 34th ACM/SIGAPP Symposium on Applied Computing

April 8 - 12, 2019

Limassol, Cyprus

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alves CIdalino TMendizabal O(2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3697090.3697106

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents