Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3297280.3297308acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A library for services transparent replication

Published: 08 April 2019 Publication History

Abstract

State Machine Replication is a well-known approach to develop fault-tolerant application. Although it seems conceptually simple, building replicated state machines is not a trivial task. The developer has to be acquainted with aspects of the inner working of the specific agreement protocol to correctly develop and deploy the replicated service (and auxiliary processes - e.g. Paxos roles), instead of focusing on the specific service. In this work we propose a replication library that facilitates the development and deployment of fault-tolerant services, and provides replication transparency to service builders. This library allows to deploy a base SMR on top of which new services can be registered at runtime. A service builder focuses on service implementation and registers the service with the base SMR to enjoy the benefits of replication. Besides separating the complexity of providing a replicated infrastructure from service implementation, multiple services share the same consensus and replication infrastructure, allowing cost amortization. According to our evaluation, this approach leads to higher overall throughput compared to the separate deployment of different SMRs over the same resources.

References

[1]
Deniz Altinbuken and Emin Gun Sirer. 2012. Commodifying replicated state machines with openreplica. Technical Report.
[2]
H. Attiya and J. Welch. 2004. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience.
[3]
Alysson Bessani, João Sousa, and Eduardo EP Alchieri. 2014. State machine replication for the masses with BFT-SMaRt. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on. IEEE, 355--362.
[4]
Martin Biely, Zarko Milosevic, Nuno Santos, and Andre Schiper. 2012. S-paxos: Offloading the leader for high throughput state machine replication. In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on. IEEE, 111--120.
[5]
Dhruba Borthakur, Jonathan Gray, and Joydeep Sen et al. Sarma. 2011. Apache Hadoop Goes Realtime at Facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). 1071--1080.
[6]
Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). 335--350.
[7]
Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. 2007. Paxos Made Live: An Engineering Perspective. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing (PODC '07). 398--407.
[8]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2 (2008), 1--26.
[9]
Heming Cui, Rui Gu, Cheng Liu, Tianyu Chen, and Junfeng Yang. 2015. P axos made transparent. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 105--120.
[10]
Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4 (2004), 372--421.
[11]
S. Ghemawat, H. Gobioff, and S.-T. Leung. 2003. The Google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles. 29--43.
[12]
Maurice P Herlihy and Jeannette M Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 3 (1990), 463--492.
[13]
Patrick Hunt, Mahadev Konar, Flavio P Junqueira, and Benjamin Reed. 2010. ZooKeeper: wait-free coordination for internet-scale systems. In ATC, Vol. 8.
[14]
Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on. IEEE, 245--256.
[15]
J. Kirsch and Y. Amir. 2008. Paxos for System Builders: An overview. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS). 1--6.
[16]
Jonathan Kirsch and Yair Amir. 2008. Paxos for System Builders: An Overview. In Proceedings of the 2Nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, New York, NY, USA, Article 3, 6 pages.
[17]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2(2010), 35--40.
[18]
L. Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems 16, 2 (May 1998), 133--169.
[19]
Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report MSR-TR-2005-33. Microsoft Research (MSR).
[20]
Parisa Jalili Marandi, Samuel Benz, Fernando Pedone, and Kenneth P. Birman. 2014. Practical Experience Report: The Performance of Paxos in the Cloud. CoRR abs/1404.6719 (2014).
[21]
Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-Ring Paxos. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on. IEEE, 1--12.
[22]
Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on. IEEE, 527--536.
[23]
Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). 358--372.
[24]
Khanh Nguyen, Lu Fang, Christian Navasca, Guoqing Xu, Brian Demsky, and Shan Lu. 2018. SKYWAY: Connecting managed heaps in distributed big data systems. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 56--69.
[25]
Diego Ongaro and John K Ousterhout. 2014. In search of an understandable consensus algorithm. In USENIX Annual Technical Conference. 305--319.
[26]
F. B. Schneider. 1990. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Comput. Surveys 22, 4 (1990), 299--319.
[27]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. Ieee, 1--10.
[28]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 18.
[29]
Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. 2002. An integrated experimental environment for distributed systems and networks. ACM SIGOPS Operating Systems Review 36, SI (2002), 255--270.
[30]
Wenbing Zhao, PM Melliar-Smith, and Louise E Moser. 2010. Fault tolerance middleware for cloud computing. In Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on. IEEE, 67--74.

Cited By

View all
  • (2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
April 2019
2682 pages
ISBN:9781450359337
DOI:10.1145/3297280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. consensus protocol
  2. reliability
  3. state machine replication
  4. transparent replication

Qualifiers

  • Research-article

Funding Sources

  • Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES)
  • Fundacao de Amparo a Pesquisa do Estado do RS (FAPERGS)

Conference

SAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media