Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Niobe: A practical replication protocol

Published: 25 February 2008 Publication History
  • Get Citation Alerts
  • Abstract

    The task of consistently and reliably replicating data is fundamental in distributed systems, and numerous existing protocols are able to achieve such replication efficiently. When called on to build a large-scale enterprise storage system with built-in replication, we were therefore surprised to discover that no existing protocols met our requirements. As a result, we designed and deployed a new replication protocol called Niobe. Niobe is in the primary-backup family of protocols, and shares many similarities with other protocols in this family. But we believe Niobe is significantly more practical for large-scale enterprise storage than previously published protocols. In particular, Niobe is simple, flexible, has rigorously proven yet simply stated consistency guarantees, and exhibits excellent performance. Niobe has been deployed as the backend for a commercial Internet service; its consistency properties have been proved formally from first principles, and further verified using the TLA + specification language. We describe the protocol itself, the system built to deploy it, and some of our experiences in doing so.

    References

    [1]
    Aguilera, M. and Frølund, S. 2003. Strict linearizability and the power of aborting. Tech. Rep. 2003-241, Hewlett-Packard Laboratories.
    [2]
    Alsberg, P. and Day, J. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering, 627--644.
    [3]
    Barroso, L. A., Dean, J., and Holzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro. 23, 2, 22--28.
    [4]
    Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. 1993. The primary-backup approach. In Distributed Systems. ACM Press/Addison-Wesley.
    [5]
    Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI), 335--50.
    [6]
    Chang, F., Ji, M., Leung, S., MacCormick, J., Perl, S., and Zhang, L. 2002. Myriad: Cost-Effective disaster tolerance. In Proceedings of the Conference on File and Storage Technologies (FAST).
    [7]
    Dolev, D., Keidar, I., and Lotem, E. Y. 1997. Dynamic voting for consistent primary components. In Proceedings of the 16th ACM Symposium on Principles of Distributed Computing (PODC). 63--71.
    [8]
    Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP).
    [9]
    Gray, C. and Cheriton, D. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP). 202--210.
    [10]
    Hassin, Y. and Peleg, D. 2006. Average probe complexity in quorum systems. J. Comput. Syst. Sci. 72, 4, 592--616.
    [11]
    Herlihy, M. P. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492.
    [12]
    Hsiao, H.-I. and DeWitt, D. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the 6th International Data Engineering Conference, 456--465.
    [13]
    Kistler, J. J. and Satyanarayanan, M. 1992. Disconnected operation in the Coda file system. ACM Trans. Comput. Syst. 10, 1, 3--25.
    [14]
    Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept.), 690--91.
    [15]
    Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169.
    [16]
    Lamport, L. 2002. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley.
    [17]
    Lamport, L. and Massa, M. 2004. Cheap paxos. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), 307--314.
    [18]
    Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    [19]
    Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., and Shrira, L. 1991. Replication in the Harp file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP). ACM Press, 226--238.
    [20]
    MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI).
    [21]
    Oki, B. M. and Liskov, B. H. 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP), 8--17.
    [22]
    Papadimitriou, C. H. 1979. The serializability of concurrent database updates. J. ACM 26, 4, 631--653.
    [23]
    Perl, S. E. and Seltzer, M. 2006. Data management for Internet-scale single-sign-on. In Proceedings of the 3rd USENIX Workshop on Real, Large Distributed Systems (WORLDS).
    [24]
    Petersen, K., Spreitzer, M. J., Terry, D. B., Theimer, M. M., and Demers, A. J. 1997. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP). ACM Press, New York, 288--301.
    [25]
    Risvik, K. M., Aasheim, Y., and Lidal, M. 2003. Multi-tier architecture for web search engines. In Proceedings of the 1st Latin American Web Congress (LA-WEB), Empowering Our Web. IEEE Computer Society, 132--143.
    [26]
    Saito, Y., Frølund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: Building distributed enterprise disk arrays from commodity components. SIGOPS Oper. Syst. Rev. 38, 5, 48--58.
    [27]
    van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI).

    Cited By

    View all
    • (2017)A General-Purpose Architecture for Replicated Metadata Services in Distributed File SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.270027228:10(2747-2759)Online publication date: 1-Oct-2017
    • (2015)A data replica consistency maintenance scheme for cloud storage under healthcare IoT environmentFuture Communication, Information and Computer Science10.1201/b18049-21(85-90)Online publication date: 5-Feb-2015
    • (2015)Securing Passive Replication through VerificationProceedings of the 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS.2015.38(176-181)Online publication date: 28-Sep-2015
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Storage
    ACM Transactions on Storage  Volume 3, Issue 4
    February 2008
    156 pages
    ISSN:1553-3077
    EISSN:1553-3093
    DOI:10.1145/1326542
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2008
    Accepted: 01 December 2007
    Received: 01 July 2007
    Published in TOS Volume 3, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Replication
    2. enterprise storage

    Qualifiers

    • Research-article
    • Research
    • Pre-selected

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)A General-Purpose Architecture for Replicated Metadata Services in Distributed File SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.270027228:10(2747-2759)Online publication date: 1-Oct-2017
    • (2015)A data replica consistency maintenance scheme for cloud storage under healthcare IoT environmentFuture Communication, Information and Computer Science10.1201/b18049-21(85-90)Online publication date: 5-Feb-2015
    • (2015)Securing Passive Replication through VerificationProceedings of the 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS.2015.38(176-181)Online publication date: 28-Sep-2015
    • (2014)ACaZooProceedings of the 2014 IEEE 33rd International Symposium on Reliable Distributed Systems10.1109/SRDS.2014.43(211-220)Online publication date: 6-Oct-2014
    • (2014)Granary: A sharing oriented distributed storage systemFuture Generation Computer Systems10.1016/j.future.2013.08.00138(47-60)Online publication date: Sep-2014
    • (2013)Strengthening Consistency in the Cassandra Distributed Key-Value StoreDistributed Applications and Interoperable Systems10.1007/978-3-642-38541-4_17(193-198)Online publication date: 2013
    • (2012)Maximizing Availability of Consistent Data in Unreliable NetworksProceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems10.1109/ICPADS.2012.26(117-123)Online publication date: 17-Dec-2012
    • (2012)Supporting multiple isolation levels in replicated environmentsData & Knowledge Engineering10.1016/j.datak.2012.05.00179-80(1-16)Online publication date: 1-Sep-2012
    • (2012)Scalability of replicated metadata services in distributed file systemsProceedings of the 12th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems10.1007/978-3-642-30823-9_3(31-44)Online publication date: 13-Jun-2012
    • (2011)Flease - Lease Coordination Without a Lock ServerProceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2011.94(978-988)Online publication date: 16-May-2011
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media