Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1629575.1629602acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Upright cluster services

Published: 11 October 2009 Publication History
  • Get Citation Alerts
  • Abstract

    The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices in UpRight favor simplifying adoption by existing applications; performance is a secondary concern. Despite these priorities, our BFT Zookeeper and BFT HDFS implementations have performance comparable with the originals while providing additional robustness.

    References

    [1]
    M. Abd-El-Malek, G.R. Ganger, G.R. Goodson, M.K. Reiter, and J.J. Wylie. Fault-scalable byzantine fault-tolerant services. In SOSP, 2005.
    [2]
    T. Abdollah. LAX outage is blamed on 1 computer. Los Angeles Times, Aug. 2007.
    [3]
    A.S. Aiyer, L. Alvisi, R.A. Bazzi, and A. Clement. Matrix signatures: From macs to digital signatures in distributed systems. In DISC, 2008.
    [4]
    Amazon elastic compute cloud. http://aws.amazon.com/ec2/, Mar. 2009.
    [5]
    Y. Amir, B.A. Coan, J. Kirsch, and J. Lane. Byzantine replication under attack. In DSN, 2008.
    [6]
    M. Burrows. The chubby lock service for loosely-coupled distributed systems. In OSDI, 2006.
    [7]
    M. Calore. Ma.gnolia suffers major data loss, site taken offline. Wired, Jan. 2009.
    [8]
    M. Castro and B. Liskov. Practical byzantine fault tolerance. In OSDI, 1999.
    [9]
    M. Castro and B. Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst., 20(4), 2002.
    [10]
    T.D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. In PODC, 2007.
    [11]
    P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson. Raid: high-performance, reliable secondary storage. ACM Comput. Surv., 26(2), 1994.
    [12]
    A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making byzantine fault tolerant systems tolerate byzantine faults. In NSDI, 2009.
    [13]
    J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. Hq replication: a hybrid quorum protocol for byzantine fault tolerance. In OSDI, 2006.
    [14]
    P. Dutta, R. Guerraoui, and M. Vukolić. Best-case complexity of asynchronous byzantine consensus. Technical Report EPFL/IC/200499, École Polytechnique Fédérale de Lausanne, 2005.
    [15]
    The FlexiProvider Group. the FlexiProvider Project. http://www.flexiprovider.de.
    [16]
    S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP, 2003.
    [17]
    Hadoop. http://hadoop.apache.org/core/.
    [18]
    C.E. Killian, J.W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In NSDI, 2007.
    [19]
    R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: speculative byzantine fault tolerance. In SOSP, 2007.
    [20]
    R. Kotla and M. Dahlin. High throughput byzantine fault tolerance. In DSN, 2004.
    [21]
    L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2), 1998.
    [22]
    L. Lamport. Lower bounds for asynchronous consensus. In FuDiCo, June 2003.
    [23]
    L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3), 1982.
    [24]
    D.E. Lowell and P.M. Chen. Free transactions with rio vista. In SOSP, 1997.
    [25]
    B.M. Oki and B.H. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In PODC, 1988.
    [26]
    E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. In FAST, 2007.
    [27]
    V. Prabhakaran, L.N. Bairavasundaram, N. Agrawal, H.S. Gunawi, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Iron file systems. In SOSP, 2005.
    [28]
    A. Rich. ZFS, sun's cutting-edge file system. Technical report, Sun Microsystems, 2006.
    [29]
    F.B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv., 22(4), 1990.
    [30]
    B. Schroeder and G.A. Gibson. Disk failures in the real world: what does an mttf of 1,000,000 hours mean to you? In FAST, 2007.
    [31]
    A.S. Team. Amazon S3 availability event: July 20, 2008. http://status.aws.amazon.com/s3-20080720.html.
    [32]
    C.A. Thekkath, T. Mann, and E.K. Lee. Frangipani: a scalable distributed file system. In SOSP, 1997.
    [33]
    B. Vandiver, H. Balakrishnan, B. Liskov, and S. Madden. Tolerating byzantine faults in transaction processing systems using commit barrier scheduling. In SOSP, 2007.
    [34]
    T. Wood, R. Singh, A. Venkataramani, and P. Shenoy. ZZ: Cheap practical BFT using virtualization. Technical Report TR14-08, University of Massachusetts, 2008.
    [35]
    J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating agreement from execution for byzantine fault tolerant services. In SOSP, 2003.
    [36]
    Zookeeper. http://hadoop.apache.org/zookeeper.

    Cited By

    View all
    • (2024)Distributed Transaction Processing in Untrusted EnvironmentsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654684(570-579)Online publication date: 9-Jun-2024
    • (2024)Research and Application of Multi-processor Fault-Tolerant Algorithms for China Space Station Full Digital Simulation PlatformSignal and Information Processing, Networking and Computers10.1007/978-981-97-2120-7_25(200-208)Online publication date: 3-May-2024
    • (2023)Joining Parallel and Partitioned State Machine Replication Models for Enhanced Shared Logging PerformanceProceedings of the 12th Latin-American Symposium on Dependable and Secure Computing10.1145/3615366.3615422(90-99)Online publication date: 16-Oct-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SOSP '09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
    October 2009
    346 pages
    ISBN:9781605587523
    DOI:10.1145/1629575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. byzantine fault tolerance
    2. cluster services
    3. reliability

    Qualifiers

    • Research-article

    Conference

    SOSP09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 131 of 716 submissions, 18%

    Upcoming Conference

    SOSP '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)1
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Distributed Transaction Processing in Untrusted EnvironmentsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654684(570-579)Online publication date: 9-Jun-2024
    • (2024)Research and Application of Multi-processor Fault-Tolerant Algorithms for China Space Station Full Digital Simulation PlatformSignal and Information Processing, Networking and Computers10.1007/978-981-97-2120-7_25(200-208)Online publication date: 3-May-2024
    • (2023)Joining Parallel and Partitioned State Machine Replication Models for Enhanced Shared Logging PerformanceProceedings of the 12th Latin-American Symposium on Dependable and Secure Computing10.1145/3615366.3615422(90-99)Online publication date: 16-Oct-2023
    • (2023)Flexible Advancement in Asynchronous BFT ConsensusProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613164(264-280)Online publication date: 23-Oct-2023
    • (2023)Understanding Silent Data Corruptions in a Large Production CPU PopulationProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613149(216-230)Online publication date: 23-Oct-2023
    • (2023)Dissecting BFT Consensus: In Trusted Components we Trust!Proceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587455(521-539)Online publication date: 8-May-2023
    • (2023)Making Intrusion Tolerance Accessible: A Cloud-Based Hybrid Management Approach to Deploying Resilient Systems2023 42nd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS60354.2023.00033(254-267)Online publication date: 25-Sep-2023
    • (2023)Reliable Transactions in Serverless-Edge Architecture2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00030(301-314)Online publication date: Apr-2023
    • (2023)Trustworthy Web Service-Business Activities (WSBA) using an Efficient Byzantine Fault Tolerance Algorithm2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128413(1-6)Online publication date: 23-Jan-2023
    • (2023)Micro Replication2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00024(123-137)Online publication date: Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media