Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SmartStream: towards efficient byzantine resilient data streaming through speculation and sharding

Published: 20 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Data streaming platforms connect heterogeneous services through the publish-subscribe paradigm. Currently available platforms provide protection against crash faults, but are not resistant against Byzantine faults like arbitrary hardware faults and intrusions. State machine replication can provide this protection, but the higher resource requirements and the more elaborate communication primitives usually result in a higher overall complexity and a non-negligible performance degradation. As data streaming operates on highly-partitionable append-only state, some of these performance losses can be counteracted by applying speculative execution and sharding. We show the effectiveness of these concepts in a prototype implementation, which only results in a reasonable drop in system throughput and latency during average system utilization, when compared to state-of-the-art data streaming platforms like Apache Kafka, while providing stronger resilience guarantees.

    References

    [1]
    Amazon Web Services LLC. Amazon S3 Availability Event: July 20, 2008, 2008 (accessed Sep 28, 2020).
    [2]
    A. Bessani, J. Sousa, and E. E. Alchieri. State machine replication for the masses with bft-smart. In 44th Ann. IEEE/IFIP Int. Conf. on Dep. Sys. and Netw. (DSN), pages 355--362. IEEE, 2014.
    [3]
    A. N. Bessani, E. P. Alchieri, M. Correia, and J. S. Fraga. Depspace: a Byzantine fault-tolerant coordination service. In 3rd ACM SIGOPS/EuroSys Eur. Conf. on Comp. Sys., pages 163--176, 2008.
    [4]
    C. E. Bezerra, F. Pedone, and R. Van Renesse. Scalable state-machine replication. In 44th Ann. IEEE/IFIP Int. Conf. on Dep. Sys. and Netw. (DSN), pages 331--342. IEEE, 2014.
    [5]
    M. Castro and B. Liskov. Practical Byzantine fault tolerance. In 3rd Symp. on Oper. Sys. Des. & Impl. (OSDI), pages 173--186, 1999.
    [6]
    M. Castro and B. Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. on Comp. Sys. (TOCS), 20(4):398--461, 2002.
    [7]
    F. Cerveira, R. Barbosa, H. Madeira, and F. Araújo. The effects of soft errors and mitigation strategies for virtualization servers. IEEE Trans. on Cloud Comp., 2020.
    [8]
    T. Chang and H. Meling. Byzantine fault-tolerant publish/subscribe: A cloud computing infrastructure. In 31st IEEE Symp. on Rel. Distr. Sys. (SRDS), pages 454--456. IEEE, 2012.
    [9]
    X. Défago, A. Schiper, and P. Urbán. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comp. Surv., 36(4):372--421, 2004.
    [10]
    R. Garcia, R. Rodrigues, and N. Preguiça. Efficient middleware for Byzantine fault tolerant database replication. In 6th Conf. on Comp. Sys. (EuroSys), pages 107--122, 2011.
    [11]
    L. J. Gunn, J. Liu, B. Vavala, and N. Asokan. Making Speculative BFT Resilient with Trusted Monotonic Counters. In 38th Symp. on Rel. Distr. Sys. (SRDS), pages 133--142, Oct. 2019.
    [12]
    S. Gupta, J. Hellings, S. Rahnama, and M. Sadoghi. Proof-of-Execution: Reaching Consensus through Fault-Tolerant Speculation. arXiv:1911.00838 [cs], Feb. 2021.
    [13]
    G. Habiger, F. J. Hauck, J. Köstler, and H. P. Reiser. Resource-efficient state-machine replication with multithreading and vertical scaling. In 14th Eur. Dep. Comp. Conf. (EDCC), pages 87--94. IEEE, 2018.
    [14]
    L. Jehl and H. Meling. Towards Byzantine fault tolerant publish/subscribe: A state machine approach. In 9th Worksh. on Hot Topics in Dep. Sys. (HotDep), page 5. ACM, 2013.
    [15]
    M. Kapritsos, Y. Wang, V. Quema, A. Clement, L. Alvisi, and M. Dahlin. All about Eve: Execute-verify replication for multi-core servers. In 10th USENIX Conf. on Oper. Sys. Des. & Impl. (OSDI), pages 237--250, USA, Oct. 2012.
    [16]
    R. S. Kazemzadeh and H.-A. Jacobsen. Publiyprime: Exploiting overlay neighborhoods to defeat byzantine publish/subscribe brokers. Technical report, Univ. of Toronto, 2013.
    [17]
    J. Köstler, J. Seidemann, and H. P. Reiser. Emusphere: Evaluating Planetary-Scale Distributed Systems in Automated Emulation Environments. In 2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW), pages 49--54, Sept. 2016.
    [18]
    R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: Speculative byzantine fault tolerance. In 21st ACM SIGOPS Symp. on Oper. Sys. Princ. (SOSP), pages 45--58, New York, NY, USA, Oct. 2007.
    [19]
    L. Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks (1976), 2(2):95--114, 1978.
    [20]
    L. H. Le, C. E. Bezerra, and F. Pedone. Dynamic scalable state machine replication. In 46th Ann. IEEE/IFIP Int. Conf. on Dep. Sys. & Netw. (DSN), pages 13--24. IEEE, 2016.
    [21]
    A. Ledeul, A. Savulescu, G. Segura Millan, and B. Styczen. Data streaming with apache kafka for cern supervision, control and data acquisition system for radiation and environmental protection. In 17th Int. Conf. on Accel. & Large Exp. Physics Contr. Sys. (ICALEPCS), 2019.
    [22]
    P. J. Marandi and F. Pedone. Optimistic Parallel State-Machine Replication. In 33rd IEEE Int. Symp. on Rel. Distr. Sys. (SRDS), pages 57--66, Oct. 2014.
    [23]
    J. Martin and L. Alvisi. Fast Byzantine consensus. In Int. Conf. on Dep. Sys. & Netw. (DSN), pages 402--411, June 2005.
    [24]
    D. Nguyen, A. Luckow, E. Duffy, K. Kennedy, and A. Apon. Evaluation of highly available cloud streaming systems for performance and price. In 18th IEEE/ACM Int. Symp. on Cluster, Cloud & Grid Comp. (CCGRID), pages 360--363. IEEE, 2018.
    [25]
    A. Nogueira, A. Casimiro, and A. Bessani. Elastic state machine replication. IEEE Trans. on Par. & Distr. Sys., 28(9):2486--2499, 2017.
    [26]
    R. Padilha and F. Pedone. Augustus: Scalable and robust storage for cloud applications. In 8th ACM Eur. Conf. on Comp. Sys. (EuroSys), pages 99--112, 2013.
    [27]
    G. S. Ramachandran, K.-L. Wright, L. Zheng, P. Navaney, M. Naveed, B. Krishnamachari, and J. Dhaliwal. Trinity: A byzantine fault-tolerant distributed publish-subscribe system with immutable blockchain-based persistence. In IEEE Int. Conf. on Blockch. & Cryptocurr. (ICBC), pages 227--235. IEEE, 2019.
    [28]
    F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comp. Surv., 22(4):299--319, 1990.
    [29]
    A. Shoker and J.-P. Bahsoun. Towards Byzantine resilient directories. In 11th IEEE Int. Symp. on Netw. Comp. & Appl. (NCA), pages 52--60. IEEE, 2012.
    [30]
    G. S. Veronese, M. Correia, A. N. Bessani, L. C. Lung, and P. Verissimo. Minimal Byzantine Fault Tolerance: Algorithm and Evaluation. Report, June 2009.
    [31]
    G. Wang, J. Koshy, S. Subramanian, K. Paramasivam, M. Zadeh, N. Narkhede, J. Rao, J. Kreps, and J. Stein. Building a replicated logging system with Apache Kafka. VLDB Endowment, 8(12):1654--1655, 2015.
    [32]
    B. Wester, J. Cowling, E. B. Nightingale, P. M. Chen, J. Flinn, and B. Liskov. Tolerating latency in replicated state machines through client speculation. In 6th USENIX Symp. on Netw. Sys. Des. & Impl. (NSDI), pages 245--260, USA, Apr. 2009.
    [33]
    J. Zhang, Y. Rong, J. Cao, C. Rong, J. Bian, and W. Wu. DBFT: A Byzantine Fault Tolerant Protocol with Graceful Performance Degradation. In 38th Symp. on Rel. Distr. Sys. (SRDS), pages 123--12309, Oct. 2019.
    [34]
    P. Zieliński. Paxos at war. Technical Report UCAM-CL-TR-593, University of Cambridge, Computer Laboratory, June 2004.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGAPP Applied Computing Review
    ACM SIGAPP Applied Computing Review  Volume 21, Issue 3
    September 2021
    56 pages
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 October 2021
    Published in SIGAPP Volume 21, Issue 3

    Check for updates

    Author Tags

    1. byzantine fault tolerance
    2. message broker
    3. replication
    4. sharding
    5. speculation
    6. state machine
    7. streaming platform

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 64
      Total Downloads
    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media