Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Safe replication through bounded concurrency verification

Published: 24 October 2018 Publication History

Abstract

High-level data types are often associated with semantic invariants that must be preserved by any correct implementation. While having implementations enforce strong guarantees such as linearizability or serializability can often be used to prevent invariant violations in concurrent settings, such mechanisms are impractical in geo-distributed replicated environments, the platform of choice for many scalable Web services. To achieve high-availability essential to this domain, these environments admit various forms of weak consistency that do not guarantee all replicas have a consistent view of an application's state. Consequently, they often admit difficult-to-understand anomalous behaviors that violate a data type's invariants, but which are extremely challenging, even for experts, to understand and debug.
In this paper, we propose a novel programming framework for replicated data types (RDTs) equipped with an automatic (bounded) verification technique that discovers and fixes weak consistency anomalies. Our approach, implemented in a tool called Q9, involves systematically exploring the state space of an application executing on top of an eventually consistent data store, under an unrestricted consistency model but with a finite concurrency bound. Q9 uncovers anomalies (i.e., invariant violations) that manifest as finite counterexamples, and automatically generates repairs for such anamolies by selectively strengthening consistency guarantees for specific operations. Using Q9, we have uncovered a range of subtle anomalies in implementations of well-known benchmarks, and have been able to apply the repairs it mandates to effectively eliminate them. Notably, these benchmarks were written adopting best practices suggested to manage distributed replicated state (e.g., they are composed of provably convergent RDTs (CRDTs), avoid mutable state, etc.). While the safety guarantees offered by our technique are constrained by the concurrency bound, we show that in practice, proving bounded safety guarantees typically generalize to the unbounded case.

Supplementary Material

WEBM File (a164-kaki.webm)

References

[1]
Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in Weak Memory Models. In Proceedings of the 22Nd International Conference on Computer Aided Verification (CAV’10). Springer-Verlag, Berlin, Heidelberg, 258–272.
[2]
Peter Alvaro, Peter Bailis, Neil Conway, and Joseph M. Hellerstein. 2013. Consistency Without Borders. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC ’13). ACM, New York, NY, USA, Article 23, 10 pages.
[3]
Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2014. Coordination Avoidance in Database Systems. Proc. VLDB Endow. 8, 3 (Nov. 2014), 185–196.
[4]
Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2015. Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 1327–1342.
[5]
P Bailis and A Ghodsi. 2013. Eventual consistency Today: Limitations, Extensions, and Beyond. Commun. ACM (2013).
[6]
Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2013. Bolt-on Causal Consistency. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13). ACM, New York, NY, USA, 761–772.
[7]
Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. 2012. Probabilistically Bounded Staleness for Practical Partial Quorums. Proc. VLDB Endow. 5, 8 (April 2012), 776–787.
[8]
Valter Balegas, Nuno Preguiça, Rodrigo Rodrigues, Sérgio Duarte, Carla Ferreira, Mahsa Najafzadeh, and Marc Shapiro. 2015. Putting the Consistency back into Eventual Consistency. In Proceedings of the Tenth European Conference on Computer System (EuroSys ’15). Bordeaux, France. http://lip6.fr/Marc.Shapiro/papers/putting-consistency-back-EuroSys-2015.pdf
[9]
James Bornholt and Emina Torlak. 2017. Synthesizing Memory Models from Framework Sketches and Litmus Tests. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 467–481.
[10]
Ahmed Bouajjani, Constantin Enea, Rachid Guerraoui, and Jad Hamza. 2017. On Verifying Causal Consistency. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 626–638.
[11]
Eric Brewer. 2000. Towards Robust Distributed Systems (Invited Talk). (2000).
[12]
Lucas Brutschy, Dimitar Dimitrov, Peter Müller, and Martin Vechev. 2017. Serializability for Eventual Consistency: Criterion, Analysis, and Applications. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 458–472.
[13]
Sebastian Burckhardt, Alexey Gotsman, Hongseok Yang, and Marek Zawirski. 2014. Replicated Data Types: Specification, Verification, Optimality. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 271–284.
[14]
Sebastian Burckhardt, Daan Leijen, Jonathan Protzenko, and Manuel Fähndrich. 2015. Global Sequence Protocol: A Robust Abstraction for Replicated Shared State. In Proceedings of the 29th European Conference on Object-Oriented Programming (ECOOP ’15). Prague, Czech Republic. http://research.microsoft.com/pubs/240462/gsp-tr-2015-2.pdf
[15]
Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56, 2 (Feb. 2013), 82–90.
[16]
Seth Gilbert and Nancy Lynch. 2002. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web Services. SIGACT News 33, 2 (June 2002), 51–59.
[17]
Alexey Gotsman, Hongseok Yang, Carla Ferreira, Mahsa Najafzadeh, and Marc Shapiro. 2016. ’Cause I’m Strong Enough: Reasoning About Consistency Choices in Distributed Systems. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2016). ACM, New York, NY, USA, 371–384.
[18]
Shachar Itzhaky, Anindya Banerjee, Neil Immerman, Ori Lahav, Aleksandar Nanevski, and Mooly Sagiv. 2014. Modular Reasoning About Heap Paths via Effectively Propositional Formulas. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 385–396.
[19]
Jepsen 2018. (2018). https://jepsen.io/
[20]
Gowtham Kaki, Kartik Nagar, Mahsa Najafzadeh, and Suresh Jagannathan. 2017. Alone Together: Compositional Reasoning and Inference for Weak Isolation. Proc. ACM Program. Lang. 2, POPL, Article 27 (Dec. 2017), 34 pages.
[21]
Charles Killian, James W. Anderson, Ranjit Jhala, and Amin Vahdat. 2007. Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. In Proceedings of the 4th USENIX Conference on Networked Systems Design & Implementation (NSDI’07). USENIX Association, Berkeley, CA, USA, 18–18. http://dl.acm.org/citation.cfm?id=1973430. 1973448
[22]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Operating Systems Review 44, 2 (April 2010), 35–40.
[23]
Mohsen Lesani, Christian J. Bell, and Adam Chlipala. 2016. Chapar: Certified Causally Consistent Distributed Key-value Stores. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 357–370.
[24]
Madanlal Musuvathi and Shaz Qadeer. 2007. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, New York, NY, USA, 446–455.
[25]
Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and Reproducing Heisenbugs in Concurrent Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, USA, 267–280. http: //dl.acm.org/citation.cfm?id=1855741.1855760
[26]
Oded Padon, Jochen Hoenicke, Giuliano Losa, Andreas Podelski, Mooly Sagiv, and Sharon Shoham. 2017a. Reducing Liveness to Safety in First-order Logic. Proc. ACM Program. Lang. 2, POPL, Article 26 (Dec. 2017), 33 pages.
[27]
Oded Padon, Giuliano Losa, Mooly Sagiv, and Sharon Shoham. 2017b. Paxos Made EPR: Decidable Reasoning About Distributed Protocols. Proc. ACM Program. Lang. 1, OOPSLA, Article 108 (Oct. 2017), 31 pages.
[28]
Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: Safety Verification by Interactive Generalization. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 614–630.
[29]
Riak 2018. (2018). docs.basho.com/riak/kv/2.2.3/ Riak NoSQL Database.
[30]
RUBiS 2014. Rice University Bidding System. (2014). http://rubis.ow2.org/ Accessed: 2014-11-4 13:21:00.
[31]
M. Shapiro, A. Bieniusa, N. Preguiça, V. Balegas, and C. Meiklejohn. 2018. Just-Right Consistency: Reconciling Availability and Safety. (Jan. 2018). ArXiv e-prints.
[32]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011a. Conflict-free Replicated Data Types. In Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems (SSS’11). Springer-Verlag, Berlin, Heidelberg, 386–400. http://dl.acm.org/citation.cfm?id=2050613.2050642
[33]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011b. Conflict-Free Replicated Data Types. In Stabilization, Safety, and Security of Distributed Systems, Xavier Défago, Franck Petit, and Vincent Villain (Eds.). Lecture Notes in Computer Science, Vol. 6976. Springer Berlin Heidelberg, 386–400.
[34]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011c. Conflict-Free Replicated Data Types. In Stabilization, Safety, and Security of Distributed Systems, Xavier Défago, Franck Petit, and Vincent Villain (Eds.). Lecture Notes in Computer Science, Vol. 6976. Springer Berlin Heidelberg, 386–400.
[35]
KC Sivaramakrishnan, Gowtham Kaki, and Suresh Jagannathan. 2015. Declarative Programming over Eventually Consistent Data Stores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). ACM, New York, NY, USA, 413–424.
[36]
Swaminathan Sivasubramanian. 2012. Amazon dynamoDB: A Seamlessly Scalable Non-relational Database Service. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA, 729–730.
[37]
Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Transactional Storage for Geo-replicated Systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP ’11). ACM, New York, NY, USA, 385–400.
[38]
Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike Spreitzer, Marvin Theimer, and Brent W. Welch. 1994. Session Guarantees for Weakly Consistent Replicated Data. In Proceedings of the Third International Conference on Parallel and Distributed Information Systems (PDIS ’94). IEEE Computer Society, Washington, DC, USA, 140–149. http://dl.acm.org/ citation.cfm?id=645792.668302
[39]
D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. 1995. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, NY, USA, 172–182.
[40]
TPC 2018. (2018). http://www.tpc.org/information/benchmarks.asp TPC Benchmarks.
[41]
Twissandra 2014. Twitter clone on Cassandra. (2014). http://twissandra.com/ Accessed: 2014-11-4 13:21:00.
[42]
Paolo Viotti and Marko Vukolic. 2015. Consistency in Non-Transactional Distributed Storage Systems. CoRR abs/1512.00168 (2015). http://arxiv.org/abs/1512.00168
[43]
Voldemort 2009. (2009). http://www.project-voldemort.com/voldemort/design.html Voldemort Distributed Database.
[44]
James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 357–368.

Cited By

View all
  • (2025)Consistent Local-First Software: Enforcing Safety and Invariants for Local-First ApplicationsIEEE Transactions on Software Engineering10.1109/TSE.2024.347772351:1(53-65)Online publication date: Jan-2025
  • (2024)IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store ApplicationsProceedings of the ACM on Programming Languages10.1145/36563918:PLDI(343-367)Online publication date: 20-Jun-2024
  • (2024)LoRe: A Programming Model for Verifiably Safe Local-first SoftwareACM Transactions on Programming Languages and Systems10.1145/363376946:1(1-26)Online publication date: 15-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 2, Issue OOPSLA
November 2018
1656 pages
EISSN:2475-1421
DOI:10.1145/3288538
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2018
Published in PACMPL Volume 2, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CRDTs
  2. Concurrency
  3. Model Checking
  4. Replication
  5. Weak Consistency

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)18
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Consistent Local-First Software: Enforcing Safety and Invariants for Local-First ApplicationsIEEE Transactions on Software Engineering10.1109/TSE.2024.347772351:1(53-65)Online publication date: Jan-2025
  • (2024)IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store ApplicationsProceedings of the ACM on Programming Languages10.1145/36563918:PLDI(343-367)Online publication date: 20-Jun-2024
  • (2024)LoRe: A Programming Model for Verifiably Safe Local-first SoftwareACM Transactions on Programming Languages and Systems10.1145/363376946:1(1-26)Online publication date: 15-Jan-2024
  • (2023)Dynamic Partial Order Reduction for Checking Correctness against Transaction Isolation LevelsProceedings of the ACM on Programming Languages10.1145/35912437:PLDI(565-590)Online publication date: 6-Jun-2023
  • (2022)Katara: synthesizing CRDTs with verified liftingProceedings of the ACM on Programming Languages10.1145/35633366:OOPSLA2(1349-1377)Online publication date: 31-Oct-2022
  • (2022)Certified mergeable replicated data typesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523735(332-347)Online publication date: 9-Jun-2022
  • (2022)Hamband: RDMA replicated data typesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523426(348-363)Online publication date: 9-Jun-2022
  • (2022)Automated replication of tuple spaces via static analysisScience of Computer Programming10.1016/j.scico.2022.102863223(102863)Online publication date: Nov-2022
  • (2021)MonkeyDB: effectively testing correctness under weak isolation levelsProceedings of the ACM on Programming Languages10.1145/34855465:OOPSLA(1-27)Online publication date: 15-Oct-2021
  • (2021)ECROs: building global scale systems from sequential codeProceedings of the ACM on Programming Languages10.1145/34854845:OOPSLA(1-30)Online publication date: 15-Oct-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media