Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2737924.2737958acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

Verdi: a framework for implementing and formally verifying distributed systems

Published: 03 June 2015 Publication History

Abstract

Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex to permit exhaustive testing. Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. Furthermore, Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden. To demonstrate Verdi's utility, we present the first mechanically checked proof of linearizability of the Raft state machine replication algorithm, as well as verified implementations of a primary-backup replication system and a key-value store. These verified systems provide similar performance to unverified equivalents.

References

[1]
Amazon. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East Region, Apr. 2011. http://aws.amazon.com/ message/65648/.
[2]
S. Bishop, M. Fairbairn, M. Norrish, P. Sewell, M. Smith, and K. Wansbrough. Engineering with logic: HOL specification and symbolic-evaluation testing for TCP implementations. In Proceedings of the 33rd ACM Symposium on Principles of Programming Languages (POPL), pages 55–66, Charleston, SC, Jan. 2006.
[3]
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the 26th ACM SIGACTSIGOPS Symposium on Principles of Distributed Computing (PODC), pages 398–407, Portland, OR, Aug. 2007.
[4]
A. Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 234–245, San Jose, CA, June 2011.
[5]
R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematics with The Nuprl Proof Development System. Prentice Hall, 1986.
[6]
CoreOS. etcd: A highly-available key value store for shared configuration and service discovery, 2014.
[7]
https://github.com/coreos/etcd.
[8]
S. J. Garland and N. Lynch. Using I/O automata for developing distributed systems. In Foundations of Component-based Systems. Cambridge University Press, 2000.
[9]
D. Geels, G. Altekar, P. Maniatis, T. Roscoe, and I. Stoica. Friday: Global comprehension for distributed replay. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 285–298, Cambridge, MA, Apr. 2007.
[10]
A. Guha, M. Reitblatt, and N. Foster. Machine-verified network controllers. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 483–494, Seattle, WA, June 2013.
[11]
Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, P. Bodik, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Proceedings of the HotOS XIV, Santa Ana Pueblo, NM, May 2013.
[12]
M. Hayden. The Ensemble System. PhD thesis, Cornell University, 1998.
[13]
M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463–492, 1990.
[14]
J. J. Hickey, N. Lynch, and R. van Renesse. Specifications and proofs for Ensemble layers. In Proceedings of the 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 119–133, York, UK, Mar. 2009.
[15]
High Scalability. The updated big list of articles on the Amazon outage, May 2011. http://highscalability.com/blog/2011/5/2/theupdated-big-list-of-articles-on-the-amazon-outage.html.
[16]
D. Jackson. Software Abstractions: Logic, Language, and Analysis. MIT Press, Feb. 2012.
[17]
C. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: Language support for building distributed systems. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 179–188, San Diego, CA, June 2007.
[18]
C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 243–256, Cambridge, MA, Apr. 2007.
[19]
G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207–220, Big Sky, MT, Oct. 2009.
[20]
L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Professional, July 2002.
[21]
L. Lamport. Thinking for programmers, Apr. 2014. http://channel9.msdn.com/Events/Build/2014/3-642.
[22]
V. Le, M. Afshari, and Z. Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 216–226, Edinburgh, UK, June 2014.
[23]
X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, July 2009.
[24]
X. Liu, C. Kreitz, R. van Renesse, J. Hickey, M. Hayden, K. Birman, and R. L. Constable. Building reliable, high-performance communication systems from components. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP), pages 80–92, Kiawah Island, SC, Dec. 1999.
[25]
X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D 3 S: Debugging deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 423–437, San Francisco, CA, Apr. 2008.
[26]
N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996. ISBN 1558603484.
[27]
A. Nanevski, G. Morrisett, A. Shinnar, P. Govereau, and L. Birkedal. Ynot: Dependent types for imperative programs. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP), Victoria, British Columbia, Canada, Sept. 2008.
[28]
C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff. Use of formal methods at Amazon Web Services, Sept. 2014. http://research.microsoft.com/enus/um/people/lamport/tla/formal-methods-amazon.pdf.
[29]
NYTimes. Amazon’s trouble raises cloud computing doubts, Apr. 2011. http://www.nytimes.com/2011/04/23/technology/23cloud.html.
[30]
D. Ongaro. Consensus: Bridging Theory and Practice. PhD thesis, Stanford University, Aug. 2014.
[31]
D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, pages 305–319, Philadelphia, PA, June 2014.
[32]
J. L. Peterson. Petri nets. ACM Computing Surveys, pages 223–252, Sept. 1977.
[33]
V. Rahli. Interfacing with proof assistants for domain specific programming using EventML. In Proceedings of the 10th International Workshop On User Interfaces for Theorem Provers, Bremen, Germany, July 2012.
[34]
D. Ricketts, V. Robert, D. Jang, Z. Tatlock, and S. Lerner. Automating formal proofs for reactive systems. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 452–462, Edinburgh, UK, June 2014.
[35]
T. Ridge. Verifying distributed systems: The operational approach. In Proceedings of the 36th ACM Symposium on Principles of Programming Languages (POPL), pages 429–440, Savannah, GA, Jan. 2009.
[36]
D. Sangiorgi and D. Walker. PI-Calculus: A Theory of Mobile Processes. Cambridge University Press, New York, NY, USA, 2001. ISBN 0521781779.
[37]
N. Schiper, V. Rahli, R. van Renesse, M. Bickford, and R. L. Constable. Developing correctly replicated databases using formal tools. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 395–406, Atlanta, GA, June 2014.
[38]
Verdi. Verdi, 2014. https://github.com/uwplse/verdi.
[39]
M. Yabandeh, N. Kneževi´c, D. Kosti´c, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 229–244, San Francisco, CA, Apr. 2008.
[40]
J. Yang and C. Hawblitzel. Safe to the last instruction: Automated verification of a type-safe operating system. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 99–110, Toronto, Canada, June 2010.
[41]
J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. M O D IST : Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI), pages 213–228, Boston, MA, Apr. 2009.
[42]
X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283–294, San Jose, CA, June 2011.
[43]
D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 249–265, Broomfield, CO, Oct. 2014.
[44]
P. Zave. Using lightweight modeling to understand Chord. ACM SIGCOMM Computer Communication Review, 42(2):49–57, Apr. 2012.

Cited By

View all
  • (2024)Verus: A Practical Foundation for Systems VerificationProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695952(438-454)Online publication date: 4-Nov-2024
  • (2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
  • (2024)Erla⁺: Translating TLA⁺ Models into Executable Actor-Based ImplementationsProceedings of the 23rd ACM SIGPLAN International Workshop on Erlang10.1145/3677995.3678190(13-23)Online publication date: 28-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 50, Issue 6
    PLDI '15
    June 2015
    630 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2813885
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2015

Check for updates

Author Tags

  1. Coq
  2. Formal verification
  3. Verdi
  4. distributed systems
  5. proof assistants

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)590
  • Downloads (Last 6 weeks)76
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Verus: A Practical Foundation for Systems VerificationProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695952(438-454)Online publication date: 4-Nov-2024
  • (2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
  • (2024)Erla⁺: Translating TLA⁺ Models into Executable Actor-Based ImplementationsProceedings of the 23rd ACM SIGPLAN International Workshop on Erlang10.1145/3677995.3678190(13-23)Online publication date: 28-Aug-2024
  • (2024)Compositional Verification of Composite Byzantine ProtocolsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690355(34-48)Online publication date: 2-Dec-2024
  • (2024)AdoB: Bridging Benign and Byzantine Consensus with Atomic Distributed ObjectsProceedings of the ACM on Programming Languages10.1145/36498268:OOPSLA1(419-448)Online publication date: 29-Apr-2024
  • (2024)Models for Storage in Database BackendsProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653036(58-66)Online publication date: 22-Apr-2024
  • (2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
  • (2024)Mostly Automated Verification of Liveness Properties for Distributed Protocols with Ranking FunctionsProceedings of the ACM on Programming Languages10.1145/36328778:POPL(1028-1059)Online publication date: 5-Jan-2024
  • (2024)SandTable: Scalable Distributed System Model Checking with Specification-Level State ExplorationProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650077(736-753)Online publication date: 22-Apr-2024
  • (2024)Formal Model-Driven Analysis of Resilience of GossipSub to Attacks from Misbehaving Peers2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00017(2142-2160)Online publication date: 19-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media