research-article

Open access

An Empirical Study on the Correctness of Formally Verified Distributed Systems

Authors:

Arvind KrishnamurthyAuthors Info & Claims

EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems

Pages 328 - 343

https://doi.org/10.1145/3064176.3064183

Published: 23 April 2017 Publication History

Abstract

Recent advances in formal verification techniques enabled the implementation of distributed systems with machine-checked proofs. While results are encouraging, the importance of distributed systems warrants a large scale evaluation of the results and verification practices.

This paper thoroughly analyzes three state-of-the-art, formally verified implementations of distributed systems: Iron-Fleet, Verdi, and Chapar. Through code review and testing, we found a total of 16 bugs, many of which produce serious consequences, including crashing servers, returning incorrect results to clients, and invalidating verification guarantees. These bugs were caused by violations of a wide-range of assumptions on which the verified components relied. Our results revealed that these assumptions referred to a small fraction of the trusted computing base, mostly at the interface of verified and unverified components. Based on our observations, we have built a testing toolkit called PK, which focuses on testing these parts and is able to automate the detection of 13 (out of 16) bugs.

References

[1]

M. Abadi and L. Lamport. The existence of refinement mappings. Theoretical Computer Science, 82(2):253--284, 1991.

Digital Library

[2]

M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. W. Hutto. Causal memory: Definitions, implementation, and programming. Distributed Computing, 9(1):37--49, 1995.

Digital Library

[3]

S. Amani, A. Hixon, Z. Chen, C. Rizkallah, P. Chubb, L. O'Connor, J. Beeren, Y. Nagashima, J. Lim, T. Sewell, J. Tuong, G. Keller, T. Murray, G. Klein, and G. Heiser. Cogent: Verifying high-assurance file system implementations. In Proceedings of the 21th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 175--188, Atlanta, GA, Apr. 2016.

Digital Library

[4]

M. Barnett, B.-Y. E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino. Boogie: A modular reusable verifier for object-oriented programs. In Formal methods for Components and Objects, pages 364--387. Springer, 2005.

[5]

A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst., 2(1):39--59, Feb. 1984. ISSN 0734-2071.

Digital Library

[6]

C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS), pages 322--335, Alexandria, VA, Oct.-Nov. 2006.

Digital Library

[7]

C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), pages 209--224, San Diego, CA, Dec. 2008.

Digital Library

[8]

Q. Carbonneaux, J. Hoffmann, T. Ramananandro, and Z. Shao. End-to-end verification of stack-space bounds for C programs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 270--281, Edinburgh, UK, June 2014.

Digital Library

[9]

Q. Carbonneaux, J. Hoffmann, and Z. Shao. Compositional certified resource bounds. In Proceedings of the 2015 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 467--478, Portland, OR, June 2015.

Digital Library

[10]

M. Castro and B. Liskov. Practical byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI), pages 173--186, New Orleans, LA, Feb. 1999.

Digital Library

[11]

H. Chen, Y. Mao, X. Wang, D. Zhou, N. Zeldovich, and M. F. Kaashoek. Linux kernel vulnerabilities: State-of-the-art defenses and open problems. In Proceedings of the 2nd Asia-Pacific Workshop on Systems, Shanghai, China, July 2011. 5 pages.

Digital Library

[12]

H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using Crash Hoare Logic for certifying the FSCQ file system. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.

Digital Library

[13]

A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), pages 73--88, Chateau Lake Louise, Banff, Canada, Oct. 2001.

Digital Library

[14]

Coq development team. Coq Reference Manual, Version 8.4pl5. INRIA, Oct. 2014. http://coq.inria.fr/distrib/current/refman/.

[15]

L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337--340, Budapest, Hungary, Mar.-Apr. 2008.

[16]

R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34--41, Apr. 1978. ISSN 0018-9162.

Digital Library

[17]

R. W. Floyd. Assigning meanings to programs. In Proceedings of the American Mathematical Society Symposia on Applied Mathematics, volume 19, pages 19--31, 1967.

[18]

P. Fonseca, C. Li, V. Singhal, and R. Rodrigues. A study of the internal and external effects of concurrency bugs. In Proceedings of the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 221--230, Chicago, IL, June 2010.

[19]

P. Fonseca, C. Li, and R. Rodrigues. Finding complex concurrency bugs in large multi-threaded applications. In Proceedings of the ACM EuroSys Conference, pages 215--228, New York, NY, USA, Apr. 2011.

Digital Library

[20]

S. J. Garland and N. A. Lynch. Using I/O automata for developing distributed systems. Foundations of Component-Based Systems, 13:285312, 2000.

[21]

Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, P. Bodik, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems (HotOS), Santa Ana Pueblo, NM, May 2013.

[22]

A. Gupta, C. Popeea, and A. Rybalchenko. Predicate abstraction and refinement for verifying multi-threaded programs. In Proceedings of the 38th ACM Symposium on Principles of Programming Languages (POPL), pages 331--344, Austin, TX, Jan. 2011.

Digital Library

[23]

C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill. IronFleet: Proving practical distributed systems correct. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.

Digital Library

[24]

M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems, 12(3):463--492, 1990.

Digital Library

[25]

C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576--580, Oct. 1969.

Digital Library

[26]

J. Hoenicke, R. Majumdar, and A. Podelski. Thread modularity at many levels: A pearl in compositional verification. In Proceedings of the 44th ACM Symposium on Principles of Programming Languages (POPL), pages 473--485, Paris, France, Jan. 2017.

Digital Library

[27]

C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 243--256, Cambridge, MA, Apr. 2007.

[28]

G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207--220, Big Sky, MT, Oct. 2009.

Digital Library

[29]

L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978. ISSN 0001-0782.

Digital Library

[30]

L. Lamport. The temporal logic of actions. ACM Transactions on Programming Languages and Systems (TOPLAS), 16(3):872--923, 1994.

Digital Library

[31]

L. Lamport. The temporal logic of actions. ACM Trans. Program. Lang. Syst., 16(3):872--923, May 1994. ISSN 0164-0925.

Digital Library

[32]

L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998.

Digital Library

[33]

C. Lee, S. J. Park, A. Kejriwal, S. Matsushita, and J. Ousterhout. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 71--86, Monterey, CA, Oct. 2015.

Digital Library

[34]

K. R. M. Leino. Dafny: An automatic program verifier for functional correctness. In Proceedings of the 16th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pages 348--370, Dakar, Senegal, Apr.-May 2010.

[35]

X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107--115, July 2009.

Digital Library

[36]

M. Lesani, C. J. Bell, and A. Chlipala. Chapar: Certified causally consistent distributed key-value stores. In Proceedings of the 43rd ACM Symposium on Principles of Programming Languages (POPL), pages 357--370, St. Petersburg, FL, Jan. 2016.

Digital Library

[37]

Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), pages 289--302, San Francisco, CA, Dec. 2004.

Digital Library

[38]

B. Liskov. Primitives for distributed computing. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP), pages 33--12, Pacific Grove, CA, Dec. 1979.

Digital Library

[39]

X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D3S: Debugging deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 423--437, San Francisco, CA, Apr. 2008.

[40]

W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), pages 313--328, Lombard, IL, Apr. 2013.

[41]

L. Lu, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Lu. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), pages 31--44, San Jose, CA, Feb. 2013.

Digital Library

[42]

S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 329--339, Seattle, WA, Mar. 2008.

Digital Library

[43]

T. Lu. Formal verification of the Pastry protocol using TLA+. In Proceedings of the 1st International Symposium on Dependable Software Engineering: Theories, Tools, and Applications, pages 284--299, Nov. 2015.

Digital Library

[44]

G. C. Necula. Proof-carrying code. In Proceedings of the 24th ACM Symposium on Principles of Programming Languages (POPL), pages 106--119, Paris, France, Jan. 1997.

Digital Library

[45]

G. C. Necula and P. Lee. Safe kernel extensions without run-time checking. In Proceedings of the 2nd Symposium on Operating Systems Design and Implementation (OSDI), pages 229--243, Seattle, WA, Oct. 1996.

Digital Library

[46]

B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. Computer, 24(8):52--60, Aug. 1991. ISSN 0018-9162.

Digital Library

[47]

D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, pages 305--319, Philadelphia, PA, June 2014.

Digital Library

[48]

T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. Software Engineering, IEEE Transactions on, 31(4):340--355, 2005.

Digital Library

[49]

C. Scott, V. Brajkovic, G. Necula, A. Krishnamurthy, and S. Shenker. Minimizing faulty executions of distributed systems. In Proceedings of the 13th Symposium on Networked Systems Design and Implementation (NSDI), pages 291--309, Santa Clara, CA, Mar. 2016.

[50]

H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Savannah, GA, Nov. 2016.

Digital Library

[51]

H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Savannah, GA, Nov. 2016.

Digital Library

[52]

M. Sullivan and R. Chillarege. A comparison of software defects in database management systems and operating systems. In Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on, pages 475--184. IEEE, 1992.

[53]

D. B. Terry, A. J. Demers, K. Petersen, M. Spreitzer, M. Theimer, and B. W. Welch. Session guarantees for weakly consistent replicated data. In Proceedings of the 3rd IEEE International Conference on Parallel and Distributed Information Systems (PDIS), pages 140--149, Washington, DC, Sept. 1994.

[54]

J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson. Verdi: A framework for implementing and formally verifying distributed systems. In Proceedings of the 2015 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 357--368, Portland, OR, June 2015.

Digital Library

[55]

M. Yabandeh, N. Knežević, D. Kostić, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 229--244, San Francisco, CA, Apr. 2008.

[56]

J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. MoDist: Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI), pages 213--228, Boston, MA, Apr. 2009.

[57]

X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283--294, San Jose, CA, June 2011.

Digital Library

[58]

D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 249--265, Broomfield, CO, Oct. 2014.

Digital Library

[59]

P. Zave. Using lightweight modeling to understand Chord. SIGCOMM Comput. Commun. Rev., 42(2):49--57, Mar. 2012. ISSN 0146-4833.

Digital Library

Cited By

Borgarelli AEnea CMajumdar RNagendra S(2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689779
Chapman RDross CMatthews SMoy Y(2024)Co-Developing Programs and Their Proof of CorrectnessCommunications of the ACM10.1145/362472867:3(84-94)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1145/3624728
Moelius S(2024)Test Harness Mutilation2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW60967.2024.00053(247-256)Online publication date: 27-May-2024
https://doi.org/10.1109/ICSTW60967.2024.00053
Show More Cited By

An Empirical Study on the Correctness of Formally Verified Distributed Systems

Recommendations

Verdi: a framework for implementing and formally verifying distributed systems
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex ...
A Formally Verified NAT
SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication

We present a Network Address Translator (NAT) written in C and proven to be semantically correct according to RFC 3022, as well as crash-free and memory-safe. There exists a lot of recent work on network verification, but it mostly assumes models of ...
Code optimizations using formally verified properties
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Formal program verification offers strong assurance of correctness, backed by the strength of mathematical proof. Constructing these proofs requires humans to identify program invariants, and show that they are always maintained. These invariants are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems

April 2017

648 pages

ISBN:9781450349383

DOI:10.1145/3064176

Copyright © 2017 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

EuroSys '17

Sponsor:

SIGOPS

EuroSys '17: Twelfth EuroSys Conference 2017

April 23 - 26, 2017

Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
1,869
Total Downloads

Downloads (Last 12 months)354
Downloads (Last 6 weeks)60

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Borgarelli AEnea CMajumdar RNagendra S(2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689779
Chapman RDross CMatthews SMoy Y(2024)Co-Developing Programs and Their Proof of CorrectnessCommunications of the ACM10.1145/362472867:3(84-94)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1145/3624728
Moelius S(2024)Test Harness Mutilation2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW60967.2024.00053(247-256)Online publication date: 27-May-2024
https://doi.org/10.1109/ICSTW60967.2024.00053
Trufaș DTeodorescu IDiaconescu DȘerbănuță TZamfir V(2023)Asynchronous Muddy Children Puzzle (work in progress)Electronic Proceedings in Theoretical Computer Science10.4204/EPTCS.389.13389(152-166)Online publication date: 22-Sep-2023
https://doi.org/10.4204/EPTCS.389.13
Ahmad AOu BLiu CZhang XFonseca PAamodt TSwift MJerger N(2023)Veil: A Protected Services Framework for Confidential Virtual MachinesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624763(378-393)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624763
Gondelman LHinrichsen JPereira MTimany ABirkedal L(2023)Verifying Reliable Network Components in a Distributed Separation Logic with Dependent Separation ProtocolsProceedings of the ACM on Programming Languages10.1145/36078597:ICFP(847-877)Online publication date: 31-Aug-2023
https://dl.acm.org/doi/10.1145/3607859
Singla TAnandayuvaraj DKalu KSchorlemmer TDavis JTorres-Arias SMelara MSimon LVasilakis NMoriarty K(2023)An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security FailuresProceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses10.1145/3605770.3625214(5-15)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3605770.3625214
Brun MAchermann RChajed THowell JZellweger GLattuada ABaumann ACrooks NSchwarzkopf M(2023)Beyond isolation: OS verification as a foundation for correct applicationsProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595899(158-165)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595899
Meng RPîrlea GRoychoudhury ASergey IMeng WJensen CCremers CKirda E(2023)Greybox Fuzzing of Distributed SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623097(1615-1629)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623097
Liu CGong SFonseca PAamodt TJerger NSwift M(2023)KIT: Testing OS-Level Virtualization for Functional Interference BugsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575731(427-441)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575731
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents