Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3064176.3064183acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

An Empirical Study on the Correctness of Formally Verified Distributed Systems

Published: 23 April 2017 Publication History

Abstract

Recent advances in formal verification techniques enabled the implementation of distributed systems with machine-checked proofs. While results are encouraging, the importance of distributed systems warrants a large scale evaluation of the results and verification practices.
This paper thoroughly analyzes three state-of-the-art, formally verified implementations of distributed systems: Iron-Fleet, Verdi, and Chapar. Through code review and testing, we found a total of 16 bugs, many of which produce serious consequences, including crashing servers, returning incorrect results to clients, and invalidating verification guarantees. These bugs were caused by violations of a wide-range of assumptions on which the verified components relied. Our results revealed that these assumptions referred to a small fraction of the trusted computing base, mostly at the interface of verified and unverified components. Based on our observations, we have built a testing toolkit called PK, which focuses on testing these parts and is able to automate the detection of 13 (out of 16) bugs.

References

[1]
M. Abadi and L. Lamport. The existence of refinement mappings. Theoretical Computer Science, 82(2):253--284, 1991.
[2]
M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. W. Hutto. Causal memory: Definitions, implementation, and programming. Distributed Computing, 9(1):37--49, 1995.
[3]
S. Amani, A. Hixon, Z. Chen, C. Rizkallah, P. Chubb, L. O'Connor, J. Beeren, Y. Nagashima, J. Lim, T. Sewell, J. Tuong, G. Keller, T. Murray, G. Klein, and G. Heiser. Cogent: Verifying high-assurance file system implementations. In Proceedings of the 21th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 175--188, Atlanta, GA, Apr. 2016.
[4]
M. Barnett, B.-Y. E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino. Boogie: A modular reusable verifier for object-oriented programs. In Formal methods for Components and Objects, pages 364--387. Springer, 2005.
[5]
A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst., 2(1):39--59, Feb. 1984. ISSN 0734-2071.
[6]
C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS), pages 322--335, Alexandria, VA, Oct.-Nov. 2006.
[7]
C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), pages 209--224, San Diego, CA, Dec. 2008.
[8]
Q. Carbonneaux, J. Hoffmann, T. Ramananandro, and Z. Shao. End-to-end verification of stack-space bounds for C programs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 270--281, Edinburgh, UK, June 2014.
[9]
Q. Carbonneaux, J. Hoffmann, and Z. Shao. Compositional certified resource bounds. In Proceedings of the 2015 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 467--478, Portland, OR, June 2015.
[10]
M. Castro and B. Liskov. Practical byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI), pages 173--186, New Orleans, LA, Feb. 1999.
[11]
H. Chen, Y. Mao, X. Wang, D. Zhou, N. Zeldovich, and M. F. Kaashoek. Linux kernel vulnerabilities: State-of-the-art defenses and open problems. In Proceedings of the 2nd Asia-Pacific Workshop on Systems, Shanghai, China, July 2011. 5 pages.
[12]
H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using Crash Hoare Logic for certifying the FSCQ file system. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
[13]
A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), pages 73--88, Chateau Lake Louise, Banff, Canada, Oct. 2001.
[14]
Coq development team. Coq Reference Manual, Version 8.4pl5. INRIA, Oct. 2014. http://coq.inria.fr/distrib/current/refman/.
[15]
L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337--340, Budapest, Hungary, Mar.-Apr. 2008.
[16]
R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34--41, Apr. 1978. ISSN 0018-9162.
[17]
R. W. Floyd. Assigning meanings to programs. In Proceedings of the American Mathematical Society Symposia on Applied Mathematics, volume 19, pages 19--31, 1967.
[18]
P. Fonseca, C. Li, V. Singhal, and R. Rodrigues. A study of the internal and external effects of concurrency bugs. In Proceedings of the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 221--230, Chicago, IL, June 2010.
[19]
P. Fonseca, C. Li, and R. Rodrigues. Finding complex concurrency bugs in large multi-threaded applications. In Proceedings of the ACM EuroSys Conference, pages 215--228, New York, NY, USA, Apr. 2011.
[20]
S. J. Garland and N. A. Lynch. Using I/O automata for developing distributed systems. Foundations of Component-Based Systems, 13:285312, 2000.
[21]
Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, P. Bodik, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems (HotOS), Santa Ana Pueblo, NM, May 2013.
[22]
A. Gupta, C. Popeea, and A. Rybalchenko. Predicate abstraction and refinement for verifying multi-threaded programs. In Proceedings of the 38th ACM Symposium on Principles of Programming Languages (POPL), pages 331--344, Austin, TX, Jan. 2011.
[23]
C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill. IronFleet: Proving practical distributed systems correct. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
[24]
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems, 12(3):463--492, 1990.
[25]
C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576--580, Oct. 1969.
[26]
J. Hoenicke, R. Majumdar, and A. Podelski. Thread modularity at many levels: A pearl in compositional verification. In Proceedings of the 44th ACM Symposium on Principles of Programming Languages (POPL), pages 473--485, Paris, France, Jan. 2017.
[27]
C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 243--256, Cambridge, MA, Apr. 2007.
[28]
G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207--220, Big Sky, MT, Oct. 2009.
[29]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, July 1978. ISSN 0001-0782.
[30]
L. Lamport. The temporal logic of actions. ACM Transactions on Programming Languages and Systems (TOPLAS), 16(3):872--923, 1994.
[31]
L. Lamport. The temporal logic of actions. ACM Trans. Program. Lang. Syst., 16(3):872--923, May 1994. ISSN 0164-0925.
[32]
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169, 1998.
[33]
C. Lee, S. J. Park, A. Kejriwal, S. Matsushita, and J. Ousterhout. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 71--86, Monterey, CA, Oct. 2015.
[34]
K. R. M. Leino. Dafny: An automatic program verifier for functional correctness. In Proceedings of the 16th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pages 348--370, Dakar, Senegal, Apr.-May 2010.
[35]
X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107--115, July 2009.
[36]
M. Lesani, C. J. Bell, and A. Chlipala. Chapar: Certified causally consistent distributed key-value stores. In Proceedings of the 43rd ACM Symposium on Principles of Programming Languages (POPL), pages 357--370, St. Petersburg, FL, Jan. 2016.
[37]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), pages 289--302, San Francisco, CA, Dec. 2004.
[38]
B. Liskov. Primitives for distributed computing. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP), pages 33--12, Pacific Grove, CA, Dec. 1979.
[39]
X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D3S: Debugging deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 423--437, San Francisco, CA, Apr. 2008.
[40]
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), pages 313--328, Lombard, IL, Apr. 2013.
[41]
L. Lu, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Lu. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), pages 31--44, San Jose, CA, Feb. 2013.
[42]
S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 329--339, Seattle, WA, Mar. 2008.
[43]
T. Lu. Formal verification of the Pastry protocol using TLA+. In Proceedings of the 1st International Symposium on Dependable Software Engineering: Theories, Tools, and Applications, pages 284--299, Nov. 2015.
[44]
G. C. Necula. Proof-carrying code. In Proceedings of the 24th ACM Symposium on Principles of Programming Languages (POPL), pages 106--119, Paris, France, Jan. 1997.
[45]
G. C. Necula and P. Lee. Safe kernel extensions without run-time checking. In Proceedings of the 2nd Symposium on Operating Systems Design and Implementation (OSDI), pages 229--243, Seattle, WA, Oct. 1996.
[46]
B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. Computer, 24(8):52--60, Aug. 1991. ISSN 0018-9162.
[47]
D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, pages 305--319, Philadelphia, PA, June 2014.
[48]
T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. Software Engineering, IEEE Transactions on, 31(4):340--355, 2005.
[49]
C. Scott, V. Brajkovic, G. Necula, A. Krishnamurthy, and S. Shenker. Minimizing faulty executions of distributed systems. In Proceedings of the 13th Symposium on Networked Systems Design and Implementation (NSDI), pages 291--309, Santa Clara, CA, Mar. 2016.
[50]
H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Savannah, GA, Nov. 2016.
[51]
H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Savannah, GA, Nov. 2016.
[52]
M. Sullivan and R. Chillarege. A comparison of software defects in database management systems and operating systems. In Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on, pages 475--184. IEEE, 1992.
[53]
D. B. Terry, A. J. Demers, K. Petersen, M. Spreitzer, M. Theimer, and B. W. Welch. Session guarantees for weakly consistent replicated data. In Proceedings of the 3rd IEEE International Conference on Parallel and Distributed Information Systems (PDIS), pages 140--149, Washington, DC, Sept. 1994.
[54]
J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson. Verdi: A framework for implementing and formally verifying distributed systems. In Proceedings of the 2015 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 357--368, Portland, OR, June 2015.
[55]
M. Yabandeh, N. Knežević, D. Kostić, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 229--244, San Francisco, CA, Apr. 2008.
[56]
J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. MoDist: Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI), pages 213--228, Boston, MA, Apr. 2009.
[57]
X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283--294, San Jose, CA, June 2011.
[58]
D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 249--265, Broomfield, CO, Oct. 2014.
[59]
P. Zave. Using lightweight modeling to understand Chord. SIGCOMM Comput. Commun. Rev., 42(2):49--57, Mar. 2012. ISSN 0146-4833.

Cited By

View all
  • (2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
  • (2024)Co-Developing Programs and Their Proof of CorrectnessCommunications of the ACM10.1145/362472867:3(84-94)Online publication date: 22-Feb-2024
  • (2024)Test Harness Mutilation2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW60967.2024.00053(247-256)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
April 2017
648 pages
ISBN:9781450349383
DOI:10.1145/3064176
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '17
Sponsor:
EuroSys '17: Twelfth EuroSys Conference 2017
April 23 - 26, 2017
Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)354
  • Downloads (Last 6 weeks)60
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
  • (2024)Co-Developing Programs and Their Proof of CorrectnessCommunications of the ACM10.1145/362472867:3(84-94)Online publication date: 22-Feb-2024
  • (2024)Test Harness Mutilation2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW60967.2024.00053(247-256)Online publication date: 27-May-2024
  • (2023)Asynchronous Muddy Children Puzzle (work in progress)Electronic Proceedings in Theoretical Computer Science10.4204/EPTCS.389.13389(152-166)Online publication date: 22-Sep-2023
  • (2023)Veil: A Protected Services Framework for Confidential Virtual MachinesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624763(378-393)Online publication date: 25-Mar-2023
  • (2023)Verifying Reliable Network Components in a Distributed Separation Logic with Dependent Separation ProtocolsProceedings of the ACM on Programming Languages10.1145/36078597:ICFP(847-877)Online publication date: 31-Aug-2023
  • (2023)An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security FailuresProceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses10.1145/3605770.3625214(5-15)Online publication date: 30-Nov-2023
  • (2023)Beyond isolation: OS verification as a foundation for correct applicationsProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595899(158-165)Online publication date: 22-Jun-2023
  • (2023)Greybox Fuzzing of Distributed SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623097(1615-1629)Online publication date: 15-Nov-2023
  • (2023)KIT: Testing OS-Level Virtualization for Functional Interference BugsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575731(427-441)Online publication date: 27-Jan-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media