Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394450.3397466acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

Generating correctness proofs with neural networks

Published: 15 June 2020 Publication History

Abstract

Foundational verification allows programmers to build software which has been empirically shown to have high levels of assurance in a variety of important domains. However, the cost of producing foundationally verified software remains prohibitively high for most projects, as it requires significant manual effort by highly trained experts. In this paper we present Proverbot9001, a proof search system using machine learning techniques to produce proofs of software correctness in interactive theorem provers. We demonstrate Proverbot9001 on the proof obligations from a large practical proof project, the CompCert verified C compiler, and show that it can effectively automate what were previously manual proofs, automatically producing proofs for 28% of theorem statements in our test dataset, when combined with solver-based tooling. Without any additional solvers, we exhibit a proof completion rate that is a 4X improvement over prior state-of-the-art machine learning models for generating proofs in Coq.

References

[1]
Alexander A. Alemi, François Chollet, Geoffrey Irving, Christian Szegedy, and Josef Urban. 2016. DeepMath - Deep Sequence Models for Premise Selection. CoRR abs/1606.04442 (2016). arXiv: 1606.04442 http://arxiv.org/abs/1606.04442
[2]
Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles A. Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. CoRR abs/1709.06182 (2017). arXiv: 1709.06182 http://arxiv.org/abs/1709.06182
[3]
Andrew W. Appel. 2015. Verification of a Cryptographic Primitive: SHA-256. ACM Trans. Program. Lang. Syst. 37, 2, Article 7 (April 2015), 31 pages.
[4]
Mislav Balunoviundefined, Pavol Bielik, and Martin Vechev. 2018. Learning to Solve SMT Formulas. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 10338–10349.
[5]
Kshitij Bansal, Sarah M. Loos, Markus N. Rabe, Christian Szegedy, and Stewart Wilcox. 2019.
[6]
HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version). CoRR abs/1904.03241 (2019). arXiv: 1904.03241 http://arxiv.org/abs/1904.03241
[7]
Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011.
[8]
CVC4. In Proceedings of the 23rd International Conference on Computer Aided Verification (CAV’11). Springer-Verlag, Berlin, Heidelberg, 171–177. http://dl.acm.org/citation.cfm?id=2032305.2032319
[9]
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, USA, 2933–2942. http://proceedings.mlr.press/v48/bielik16.html
[10]
Haogang Chen, Tej Chajed, Alex Konradi, Stephanie Wang, Atalay İleri, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2017. Verifying a High-performance Crash-safe File System Using a Tree Specification. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP ’17). ACM, New York, NY, USA, 270–286.
[11]
Adam Chlipala. 2013. Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant. The MIT Press.
[12]
Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for Dependent Type Theory. Journal of Automated Reasoning 61, 1 (01 Jun 2018), 423–453.
[13]
Hoa Khanh Dam, Truyen Tran, and Trang Pham. 2016. A deep language model for software code. CoRR abs/1608.02715 (2016). arXiv: 1608.02715 http://arxiv.org/abs/1608.02715
[14]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 337–340.
[15]
Jean-Christophe Filliâtre, Hugo Herbelin, Bruno Barras, Bruno Barras, Samuel Boutin, Eduardo Giménez, Samuel Boutin, Gérard Huet, César Muñoz, Cristina Cornes, Cristina Cornes, Judicaël Courant, Judicael Courant, Chetan Murthy, Chetan Murthy, Catherine Parent, Catherine Parent, Christine Paulin-mohring, Christine Paulin-mohring, Amokrane Saibi, Amokrane Saibi, Benjamin Werner, and Benjamin Werner. 1997. The Coq Proof Assistant - Reference Manual Version 6.1. Technical Report.
[16]
Jonathan Frankle, Peter-Michael Osera, David Walker, and S Zdancewic. 2016.
[17]
Example-directed synthesis: a type-theoretic interpretation. ACM SIGPLAN Notices 51 (01 2016), 802–815.
[18]
Pranav Garg, Daniel Neider, P. Madhusudan, and Dan Roth. 2016.
[19]
Learning Invariants Using Decision Trees and Implication Counterexamples. SIGPLAN Not. 51, 1 (Jan. 2016), 499–512.
[20]
Thibault Gauthier, Cezary Kaliszyk, and Josef Urban. 2017. TacticToe: Learning to Reason with HOL4 Tactics. In LPAR-21. 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning (EPiC Series in Computing), Thomas Eiter and David Sands (Eds.), Vol. 46. EasyChair, 125–143.
[21]
Sumit Gulwani. 2010. Dimensions in Program Synthesis. In PPDP ’10 Hagenberg, Austria (ppdp ’10 hagenberg, austria ed.). https://www.microsoft.com/en-us/research/publication/dimensionsprogram-synthesis/
[22]
Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete Completion using Types and Weights. PLDI 2013 (2013), 12. 27–38. http://infoscience.epfl.ch/record/188990
[23]
Jónathan Heras and Ekaterina Komendantskaya. 2014. ACL2(ml): Machine-Learning for ACL2. In Proceedings Twelfth International Workshop on the ACL2 Theorem Prover and its Applications, Vienna, Austria, 12-13th July 2014. 61–75.
[24]
Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2018.
[25]
GamePad: A Learning Environment for Theorem Proving. CoRR abs/1806.00608 (2018).
[26]
arXiv: 1806.00608 http://arxiv.org/abs/1806.00608
[27]
Cezary Kaliszyk, François Chollet, and Christian Szegedy. 2017. HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving. CoRR abs/1703.00426 (2017). arXiv: 1703.00426 http://arxiv.org/abs/1703.00426
[28]
Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. 2009. seL4: Formal Verification of an OS Kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP ’09). ACM, New York, NY, USA, 207–220.
[29]
Ekaterina Komendantskaya, Jónathan Heras, and Gudmund Grov. 2012. Machine Learning in Proof General: Interfacing Interfaces. Electronic Proceedings in Theoretical Computer Science 118 (12 2012).
[30]
Ekaterina Komendantskaya and Kacper Lichota. 2012. Neural Networks for Proof-Pattern Recognition, Vol. 7553. 427–434.
[31]
Laura Kovács and Andrei Voronkov. 2013. First-Order Theorem Proving and Vampire, Vol. 8044. 1–35. 3-642-39799-8_1
[32]
Xavier Leroy. 2009. Formal verification of a realistic compiler. Commun. ACM 52, 7 (2009), 107–115. http://xavierleroy.org/publi/compcert-CACM.pdf
[33]
Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic inference of code transforms for patch generation. 727–739.
[34]
Sarah M. Loos, Geoffrey Irving, Christian Szegedy, and Cezary Kaliszyk. 2017. Deep Network Guided Proof Search. CoRR abs/1701.06972 (2017).
[35]
arXiv: 1701.06972 http://arxiv.org/abs/1701.06972
[36]
Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. 2010. Toward a Verified Relational Database Management System. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’10). ACM, New York, NY, USA, 237–248.
[37]
Lili Mou, Ge Li, Zhi Jin, Lu Zhang, and Tao Wang. 2014. TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718 (2014). arXiv: 1409.5718 http://arxiv.org/abs/1409.5718 MAPL ’20, June 15, 2020, London, UK Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner
[38]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-exampledirected Program Synthesis. SIGPLAN Not. 50, 6 (June 2015), 619–630.
[39]
Aditya Paliwal, Sarah M. Loos, Markus N. Rabe, Kshitij Bansal, and Christian Szegedy. 2019. Graph Representations for Higher-Order Logic and Theorem Proving. CoRR abs/1905.10006 (2019). arXiv: 1905.10006 http://arxiv.org/abs/1905.10006
[40]
Lawrence C. Paulson. 1993. Natural Deduction as Higher-Order Resolution. CoRR cs.LO/9301104 (1993). http://arxiv.org/abs/cs.LO/9301104
[41]
Stephan Schulz. 2013. System Description: E 1.8. In Proc. of the 19th LPAR, Stellenbosch (LNCS), Ken McMillan, Aart Middeldorp, and Andrei Voronkov (Eds.), Vol. 8312. Springer.
[42]
Taro Sekiyama, Akifumi Imanishi, and Kohei Suenaga. 2017. Towards Proof Synthesis Guided by Neural Machine Translation for Intuitionistic Propositional Logic. CoRR abs/1706.06462 (2017). arXiv: 1706.06462 http://arxiv.org/abs/1706.06462
[43]
O. Tange. 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36, 1 (Feb 2011), 42–47. http://www.gnu.org/s/parallel
[44]
James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 357–368.
[45]
Kaiyu Yang and Jia Deng. 2019. Learning to Prove Theorems via Interacting with Proof Assistants. CoRR abs/1905.09381 (2019).
[46]
arXiv: 1905.09381 http://arxiv.org/abs/1905.09381
[47]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. PLDI (2011).

Cited By

View all
  • (2024)Graph2TacProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692234(4046-4076)Online publication date: 21-Jul-2024
  • (2024)Vision Paper: Proof-Carrying Code CompletionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694932(35-42)Online publication date: 27-Oct-2024
  • (2024)Proof Automation with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695521(1509-1520)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. Generating correctness proofs with neural networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MAPL 2020: Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages
      June 2020
      44 pages
      ISBN:9781450379960
      DOI:10.1145/3394450
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 June 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Machine-learning
      2. theorem proving

      Qualifiers

      • Research-article

      Conference

      PLDI '20
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)207
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 24 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Graph2TacProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692234(4046-4076)Online publication date: 21-Jul-2024
      • (2024)Vision Paper: Proof-Carrying Code CompletionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694932(35-42)Online publication date: 27-Oct-2024
      • (2024)Proof Automation with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695521(1509-1520)Online publication date: 27-Oct-2024
      • (2024)CoqPilot, a plugin for LLM-based generation of proofsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695357(2382-2385)Online publication date: 27-Oct-2024
      • (2024)LLM-Enhanced Theorem Proving with Term Explanation and Tactic Parameter Repair✱Proceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674823(21-30)Online publication date: 24-Jul-2024
      • (2024)CoqPyt: Proof Navigation in Python in the Era of LLMsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663814(637-641)Online publication date: 10-Jul-2024
      • (2024)Towards AI-Assisted Synthesis of Verified Dafny MethodsProceedings of the ACM on Software Engineering10.1145/36437631:FSE(812-835)Online publication date: 12-Jul-2024
      • (2024)Learning Guided Automated Reasoning: A Brief SurveyLogics and Type Systems in Theory and Practice10.1007/978-3-031-61716-4_4(54-83)Online publication date: 22-May-2024
      • (2023)Baldur: Whole-Proof Generation and Repair with Large Language ModelsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616243(1229-1241)Online publication date: 30-Nov-2023
      • (2023)Passport: Improving Automated Formal Verification Using IdentifiersACM Transactions on Programming Languages and Systems10.1145/359337445:2(1-30)Online publication date: 26-Jun-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media