research-article

Open access

Generating correctness proofs with neural networks

Authors:

Alex Sanchez-Stern,

Yousef Alhessi,

Sorin LernerAuthors Info & Claims

MAPL 2020: Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

Pages 1 - 10

https://doi.org/10.1145/3394450.3397466

Published: 15 June 2020 Publication History

Abstract

Foundational verification allows programmers to build software which has been empirically shown to have high levels of assurance in a variety of important domains. However, the cost of producing foundationally verified software remains prohibitively high for most projects, as it requires significant manual effort by highly trained experts. In this paper we present Proverbot9001, a proof search system using machine learning techniques to produce proofs of software correctness in interactive theorem provers. We demonstrate Proverbot9001 on the proof obligations from a large practical proof project, the CompCert verified C compiler, and show that it can effectively automate what were previously manual proofs, automatically producing proofs for 28% of theorem statements in our test dataset, when combined with solver-based tooling. Without any additional solvers, we exhibit a proof completion rate that is a 4X improvement over prior state-of-the-art machine learning models for generating proofs in Coq.

References

[1]

Alexander A. Alemi, François Chollet, Geoffrey Irving, Christian Szegedy, and Josef Urban. 2016. DeepMath - Deep Sequence Models for Premise Selection. CoRR abs/1606.04442 (2016). arXiv: 1606.04442 http://arxiv.org/abs/1606.04442

[2]

Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles A. Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. CoRR abs/1709.06182 (2017). arXiv: 1709.06182 http://arxiv.org/abs/1709.06182

[3]

Andrew W. Appel. 2015. Verification of a Cryptographic Primitive: SHA-256. ACM Trans. Program. Lang. Syst. 37, 2, Article 7 (April 2015), 31 pages.

Digital Library

[4]

Mislav Balunoviundefined, Pavol Bielik, and Martin Vechev. 2018. Learning to Solve SMT Formulas. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 10338–10349.

[5]

Kshitij Bansal, Sarah M. Loos, Markus N. Rabe, Christian Szegedy, and Stewart Wilcox. 2019.

[6]

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version). CoRR abs/1904.03241 (2019). arXiv: 1904.03241 http://arxiv.org/abs/1904.03241

[7]

Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011.

[8]

CVC4. In Proceedings of the 23rd International Conference on Computer Aided Verification (CAV’11). Springer-Verlag, Berlin, Heidelberg, 171–177. http://dl.acm.org/citation.cfm?id=2032305.2032319

[9]

Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, USA, 2933–2942. http://proceedings.mlr.press/v48/bielik16.html

[10]

Haogang Chen, Tej Chajed, Alex Konradi, Stephanie Wang, Atalay İleri, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2017. Verifying a High-performance Crash-safe File System Using a Tree Specification. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP ’17). ACM, New York, NY, USA, 270–286.

Digital Library

[11]

Adam Chlipala. 2013. Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant. The MIT Press.

[12]

Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for Dependent Type Theory. Journal of Automated Reasoning 61, 1 (01 Jun 2018), 423–453.

Digital Library

[13]

Hoa Khanh Dam, Truyen Tran, and Trang Pham. 2016. A deep language model for software code. CoRR abs/1608.02715 (2016). arXiv: 1608.02715 http://arxiv.org/abs/1608.02715

[14]

Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 337–340.

Digital Library

[15]

Jean-Christophe Filliâtre, Hugo Herbelin, Bruno Barras, Bruno Barras, Samuel Boutin, Eduardo Giménez, Samuel Boutin, Gérard Huet, César Muñoz, Cristina Cornes, Cristina Cornes, Judicaël Courant, Judicael Courant, Chetan Murthy, Chetan Murthy, Catherine Parent, Catherine Parent, Christine Paulin-mohring, Christine Paulin-mohring, Amokrane Saibi, Amokrane Saibi, Benjamin Werner, and Benjamin Werner. 1997. The Coq Proof Assistant - Reference Manual Version 6.1. Technical Report.

[16]

Jonathan Frankle, Peter-Michael Osera, David Walker, and S Zdancewic. 2016.

[17]

Example-directed synthesis: a type-theoretic interpretation. ACM SIGPLAN Notices 51 (01 2016), 802–815.

Digital Library

[18]

Pranav Garg, Daniel Neider, P. Madhusudan, and Dan Roth. 2016.

[19]

Learning Invariants Using Decision Trees and Implication Counterexamples. SIGPLAN Not. 51, 1 (Jan. 2016), 499–512.

Digital Library

[20]

Thibault Gauthier, Cezary Kaliszyk, and Josef Urban. 2017. TacticToe: Learning to Reason with HOL4 Tactics. In LPAR-21. 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning (EPiC Series in Computing), Thomas Eiter and David Sands (Eds.), Vol. 46. EasyChair, 125–143.

[21]

Sumit Gulwani. 2010. Dimensions in Program Synthesis. In PPDP ’10 Hagenberg, Austria (ppdp ’10 hagenberg, austria ed.). https://www.microsoft.com/en-us/research/publication/dimensionsprogram-synthesis/

[22]

Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete Completion using Types and Weights. PLDI 2013 (2013), 12. 27–38. http://infoscience.epfl.ch/record/188990

[23]

Jónathan Heras and Ekaterina Komendantskaya. 2014. ACL2(ml): Machine-Learning for ACL2. In Proceedings Twelfth International Workshop on the ACL2 Theorem Prover and its Applications, Vienna, Austria, 12-13th July 2014. 61–75.

[24]

Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2018.

[25]

GamePad: A Learning Environment for Theorem Proving. CoRR abs/1806.00608 (2018).

[26]

arXiv: 1806.00608 http://arxiv.org/abs/1806.00608

[27]

Cezary Kaliszyk, François Chollet, and Christian Szegedy. 2017. HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving. CoRR abs/1703.00426 (2017). arXiv: 1703.00426 http://arxiv.org/abs/1703.00426

[28]

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. 2009. seL4: Formal Verification of an OS Kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP ’09). ACM, New York, NY, USA, 207–220.

Digital Library

[29]

Ekaterina Komendantskaya, Jónathan Heras, and Gudmund Grov. 2012. Machine Learning in Proof General: Interfacing Interfaces. Electronic Proceedings in Theoretical Computer Science 118 (12 2012).

[30]

Ekaterina Komendantskaya and Kacper Lichota. 2012. Neural Networks for Proof-Pattern Recognition, Vol. 7553. 427–434.

Digital Library

[31]

Laura Kovács and Andrei Voronkov. 2013. First-Order Theorem Proving and Vampire, Vol. 8044. 1–35. 3-642-39799-8_1

[32]

Xavier Leroy. 2009. Formal verification of a realistic compiler. Commun. ACM 52, 7 (2009), 107–115. http://xavierleroy.org/publi/compcert-CACM.pdf

Digital Library

[33]

Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic inference of code transforms for patch generation. 727–739.

Digital Library

[34]

Sarah M. Loos, Geoffrey Irving, Christian Szegedy, and Cezary Kaliszyk. 2017. Deep Network Guided Proof Search. CoRR abs/1701.06972 (2017).

[35]

arXiv: 1701.06972 http://arxiv.org/abs/1701.06972

[36]

Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. 2010. Toward a Verified Relational Database Management System. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’10). ACM, New York, NY, USA, 237–248.

Digital Library

[37]

Lili Mou, Ge Li, Zhi Jin, Lu Zhang, and Tao Wang. 2014. TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718 (2014). arXiv: 1409.5718 http://arxiv.org/abs/1409.5718 MAPL ’20, June 15, 2020, London, UK Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner

[38]

Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-exampledirected Program Synthesis. SIGPLAN Not. 50, 6 (June 2015), 619–630.

Digital Library

[39]

Aditya Paliwal, Sarah M. Loos, Markus N. Rabe, Kshitij Bansal, and Christian Szegedy. 2019. Graph Representations for Higher-Order Logic and Theorem Proving. CoRR abs/1905.10006 (2019). arXiv: 1905.10006 http://arxiv.org/abs/1905.10006

[40]

Lawrence C. Paulson. 1993. Natural Deduction as Higher-Order Resolution. CoRR cs.LO/9301104 (1993). http://arxiv.org/abs/cs.LO/9301104

[41]

Stephan Schulz. 2013. System Description: E 1.8. In Proc. of the 19th LPAR, Stellenbosch (LNCS), Ken McMillan, Aart Middeldorp, and Andrei Voronkov (Eds.), Vol. 8312. Springer.

[42]

Taro Sekiyama, Akifumi Imanishi, and Kohei Suenaga. 2017. Towards Proof Synthesis Guided by Neural Machine Translation for Intuitionistic Propositional Logic. CoRR abs/1706.06462 (2017). arXiv: 1706.06462 http://arxiv.org/abs/1706.06462

[43]

O. Tange. 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36, 1 (Feb 2011), 42–47. http://www.gnu.org/s/parallel

[44]

James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A Framework for Implementing and Formally Verifying Distributed Systems. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 357–368.

Digital Library

[45]

Kaiyu Yang and Jia Deng. 2019. Learning to Prove Theorems via Interacting with Proof Assistants. CoRR abs/1905.09381 (2019).

[46]

arXiv: 1905.09381 http://arxiv.org/abs/1905.09381

[47]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. PLDI (2011).

Cited By

Blaauwbroek LOlšák MRute JMassolo FPiepenbrock JPestun VSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Graph2TacProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692234(4046-4076)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692234
Kamran PDevanbu PStanford CFilkov VRay BZhou M(2024)Vision Paper: Proof-Carrying Code CompletionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694932(35-42)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691621.3694932
Lu MDelaware BZhang TFilkov VRay BZhou M(2024)Proof Automation with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695521(1509-1520)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695521
Show More Cited By

Index Terms

Generating correctness proofs with neural networks
1. Computing methodologies
  1. Machine learning
  2. Symbolic and algebraic manipulation

Recommendations

More Church–Rosser Proofs

The proofs of the Church–Rosser theorems for β, η, and β ∪ η reduction in untyped λ-calculus are formalized in Isabelle/HOL, an implementation of Higher Order Logic in the generic theorem prover Isabelle. For β-reduction, both the standard proof and ...
Finding Proofs in Tarskian Geometry

We report on a project to use a theorem prover to find proofs of the theorems in Tarskian geometry. These theorems start with fundamental properties of betweenness, proceed through the derivations of several famous theorems due to Gupta and end with the ...
Inducing theorem provers from proofs
ICTAI '97: Proceedings of the 9th International Conference on Tools with Artificial Intelligence

Abstract: A methodology is introduced for the automatic generation of theorem provers from sets of proof examples. As an example, this methodology was used to generate a theorem prover for intuitionistic propositional calculus which proves any theorem ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MAPL 2020: Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

June 2020

44 pages

ISBN:9781450379960

DOI:10.1145/3394450

General Chair:
Koushik Sen,
Program Chair:
Mayur Naik

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '20

Sponsor:

SIGPLAN

PLDI '20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 15, 2020

London, UK

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
761
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)20

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Blaauwbroek LOlšák MRute JMassolo FPiepenbrock JPestun VSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Graph2TacProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692234(4046-4076)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692234
Kamran PDevanbu PStanford CFilkov VRay BZhou M(2024)Vision Paper: Proof-Carrying Code CompletionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops10.1145/3691621.3694932(35-42)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691621.3694932
Lu MDelaware BZhang TFilkov VRay BZhou M(2024)Proof Automation with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695521(1509-1520)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695521
Kozyrev ASolovev GKhramov NPodkopaev AFilkov VRay BZhou M(2024)CoqPilot, a plugin for LLM-based generation of proofsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695357(2382-2385)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695357
Liu XLiu HYi XWang J(2024)LLM-Enhanced Theorem Proving with Term Explanation and Tactic Parameter Repair✱Proceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674823(21-30)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674823
Carrott PSaavedra NThompson KLerner SFerreira JFirst Ed'Amorim M(2024)CoqPyt: Proof Navigation in Python in the Era of LLMsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663814(637-641)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663814
Misu MLopes CMa INoble J(2024)Towards AI-Assisted Synthesis of Verified Dafny MethodsProceedings of the ACM on Software Engineering10.1145/36437631:FSE(812-835)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643763
Blaauwbroek LCerna DGauthier TJakubův JKaliszyk CSuda MUrban J(2024)Learning Guided Automated Reasoning: A Brief SurveyLogics and Type Systems in Theory and Practice10.1007/978-3-031-61716-4_4(54-83)Online publication date: 22-May-2024
https://doi.org/10.1007/978-3-031-61716-4_4
First ERabe MRinger TBrun YChandra SBlincoe KTonella P(2023)Baldur: Whole-Proof Generation and Repair with Large Language ModelsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616243(1229-1241)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616243
Sanchez-Stern AFirst EZhou TKaufman ZBrun YRinger T(2023)Passport: Improving Automated Formal Verification Using IdentifiersACM Transactions on Programming Languages and Systems10.1145/359337445:2(1-30)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3593374
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten