1 Introduction

Interactive theorem provers (ITPs) include facilities for computing within the hosted logic. To illustrate what we mean by such a feature, consider the following function, sum, which sums a list of natural numbers:

figure a

A facility for computing within the logic can be used to automatically produce theorems such as the following, where \({\textsf { sum}\;[\textsf {5};\;\textsf {9};\;\textsf {1}]}\) was given as input, and the following equation is the output, showing that the input reduces to 15:

figure b

The ability to compute such equations in ITPs is essential for use of verified decision procedures, for proving ground cases in proofs, and for running a parser, pretty printer or even compiler inside the logic for a smaller trusted computing base (TCB).

Higher-order logic (HOL) does not have a primitive rule for (or notion of) computation. Instead, HOL ITPs such as HOL Light [13], HOL4 [20], and Isabelle/HOL [18] implement computation as a derived rule using rewriting, which in turn is a derived rule implemented outside their trusted kernels. As a result, computation is slow in these systems.

To understand why computation is so sluggish in HOL ITPs, it is worth noting that the primitive steps taken for the computation of Example (1) are numerous:

  • At each step, rewriting has to match the subterm that is to be reduced next (according to a call-by-value order) against each pattern it knows (the left-hand side of the definitions of sum, hd, tl, if- then- else and more); when a match is found, it needs to instantiate the equation whose left-hand-side matched, and then reconstruct the surrounding term.

  • Computation over natural numbers is far from constant-time, since , and are syntactic sugar for numerals built using the constructor-like functions and constants: Bit0, Bit1 and  0. For example, . Deriving equations describing the evaluation of simple operations such as \(+\) requires rewriting with lemmas such as these:

    figure c

HOL ITPs employ such laborious methods for computation in order to keep their soundness-critical kernel as small as possible: the small size and simplicity of the kernel is key to the soundness argument.

1.1 Fast, Verified Computation

We want fast computation which does not compromise the soundness of slower approaches, and remains user-friendly.

First, we develop our fast computation feature for Candle. Candle is a port of HOL Light which has been proved sound (in HOL4) with respect to a formal semantics of higher-order logic [1]. By extending this proof, we can increase the size and complexity of Candle’s kernel without compromising soundness.

Second, we copy this verified fast computation feature to HOL4’s kernel, and implement automation outside of HOL4’s kernel to ease usage of our new feature. The fast computation feature modifies HOL4’s kernel, but is trustworthy because we have proved it sound for Candle’s kernel and copied it over with only minor changes; the automation does not modify HOL4’s kernel, so it requires no additional trust.

1.1.1 Candle: Implementation and Verification of Fast Computation

This first part of this paper (Sects. 28) is about how we have added a fast function for computation to the Candle HOL ITPFootnote 1 and updated Candle’s soundness proof.

With this new function for computation, proving equations via computation is cheap. For the sum example:

  • The input term is traversed once, and is converted to a datatype better suited for fast computation. In this representation, each occurrence of sum, hd, tl, etc. can be expanded directly without pattern-matching.

  • The representation makes use of host-language integers, so \({{5}\;{{+}}\;({9}\;{{+}}\;({1}\;{{+}}\;{0}))}\) can be computed using three native addition operations.

  • Once the computation is complete, the result is converted back to a HOL term and an equation similar to (1) is returned to the user.

Our function for computation works on a first-order, untyped, monomorphic subset of higher-order logic. Our implementation interprets terms of this subset using a call-by-value strategy and host-language (CakeML [15]) features such as arbitrary precision integer arithmetic.

In our experiments, we observe speed gains of several orders of magnitude when comparing Candle’s new compute function against established in-logic computation implementations used by other HOL ITPs (Sect. 8).

1.1.2 HOL4: Automated Translation into Code Equations for Fast Computation

The second part of this paper (Sects. 912) is about how we have added the same fast computation to the kernel of the HOL4 ITP, and implemented automated tooling outside the kernel to improve its usability. This tooling allows users to invoke fast computation without writing their functions in the first-order subset. In particular, our automation:

  1. 1.

    Automatically translates ordinary input to the first-order subset outside the kernel.

  2. 2.

    Applies the fast kernel function for computation.

  3. 3.

    Automatically translates the result back to ordinary HOL4 outside the kernel.

The addition of the fast computation feature modifies HOL4’s kernel, but all of our automation is implemented outside the kernel, requiring no additional trust. In fact, the automation is proof-producing: on each invocation, it constructs a theorem like (1), equating the input HOL4 term to the result of its fast computation. We evaluate our automation by applying it to previously written, ordinary HOL4 functions: the implementation of the CakeML compiler backend [22].

1.2 Contributions

We make the following contributions:

  • We implement a fast interpreter for terms as a user-accessible primitive in the Candle and HOL4 kernels. The implementation allows users to supply code equations dictating how user-defined (recursive) functions are to be interpreted.

  • We prove the new primitive correct with respect to the inference rules of Candle’s higher-order logic, and fully integrate it into the existing end-to-end soundness proof of the Candle ITP.

  • We implement an automatic translator from ordinary HOL4 user definitions to functions operating over the datatype expected by the new kernel function.

  • We show that our compute function is significantly faster than in-logic compute facilities provided by other HOL ITPs. Within HOL4, it speeds up significant benchmarks by an order of magnitude: in-logic execution of the CakeML type inferencer, and in-logic compilation using the CakeML compiler backend.

In this extended version of our original conference paper, the italicised contributions in the list above are new.

1.3 Notation: \(\textsf { =}\) and \(\textsf { =}_{\textsf {c}}\), \(\vdash \) and \(\vdash _{\textsf {c}}\), etc.

This paper contains syntax at multiple, potentially confusing levels. The Candle logic is formalised inside the HOL4 logic. Symbols that exist in both logics are suffixed by a subscript \(_\textsf{c}\) in its Candle version; as an example, \(\textsf { =}\) denotes equality in the HOL4 logic, and \(\textsf { =}_{\textsf {c}}\) denotes equality in the embedded Candle logic. Likewise, a theorem in HOL4 is prefixed by \(\vdash \), while a Candle theorem is prefixed by \(\vdash _{\textsf {c}}\).

1.4 Source Code and Proofs

Our Candle sources are on GitHub,Footnote 2 and the Candle project is hosted on the CakeML website.Footnote 3 Our HOL4 sources are also on GitHub.Footnote 4

2 Approach: Candle

This section explains, at a high level, the approach we have taken to add and verify a new function for computation to Candle. We begin with some background on Candle itself; readers familiar with Candle can skip ahead to Sect. 2.2.

2.1 Background: Candle

Candle is a port of HOL Light. Whereas HOL Light is implemented using OCaml, Candle is formalised in HOL4. In HOL4, it has further been proved sound: Candle can only output theorems constructed by the primitive inference rules of its higher-order logic. As these inference rules are known to be sound, Candle can only output valid theorems. Like all LCF-style theorem provers, Candle’s primitive inference rules are implemented in its kernel; unlike other provers, Candle’s kernel is not trusted to be sound, but verified. Candle therefore provides a valuable testing environment for making changes to higher-order logic prover kernels: as long as the changes can be incorporated into its soundness proof, they do not undermine trust.

Candle also has a verified implementation in CakeML. This implementation is automatically synthesised from the HOL4 functions of Candle’s formalisation using a tool described in prior work [4]. This tool is proof-producing: when invoked, it outputs not only CakeML code, but also a HOL4 theorem that relates the behaviour of the output CakeML code with the input HOL4 functions. This can be considered a form of verified code extraction from a theorem prover.

2.2 Overview

First, we introduce a new computation-friendly internal representation (IR) for expressions that we want to do computation on. On entry to the new compute primitive, the given input term is translated into this new IR. This step corresponds to the downwards arrow in Fig. 1. We use an IR that is separate from the syntax of HOL (theorems, terms and types), since the datatypes used by HOL ITPs are badly suited for efficient computation.

Fig. 1
figure 1

Diagram illustrating the approach we take to embedding logical terms into compute expressions and evaluating them using an interpreter

We perform computation on the terms of our IR via interpretation. This step is the solid right arrow in Fig. 1. On termination, this interpretation arrives at a return value, which is translated to a HOL term r. This step is the up arrow in Fig. 1. The new compute primitive returns, to the user, a theorem stating that the input term is equal to the result r. The theorem states an equality between the points connected with a dashed arrow in Fig. 1.

The new compute primitive is a user-accessible function in the Candle kernel and must therefore be proved to be sound, i.e., every theorem it returns must follow by the primitive inference rules of higher-order logic (HOL).

We prove the soundness of our computation function by showing that there is some way of using the inference rules of HOL to mimic the operations of the interpreter. Our use of the inference rules amounts to showing that there is some proof by rewriting that establishes the desired equation. Since Candle performs no proof recording of any kind, it suffices, for the soundness proof, to prove (in HOL4) that there exists some derivation in the Candle logic.

The connection established by the existentially quantified proof is illustrated by the dashed arrow in Fig. 1. All reasoning about the interpreter (the lower horizontal arrow) must be w.r.t. the view of the interpreter provided by the translations to and from the IR (the vertical arrows). Nearly all of our theorems are stated in terms of the arrow upwards, i.e. from IR to HOL.

2.3 Staging

The development of our new compute primitive for Candle was staged into increasingly complex versions.

  1. 1.

    Version 1 (Sect. 3) was a proof-of-concept Candle function for computing the result of additions of concrete natural numbers. This function was implemented as a conversionFootnote 5 in the Candle kernel that given a term computes the result of the addition r, and returns a theorem \(\vdash _\textsf{c} {{m}~{{\textsf { +}}_{\textsf{c}}}\;{n}\;{{{\textsf { =}}_{\textsf{c}}}}\;{r}}\) to the user. Internally, the implementation makes use of the arbitrary precision integer arithmetic of the host language, i.e. CakeML. The purpose of Version 1 was to establish the concepts needed for this work rather than producing something that is actually useful from a user’s point of view.

  2. 2.

    Version 2 (Sect. 4) improved on Version 1 by replacing the type of natural numbers by a datatype for binary trees with natural numbers at the leaves, and by supporting structured control-flow (if-then-else), projections (fst, snd) and the usual arithmetic operations. This version supports nesting of expressions.

  3. 3.

    Version 3 (Sect. 5) extended Version 2 with support for user-supplied code equations for user-defined constants. The code equations are allowed to be recursive and thus the interpreter had to support recursion. This extension also brought with it variables: from Version 3 and on, all interpreters are able to interpret input terms containing variables.

  4. 4.

    Version 4 (Sect. 6) replaced the naive interpreter with one that is designed to evaluate with less overhead. This version uses O(1) operations to look up code equations and uses environments rather than substitutions for variable bindings.

  5. 5.

    The final Version 5 (Sect. 7) is, at the time of writing, left as future work for Candle. In Version 5, our intention is to speed up the interpreter by using partial evaluation at compile time so that less work is done at during actual interpreter execution.

At the time of writing, Version 4 (Sect. 6) is integrated into the existing Candle implementation and end-to-end soundness proof. This is the version we perform Candle benchmarks on (Sect. 8). Should refer to Sect. 6 has been integrated into the HOL4 implementation (Sect. 10). This is the version we perform HOL4 benchmarks on, in particular execution of the CakeML compiler backend (Sect. 12).

3 Addition of Natural Numbers (Version 1)

In this section, we describe how we implemented and verified a function for computing addition on natural numbers in the Candle kernel. This is the first step towards a proven-correct function for computation. The approach can be reused to produce computation functions for other kinds of binary operations (multiplication, subtraction, division, etc.) on natural numbers, and it can be used to build evaluators for arithmetic inside more general expressions (Sect. 4).

3.1 Input and Output

In Version 1, the user can input terms such as \(3~{+}_{\textsf{c}}~5\) or \(100~{+}_{\textsf{c}}~0\), i.e., terms consisting of one addition applied to two concrete numbers. The numbers are shown here as 3, 5, 100, 0, even though they are actually terms in a binary representation based on the constant \({\textsf { 0}}_{\textsf{c}}\), and the functions \({\textsf { Bit0}}_{\textsf{c}}\) and \({\textsf { Bit1}}_{\textsf{c}}\) in the Candle logic.

The output is a theorem equating the input with a concrete natural number. For the examples above, the function returns the following equations. The subscript \( _{\textsf{c}}\) is used below to highlight that these are theorems in the Candle logic.

The results 8 and 100 are computed using addition outside the logic. The challenge is to show that the same computation can be derived from the equations defining \({+}_{\textsf{c}}\) (in Candle) using the primitive inference rules of the Candle logic.

3.2 Key Soundness Lemma

In order to prove the soundness of Version 1 (required for its inclusion in the Candle kernel), we need to prove the following theorem, which states: if the arithmetic operations are defined as expected (num_thy_ok) in the current Candle theory \(\Gamma \), then the addition \(({+}_{\textsf{c}})\) of the binary representations (mk_num) of two natural numbers m and n is equal (\({=}_{\textsf{c}}\)) to the binary representation of , where + is HOL4 addition.

(2)

To understand the theorem statement above, let us look at the definitions of mk_ num and num_ thy_ ok. The function mk_ num converts a HOL4 natural number into the corresponding Candle natural number in binary representation:

figure d

The definition of num_ thy_ ok asserts that various characterising equations hold for the Candle constants \({+}_{\textsf{c}}\), \({\textsf { Bit0}}_{\textsf{c}}\) and \({\textsf { Bit1}}_{\textsf{c}}\) (the complete definition is not shown below). Here m and n are natural number typed variables in Candle’s logic:

figure e

We use num_ thy_ ok as an assumption in Theorem (2), since the computation function is part of the Candle kernel, which does not include these definitions when the prover starts from its initial state (and thus the user might define them differently).

A closer look at num_ thy_ ok reveals that \({+}_{\textsf{c}}\) is characterised by its simple Suc-based equations and \({\textsf { Bit1}}_{\textsf{c}}\) is characterised in terms of Suc and \({+}_{\textsf{c}}\). As a result, a direct proof of Theorem (2) would be awkward at best.

To keep the proof of Theorem (2) as neat as possible, we defined the expansion of a HOL number into a tower of \({\textsf { Suc}}_{\textsf{c}}\) applications to \({\textsf { 0}}_{\textsf{c}}\):

figure f

and split the proof of Theorem (2) into two lemmas. The first lemma is a mk_ suc variant of Theorem (2):

(3)

and the second lemma \({\textsf { =}}_{\textsf{c}}\)-equates mk_ num with mk_ suc:

(4)

The proof of Theorem (3) was done by induction on m, and involved manually constructing the \({\vdash \!\!}_{\textsf{c}}\)-derivation that connects the two sides of \({\textsf { =}}_{\textsf{c}}\) in Theorem (3). The proof of Theorem (4) is a complete induction on n and uses Theorem (3) when \({+}_{\textsf{c}}\) is encountered. Finally, the proof of Theorem (2) is a manually constructed \({\vdash \!\!}_{\textsf{c}}\)-derivation that uses Theorems (4) and (3), and symmetry of \({\textsf { =}}_{\textsf{c}}\).

3.3 From Candle Terms to Natural Numbers

The development described above is in terms of functions ( mk_ num, mk_ suc) that map HOL4 natural numbers into Candle terms, but the implementation also converts in the opposite direction: on initialisation, the computation function converts the given input term into its internal representation (see the leftmost arrow in Fig. 1).

We use the following function, dest_ num, to extract a natural number from a Candle term. This function traverses terms, and recognises the function symbols used in Candle’s binary representation of natural numbers:

figure g

One should read the application as a natural number in binary with least significant bit and other bits .

The correctness of dest_ num is captured by the following theorem, which states that \({\textsf { =}}_{\textsf{c}}\) is preserved when moving from Candle terms to natural numbers in HOL4, and back:

(5)

Version 1 of the computation function also has a function for taking apart a Candle term with a top-level addition \({+}_{\textsf{c}}\):

figure h

Equipped with the functions dest_ num and dest_ add, and Theorems (2) and (5), it is easy to prove the following soundness result. This theorem states: if a term t can be taken apart using dest_ add and dest_ num, then the term constructed by mk_ num and the HOL4 addition, \(+\), can be used as the right-hand side of an equation that is \({\vdash \!\!}_{\textsf{c}}\)-derivable.

(6)

This theorem can be used as the blueprint for an implementation that uses \(\textsf { dest}\_\textsf { add}\), dest_ num and mk_ num.

3.4 Checking num_thy_ok

Note that Theorem (6) assumes num_ thy_ ok, which requires certain equations to be true in the current theory \(\Gamma \). To be sound, an implementation of our computation function must check that this assumption holds.

We deal with this issue in a pragmatic manner, by requiring that the user provides a list of theorems corresponding to the equations of num_ thy_ ok on each invocation of our computation function. This approach makes num_ thy_ ok easy to establish, but causes extra overhead on each call to the computation function.

3.5 Soundness of CakeML Implementation

Throughout this section, we have treated functions in the logic of HOL4 as if they were the implementation of the Candle kernel. We do this because the actual CakeML implementation of the Candle kernel is automatically synthesised from these functions in the HOL4 logic (Sect. 2.1).

Updating the entire Candle soundness proof for the addition of Version 1 of the compute function was straightforward, once Theorem (6) was proved and the code for checking num_ thy_ ok was verified.

4 Compute Expressions (Version 2)

This section describes Version 2, which generalises the very limited Version 1. While Version 1 only computed addition of natural numbers, Version 2 can compute the value of any term that fits in a subset of Candle terms that we call compute expressions. Compute expressions operate over a Lisp-inspired datatype which we call compute values; in Candle, this type is called cval.

Even though this second version might at first seem significantly more complicated than the first, it is merely a further development of Version 1. The approach is the same: the soundness theorems we prove are very similar looking. Technically, the most significant change is the introduction of a datatype, cexp, that is the internal representation of all valid input terms, i.e., compute expressions.

4.1 Compute Values

To the Candle user, the following cval datatype is important, since all terms supplied to the new compute function must be of this type. The cval datatype is a Lisp-inspired binary tree with natural numbers ( num) at the leaves:

figure i

4.2 Compute Expressions

The other important datatype is cexp, which is the internal representation that user input is translated into:

figure j

The cexp datatype is extended with more constructors in Version 3, described in Sect. 5.

4.3 Input Terms

On start up, the compute function maps the given term into the cexp type. For example, given this term as input:

figure k

the function will create this cexp expression:

figure l

This mapping assumes that certain functions in the Candle logic (e.g. \({\textsf { fst}}_{\textsf{c}}\)) correspond to certain constructs in the cexp datatype (e.g. \({\textsf { Uop}\;\textsf { Fst}}\)). Note that there is nothing strange about this: in Version 1, we assumed that \({+}_{\textsf{c}}\) corresponds to addition. We formalise the assumptions about \({\textsf { fst}}_{\textsf{c}}\), etc., next.

4.4 Context Assumption: cexp_ thy_ ok

Just as in Version 1, Version 2 also has an assumption on the current theory context. In Version 1, the assumption num_ thy_ ok ensured that the Candle definition of \({+}_{\textsf{c}}\) satisfied the relevant characterising equations. For Version 2, this assumption was extended to cover characterising equations for all names that the conversion from user input to cexp recognises: \({\textsf { cif}}_{\textsf{c}}\), \({\textsf { fst}}_{\textsf{c}}\), etc. These characterising equations fix a semantics for the Candle functions that correspond to constructs of the cexp type. For simplicity, all of the Candle functions take inputs of type cval and produce outputs of type cval.

Our implementation makes no attempt at ensuring that functions are applied to sensible inputs. Consequently, it is perfectly possible to write strange terms in this syntax, such as , or . We resolve such cases in a systematic way:

  • Operations that expect numbers as input treat \({\textsf { Pair}}_{\textsf{c}}\) values as .

  • Operations that expect a pair as input return when applied to \({\textsf { Num}}_{\textsf{c}}\) values.

This treatment of the primitives can be seen in the assumption, called cexp_ thy_ ok, that we make about the context for Version 2. Below, x and y are variables in the Candle logic with type cval. The lines specifying \({\textsf { add}}_{\textsf{c}}\) are:

figure m

The lines specifying \({\textsf { fst}}_{\textsf{c}}\) are:

figure n

The following characteristic equations for \({\textsf { cif}}_{\textsf{c}}\) illustrate that we treat \({\textsf { Num}}_{\textsf{c}}~{{{\textsf { 0}}_{\textsf{c}}}}\) as false and all other values as true:

figure o

Comparison primitives return \({\textsf { Num}}_{\textsf{c}}~1\) for true.

4.5 Soundness

The following theorem summarises the operations and soundness of Version 2. If a term t can be successfully converted (using dest_ term) into a compute expression cexp, then t is equal to a Candle term created (using mk_ term) from the result of evaluating cexp using a straightforward evaluation function ( cexp_ eval):

(7)

Note the similarity between Theorems (6) and (7). Where Theorem (6) uses \(+\), Theorem (7) calls cexp_ eval. The evaluation function cexp_ eval is defined to traverse the \(\textsf { cexp}\) bottom-up in the most obvious manner, respecting the evaluation rules set by the characterising equations of cexp_ thy_ ok.

4.6 CakeML Code and Integration

The functions dest_ term, cexp_ eval and mk_ term are the main workhorses of the implementation of Version 2. Corresponding CakeML implementations are synthesised from these functions (Sect. 2.1). The definition of the evaluator function cexp_ eval uses arithmetic operations (\(+\), −, \(\times \), div, mod, <, \(=\)) over the natural numbers. Such arithmetic operations translate into arbitrary precision arithmetic operations in CakeML.

Updating the Candle proofs for Version 2 was a straightforward exercise, given the prior integration of Version 1.

5 Recursion and User-Supplied Code Equations (Version 3)

Version 3 of our compute function for Candle adds support for (mutually) recursive user-defined functions. The user supplies function definitions in the form of code equations.

5.1 Code Equations

In our setting, a code equation for a user-defined constant c is a Candle theorem of the form:

figure p

where each variable has type cval and the expression has type cval. Furthermore, the free variables of must be a subset of . Note that any user-defined constants, including c, are allowed to appear in in fully applied form. Every user-defined constant appearing in some right-hand side e must have a code equation describing that constant.

5.2 Updated Compute Expressions

We updated the cexp datatype to allow variables ( Var), applications of user-supplied constants ( App), and, at the same time, we added let-expressions ( Let):

figure q

Variables are present to capture the values bound by the left-hand sides of code equations and by let-expressions.

The interpreter for Version 3 of our compute function uses a substitution-based semantics, and keeps track of code equations as a simple list. This style of semantics maps well to the Candle logic’s substitution primitive, thus simplifying verification, but at a price:

  • At each let-expression or function application, the entire body of the let-expression or the code equation corresponding to the function is traversed an additional time, to substitute out variables.

  • At each function application, the code equation corresponding to the function name is found using linear search, making the interpreter’s performance degrade as more code equations are added.

We address these shortcomings in Version 4 of our compute function, in Sect. 6.

5.3 Soundness

The following theorem is the essential part of the soundness argument for Version 3. The user supplies the Version 3 compute function with: a list of theorems that allows it to establish cexp_ thy_ ok, a list eqs of code equations, and a term t to evaluate. Every theorem in eqs must be a Candle theorem (\({\vdash \!\!}_{\textsf{c}}\)). Definitions defs are extracted from the given code equations eqs. A compute expression cexp is extracted from the given input term w.r.t. the available definitions defs. An interpreter, interpret, is run on the cexp, and its execution is parameterised by defs and a clock which is initialised to a large number init_ ck. If the interpreter returns a result res, i.e. Some res, then an equation between the input term t and \(\textsf { mk}\_\textsf { term}~{{res}}\) can be returned to the user.

(8)

There are a few subtleties hidden in this theorem that we will comment on next.

First, the statement of Theorem 8 includes an assumption that the user-provided code equations are theorems in the context \(\Gamma \). This holds trivially for Candle’s implementation: the user can only pass in the code equations as theorems, and can only construct these theorems if they are valid in the current context. Therefore, Candle’s soundness result allows us to discharge this assumption where Theorem 8 is used.

Second, the functions dest_ eqs and dest_ term perform sanity checks on their inputs. For example, dest_ eqs checks that all right-hand sides in the equations eqs mention only constants for which there are code equations in eqs.

Third, the interpret function, which is used for the actual computation, takes a clock (sometimes called fuel parameter) in order to guarantee termination. The clock is decremented by interpret on each function application (i.e. App), and, due to the substitution semantics, also on each Let. If the clock is exhausted, interpret returns None. This clock is not strictly necessary to reason about soundness: we could have implemented interpret directly in CakeML and verified its soundness by appealing to either CakeML’s semantics or its program logic (which supports reasoning about diverging programs [19]). In particular, interpret can diverge without introducing unsoundness, because then no theorem is ever constructed. For simplicity, we chose to keep in line with the rest of Candle, by specifying and verifying interpret in HOL4 before synthesising a verified CakeML implementation (Sect. 5.4). This approach requires a clock argument to guarantee termination.

5.4 CakeML Code

As with previous versions, the CakeML implementation of the computation function is synthesised from the HOL4 functions (Sect. 2.1). For efficiency purposes, the generated CakeML code for interpret avoids returning an option and instead signals running out of clock using an ML exception. We note that it is very unlikely that a user has the patience to wait for a timeout since the value of init_ ck is very large (maximum smallnum).

5.5 Integration

Updating the Candle proofs for Version 3 required more work than Versions 1 and 2, since we had to verify the correctness of the sanity checks performed on the user-provided list of code equations.

6 Efficient Interpreter (Version 4)

For Version 4, we replaced the interpreter function, interpret, with compilation to a different datatype for which we have a faster interpreter.

The new datatype for representing programs is called ce, shown below. It uses de Bruijn indexing for local variables, and represents function names as indices into a vector of function bodies, which means lookups happen in constant time during interpretation. Rather than representing primitive functions by names, the ce datatype represents primitive functions as (shallowly embedded) function values that can immediately be applied to the result of evaluating the argument expressions.

figure r

The new faster interpreter exec, shown in Fig. 2, for the \(\textsf { ce}\) datatype addresses the two main shortcomings of Version 3. First, it drops the substitution semantics in favour of de Bruijn variables and an explicit environment, so that variable substitution can be deferred until (and if) the value bound to a variable is needed. Second, all function names are replaced by an index into a vector which stores all user-provided code equations.

Fig. 2
figure 2

Definition of the fast interpreter as functions in HOL

Fig. 3
figure 3

CakeML code synthesised from definition of exec, as described in Sect. 2.1. It is verified to implement the HOL functions in Fig. 2

When updating Version 3 to Version 4, we simply replaced the following line in the implementation:

with the line below, which calls the compilers compile_ all and compile (these translate cexp into ce, turning variables and function names into indices) and then runs exec, which interprets the program represented in terms of ce:

Updating the proofs for Version 4 was a routine exercise in proving the correctness of the compilers compile_ all and compile. In this proof, compiler correctness is an equality: the new line computes exactly the same result as the line that it replaced (under some assumptions that are easily established in the surrounding proof). The adjustments required in the existing proofs were minimal.

7 Partial Evaluation (Version 5)

At the time of writing, Version 5 is not yet implemented in Candle. However, the plan is replace the exec function of Fig. 3 with code along the lines of the code shown in Fig. 4. While the old version walks the AST as it interprets it, the new version performs all case splitting on the AST before the actual exeuction starts. When the new version exeuctes, it always has at hand a closure value that will perform the relevant next step of the computation.

In some preliminary stress tests, Version 5 is almost twice as fast as Version 4. We intend to verify Version 5 and upgrade the Candle implementation to use it.

Fig. 4
figure 4

Interpreter exec sped up using partial evaluation before exeuction

8 Evaluation: Candle

In this section, we report on experiments comparing our new verified compute function (Version 4) to the existing in-logic interpreters of HOL4, HOL Light, and Isabelle/HOL.Footnote 6 These are implemented as derived rules, as alluded to in Sect. 1. We tested the performance of each on the following four example programs written as functions in the logic of HOL.

  • the factorial function,

  • enumeration of primes,

  • generating and reversing a list of numbers,

  • simulation of a 100-by-100 grid of cells in Conway’s Game of Life.

The tests were run on an Intel i7-7700K 4.2GHz with 64 GiB RAM running Ubuntu 20.04. The code used for these experiments is available on the CakeML website.Footnote 7

The results, in Fig. 1, show that Candle’s new compute function runs orders of magnitude faster than the existing in-logic interpreters of HOL4, HOL Light, and Isabelle/HOL, on all four examples. In fact, it was difficult to choose input sizes large enough for us to gather meaningful measurements from our computation function, while keeping the runtimes of its derived counterparts within minutes. For this reason, we added one large data point to the end of each experiment. In Fig. 1, a dash, —, indicates that we did not test this.

Table 1 Running times for Candle’s compute primitive, HOL4’s Eval, HOL Light’s Eval, and Isabelle/HOL’s in-logic Code_Simp.dynamic_conv

The first two examples, factorial and primes, demonstrate the speed of computing basic arithmetic, while the latter two examples, list reversal and Conway’s Game of Life, highlight that Candle’s compute primitive is also well suited for structural computations, such as tree traversals, that do not involve much arithmetic.

8.1 Factorial

The first example is a standard, non-tail-recursive factorial function, tested on inputs of various sizes. The results of the tests are shown in the upper left corner of Table 1. This is the only test where HOL Light outperforms HOL4. We suspect HOL Light benefits from the effort that has gone into making its basic arithmetic evaluate fast.

8.2 Prime Eumeration

The second example, primes_upto, enumerates all primes up to n and returns them as a list. We chose to implement the checks for primality using trial division, since it is challenging to compute division and remainder efficiently inside the logic. The results of the tests are shown in the upper right corner of Table 1.

8.3 List Reversal

The third example performs repeated list reversals. The function rev_enum creates a list of the natural numbers \(\left[ 1,2,\dots ,n\right] \) and then calls a tail-recursive list reverse function \(\textsf { rev}\) on this list 1000 times. The results of the tests are shown in the lower left corner of Table 1. On this and the next benchmark HOL Light performs much worse than HOL4 and Isabelle/HOL.

8.4 Conway’s Game of Life

The fourth example simulates a 100-by-100 grid of cells in Conway’s Game of Life. The surface of this 100-by-100 square is set up with five Gosper glider generators that will interact. The set up is self contained, i.e., it never touches the boundaries of the 100-by-100 grid. The simulation runs for n steps of Conway’s Game of Life. The results of the tests are shown in the lower right corner of Table 1.

9 Approach: HOL4

This section describes, at a high level, how we have added our fast computation feature to the HOL4 theorem prover and built automation to make it as usable as possible there.

In particular, we implement Version 4 (Sect. 6) of the fast computation feature in HOL4’s kernel. Evaluation proofs in HOL4 can then benefit from fast computation without compromising trust due to our verification in Candle. In HOL4, the function is implemented in Standard ML, in contrast to Candle’s CakeML implementation (which is itself automatically synthesised from HOL4 definitions—Sect. 2.1). It also requires minor modifications due to incompatibilities between HOL4 and HOL Light, on which Candle is based (Sect. 10).

However, the new computation feature is not very user-friendly when used directly: it only accepts terms and code equations which are stated in terms of the cval type. We want to perform fast computation using ordinary user-defined functions which do not conform to this subset of the logic. Therefore, we build an automatic tool which translates common (mostly first-order) HOL functions into the cval datatype (Sect. 11). The tool provides a user-friendly flow for invoking the new compute function: it hides almost all details of cval from the user. Critically, this flow is proof-producing: it returns a top-level theorem which equates the input term with the result of its fast computation, with no mention of cval. We evaluate our flow by applying it to previously written, ordinary HOL4 functions: the implementations of CakeML’s type inferencer and compiler backend (Sect. 12). We then demonstrate significant speedups when executing both the inferencer and compiler backend within HOL4’s logic. In future work, we hope to port the tool back to Candle too.

9.1 Notation

All logical syntax from here on is within the HOL4 logic. In HOL4, the cval datatype is renamed to cv. Its constructors are Num and Pair (c.f. \({\textsf { Num}}_{\textsf{c}}\) and \({\textsf { Pair}}_{\textsf{c}}\)), and its “primitive” functions are prefixed by “ cv_” (e.g. cv_ fst and cv_ if instead of \({\textsf { fst}}_{\textsf{c}}\) and \({\textsf { cif}}_{\textsf{c}}\)).

10 Fast Computation for HOL4

The implementation of Version 4 (Sect. 6) in HOL4 had to contend with two key differences between HOL4 and Candle. Both are inherited from HOL Light, on which Candle is based. The first concerns HOL4’s theory structure: each HOL4 constant belongs to a theory, so HOL4 namespaces are structured; however, HOL Light and Candle have flat namespaces.

The second concerns the representation of natural numbers: recall that Candle (and HOL Light) uses the constants \({\textsf { 0}}_{\textsf{c}}\), \({\textsf { Bit0}}_{\textsf{c}}\) and \({\textsf { Bit1}}_{\textsf{c}}\). HOL4 instead uses 0, Bit1, and Bit2. The difference here is illustrated by the following theorems:

This discrepancy slightly alters the translation between logical and native numbers performed by our new compute function.

To bring behaviour in line with Candle (and HOL Light), we also had to extend HOL4’s natural number division and modulo operators to specify dividing by zero and using a modulus of zero:

This is also in line with several other systems (e.g. Isabelle/HOL, Lean, and Coq). Previously, there were no equations defining the behaviour of these two constants when applied to zero.

We also removed the clock argument to the interpreter in HOL4: unlike in Candle’s HOL4 formalisation, we do not need to consider termination.

11 Automated Translation into Code Equations

This section describes automation that allows users to apply fast computation without first writing definitions in the cv subset.

This automation centres around canonical in-logic translations for each supported HOL4 type: a translation from HOL4 terms of the type to untyped cv terms; and a translation to HOL4 terms of the type from cv terms. To perform fast computation of an ordinary HOL4 term, the automation does the following:

  1. 1.

    Automatically translates from the ordinary HOL4 term to its cv version.

  2. 2.

    Invokes the new kernel function to perform fast computations over the cv version.

  3. 3.

    Translates the resulting cv term back to an ordinary HOL4 term.

In this way, almost all details of cv are hidden from the user.

Critically, the automation is written outside of HOL4’s kernel and demands no additional trust. In particular, the automation is proof-producing: at a high level, each of the three steps above produces a theorem as follows, where term is the input term.

figure s

We can compose these theorems as follows. Here we rely on the key property of from and to translations, namely that to is a left-inverse of from: .

figure t

The top-level theorem presented to the user is therefore \(\vdash \textsf { term} = \textsf { term'}\). Overall, our automation has invoked the new kernel function without exposing the cv datatype to the user.

Therefore, our automation must generate the from and to translations automatically for each supported type. It must further translate ordinary HOL4 functions to corresponding cv code equations using from translations, so that it can handle user-defined functions. The remainder of this section describes how we have designed our automation in more detail.

11.1 To and From cv

The key property of from and functions mentioned above is that is a left-inverse of from. We define this property as from_ to:

figure u

Note that for simplicity, from and to are translation functions, not relations. This choice has a convenient side effect: every representation in the cv datatype is unique, which plays well with HOL4 equality:

figure v

Below, we show from and to functions for the following types: bool, num, and list.

figure w

These satisfy the following from_ to theorems. Note how the type variable \({\alpha }\) in the list type \({\alpha }\)  list is represented by the \({{{{ from}}\_a}}\) (resp. \({{to\_a}}\)) function argument to from_ list (resp. to_list).

figure x

We derive from_ to theorems for a variety of “primitive” HOL4 types: booleans, natural numbers, characters, integers, rationals, machine words, options, pairs, sums, and lists. We straightforwardly implement a library that automatically defines from and to functions for user-defined datatypes and derives corresponding from_ to theorems.

11.2 Using from Theorems

Using these from_ to theorems, we can demonstrate our high-level approach (see the start of this section) on a concrete example. Suppose the user wants to evaluate the term . As always, the input term must be closed. The automation first derives a from theorem: its left-hand side is the input term wrapped in the appropriate from function (in this case, from_ bool); and its right-hand side consists only of cv functions.

figure y

The tool then uses the new compute function to prove that equals T, relying on the from_ to theorem for the bool type and Theorem (9) above:

figure z

This concrete example directly mirrors the high-level approach described at the start of this section. Though it is very simple, this example showcases the overall approach which remains unchanged regardless of the complexity of the input term.

11.3 Deriving from Theorems

The example in Sect. 11.2 shows that our automation must be able to derive from theorems of the form:

We define a judgement for such from equations, cv_ rep, to better structure the terms our automation will handle:

figure aa

This can be read as follows: “given precondition pre, the function from translates term e to the cv term \({{cv\_e}}\)”.

Our automation derives cv_ rep judgements bottom-up, using various helper lemmas. For example, to derive cv_ rep for a natural number literal, it instantiates the following judgement:

figure ab

To derive cv_ rep for natural number addition (\(+\)), it uses the following result to establish a connection with cv_ add:

figure ac

By way of example, consider the term . Lemmas (11) and (12) are used to derive a cv_ rep judgement as follows:

figure ad

In this example, the preconditions are trivial. We will see the need for non-trivial preconditions when translating recursive and partially specified functions (Sects. 11.5 and 11.7).

We have derived lemmas akin to (11) and (12) for the various “primitive” literals and operations of the types mentioned in Sect. 11.1.

11.4 Translating Functions

Any interesting user input will contain functions—our automation must be able to translate them into code equations. We first describe the translation of non-recursive functions, taking add1 below as an example.

figure ae

We first derive a cv_ rep judgement for the right-hand side of the function definition: n \({+}\) . Unlike the example in Sect. 11.3, this term has a free variable, that is, the argument n to the function. To handle such free variables, our automation produces a trivial cv_ rep judgement on-the-fly:

figure af

To see why this is trivial, simply unfold the definition of cv_ rep (10).

The resulting cv_ rep judgement for the right-hand side is therefore as follows, where we simplify the precondition to T for brevity:

figure ag

Now, the second argument of the cv_ rep judgement above (13) is essentially the right-hand side of the cv version of add1 we wish to define. We need only generalise \({\textsf { from}\_\textsf { num}\;{n}}\) to \({{cv\_n}}\) to define cv_ add1:

figure ah

We now rewrite (13) using the definitions of cv_ add1 and add1:

figure ai

Unfolding the definition of cv_ rep shows precisely the correspondence between cv_ add1 and add1 (this theorem is derived by our automation and saved for the user):

figure aj

A postprocessing step derives a cv_ rep judgement suitable for use in future translations which refer to add1. In particular, we transform the judgement to mirror Theorem (12), so that each argument variable results in a separate assumption.

figure ak

11.5 Translating Recursive Functions

Translating recursive functions is more involved. We describe the process by example, using the following input definition:

figure al

First, we show the cv_ rep judgement for if, which is unchanged from non-recursive translations. Note how the final precondition uses the condition of the if to guard the preconditions of the two branches (\({{{p}_{\textrm{2}}}}\) and \({{{p}_{\textrm{3}}}}\)).

figure am

The key challenge with recursive functions is to produce cv_ rep judgements for their recursive calls. We create trivial cv_ rep judgements, much like the ones we produce for variables (Sect. 11.4). For example, the recursive call to num_ list produces the judgement below. Here, the variable is a placeholder for the cv_ num_ list function we will soon define.

figure an

Using this cv_ rep judgement, we continue the bottom-up derivation as usual to produce the following judgement for the right-hand side of num_ list:

figure ao

Now, we can define the cv function cv_ num_ list as before, by considering the second argument to the cv_ rep judgement:

figure ap

This function is recursive, so it may require a termination proof. Our automation can derive simple termination proofs (or avoid them for tail-recursive functions); for more complicated situations the user may need to supply one.

We instantiate Theorem (14) with the newly defined cv_ num_ list, and rewrite it using the definitions of cv_ num_ list and num_ list:

figure aq

If we make explicit the universal quantification over and expand the definition of cv_ rep, we can see that this theorem has a familiar structure:

figure ar

This matches the antecedent of the induction theorem arising from the termination proof of num_ list, namely:

figure as

A simple application of modus ponens gives the following:

figure at

Again, postprocessing reformulates this for use in future translations of terms which refer to num_ list:

figure au

11.6 Translating Let-Bindings and Pattern Matching

Our examples so far have only bound variables at the top-level as function arguments. Interesting user input may contain local variable bindings due to let-bindings and pattern matching using case-expressions. We require slightly more involved cv_ rep judgements to support these.

In HOL4, let-bindings are syntactic sugar for applications of the LET constant, which is simply defined as function application: LET     . For example, the binding desugars to LET . We use the following cv_ rep judgement to translate \({\textsf { LET}\;{f}\;{x}}\). Here, \({{from\_a}}\) is the from function for the type of the let-bound variable, and the second precondition, \({{{p}_{\textrm{2}}}}\), is a function.

figure av

Note that the second assumption (\(\forall \,v.\; {\textsf { cv}\_\textsf { rep}}\;\ldots \)) is universally quantified: it must hold for any application of f to a variable v. Note too that we lift the application \({{from\_a}\;{v}}\) into the assumptions, not just v.

We use similar cv_ rep judgements to translate pattern matching via case-expressions. In HOL4, case-expressions desugar to functions such as , in the case of lists. Here, x is the list being pattern matched: if it is empty, the pattern match is equal to nil, if it is non-empty, then and the pattern match is equal to \({{cons}\;{h}\;{t}}\).

figure aw

The cv_ rep judgement for list_ case is shown below. It handles local variable bindings in the non-empty case in much the same way as the cv_ rep judgement for LET above.

figure ax

Our tooling automatically derives such cv_ rep judgements for the constant of each datatype it encounters.

11.7 Translating Partially Specified Functions

All functions in higher-order logic are total; however, they can be partially specified if they leave certain cases unspecified. For example, the function that extracts the head of a list is partially specified because it has no equation for the empty list case:

figure ay

To translate such examples, we first transform the pattern match on the left-hand side of the definition to a case-expression on its right-hand side. For partially specified functions (such as hd above), this produces an assumption which ensures we are not in any of the unspecified cases. As the unspecified cases are now impossible, they can simply take the value of the HOL4 constant for an arbitrary value ( arb, a fixed but unspecified value of any HOL4 type). For example, the definition of hd above becomes:

figure az

There is no value in the cv type which corresponds to arb, so we use a trivial cv_ rep theorem with a false precondition:

figure ba

The result of translating the hd function (as described in Sect. 11.4) remains a fully specified cv function, cv_ hd:

figure bb

However, the partiality becomes clear in the user-presentable theorem relating hd and cv_ hd. This theorem has a precondition which requires the input to satisfy a new constant defined by our automation, hd_ pre:

figure bc

The definition of hd_ pre has two equivalent conjuncts, which seems verbose until we consider their origins: the first is simply the precondition of (15); the second arises when translating the arb on the right-hand side of (15). To work with hd_ pre, users should derive a readable lemma, such as .

Partial specification and recursion interact to produce more interesting preconditions. For example, consider a function to extract the last element of a list:

figure bd

During translation, our tooling automatically defines the precondition for such a function as an inductive relation. For last, this relation has only one rule:

figure be

In manual proof, one can easily show that by induction.

During development of this part of our tool, we discovered a neat trick: the induction theorem arising from the definition of last_ pre closely resembles the induction theorem required when translating a fully specified recursive function, as described at the end of Sect. 11.5, and can be automatically transformed to match exactly. This allows a mode of operation in which users can instruct our tooling to treat an input recursive function as partially specified: this has the effect of outsourcing the induction proof of Sect. 11.5 to the user. This can be useful when the induction theorem required for the proof in Sect. 11.5 cannot be found, or does not quite suffice.

11.8 Translating Higher-Order Functions

The cv type is unable to represent any function type with an infinite domain (e.g. any function which accepts a natural number as input). Therefore from_ to (Sect. 11.1) cannot hold of most interesting function types, making it impossible for our tool to translate higher-order functions in general.

However, we can handle functions with finite domains, as well as uses of higher-order functions that can be rephrased as first-order characterisations.

11.8.1 Functions with Finite Domains

Consider a HOL function . We can represent this in the cv type as a pair whose first element is the result of applying the function to true, and whose second value is the function applied to false:

In other words, the cv representation is a lookup table for the function: an exhaustive enumeration of its input–output behaviour. We can then represent the application \(\textsf { f}\;{\textsf { T}}\) using cv_ fst (similarly cv_ snd for the false case).

This idea generalises to any function with a finite domain: we can represent it as a pair which encapsulates a lookup table, and represent its application as a projection from the pair. Our automation is sufficiently extensible that users can define the from/to functions for such representations, and pass the corresponding from_ to theorem to the tooling for use in translation.

We have exercised this ability on a small example of addressable memory.Footnote 8 In particular, we represent a memory with 256-bit addresses which stores natural number values as the HOL4 type \({256\;\textsf { word}\;{\rightarrow }\;\textsf { num}}\). We define lookup and update functions for this memory, manually define cv versions of these, and derive cv_ rep theorems which relate the two. Again, our automation is sufficiently extensible that users can supply such manually derived cv_ rep theorems for use in translation.

However, we cannot exhaustively enumerate all possible input–output pairs for a function with \(2^{256}\) possible input values. Instead, we make a small optimisation: our lookup table consists of a default output value, and a series of input–output pairs for which the output values differs from the default. In this way, we can efficiently represent a sparsely populated memory as a cv value.

11.8.2 First-Order Characterisations

We have implemented automation that recognises common patterns (e.g. mapping over a list with a concrete function), and proves them equivalent to first-order characterisations. A preprocessing phase rewrites the original pattern to its first-order characterisation before translation. In practice, we have found that the preprocessing allows our tool to remove most common uses of higher-order functions such as map, filter, and so on.

For example, suppose an input function contains for some list . The preprocessing defines a copy of map with its function argument specialised to add1:

figure bf

It then uses the definition of map to derive first-order equations for map_ add1:

figure bg

Preprocessing then rewrites any uses of map  add1 to map_ add1 so that the main part of the translator never sees this higher-order function.

12 Evaluation: HOL4

In this section, we evaluate the HOL4 implementation of our fast new computation feature (Sect. 10) and its associated automation (Sect. 11). We demonstrate its performance by exercising it on significant benchmarks: in-logic evaluation of the CakeML type inferencer [23] and in-logic self-compilation of the CakeML compiler backend [22]. We demonstrate its usability by example: we showcase the user experience for the in-logic evaluation of several small examples. These include partially specified and recursive functions.

12.1 Performance

Using our new fast compute feature and associated automation, we have enabled fast in-logic evaluation of existing ordinary HOL4 functions: CakeML’s type inferencer and CakeML’s compiler backend. In particular, we first fed these functions to our automation so that it could produce cv code equations. Then, we invoked our automation on several existing evaluation proofs which use these functions. Previously, the proofs relied on HOL4’s existing in-logic evaluation facilities, known as Eval (Sect. 13). We have been able to speed them up significantly using our fast computation, without significantly exposing the cv type. The table below summarises the improvement.

 

Eval

This work

Type inferencer

\(\sim 2\) hours

\(<1\) second

Compiler backend

\(\sim 14.5\) hours

\(\sim 45\) minutes

12.1.1 CakeML Type Inferencer

In-logic evaluation of CakeML’s type inferencer is crucial to the proof of soundness of the Candle theorem prover. Previously, this took more than two hours on an Intel® desktop machine with 64 GB RAM. Now, evaluation takes less than a second on the same machine. Of course, we incur the cost of our automation translating the inferencer functions to cv code equations. However, this takes less than 2 min on the same Intel® machine, producing 90 cv equations.

One interesting aspect of this translation was the need to first re-express the (type) unification algorithm (ultimately from [14]) tail-recursively. The input algorithm’s natural expression involves both nested recursion (when a type has more than one sub-type), and partiality (it requires a well-formed input substitution). Transforming to tail-recursive, CPS, form is semi-automatic (mostly by rewriting), but, because cv-values cannot include abstractions, the continuations need to be defunctionalised. We were guided by Danvy and Nielsen [8], and ultimately generated a (verified equivalent) work-list version of the unification algorithm.

This process can be illustrated by the handling of the (slightly less) complicated substitution function used in the unification algorithm (Fig. 5). This recursively applies a substitution map (the s parameter) over the three possible forms of inference terms: constants ( UConst), variables ( UVar), and applications of operators (identified by numbers n) to lists of terms ( UApp). Constants are left alone, applications have the algorithm applied recursively to the list argument, and variables are looked up in the substitution map with the function cvwalk before the substitution can be applied again.

The first step of the transformation (to CPS) is handled by judicious application of the contify function, which has the almost trivial definition:

figure bh

More important are two illustrative consequences, a handling of “normal” function applications, and of if-then-else:

figure bi

The first equation refers to the cwc constant, which is semantically identical to contify but takes its two arguments in the opposite order. Using a second constant means that a naïve rewriting strategy using contify behaves well. The second equation makes it clear that when evaluating conditional forms, it doesn’t make sense to chain the continuations through the evaluation of the \(\textsf {then}\) and \(\textsf {else}\) branches. Instead, both sub-terms are transformed with the same continuation k. Similar forms are required for case-constants that discriminate on data type constructors (these lie behind the \(\textsf {case}\;\cdots \;\textsf {of}\;\cdots \) syntax, as mentioned in Sect. 11.6).

The transformation of csubst starts by defining an auxiliary ( cwalkstarl) to handle the recursive calls to , and allows the definition of a continuation-passing \({{{\textsf { csubst}_{\textsf{k}}}}}\), subsuming both. The definition of is then no more than

Using the “contification” rewrites above, it is then straightforward to derive the characterisation in Fig. 6.

Fig. 5
figure 5

csubst: the equation defining the substitution function. Its precondition, cwfs , requires that the lookup-tree be well-formed

Fig. 6
figure 6

\({{{\textsf { csubst}_{\textsf{k}}}}}\): the equation defining the continuation-passing substitution function. Here and later we omit the assumption. The earlier lookup function cvwalk has also been replaced by a tail-recursive reformulation tcvwalk

The last step is defunctionalisation, generating “work-list” versions of these functions. There are four different continuation abstractions in Fig. 6: one for each constructor form, and one (beginning ) that wraps the inner UApp continuation. Each abstraction has just one continuation variable free ( in all cases), so the concrete type we use to represent continuations is naturally a list, where each element of the list bundles the other free variables of the abstraction (except the unvarying substitution map ). Thus the form for the UApp continuation is APk, which takes a list of terms and a number (corresponding to the variables and respectively). The fourth form has the same type, but in this case the list of terms corresponds to the variable.

This leads to the definition of the \({\textsf { kclkont}}\) type:

figure bj

With the continuation abstractions encoded as the type \({\textsf { kclkont}\;\textsf { list}}\), the next step of defunctionalisation is to define what it is to apply such a continuation to a result (a list of inference terms). The combination of this application function and the original is presented in Fig. 7.

Fig. 7
figure 7

The final, tail-recursive, first-order formulation of substitution over inference terms. Rather than present this as two mutually recursive functions, the fact that they have the same type allows us to have one function with a boolean flag to pick between the two “modes”. When is true, the reified continuation is applied to argument ; when false, the list is processed

Once tail-recursive and first-order, the side-conditions around partiality are cleanly handled by the methods above (Sect. 11.7) and the cv-translation proceeds without further difficulty.

12.1.2 CakeML Compiler Backend

We extended our use of fast computation for CakeML’s entire compiler backend, once again using our automation to replace a previous Eval-based approach. As with the type inferencer above, we needed to re-express some functions to fit the new setting; however, the necessary changes were much more limited. In particular, various compiler functions operated over lists of expressions, using a nested pattern match to deconstruct the expression at the head of the list. We expressed (verified equivalent) versions as a mutual recursion between two functions: one which operates over single expressions, and one which operates over lists of expressions.

We incorporated our use of fast computation into CakeML’s regression testing framework. Each regression test bootstraps the compiler by verified self-compilation [17], that is, by in-logic evaluation of the compiler backend on itself. The process is repeated for multiple targets: x64 (64- and 32-bit) and Silver, a custom ISA for a verified processor [16]. Previously it took around 2.5 days per run, now it takes less than a day. This has made it feasible to add an additional target (Arm 64-bit) without significantly bloating run times.

The table below summarises run times for these parts of the regression test.

 

Eval

This work

x64 64-bit

\(\sim \) 14.5 hours

\(\sim \) 45 minutes

x64 32-bit

\(\sim \) 7 hours

\(\sim \) 50 minutes

Silver

\(\sim \) 8 hours

\(\sim \) 40 minutes

Arm 64-bit

\(\sim \) 55 minutes

12.2 Usability

We have already demonstrated that our automation is usable in significant projects by applying it to CakeML. Here, we give a further flavour of its usability using three small examples:

  • Parity checking, defined in mutual recursion;

  • Incrementing each element of a list: the add1 and map_ add1 examples from Sects. 11.4 and 11.8 respectively; and

  • Functional quicksort.

For each example, we show how the user can invoke our automation in a HOL4 REPL. We display user input on the left, and HOL4 output on the right. We superficially simplify HOL4 output for ease of reading, and elide output that is not interesting.

12.2.1 Parity Checking:

 even and  odd

Consider the following mutually recursive definition, where the monospaced metalanguage identifier on the left is bound to the object language definition on the right:

The user can invoke our automation using the cv_trans and cv_eval entrypoints as follows:

figure bk

Using cv_trans, our automation has successfully translated even and odd into code equations: it has defined cv versions cv_ even and cv_ odd, using their tail-recursion to establish termination. The saved theorem cv_even_thm is then as follows:

The user can then perform fast computation using even and odd using the cv_eval entrypoint, as demonstrated by the theorem even_999.

12.2.2 Incrementing Each Element of a List: add1_list

Consider the following definitions:

The user can invoke our automation as follows.

figure bl

Here, we have used the cv_auto_trans entrypoint, which is a more automatic version of cv_trans. In particular, it has:

  • Determined that to translate add1_list_def, it must first translate add1_def.

  • Followed the procedure described in Sect. 11.8 to produce and translate a first-order characterisation of \({\textsf { map}\;\textsf { add1}}\).

The printed output logs the steps the automation has taken to produce the final translation of add1_list_def. The final saved theorem is as follows:

As before, cv_eval can then be used to perform fast computation using add1_ list.

12.2.3 Functional Quicksort: qsort

Consider the following standard definition of functional quicksort:

We elide the definition of partition for brevity. There are two key features to note:

  • This function is not obviously terminating, so HOL4 cannot automatically infer its termination. To convince HOL4, the user must prove that partition preserves list lengths, and so the recursive calls to qsort are on strictly smaller lists.

  • This function is not obviously total, due to its use of hd which is not specified on empty lists. Fortunately, the guard on the length of ensures hd is applied to a non-empty list.

The user cannot invoke cv_trans or cv_auto_trans to translate qsort: these will be unable to prove termination of the cv version cv_ qsort. Also, both entrypoints will fail if their translations produce any preconditions, ensuring preconditions cannot be silently introduced. The use of cv_ hd in cv_ qsort would produce such a precondition as described in Sect. 11.7. Therefore we use a different entrypoint, cv_trans_pre_rec:

figure bm

First, the user translates partition_def as usual. Then they invoke cv_trans_pre_rec with the additional argument termination_tactic. This is a tactic which must convince HOL4 that cv_qsort terminates. Proofs of termination for cv functions can be a little fiddly, but they mostly echo the proofs for the ordinary functions from which the cv versions were derived. In this case, the tactic relies on a lemma that cv_ partition preserves the sizes of its cv terms (a 3-line proof), and shows that cv_ qsort is always called on strictly smaller arguments (a 4-line proof).

Note that during interactive development, the user would first invoke cv_trans_pre qsort_def to discover the necessary termination goal. This entrypoint does not fail if the translation produces a precondition, and will return the precondition and any termination goal to the user. After establishing an appropriate tactic to solve the termination goal, the user can then pass it to cv_trans_pre_rec.

This leaves the precondition, qsort_ pre. The user must define a theorem which proves the precondition before they can use qsort for fast computation. Here, we show the necessary theorem using standard HOL4 syntax with a cv_pre annotation. Theorems with this annotation are passed to our automation, which will ensure the precondition is appropriately discharged. The short proof above is all that is required, because the precondition boils down to showing that a list with length not less than two is non-empty. The proof uses induction because it must discharge the preconditions arising from the recursive calls to qsort. Here, we rewrite (rw) using qsort_pre_cases: the automatically derived inversion lemma for the inductive relation qsort_ pre, which has been defined according to Sect. 11.7.

13 Related Work

This section discusses related work in the area of computation in ITPs.

13.1 HOL4

Barras implemented a fast interpreter for terms in HOL4 [6], usually referred to as Eval. This is effectively a derived rule as described in Sect. 1. Eval implements an extended version of Crégut’s abstract machine KN [7], and performs strong reduction of open terms. It supports user-defined datatypes and pattern-matching, as well as rewriting using user-supplied conversions. It is this Eval function that was used when benchmarking HOL4 in Sect. 8 and evaluating CakeML’s type inferencer and compiler backend in Sect. 12.

Unlike our work in Candle, Eval operates directly on HOL terms, though the automation described in Sect. 11 reduces this gap in HOL4. The HOL4 kernel was modified by Barras to make this as efficient as possible: the HOL4 kernel uses de Bruijn terms and explicit substitutions to ensure that Eval runs fast. However, true to LCF tradition, all interpreter steps are implemented using basic kernel inferences.

13.2 HOL Light

A HOL Light port of Eval exists [21] and was used in the performance comparisons of Sect. 8. However, unlike HOL4, the HOL Light kernel has not been optimised for running Eval; HOL Light uses name-carrying terms without explicit substitutions, making this port comparably slow.

13.3 Isabelle/HOL

Isabelle/HOL supports two mechanisms for efficient evaluation, both due to Haftmann and Nipkow. A code generation feature [11, 12] can be used to synthesise ML, Haskell and Scala programs from closed terms, which can then be compiled and executed efficiently. We borrow the concept of code equations (Sect. 5) from their work, but note that Isabelle’s code equations are more general than ours.

The second option is based on a normalisation-by-evaluation (NBE) mechanism [5] and synthesises ad-hoc ML interpreters over an untyped lambda calculus datatype from (possibly open) HOL terms. The ML code is compiled and executed by an ML compiler, and the resulting values are reinterpreted as HOL terms.

Both methods support a rich, higher-order, computable fragment of HOL. However, both also escape the logic, make use of unverified functions for synthesising functional programs, and rely on unverified compilers and language runtimes for execution.

13.4 Dependent Type Theories

Computation is an integral part of ITPs based on higher-order type theories, such as Coq [24], and Lean [9]. Their logics identify terms up to normal form and must reduce terms as part of their proof checking (i.e., type checking). Consequently, their trusted kernels must implement an interpreter or compiler of some sort.

Coq supports proof by computation using its interpreter (accessible via vm_compute), as well as native code generation to OCaml (accessible via native_compute). Internally, Coq’s interpreter implements an extended version of the ZAM machine used in the interactive mode of the OCaml compiler [10], but with added support for open terms.

A formalisation of the abstract machine used in the interpreter exists [10], but the actual Coq implementation is completely unverified.

13.5 First-Order Logic

ACL2 is an ITP for a quantifier-free first-order logic with recursive, untyped functions. It axiomatises a purely functional fragment of Common Lisp, which doubles as term syntax and host language for the system. As a consequence, some terms can be compiled and executed at native speed. However, this execution speed comes at a cost: no verified Lisp compiler exists that can host ACL2 and its soundness-critical code encompasses essentially the entire theorem prover.

14 Conclusion

We have added efficient functions for computation to two HOL ITPs: Candle and HOL4. Using Candle, we developed our computation function in stages and verified its soundness: it can only produce theorems that follow from the inference rules of higher-order logic. For HOL4, we further built automation to provide a simple user interface to the function.

Our new compute function requires all input to be first-order computations over a Lisp-inspired datatype for compute values ( cval/ cv). Our automation provides a user-friendly flow to invoking the new compute function: users can write ordinary definitions and evaluate them without interacting with the first-order format.

In our experiments, this new computation functionality performed several orders of magnitude faster than in-logic evaluation mechanisms provided by mainstream HOL ITPs. Using our automation, we have successfully exercised it on a significant existing benchmark, the CakeML compiler backend, with an order of magnitude speedup. At present, the performance numbers suggest that we do not need to go to the trouble of replacing our interpreter-based solution with a solution that compiles the given input to native machine code for extra performance. However, future case studies might lead us to explore such options too.

We envision that future case studies might explore how facilities for fast in-logic computation might open the door to for verified decision procedures (for linear arithmetic, linear algebra, or word problems) in HOL provers. Such proof procedures have typically been programmed in the meta language (SML and OCaml) of HOL provers.