Abstract
We introduce and elaborate a novel formalism for the manipulation and analysis of proofs as objects in a global manner. In this first approach the formalism is restricted to first-order problems characterized by condensed detachment. It is applied in an exemplary manner to a coherent and comprehensive formal reconstruction and analysis of historical proofs of a widely-studied problem due to Łukasiewicz. The underlying approach opens the door towards new systematic ways of generating lemmas in the course of proof search to the effects of reducing the search effort and finding shorter proofs. Among the numerous reported experiments along this line, a proof of Łukasiewicz ’s problem was automatically discovered that is much shorter than any proof found before by man or machine.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In Automated Theorem Proving (ATP)—or Automated Deduction, or Automated Reasoning—the general research topic consists in the search for proofs of formulas in order to establish their validity or theoremhood. We consider proofs as syntactic objects defined on the basis of some formal system. There is a variety of such formal proof systems; hence the formal objects representing proofs in these differ widely and in consequence also the methods for finding proofs.
In popular proof methods such as the resolution method, superposition, or the tableau methods, proofs are sets of formulas arranged in a structured way. This could be, for instance, in the form of a tree or graph with the formulas—or clauses—labeling its nodes. Connections in the graph indicate that a succeeding formula is derived from preceding formulas by some manipulation such as forming the resolvent out of two clauses. Let us here refer to this subclass of proof systems as formula-manipulative ones.
From the point of view of proofs as a whole, formula manipulation of this kind is a local operation. For both representing as well as finding proofs, more global operations might be helpful. The use of lemmas may be regarded as such a global operation. If a proof of a lemma is known, this proof may be inserted into the overall proof wherever the formula representing the lemma occurs. For formula-manipulative proof systems such a replacement operation is performed implicitly by associating with an inferred formula pointers to the parent formulas from which it was inferred. The proof structure as a whole is made available in retrospect after a proof has been found, as a DAG (directed acyclic graph) or as a tree formed by such pointers.
There are proof systems beyond the purely formula-manipulative ones. One such system has been introduced and applied by Carew A. Meredith, e.g., in a paper from 1963 jointly authored with Arthur Prior [48]. It became known under the label of condensed detachment (CD). A proof in this system is represented as a list of pairs of a formula and a proof term. The focus is on the proof-structural part represented as a term. The formula-manipulative aspect is reduced to presenting intermediate lemmas.
The proof system underlying the connection method (CM) [4] is even more extreme in this sense. Proofs consist there exclusively of structural information on the given formula without any manipulative part as in formula-manipulative systems.
In ATP, CD was so far considered mainly as a special case of hyperresolution, not taking into account its non-formula-manipulative characteristics. So far, no adequate formal account of CD from the perspective of ATP could be found in the literature.
The mutual advantages or disadvantages of these different kinds of formal systems for proof search or proof representation are not at all clear at this point in the development of ATP. Lemma-related techniques of general importance for saturation-based provers such as the advanced use of weighting templates, e.g., [81], and hints [70] were initially devised for CD problems. There are several approaches to integrating forms of lemma generation into variants of the CM [2, 15, 20, 32, 50, 61]. Nevertheless, for the more structurally complicated systems such as CD or CM global operations like the use of lemmas have never been studied systematically.
The work reported in this paper provides first results in exactly this direction. Since a comparative analysis of different proof systems such as those just mentioned is a truly complex enterprise, the task has to be drastically restricted in this first approach. We thus focus on the simplest nontrivial class of first-order formulas: a structurally simple goal statement to be derived from an axiom and a rule with two premises and a single conclusion. The obvious generalizations are deferred to future work: more than one axiom, more and more complex rules, and so forth, up to arbitrary first-order formulas.
Even under the drastic restriction just specified, our comparative task turns out to be rather involved and proof search for this class of formulas is not at all trivial for leading ATP systems. Global techniques for directing proof search such as the use of lemmas or the replacement of proof parts appear to be particularly intricate for systems that are not formula-manipulative.
The required extensive formal basis is worked out in this paper. Proofs are represented as terms, which offers advantages not present in formula-manipulative systems. Altogether, we open here the door towards a better understanding of the distinctive features of known formal proof systems with regard to their better or worse suitability for proof search, taking first steps in this important direction.
Since CD falls into the considered class of first-order formulas, our work includes the first comprehensive formalization of Meredith’s proof system from an ATP perspective, quasi as a side-result. At the same time this amounts to a very detailed reconstruction of the historical proofs of a much-studied problem first stated and proved by Łukasiewicz. Our paper also gives a rather comprehensive account of the work reported in the literature about this well-known problem. This account includes numerous experimental results achieved with a variety of systems. Incorporating the presented original insights, one of our systems (SGCD) discovered in a few seconds a new proof of this problem, which is shorter than all previously known ones.
This work extends the results presented at CADE 2021 [77]. The concepts and techniques described here are backed by an implemented system, CD Tools [74], a library for experimenting with CD and related techniques, which is written in SWI Prolog [79] and available as free software. CD Tools includes two provers, SGCD [76] (the name suggesting Structure-Generating proving for Condensed Detachment) for CD problems, and CCS [75] (the name suggesting Compressed Combinatory Structures) for CD problems and first-order Horn problems. In the paper we will discuss particular features of these provers and report experimental results obtained with them. For more details on SGCD and CCS we refer to [76] and [75], respectively.Footnote 1
The contributions of the paper can be summarized as follows.
-
1.
A new formal characterization of CD with the proof structure as a whole in the focus, based on concepts and techniques known from the CM.
-
2.
New aspects concerning the interplay of tree and DAG structures in ATP. They relate the tree-oriented proceeding of clausal tableau methods with the DAG-oriented structure of CD and resolution proofs.
-
3.
New regularity properties of proof structures and new criteria for shortening proofs by rewriting. Some of these are consequences of the interplay of tree and DAG structures.
-
4.
Identifying and systematizing a set of ATP-relevant features of proofs on the basis of our formal framework.
-
5.
A detailed analysis of a historic formal proof by Jan Łukasiewicz and a variation by Meredith, from an ATP perspective, with respect to the identified proof features.
-
6.
Generalizing specific structural features observed in the historic proofs to novel proof-structure-oriented techniques for proof search and lemma generation in ATP.
-
7.
Providing the basis for an implemented system to experiment with CD problems and their proof structures [74,75,76]. It includes two provers, each addressing a specific main aspect. One of them, SGCD, realizes the newly discovered structure-oriented techniques.
-
8.
A new short proof of Łukasiewicz ’s problem, found by SGCD with one of the new techniques. It is substantially shorter than the human-made proofs and drastically shorter than known proofs by first-order provers. Although the proofs by Prover9 [43] can be substantially shortened with our new proof rewritings, they still remain drastically larger.
-
9.
Foundation for follow-up work, including a novel approach to proof search over compressed combinatory structures [75] and studying the generation, selection and application of lemmas [54], also with machine learning. As described in the latter reference, lemmas utilizing the new techniques already led to remarkable success in improving competitive first-order provers and solving a challenge problem.Footnote 2
The paper is organized as follows. In Sect. 2, after a very brief illustration of the CM, we introduce Łukasiewicz’s problem as well as different representations of it. We also compare different formal representations of proofs, in particular the representation by Meredith and the ATP-oriented representation of the CM. Section 3 presents Meredith’s proof of the problem. There we reconstruct the historical method of CD in a novel way as a restricted variation of the CM where proof structures are represented as terms. The section introduces the formal basis for the comparative analysis described above. On this basis, Sect. 4 focuses on global features to support proof search. It presents the underlying formalism and results on reducing the size of such proof terms in order to shorten proofs and to restrict the search space. The formalism worked out in the preceding two sections is applied in Sect. 5 to provide a comprehensive analysis of the two historical proofs by Łukasiewicz and by Meredith of our widely studied guiding problem. The results are summarized in detailed feature tables for each proof. In Sect. 6 we contrast these proofs with proofs of the same problem that were obtained as outputs of ATP systems, general first-order provers as well as postprocessors and specialized provers that realize observations and new techniques discussed in the paper. Section 7 concludes the paper.
2 Relating Formal Human Proofs with ATP Proofs
Our investigations are centered around a historic formal proof, a landmark result by Jan Łukasiewicz from 1936, published in 1948 [37]. It is expressed with the method of substitution and detachment. In the early 1960 s Łukasiewicz ’s proof was modified and slightly shortened by Carew A. Meredith with his method of condensed detachment (CD) [48]. Thus, our basis are two slightly different versions of an advanced human-made formal proof. The proven problem was, upon suggestion in 1988 by Frank Pfenning [51], a prominent challenge problem for ATP [18, 41, 44, 84]. Also the background technique of CD, translated to hyperresolution [27], led to many successes of ATP in the 1990 s [44, 67].
Although the problem can be solved by modern ATP systems, the current state is not satisfying. For implemented provers that operate in a goal-driven way with the CM or with clausal tableaux the problem is still completely out of reach. Its difficulty rating in the TPTP [64, 65] has not stabilized at “most easy”, but fluctuates and recent versions of two competition champions fail on it.Footnote 3 Since the problem was proven formally by humans, this indicates that proof search in ATP remains in need for further improvement. Also the proofs obtained with ATP systems are much longer than the human-made proofs, which indicates a general weakness in our methods with negative effects on their performance, let alone the involved annoyance for ATP users.
Our aim here is to improve on these issues of general relevance for ATP. Nevertheless, we focus on a single problem, which is solvable, yet remains a challenge for both humans and ATP systems. Its basic structure and features are common to many first-order problems such that results obtained for the problem can be assumed to apply also more generally. What justifies the particular choice of the problem is that we have two related human-made formal proofs at hand, developed by world-leading masters in the field. Their proofs by far improve on those by today’s ATP systems with respect to invested search effort and size. Hence, inspecting the human-made proofs in depth should lead through some sort of “reverse engineering” to the discovery of techniques that were used—intentionally or intuitively—by Łukasiewicz and Meredith and are useful to advance modern ATP.
In this section we introduce the problem proven originally by Łukasiewicz and indicate a new adaptation of CD, the technique used by Meredith, for ATP, which will be elaborated in later sections. In contrast to most previous accounts of CD in ATP and type theory [24, 25, 27, 44] our focus is not on regarding CD as an inference rule, but rather on the aspect that CD originally comes with explicitly reified proof structures that are as a whole accessible as trees or terms, or in compacted form as DAGs. In this respect our modeling of CD is oriented at the CM, and could in fact be understood as a simple special case of the CM. For the background concerning the CM we refer to [4, 5, 8]. Here we just briefly illustrate the CM in the following subsection.
2.1 A Very Short Illustration of the Connection Method (CM)
Proof systems in ATP are designed to establish the validity of statements represented as formulas in some logic, like the first-order logic formula
In order to simplify the involved mechanisms, often the number of different logical operators is minimized, e.g., restricted to \(\lnot ,\vee ,\exists \), in accordance with well-known logical rules. In the present example this leads to the formula \(\lnot (\exists a\, \textsf{P} a\vee \lnot \exists y\, \textsf{Q} y)\vee \exists z\, \textsf{P} z\vee \lnot \exists b\, \textsf{Q} \textsf{f} b\).
Gentzen introduced his traditional calculus of natural deduction for the same purpose. In a simplified variant of this calculus the validity of this formula is established as its derivation shown in Fig. 1.
This kind of proof representation is extremely redundant. The CM is designed to eliminate this redundancy; it represents the relevant information given by this derivation in a following structure attached to the formula as shown in Fig. 2.
The details of this redundancy elimination are formally presented in the paper [8]. The upper indices attached to the predicates indicate whether the literal occurs positively (0) or negatively (1) in the formula. A connection, i.e., a pair of occurrences in the formula, links a positive with a negative literal, both with the same predicate symbol. The CM proof structure can as well be attached to the formula in its original presentation (i.e., before reducing the number of used logical operators) in a straightforward way.
CM calculi test such structures according to a certain criterion which is best illustrated in the matrix representation of the same formula shown in Fig. 3.
The matrix features three columns or clauses. A path through such a matrix (or the corresponding formula) is a set of literals such that exactly one is picked from each clause. There are exactly two different paths in the example. A set of connections is called spanning for a formula if each path contains (as a subset) at least one of the set’s connections. If the attached substitution unifies each pair of connected literals then the set of connections is called complementary. The mentioned criterion for validity of a formula is the existence of a spanning and complementary set of connections.
Clauses may be needed multiple times for achieving a proof. This may be realized by listing them explicitly with different variables or by indexing just a single occurrence resulting in indexed variables as indicated in Figs. 1 and 3. The latter variant complicates the illustrated concepts accordingly.
CM calculi search for spanning and complementary connection sets. A popular method does this in a certain systematic manner starting from the goal of the formula (top-down). More involved search strategies are mixing top-down with bottom-up search (see, e.g., [7]). For numerous further refinements see [4,5,6, 8]
Proofs in the CM are formal structures attached to the given formula. For ease of understanding the present paper focuses on equivalent, less compact but more familiar proof structures like trees and DAGs. Yet the results of the present paper are also evidence for the advantages of such compact proof structures like those of the CM.
2.2 Łukasiewicz’s Shortest Single Axiom for the Implicational Fragment of Propositional Logic
Classical propositional logic can be formalized with different sets of logic operators such as, for example, implication and negation, \(\{\rightarrow , \lnot \}\). Abandoning \(\lnot \) and leaving \(\rightarrow \) as the only logic operator yields a restricted propositional logic, the implicational fragment IF. The original investigations of this logic use Łukasiewicz ’s so-called Polish notation where the implication \(p \rightarrow q\) is written as \( Cpq \). Following Pfenning [51] we formalize IF in the setting of modern first-order ATP with a single unary predicate \(\textsf{P}\) to be interpreted as something like “provable” and represent the IF formulas by terms using the binary function symbol \(\textsf{i}\) (instead of \(\rightarrow \)) for implication. Implicational propositional logic is characterized by the Tarski-Bernays Axioms, that is, the set of the following three axioms called simplification (Simp), Peirce’s law (Peirce) and hypothetical syllogism (Syll).Footnote 4
Nickname | Łukasiewicz ’s notation | First-order representation |
---|---|---|
Simp | \( CpCqp \) | \(\forall pq\, \textsf{P}(\textsf{i}(p,\textsf{i}qp))\) |
Peirce | \( CCCpqpp \) | \(\forall pq\, \textsf{P}(\textsf{i}(\textsf{i}(\textsf{i}pq),p),p)\) |
Syll | \( CCpqCCqrCpr \) | \(\forall pqr\, \textsf{P}(\textsf{i}(\textsf{i}pq,\textsf{i}(\textsf{i}qr,\textsf{i}pr)))\) |
Alfred Tarski in 1925 raised the problem to characterize IF by a single axiom and solved it with a general technique for packaging axioms together,Footnote 5 which inherently produced very long axioms. Jan Łukasiewicz worked on shortening them, initially by modifying Tarski’s packaging method [37]. As of 1926, the shortest known single axioms, found by Łukasiewicz and Mordechaj Wajsberg, had 25 letters in Łukasiewicz parenthesis-free notation [40, p. 43]. Two further single axioms consisting of 17 letters were found by Łukasiewicz in 1930 and 1932 [37, 56, 62]. In 1936 he then found the shortest single axiom [37], which in the literature is nicknamed after him.Footnote 6
Nickname | Łukasiewicz ’s notation | First-order representation |
---|---|---|
Łukasiewicz | \( CCCpqrCCrpCsp \) | \(\forall pqrs\, \textsf{P}(\textsf{i}(\textsf{i}(\textsf{i}pq,r),\textsf{i}(\textsf{i}rp,\textsf{i}sp)))\) |
In order to show that Łukasiewicz is an axiom for implicational propositional logic, Łukasiewicz derived Simp, Peirce, and Syll from Łukasiewicz with the proof method of substitution and detachment, used by him and other logicians since about 1930. Detachment is also familiar as modus ponens. His formal proof published in 1948 [37]Footnote 7 is presented in 29 steps, most of them corresponding to a single application of substitution and detachment, but some to two consecutive applications, such that the proof in total involves 34 applications of detachment. Among the three Tarski-Bernays axioms Syll is by far the most challenging to prove such that Łukasiewicz ’s proof is centered around the proof of Syll, with Simp and Peirce spinning off as side results. In 1963 Carew A. Meredith [48] presented a “very slight abridgement” of Łukasiewicz ’s proof, expressed in his framework of CD [52], where the performed substitutions are no longer explicitly presented but implicitly assumed through unification. Meredith’s variation involves only 33 applications of detachment, one less than Łukasiewicz ’s original proof.
2.3 The First-Order ATP View on Detachment
In our first-order setting, detachment can be modeled with the following axiom.
In Det the atom \(\textsf{P}x\) is called the minor premise, \(\textsf{P}\textsf{i}xy\) the major premise, and \(\textsf{P}y\) the conclusion. Let us now focus on the following particular formula.
Showing that Łukasiewicz together with the detachment axiom implies Syll, is then the problem of proving the validity of the first order formula ŁDS. This formula features a rather simple structure: it asserts that from the proper axiom Łukasiewicz at the left the goal Syll at the right can be derived via Det, the rule in the middle, coding the well-known modus ponens—or detachment. Although it looks so simple, finding its proof amounts to a real challenge, both for humans and machines. Since formulas of a similar structure with axiom(s), rule(s) and goal(s) are quite frequent, progress in finding their proofs automatically is clearly desirable. We believe that a deeper understanding of the underlying proof structure is indispensable for such progress. The study in this paper aims exactly at such an understanding.
In view of the CM [4, 5, 8], a formula is valid if and only if there is a spanning and complementary set of connections in it. In Fig. 4 the formula ŁDS is presented again, nicknames dereferenced and quantifiers omitted as usual in ATP, with the five unifiable connections in it. The symbols \(\textsf{a},\textsf{b},\textsf{c}\) in the conclusion are Skolem constants introduced for the universal variables in Syll. The pair consisting of axiom Łukasiewicz and the conclusion might be seen as a further connection 0, but is not depicted because it is not unifiable and thus irrelevant for any proof. Any CM proof of ŁDS consists of a number of instances of the five shown connections. For example, Meredith’s proof of Syll from Łukasiewicz involves 491 instances of Det (as shown in more detail in Sect. 5.2), each linked with three instances of its five incident connections. This large number already demonstrates that such a proof cannot be found and overlooked by humans except with some structural concept for reducing the sheer proof size.
The concept for such a reduction consists in the well-known feature of involving lemmas. In terms of the shown connection structure this means that a certain number of rule instances along with their connection instances are noted as such a lemma in some abbreviated form that can be referenced several times in the presentation of the final proof. This way the size of the proof may be reduced substantially without dispensing the basic characterization of proofs in the CM. By the use of lemmas that permit reusing subproofs with the same structure but different instantiations, Meredith’s proof of \({\textit{Syll}} \) reduces from 491 to 31 detachment steps. With two more steps, the proof also yields \({\textit{Peirce}} \) and \({\textit{Simp}} \), resulting in the total number of 33 detachment steps mentioned above.
Under this extended view, our aim for a deeper understanding of such proofs raises further questions. Can lemmas that are useful for such reductions be characterized by syntactic features of the re-used formulas? Or by features of the proof structure, the re-used subproofs of lemmas in the context of the overall proof? If we find such features, how could they be utilized to support the automated search for proofs?
2.4 Comparing Proof Representations
Figure 5 compares different representations of a short formal proof with the Det axiom. There is a single proper axiom, Syll-SimpFootnote 8 defined as follows.
Nickname | Łukasiewicz ’s notation | First-order representation |
---|---|---|
Syll-Simp | \( CCCpqrCqr \) | \(\forall pqr\, \textsf{P}\textsf{i}(\textsf{i}(\textsf{i}pq,r),\textsf{i}qr)\) |
The goal theorem is \(\forall abcdef\, \textsf{P}\textsf{i}(a,\textsf{i}(b,\textsf{i}(c,\textsf{i}(d,\textsf{i}(e,\textsf{i}fd)))))\).Footnote 9 Figure 5a shows the structure of a CM proof. It involves seven instances of Det, shown in columns \(D_1, \ldots , D_7\).Footnote 10 The major premise \(\textsf{P}\textsf{i}x_i y_i\) is displayed there on top of the minor premise \(\textsf{P}x_i\), and the (negated) conclusion \(\lnot \textsf{P}y_i\), where \(x_i, y_i\) are variables. Instances of the axiom appear as literals \(\lnot \textsf{P}a_i\), with \(a_i\) a shorthand for the term \(\textsf{i}(\textsf{i}(\textsf{i}p_i q_i,r_i),\textsf{i}q_i r_i)\). The rightmost literal \(\textsf{P}g\) is a shorthand for the Skolemized goal theorem. The clause instances are linked through edges representing connection instances. The edge labels identify the respective connections as in Fig. 4. An actual connection proof is obtained by supplementing this structure with a substitution under which all pairs of literals related through a connection instance become complementary.
Figure 5b represents the tree implicit in the CM proof. Its inner nodes \(D_1,\ldots ,D_7\) correspond to the seven instances of Det, and its leaf nodes \(A_1,\ldots ,A_8\) to the instances of the axiom. Edges appear ordered to the effect that those originating in a major premise of Det are directed to the left and those from a minor premise to the right. The goal clause \(\textsf{P}g\) is dropped. The resulting tree is a full binary tree, i.e., a binary tree where each node has 0 or 2 children. We observe that the ordering of the children makes the connection labeling redundant as it directly corresponds to the tree structure.
Figure 5c presents the proof in Meredith’s notation for CD. Each line shows a formula, line 1 the axiom and lines 2–4 derived formulas, with proofs annotated in the last column. Proofs are written as terms in Polish notation with the binary function symbol \(\textsf{D}\) for detachment where the subproofs of the major and minor premise are supplied as first and second argument, respectively. Formula 4, for example, is obtained as conclusion of Det applied to formula 2 as major premise and another formula not made explicit in the presentation as minor premise, namely the conclusion of Det applied to formula 3 as both, major and minor, premises. An asterisk marks the goal theorem.
Figure 5d is like Fig. 5b, but with a different labeling: Node labels now refer to the line in Fig. 5c, which corresponds to the subproof rooted at the node. The blank node represents the mentioned subproof of the formula that is not made explicit in Fig. 5b. An inner node represents a CD step applied to the subproof of the major premise (left child) and minor premise (right child).
Figure 5e shows a DAG representation of Fig. 5d. It is the unique minimal, or maximally factored, DAG representation of the tree, i.e., it has no multiple occurrences of the same subtree. Each of the four proof line labels of Fig. 5c appears exactly once in the DAG. The presentation layout of the DAG reflects a tree compacting procedure, the value-number method, which computes unique identifiers for all subtrees in a post-order tree traversal [1, 22]. A straight edge corresponds to the first visit of the subtree rooted at its endpoint, and a bended edge to a pointer to a previously identified subtree. Observe that each of the four proof line labels of Meredith’s representation (Fig. 5c) appears exactly once in the DAG. In fact, the structural component of the textual proof representation (that is, if we disregard the displayed formulas) can be considered as a compact notation for such a DAG.
3 Condensed Detachment and a Formal Basis
With Fig. 5c we already have seen a small example of a CD proof in Meredith’s notation. Figure 6 shows Meredith’s CD proof that Łukasiewicz entails Syll, Peirce and Simp, taken from a 1963 paper by Meredith and Prior [48]. There is a single axiom, 1, which is Łukasiewicz. The proven theorems are Syll (17), Peirce (18) and Simp (19), marked by asterisks. In addition to line numbers also the symbol “\(\textrm{n}\)” appears in some of the proof terms. We will discuss its meaning in Sect. 4.4 and, for now, read it just as “1”. Dots are used in the Polish notation to disambiguate numeric identifiers with more than a single digit, for example in line 11.
Following Martin W. Bunder [10], the idea of CD can be described as follows: Given premises \(F \rightarrow G\) and H, we can conclude \(G'\), where \(G'\) is the most general result that can be obtained by using a substitution instance \(H'\) of H as minor premise with the substitution instance \(F' \rightarrow G'\) of \(F \rightarrow G\) as major premise in modus ponens. CD was introduced by Meredith in the mid-1950 s as an evolution of the earlier method of substitution and detachment, where the involved substitutions were explicitly given.
The original presentations of CD are informal, by means of examples [30, 47, 52, 53]. Only later, formal specifications have been given. John A. Kalman [27] provides two characterizations, one in terms of resolution. CD was then considered in the context of type theory, the formulas-as-types view, where J. Roger Hindley and David Meredith [24, 25] notice and fix an inaccuracy related to the notion of most general unifier in the early formalizations of CD and Bunder [10] provides a formalization that is independent from this notion. A particular investigated topic concerning CD in type theory is the relationship to substitution and detachment.
Unfortunately it seems that not much is bequeathed about the methods by which humans found advanced CD proofs. Łukasiewicz [37, § 4] discusses an important intermediate step for his proof by substitution and detachment. Legend has it that Meredith often sent his finished CD proofs as postcards [9, 52].
In ATP, the rendering of CD by positive hyperresolution with the clausal form of axiom Det is so far the prevalent view. As overviewed by William McCune and Larry Wos [44], and Dolph Ulrich [67], many of the early successes of ATP were based on CD. Starting from the hyperresolution view, structural aspects of CD have been considered by Robert Veroff [71] with the use of term representations of proofs and linked resolution. Results of ATP systems on deriving the Tarski-Bernays axioms from Łukasiewicz are reported in several papers [18, 41, 44, 51, 84]. The problems of deriving Syll, Peirce and Simp from Łukasiewicz are in the TPTP as LCL038-1, LCL083-1 and LCL082-1, respectively. In general, many refinements of the OTTER prover [42] in the 1990 s, some of which have found their ways into modern saturating provers, were originally conceived and explored in the setting of CD [18, 44, 71, 80,81,82,83,84]. Various sources compile open and challenge problems concerning CD, along with some solutions or partial solutions [67, 68, 72, 73]. A sustaining and far-reaching application of CD is Metamath [45, 46], a successful computer-processable language for verifying, archiving, and presenting mathematical proofs. “Simple by design”, it is entirely based on CD extended by a second rule for condensed generalization.
From the viewpoint of general first-order ATP, CD basically offers a simplified, streamlined setting for investigations and developments that nevertheless includes with first-order variables, binary function symbols and cyclic predicate dependency core characteristics of first-order ATP. The simplifications concern the restricted application domain, axiomatizations of propositional logics, which is, however, not difficult to lift to Horn problems in general,Footnote 11 no explicit consideration of non-Horn problems,Footnote 12 and no explicit use of equality.Footnote 13 The TPTP contains easy and still very hard CD problems.
But CD offers more. It integrates various features of relevance to ATP in a natural and formally accessible way, which we outline in the following paragraphs.
CD distinguishes from its predecessor, the method of substitution and detachment, by applying most general substitutions that are obtained through unification. In CD proof presentations, just most general formulas resulting from unification are written, the involved substitutions are left implicit. Remarkably, unification was applied with CD extensively in formal deduction a decade before it became popular in the context of resolution through John Alan Robinson [59].
CD proofs are presented in the literature as a sequence of pairs of a lemma and a proof structure term that describes how the lemma is proven from previous lemmas. The structure terms can be combined to form a tree for each goal theorem or to a DAG representing the set of these trees more compactly such that subtrees with multiple occurrences appear only once. Both representations have their merits. The explicit tree view facilitates to associate semantic properties and formula substitutions in an inductive fashion. It permits to understand variables in a particular simple way as scoped over the whole structure, known as rigid variables in tableaux. The compacted view in particular provides an adequate notion of proof size and, in printed form, is much easier to comprehend by humans.
A related separation of concerns regarding proof structure and associated formulas is provided among the modern approaches to ATP by the CM. In fact, as illustrated with Fig. 5 above, CD can be understood as an adaptation of the CM to inputs of a specific simple form: a single clause with three literals, which represents the Det axiom, and otherwise just unit clauses, representing proper axioms and the theorem to be proven.
The separation of a deductive derivation into a formula part and a proof structural part, as illustrated in Fig. 6, can be seen as a precursor of the CM. Namely, the CM has carried this separation to the extreme in that it keeps the formula part completely unchanged within such a derivation and shifts all deductive information into the proof structural part (see, e.g., [4, Section III.6]).
In the traditional presentation of a CD proof the members of the sequence of lemma-structure pairs are labeled with numbers, where the labeling turns out to be useful for the following two purposes. For a lemma that is referenced multiple times in the overall proof, a label is necessary to represent the proof structure compactly as a DAG. For a lemma that is referenced only once, the presentation by a labeled pair is optional and serves the convenience of a human reader or points out some special significance of the lemma. Otherwise, lemmas that are referenced only once do not appear explicitly in the proof presentation but could be obtained as the most general formulas proven by the substructures of the structure components of the labeled lemma-structure pairs.
The term view of proof structures lets the replacement of subproof occurrences appear as a form of term rewriting, with shorter subproofs that preserve equivalence in some sense. A suitable notion of equivalence can be based on the most general formula that can be proven with a given proof term by applying detachment steps according to the term structure from given axioms. Such proof reductions can be applied to simplify given proofs, or in proof search, to justify that a subproof recognized as reducible can be immediately discarded, because there must exist a different preferable subproof.
The term view of proof structures is also the basis of a recent technique where combinators are applied to express stronger compressions of the proof structure than just to DAGs [75]. Such compressions can be applied to shorten given proofs and in proof search. They correspond to more complex lemma formulas than the unit lemmas considered in the DAG compression, and can express simulations of other calculi.
Search for a CD proof can be performed goal- or axiom-driven. Consideration of a goal (e.g., a ground atom resulting from Skolemizing a universally quantified atom) in the unifying substitution to determine the formulas involved in the proof is optional. Taking the goal into account effects restriction of the search space, as in the conventional goal-driven realizations of the CM. Nevertheless, also axiom-driven proceeding without supplied goal is possible with very similar search mechanisms, enumerating proof structures interwoven with unification. The results then are consequences derived from axioms, which optionally may be used as lemmas to improve proof search in a second goal-driven phase [61, 76].
In a wider perspective the consideration of the proof structure as a whole, for example as term, which may be compacted into a DAG, introduces an important separation of concerns for proof search. Namely, the way in which the concrete structure is built up in proof search is not obliged to follow the inductive specification of the structure. The concrete structure can be built up in various ways, including rewriting of subproofs as indicated above, or by combining given proof fragments. This contrasts with calculi such as typical tableau methods where proof construction rules are directly taken to build up the proof structures.
Our goal in this section is to provide a formal framework that takes account of these aspects and provides a basis for experiments and future developments in ATP.
3.1 Notation
Most of our notation follows common practice [12]. The set of variables occurring in a term s is denoted by \( \mathcal {V}\hspace{-0.11em}ar (s)\). We extend this to other objects s such as, e.g., sets of terms. A substitution is a mapping from variables to terms which is almost everywhere equal to identity. If \(\sigma \) is a substitution, then the domain of \(\sigma \) is the set of variables , the range of \(\sigma \) is , and the restriction of \(\sigma \) to a set X of variables, denoted by \(\sigma |_X\), is the substitution which is equal to the identity everywhere except over \(X \cap \mathcal {D}\hspace{-0.08em}om (\sigma )\), where it is equal to \(\sigma \). The identity substitution is denoted by \(\epsilon \). We write the set \( \mathcal {V}\hspace{-0.11em}ar ( \mathcal {R}\hspace{-0.02em}ng (\sigma ))\) of variables in the range of substitution \(\sigma \) also as \( \mathcal{V}\mathcal{R}\hspace{-0.02em}ng (\sigma )\). A substitution can be represented by a set of assignments of the variables in its domain, e.g., \(\{x_1 \mapsto t_1, \ldots , x_n \mapsto t_n\}\). The application of a substitution \(\sigma \) to a term s is written as \(s\sigma \), \(s\sigma \) is called an instance of s and s is said to subsume \(s\sigma \). That s subsumes t, or synonymously, that t is an instance of s, is expressed symbolically by
If both, and , hold we say that s and t are variants of each other, expressed symbolically as . Composition of substitutions is written as juxtaposition. Hence, if \(\sigma \) and \(\theta \) are both substitutions, then \(E\sigma \theta \) stands for \((E\sigma )\theta \). A substitution \(\sigma \) is idempotent if \(\sigma \sigma = \sigma \), or, equivalently, \( \mathcal {D}\hspace{-0.08em}om (\sigma ) \cap \mathcal{V}\mathcal{R}\hspace{-0.02em}ng (\sigma ) = \emptyset \). A substitution \(\sigma \) is called more general than a substitution \(\theta \), in symbols , if there exists a substitution \(\rho \) such that \(\sigma \rho = \theta \). That both, and hold is expressed by .
A position is a sequence of positive integers that specifies the occurrence of a subterm in a term as a path in Dewey decimal notation starting from the root of the term. The set of all positions of a term s is denoted by \( \mathcal {P}\!os (s)\). For example, \( \mathcal {P}\!os (\textsf{f}(x,\textsf{g}(y))) = \{\epsilon , 1, 2, 2.1\}\). If position p is a prefix of position q, we write
and say that p is above q, and q is below p. We also use \(p \not \le q\), \(p < q\) and \(p \not < q\) for positions p, q with the obvious analog meanings. For \(p \in \mathcal {P}\!os (s)\), the subterm of s at position p is denoted by \(s|_p\). For example, if \(s = \textsf{f}(x,\textsf{g}(y))\), then \(s|_{\epsilon } = s = \textsf{f}(x,\textsf{g}(y))\), \(s|_{1} = x\), \(s|_{2} = \textsf{g}(y)\) and \(s|_{2.1} = y\). That s is a subterm of t is expressed symbolically as
and that s is a strict subterm of t as . For \(p \in \mathcal {P}\!os (s)\), the expression
denotes the term obtained from s by replacing the subterm occurrence at position p with term t, or, in case \(s|_p = t\), to denote s with indicating that t occurs at position p in s.
In addition to common notation, we use a few special symbols and conventions: The set of positions \(p \in \mathcal {P}\!os (s)\) such that \(s|_p\) is a variable or a constant is denoted by \( \mathcal {L}\hspace{-0.05em}eaf\mathcal {P}\!os (s)\) and the set of positions \(p \in \mathcal {P}\!os (s)\) such that \(s|_p\) is a compound term by \( \mathcal {I}\hspace{-0.05em}nner\mathcal {P}\!os (s)\). We use the postfix notation for the application of a substitution \(\sigma \) also for sets M of pairs of terms: \(M\sigma \) stands for \(\{\{s\sigma , t\sigma \} \mid \{s, t\} \in M\}\). For terms s, t, u, the expression
denotes s after simultaneously replacing all occurrences of t with u. If F is a formula, then \(\forall F\) denotes the universal closure of F.
3.2 Proof Structures: D-Terms
In this subsection (as well as in Sect. 4.1 below) we consider only the purely structural aspects of CD proofs. Emphasis is on a twofold view on the proof structure, as a tree and as a DAG (directed acyclic graph), which factorizes multiple occurrences of the same subtree. Both representation forms are useful: the compacted DAG form captures that lemmas can be repeatedly used in a proof, whereas the tree form facilitates to specify properties in an inductive manner.
3.2.1 Basic Definitions: Term View and Tree View
We call the tree representation of proofs by terms with the binary function symbol \(\textsf{D}\) D-terms.
Definition 1
-
(i)
We assume a distinguished set \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim \) of symbols, called the primitive D-terms.
-
(ii)
A D-term is specified inductively as follows.
-
1.
Any member of \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim \) is a D-term.
-
2.
If \(d_1\) and \(d_2\) are D-terms, then \(\textsf{D}(d_1,d_2)\) is a D-term.
-
1.
-
(iii)
A D-term of the form \(\textsf{D}(d_1,d_2)\) is called compound.
-
(iv)
For D-terms d define .
A D-term d is a full binary tree (a binary tree where every inner node has exactly two children, its left and its right child) whose leaves are labeled by primitive D-terms. \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (d)\) denotes the set of the primitive D-terms that occur in d, or, in other words, the set of leaf labels of d.
Example 2
Assume that \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim \) contains the numeral 1. Then
is a D-term with \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (d) = \{1\}\) that represents the structure of the proof shown in Fig. 5. Its visualization is shown in Fig. 7 (which is identical to Fig. 5d after removing all labels with exception of the leaf labels).
Example 3
The proof annotations in Fig. 5c and Fig. 6 are D-terms written in Polish notation, where \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim \) is a set \(\{1,2,3,\ldots \}\) of numerals. The expression \(\textsf{D}2\textsf{D}33\) in line 4 of Fig. 5, for example, stands for the D-term \(\textsf{D}(2,\textsf{D}(3,3))\). Its set \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (\textsf{D}(2,\textsf{D}(3,3)))\) of primitive D-terms is \(\{2,3\}\).
3.2.2 Tree Size and Height
The following definition specifies two basic size measures of D-Terms.
Definition 4
-
(i)
The tree size of a D-term d, in symbols \(\textsf {t-size} (d)\), is the number of occurrences of the function symbol \(\textsf{D}\) in d.
-
(ii)
The height of a D-term d, in symbols \(\textsf{height}(d)\) is, viewing the term as a tree, the number of edges of the longest downward path from the root to a leaf.
The tree size of a D-term can equivalently be characterized as the number of its inner nodes. Veroff [71] calls it CDcount. As will be explicated in more detail in Sect. 3.3, each occurrence of the function symbol \(\textsf{D}\) in a D-term corresponds to an instance of the axiom \({\textit{Det}} \) in the represented proof. Hence the tree size measures the number of instances, or multiplicity, of \({\textit{Det}} \) in the proof. Another view is that each occurrence of \(\textsf{D}\) in a D-term corresponds to a detachment step, without re-using already proven lemmas and instead again re-proving each lemma whenever it is used. The tree size of the D-term d of Example 2 is \(\textsf {t-size} (d) = 7\).
The height of a D-term is just its height according to the conventional notion of the height of a tree. Applied to terms it is often also called depth. For D-terms, it is called level by Veroff [71]. The height of the D-term d of Example 2 is \(\textsf{height}(d) = 4\).
3.2.3 DAG Representation and Compacted Size
A finite tree and, more generally, a finite set of finite trees can be represented as a DAG, where each node in the DAG corresponds to a subtreeFootnote 14 of a tree in the given set. It is well known that there is a unique (modulo isomorphism) minimal such DAG, which is maximally factored (it has no multiple occurrences of the same subtree) or, equivalently, is minimal with respect to the number of nodes, and, moreover, can be computed in linear time [13]. The number of nodes of the minimal DAG is the number of distinct subtrees of the members of the set of trees. This can be used as the basis for proof size measures defined as follows.
Definition 5
-
(i)
For D-terms d define .
-
(ii)
For D-terms d define the compacted size of d as
-
(iii)
For finite sets D of D-terms define the compacted size of D as .
\( \mathcal {S}{\hspace{-0.12em}ubeq} (d)\) denotes the set of all compound subterms of a D-term d. The compacted sizeFootnote 15 of a D-term, called length by Veroff [71], is the number of its distinct compound subterms, reflecting the view that the size of the proof of a lemma is only counted once, even if the lemma is used multiple times in the proof. It can equivalently be characterized as the number of the inner nodes of its minimal DAG.
Example 6
Consider the D-term
from Example 2. Its compacted size is \(\textsf {c-size} (d) = 4\). This is the number of inner nodes of the minimal DAG of d, which is shown in Fig. 8 (which is identical to Fig. 5e after removing all labels with exception of the leaf label), or, equivalently, the cardinality of the set
of compound subterms of d.
A textual representation of D-terms that respects the compacted size, that is, is at most linearly larger than the compacted size, is possible by introducing labels and references for subterms with multiple occurrences, which can be done with a variety of concrete mechanisms. Our approach is to extend the set of primitive D-terms with labels used for referencing subproofs. Formally, we view a compacted D-term as a special kind of substitution whose domain members are primitive D-terms. Written out as a set of bindings, as common for substitutions, a compacted D-term provides the desired compact textual representation of a set of D-terms.
Definition 7
-
(i)
A compacted D-term is a mapping \(\delta \) whose domain is a finite set of primitive D-terms and whose range is a set of compound D-terms such that the relation \(<_{\delta }\), called label dependency ordering of \(\delta \), defined as the transitive closure of \(\{\langle l, l'\rangle \mid l,l' \in \mathcal{D}\mathcal{P}\hspace{-0.12em}rim \text { and } l \in \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (l'\delta )\}\) is a strict partial order.
-
(ii)
The roots of a compacted D-term \(\delta \) are the elements of \( \mathcal {D}\hspace{-0.08em}om (\delta )\) that are maximal with respect to \(<_{\delta }\).
-
(iii)
The binary function \(\textsf{expand}\) from compacted D-terms \(\delta \) and primitive D-terms \(l \in \mathcal {D}\hspace{-0.08em}om (\delta )\) to D-terms is defined as , where \(l_1,l_2,\ldots ,l_n\) is some \(<_{\delta }\)-linearization of the set \(\{l' \in \mathcal {D}\hspace{-0.08em}om (\delta ) \mid l' <_{\delta } l\}\).
We write the application of a compacted D-term (a special kind of substitution) in postfix notation. A compacted D-term \(\delta \) represents the finite set of D-terms, or trees, that correspond to its roots, that is, \(\{\textsf{expand}_{\delta }(l) \mid l \text { is a root of } \delta \}\). If \(\delta \) has a single root l, we also say that it represents the D-term \(\textsf{expand}_{\delta }(l)\).
Example 8
The D-term d from Examples 2 and 6 is represented by the compacted D-term
The label dependency ordering \(<_\delta \) can be described as \(1<_{\delta } 2<_{\delta } 3 <_{\delta } 4\) and \(\delta \) has a single root, namely 4.
Example 9
Consider Meredith’s proof shown in Fig. 6. Its structure can be represented by the compacted D-term where \(d_i\) is the D-term representation of the proof term in line i. Thus, \(\delta _{\textsf{mer}} = \{2 \mapsto \textsf{D}(\textsf{D}(\textsf{D}(1,\textsf{D}(1,1)),1),n),\; 3\mapsto \textsf{D}(\textsf{D}(\textsf{D}(1,\textsf{D}(1,\textsf{D}(1,2))),1),n),\; 4\mapsto \textsf{D}(3,1),\; \ldots ,\; 19\mapsto \textsf{D}(3,3)\}\). The label dependency ordering is visualized in Fig. 9. The compacted D-term \(\delta _{\textsf{mer}}\) has three roots, 17, 18 and 19. Meredith’s representation of the proof structure can be reconstructed in full as a linearization of the label dependency ordering from the compacted D-term \(\delta _{\textsf{mer}}\).
A compacted D-term directly represents a DAG: The DAG of a compacted D-term \(\delta = \{l_1 \mapsto d_1, \ldots , l_n \mapsto d_n\}\) is the graph obtained from the trees \(d_1, \ldots , d_n\) by considering any edge to a leaf labeled with \(l_i\) as an edge to the root of \(d_i\), and any edge to a leaf labeled with a symbol not in \( \mathcal {D}\hspace{-0.08em}om (\delta )\) as an edge to a unique node representing that symbol in the DAG. Figure 10 shows an example. The DAGs of compacted D-terms inherit from D-terms, full binary trees, the property that each inner node has exactly two children, a left and a right child.Footnote 16
The number of inner nodes of the DAG of a compacted D-term is \(\sum _{\,l \in \mathcal {D}\hspace{-0.08em}om (\delta )} \textsf {t-size} (l\delta )\). If the compacted D-term is written as a set of bindings as in Example 8, it can be read off as the total number of occurrences of \(\textsf{D}\) in the bindings’ right sides.
An alternative possible technical understanding of a compacted D-term with a single root is as a regular tree grammar where the domain forms the set of nonterminals. Each nonterminal there has exactly one production and the grammar describes a single tree [34, 35]. If the regularity condition is dropped, the grammar framework generalizes to tree representations that are more strongly compressed than DAGs, offering further compression possibilities also for D-terms [75].
3.2.4 Comparing the Number of D-terms of a Given Size for Different Size Measures
The number of distinct D-terms for increasing values of some size measure like tree size, height or compacted size, gives an upper bound of the number of trees to consider in proof search by enumerating D-terms with iterative deepening upon that size measure. This number is just an upper bound of the actual structures to consider, because it does not take into account that D-term enumeration may be interwoven with unification constrained by given axioms and possibly a given goal where fragments of D-terms for which unifiability fails are immediately discarded. Heuristic restrictions may in practice further restrict the considered number of structures. The number of distinct D-terms for increasing values of a size measure also indicates a measure-specific size value up to which it is easily possible to compute for given axioms all proofs, together with the lemmas proven by them.
If we assume a single proper axiom such that we can identify compacted D-terms with full binary trees without any additional labeling, the sequences of the number of distinct D-terms for increasing tree size, height or compacted size are well-known and can be found in The On-Line Encyclopedia of Integer Sequenceshttps://oeis.org/ [49], with identifiers A000108, A001699, and A254789, respectively, as shown in Table 1.
3.2.5 Node Labels for Proof Modularization
That a compacted D-term \(\delta \) represents a set \(D = \{d_1, \ldots , d_n\}\) of D-terms does not imply that the DAG of \(\delta \) is the minimal DAG corresponding to D. If the number of inner nodes of the DAG is larger than the compacted size of D, this indicates that not all subtrees of D with multiple occurrences have properly been factored out in \(\delta \). Although obviously burdened with redundancy, such non-minimal DAGs cannot be excluded from the outset because automated theorem provers might produce them, as in general they do not always detect different subproof occurrences with identical structure.
A compacted D-term comprises not just the representation of a DAG, but also a labeling of some of its inner nodes. Nodes that receive such a label include in particular all root nodes and all nodes that have more than a single incoming edge. Figure 10 shows these labelings for the D-term \(\delta \) of Example 8: The unlabeled blank node corresponds to the subtree \(\textsf{D}(3,3)\) which has only a single incoming edge. In addition to labels that are necessary to describe the structure, a compacted D-term can provide labels for further nodes. In other words, its domain may include primitive D-terms that are neither a root nor occur “multiple times” in its range, where occurring “multiple times” in the range means occurring in different members of the range or with multiple occurrences in some member of the range.
Example 10
Consider the compacted D-term \(\delta \) of Example 8, whose DAG is shown in Fig. 5e and which represents the D-term d from Examples 2 and 6. The root of the following compacted D-term \(\delta '\) represents the same D-term as \(\delta \) and has the same number of inner nodes, but has with \(3'\) one more primitive D-term in its domain, which it maps to the subterm \(\textsf{D}(3,3)\) and which has just a single occurrence in its range. This occurrence is in \(\textsf{D}(2,3')\), which is the value of \(4\delta '\).
Example 11
The compacted D-term \(\delta _{\textsf{mer}}\) from Example 9, which represents the structure of Meredith’s proof from Fig. 6, is a compacted D-term where not all non-root members of the domain occur multiple times in the range, which is not difficult but somewhat tedious to verify: The primitive D-terms 2, 7, 11 and 15 each have only a single occurrence in the range of \(\delta _{\textsf{mer}}\).
Such node labels or domain members of a compacted D-term, which are superfluous for the purpose of describing the proof structure, can nevertheless be meaningful for the presentation of a proof, because they indicate a modularization into subproofs that is motivated by other reasons than the multiple occurrence of a subproof or multiple use of a lemma. For example, to exhibit a subproof obtained with a specific inference technique or to explicitly show the lemma proven by a subproof as an intermediate proof stage for better comprehension by humans.
3.3 Proof Structures, Formula Substitutions and Semantics
A CD proof combines structural aspects represented by a D-term, a full binary tree, with atomic formulas associated with its nodes. Similar to a CM proof of a clausal formula, a CD proof involves different instances of the input clauses, specifically the proper axioms and the detachment axiom Det. The atomic formulas associated with nodes are induced through unification from the axioms and, via instances of Det, the tree structure of the D-term. The atomic formula associated with the root of the tree is the “most general” formula proven by the D-term with respect to the given proper axioms. In particular, it proves all ground formulas that are instances of it and are obtained from Skolemizing a universally quantified goal formula. For goal-driven proof search, such a ground formula is taken into account from the beginning, such that fragments of D-terms whose root-associated formula does not subsume the goal can be excluded early through failure of unification.
We call the most general formula proven by a D-term with respect to given proper axioms the most general theorem (MGT) of the D-term. The MGT of a subproof \(d|_p\) of a proof d represents the lemma used in d at position p. This MGT is determined just by the subproof \(d|_p\) and the proper axioms. Thus, occurrences of the same subproof at other positions in d have the same MGT. There is a second useful way to associate formulas with positions in a D-term, the in-place theorem (IPT) of a D-term d at position p, which represents the actual instance of the lemma used in d at position p. Like the MGT, the IPT is determined through most general unification but, in addition to the subtree \(d|_p\), also with respect to the context of p in d. The notions of MGT and IPT as well as their interplay will be made precise in this subsection.
3.3.1 Most General Unifiers
CD involves the implicit characterization of proven lemmas as formulas that are most general in a certain sense, which can be specified with the notion of most general unifier, a standard concept in modern ATP. We use it here in a version that applies to a set of pairs of terms, as convenient in discussions based on the CM [4, 14, 16], and assume useful restricting properties suggested by Elmar Eder [14], gathered here under the label clean.
Definition 12
Let M be a set of pairs of terms and let \(\sigma \) be a substitution.
-
(i)
\(\sigma \) is said to be a unifier of M if for all \(\{s,t\} \in M\) it holds that \(s\sigma = t\sigma \).
-
(ii)
\(\sigma \) is called a most general unifier of M if \(\sigma \) is a unifier of M and for all unifiers \(\sigma '\) of M it holds that .
-
(iii)
\(\sigma \) is called a clean most general unifier of M if it is a most general unifier of M and, in addition, is idempotent and satisfies \( \mathcal {D}\hspace{-0.08em}om (\sigma ) \cup \mathcal{V}\mathcal{R}\hspace{-0.02em}ng (\sigma ) \subseteq \mathcal {V}\hspace{-0.11em}ar (M)\).
-
(iv)
If M has a unifier, then \(\textsf{mgu}(M)\) denotes some clean most general unifier of M. M is called unifiable and \(\textsf{mgu}(M)\) is called defined in this case, otherwise it is called undefined.
Convention 13
Proposition, lemma and theorem statements implicitly assert their claims only for the case where occurrences of \(\textsf{mgu}\) in them are defined.
Although a unifier of a finite set of pairs \(\{\{s_1,t_1\},\ldots ,\{s_n,t_n\}\}\) can be expressed as unifier of the single pair \(\{\textsf{f}(s_1,\ldots ,s_n), \textsf{f}(t_1,\ldots ,t_n)\}\), the explicit definition for a set of pairs fits well with the graphs in the CM and the related D-terms, where such sets of pairs naturally arise.
The additional properties required for a clean most general unifier do not hold for all most general unifiers.Footnote 17 However, the unification algorithms known from the literature produce clean most general unifiers [14, Remark 4.2]. If a set of pairs of terms has a unifier, then it has a most general unifier and, moreover, also a clean most general unifier. Since we define \(\textsf{mgu}(M)\) as a clean most general unifier, whenever necessary, we can assume that it is idempotent and that all variables occurring in its domain and range occur in M. Convention 13 has the purpose to reduce clutter in proposition, lemma and theorem statements.
3.3.2 Positional Variables
The atomic formulas associated with the nodes of a D-term are based on instances of the proper axioms and Det, which may conceptually be considered as obtained in two steps: first, “copies”, i.e., variants with fresh variables, are created; second, a substitution determined by the proof structure is applied to these copies. Let us begin with describing the first step. We only need formulas with specific variables, which we call positional variables because each of them is firmly tied to a term position. They are defined as follows.
Definition 14
-
(i)
For all positions p and positive integers i let \(x^{i}_{p}\) and \(y_p\) denote pairwise different variables. We call the variables \(x^{i}_{p}\) and \(y_p\) positional variables.
-
(ii)
For all sets P of positions define
With each leaf of a D-term d a dedicated copy of some proper axiom is associated. The variables \(x^{i}_{p}\) are for use in these copies, where the subscript p is the position of the leaf node in d. The upper index i serves to distinguish different variables within the copies, as indicated with the right side of the following exemplary equivalence, which holds for all positions p.
A variable \(y_p\) can be associated with each position p of a D-term. That each inner node of a D-term corresponds to a dedicated copy of the \({\textit{Det}} \) axiom is reflected in the following equivalence, which holds for all positions p.
Here the major premise of Det is written to the left of the minor one, matching the argument order of the \(\textsf{D}\) function symbol for proof tree construction. \( \mathcal {P}\!os \mathcal {V}ar(P)\) provides notation for referring to all positional variables associated with members of a given set P of positions, regardless of whether they are of the form \(y_p\) or \(x^{i}_{p}\).
The following substitution \(\textsf{shift}_{p}\) is a tool to systematically rename positional variables while preserving the internal relationships between the index-referenced positions.
Definition 15
For all positions p define the substitution \(\textsf{shift}_{p}\) as
The application of \(\textsf{shift}_{p}\) to a term s effects that p is prepended to the position indexes of all the positional variables occurring in s.
Example 16
In the second equality, observe that position 2.1.2 refers to the right child of position 2.1. After applying \(\textsf{shift}_{1.1}\), it is position 1.1.2.1.2 that, again, refers to the right child of position 1.1.2.1.
Applying a \(\textsf{shift}_{p}\) substitution to a term always yields a variant, as stated in the following proposition.
Proposition 17
For all terms s whose variables are positional variables (Definition 14.i) and for all positions p it holds that
Proof
Easy to see. \(\square \)
3.3.3 Axiom Assignments
The association of axioms with primitive D-terms is represented by a mapping which we call axiom assignment, defined as follows.
Definition 18
An axiom assignment \(\alpha \) is a mapping whose domain is a set of primitive D-terms and whose range is a set of terms whose variables are in \(\{x^{i}_{\epsilon } \mid i \ge 1\}\). We say that \(\alpha \) is for a D-term d if \( \mathcal {D}\hspace{-0.08em}om (\alpha ) \supseteq \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (d)\).
We write the application of an axiom assignment in postfix notation.
Example 19
The mapping
is an axiom assignment for all D-terms d with \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (d) = \{1\}\). Its sole range element is a variant of the argument term of Łukasiewicz in the form of the right side of (i), with p instantiated to the empty position \(\epsilon \). The application of \(\alpha \) to the primitive D-term 1 is written in postfix notation as \(1\alpha \).
Example 20
In Meredith’s proof presentation the axiom assignment is represented by the steps with no trailing D-term, such as line 1 in Fig. 5c, or line 1 in Fig. 6. The latter actually represents the same axiom assignment as Example 19.
3.3.4 Pairings
As noted in the beginning of Sect. 3.3.2, the clause instances involved in a CD proof may, similarly as in the CM, conceptually be considered as obtained in two steps. We now turn to the second step, the application of a substitution determined by the proof structure to the previously created clause copies. This substitution is characterized as the most general unifier of a set of term pairs that contains exactly one pair for each node, or position, of the D-term. The following definition specifies this pair for a given position.
Definition 21
For D-terms d, axiom assignments \(\alpha \) and positions \(p \in \mathcal {P}\!os (d)\) define the pair \(\textsf{pairing}_{\alpha }(d, p)\) of terms as
A unifier of the set of pairings of all positions of a D-term d equates for each leaf position p the variable \(y_p\) with the value of the axiom assignment \(\alpha \) for the primitive D-term at p, after “shifting” variables by p. This “shifting” means that the position subscript \(\epsilon \) of the variables in \(d|_p\alpha \) is replaced by p, yielding a dedicated copy of the axiom for the leaf position p. For inner positions p, which represent detachment steps, the unifier equates \(y_{p.1}\) and \(\textsf{i}(y_{p.2}, y_p)\), reflecting that the major premise of Det is proven by the left child of p. With respect to the connections shown for the case of a single axiom in Fig. 4, the pairing \(\{y_{p.1},\, \textsf{i}(y_{p.2}, y_p)\}\) for an inner position p is induced by connection 2 or 4, respectively, depending on whether \(y_{p.1}\) is an inner node or a leaf. Connections 3 and 5 would just induce the void requirement \(\{y_{p.2}, y_{p.2}\}\) and thus have no explicit correspondent in the specification of \(\textsf{pairing}\). An example of a set of pairings and its most general unifier is included in Example 25 below.
The following proposition shows an interplay of \(\textsf{pairing}\) and \(\textsf{shift}\), which is useful as a lemma in further derivations.
Proposition 22
For all D-terms d, axiom assignments \(\alpha \) for d and positions \(p \in \mathcal {P}\!os (d)\) it holds that
Proof
Easy to see. \(\square \)
3.3.5 In-Place Theorem (IPT) and Most General Theorem (MGT)
Based on the most general unifier of the set of pairings of all positions of a D-term, a specific formula can be associated with each position of the D-term, called the in-place theorem (IPT). The case where the position is the top position \(\epsilon \) is distinguished as most general theorem (MGT).
Definition 23
For D-terms d, axiom assignments \(\alpha \) and positions \(p \in \mathcal {P}\!os (d)\) define the in-place theorem (IPT) of d at p for \(\alpha \), \( Ipt _{\alpha }(d,p)\), and the most general theorem (MGT) of d for \(\alpha \), \( Mgt _{\alpha }(d)\), as
-
(i)
\(\textsf{P}(y_{p}\textsf{mgu}(\{\textsf{pairing}_{\alpha }(d, q) \mid q \in \mathcal {P}\!os (d)\})).\)
-
(ii)
.
Since \( Ipt \) and \( Mgt \) are defined on the basis of \(\textsf{mgu}\), they are undefined if the set of pairs of terms underlying the respective application of \(\textsf{mgu}\) is not unifiable. Hence, we apply Convention 13 for \(\textsf{mgu}\) also to occurrences of \( Ipt \) and \( Mgt \). If \( Ipt \) and \( Mgt \) are defined, they both denote an atom whose variables are constrained by the clean property of the underlying application of \(\textsf{mgu}\).
Let us illustrate the two formulas specified in Definition 23 in a more informal way, beginning with the conceptually simpler MGT. We assume that the axiom assignment \(\alpha \) is , that is, we have just a single proper axiom, Łukasiewicz, which is labeled by 1. The argument d of \( Mgt \) is a D-term. If it is a primitive D-term, that is, if \(d=1\), then \( Mgt _{\alpha }(d)\) is just a variant of the axiom Łukasiewicz, corresponding to the value of 1 in the axiom assignment. Otherwise d refers to some instance of the detachment axiom \(\textsf{P}x \wedge \textsf{P}\textsf{i}xy \rightarrow \textsf{P}y\). If, for example, \(d = \textsf{D}(1,1)\), then both premises of d are connected with two different instances of the axiom Łukasiewicz resulting in a substitution \(\sigma \) for x and y such that \( Mgt _\alpha (d)=\textsf{P}y\sigma \). In other words, the resulting MGT is the derived conclusion of the detachment axiom, applied to two copies of the proper axiom as premises.
In the general case we have more instances of the detachment axiom and more instances of the proper axiom involved; but the resulting MGT is still the derived conclusion of the applications of the detachment axiom, one application for each inner node of d. In such a more general case, we could be interested in the conclusion of some instance of the detachment axiom within the derivation other than the final one, say the one at position p. This situation is captured by the IPT, which renders exactly such a conclusion formula. The substitution to obtain the IPT is induced not only by the pairing constraints of the subtree rooted at position p, but also by the pairing constraints of its context in the overall proof.
In accounts of CD in type theory [24, 25] the MGT is considered as principal type-scheme or principal type. A primitive D-term is identified there with the associated axiom. A compound D-term \(\textsf{D}(d_1,d_2)\) is identified with the principal type of the application of a function with principal type \(d_1\) to an argument with principal type \(d_2\).
The following proposition relates IPT and MGT with respect to subsumption.
Proposition 24
For all D-terms d, axiom assignments \(\alpha \) for d and positions \(p \in \mathcal {P}\!os (d)\) it holds that
Proof
Can be shown in the following steps, explained below.
Step (3) follows easily from the definition of most general unifier. Step (4) is justified by Proposition 22, step (5) by Proposition 17. The remaining steps are obtained by expanding and contracting definitions. \(\square \)
By Proposition 24, the IPT at some position p of a D-term d is subsumed by the MGT of the subterm \(d|_p\) of d rooted at position p. An intuitive argument is that the only constraints that determine the most general unifier underlying the MGT are induced by positions of \(d|_p\), that is, below p (including p itself). In contrast, the most general unifier underlying the IPT is determined by all positions of d, including those that are not below p.
Example 25
This example shows for a given D-term the set of associated pairings (Definition 21) and its most general unifier (Definition 12), as well as the IPT and MGT for a specific position in the D-term (Definition 23). Let
That is, \(\alpha \) is an axiom assignment that maps the primitive D-term 1 to a variant of the argument term of axiom Simp whose variables are positional variables \(x_{\epsilon }^i\). Consider the D-term
Then \( \mathcal {P}\!os (d) = \{\epsilon , 1, 1.1, 1.2, 2\}\) and
Let . We can then calculate that
We are going to compare the IPT and MGT of
that is, the subterm of d at position 1. Then \(d' = \textsf{D}(1,1)\), \( \mathcal {P}\!os (d') = \{\epsilon , 1, 2\}\), and
Let . We can calculate that
Now \( Ipt (d,1)\) and \( Mgt (d|_1)\) can be determined as follows, where, to increase readability, we supplement additional variants with variable names p, q, r, s.
By Proposition 24 it holds that , that is,
which is easy to verify.
Side remark: In this simple example it holds that , that is, the MGT of d is a variant of the axiom Syll. There is some apparent redundancy inherent in d, because it does just prove what a strict subterm of it, the primitive D-term 1, proves. Such redundancies will be discussed in Sect. 4.
Semantics now enters the stage with the entailment relationship \(\models \). By universally closing the atoms on both sides of Proposition 24 we can relate MGT and IPT through entailment.
Proposition 26
For all D-terms d, axiom assignments \(\alpha \) for d and positions \(p \in \mathcal {P}\!os (d)\) it holds that
Proof
Follows from Proposition 24. \(\square \)
The following lemma expresses the core relationships between the structure of a proof (a D-term), the unifying substitution of the pairings (underlying the specification of IPTs) and semantic entailment of the formulas associated with positions in the structure (IPTs).
Lemma 27
[Junction Core Lemma] For all D-terms d, axiom assignments \(\alpha \) for d and positions \(p \in \mathcal {P}\!os (d)\) it holds that
-
(i)
If \(p \in \mathcal {L}\hspace{-0.05em}eaf\mathcal {P}\!os (d)\), then
-
(ii)
If \(p \in \mathcal {I}\hspace{-0.05em}nner\mathcal {P}\!os (d)\), then
$$\begin{aligned}{\textit{Det}} \wedge Ipt _{\alpha }(d,p.1) \wedge Ipt _{\alpha }(d,p.2)\; \models \; Ipt _{\alpha }(d,p).\end{aligned}$$
Proof
Let \(\sigma = \textsf{mgu}(\{\textsf{pairing}_{\alpha }(d, q) \mid q \in \mathcal {P}\!os (d)\})\) and assume it is defined.
(27.i) From Definition 23.i and Definition 21 we can conclude , which implies the proposition to be proven.
(27.ii) From Definition 23.i and Definition 21 we can conclude \( Ipt (d,p.1)\, =\, \textsf{P}(y_{p.1}\sigma )\, =\, \textsf{P}(\textsf{i}(y_{p.2}, y_p)\sigma )\), \( Ipt (d,p.2)\, =\, \textsf{P}(y_{p.2}\sigma )\), and \( Ipt (d,p)\, =\, \textsf{P}(y_{p}\sigma )\). Hence, we can rephrase the proposition statement as
By expanding the definition of \({\textit{Det}} \) and rearranging formula components, this entailment can be brought into the following form, which obviously holds as its right side is obtained from instantiating universal quantifiers on the left side.
\(\square \)
Based on Lemma 27, the following theorem expresses how Det together with the axioms referenced in leaves entails the MGT of a D-term.
Theorem 28
[MGT Entailment] For all D-terms d and axiom assignments \(\alpha \) for d it holds that
Proof
By induction on the structure of d it follows from Lemma 27 that
Contracting the definition of \( Mgt \), the right side of this entailment can be written as \( Mgt _{\alpha }(d)\). Since the left side of the entailment has no free variables, we can replace the right side with its universal closure and obtain the statement to be proven. \(\square \)
Theorem 28 states that Det together with the “axioms referenced in the proof”, that is, the values of the axiom assignment \(\alpha \) for the leaf nodes of the D-term d considered as universally closed atoms, entail the universal closure of the MGT of d for \(\alpha \).
In Meredith’s proof notation, the displayed formulas represent the universal closure of the MGT. In a line without trailing D-term, the formula is an axiom. In a line with a trailing D-term, the formula can be understood as derived in two alternate ways, both yielding the same result. First, as the universal closure of the MGT of the D-term after expanding the numeric labels into their defining trees, exhaustively until all primitive D-terms are axiom labels. Second, as the universal closure of the MGT of the trailing D-term as is, where its primitive D-terms are taken as labels of displayed formulas in the role of axioms.
4 Reducing the Proof Size by Replacing Subproofs
The term view on proof trees suggests to shorten proofs by rewriting subterms, that is, replacing occurrences of subproofs by other ones, with three main aims:
-
1.
To shorten given proofs, with respect to the tree size or the compacted size.
-
2.
To investigate given proofs—by humans or machines—whether they can be shortened by certain rewritings or are closed under these.
-
3.
To develop notions of redundancy for use in proof search. A proof fragment constructed during search may be rejected if it can be rewritten to a shorter one.
Of course, any given proof of some theorem could be trivially shortened by enumerating all smaller structures and checking whether one of them provides a proof of the theorem. Here our interest is in techniques for shortening proofs that require less computational effort because they are based on properties of subproofs of the given proof and involve criteria that can be evaluated on the basis of a smaller search space than the set of all smaller proofs. As in Sect. 3, we consider purely structural aspects separated from aspects involving formulas.
4.1 Structural Criteria for Reducing the Compacted Size
To convert a proof to a smaller one or to detect that a proof is redundant because of the existence of a smaller proof, it is essential to compare the size of proofs before and after replacing occurrences of subproofs. While for tree size the replacement of a subproof by a smaller one evidently results in a smaller overall proof, for compacted size the effects of subproof replacements are more intricate. In this subsection, a replacement criterion for reducing the compacted size is developed, which is stated as Theorem 38 below. The theorem is based on ordering relations on D-terms that are defined in terms of a strict version of \( \mathcal {S}{\hspace{-0.12em}ubeq} (d)\) (Definition 5.i), the set of all compound subterms of a D-term d.
4.1.1 Compaction Orderings
Definition 29
For D-terms d define
Definition 30
For D-terms d, e define
- (i)
- (ii)
We call the ordering relations \(d \mathrel {\ge _{\textrm{c}}}e\) and \(d \mathrel {>_{\textrm{c}}}e\) compaction orderings because they relate to compacted size rather than tree size. They compare D-terms d and e with respect to the superset relationship of their sets of those strict subterms that are compound terms. For example, \(\textsf{D}(\textsf{D}(\textsf{D}(1,1),1),1) \mathrel {>_{\textrm{c}}}\textsf{D}(1,\textsf{D}(1,1))\) because \(\{\textsf{D}(1,1),\, \textsf{D}(\textsf{D}(1,1),1)\} \supseteq \{\textsf{D}(1,1)\}\). The relation \(d \mathrel {>_{\textrm{c}}}e\) (Definition 30) can equivalently be characterized as \( \mathcal {S}{\hspace{-0.12em}ub} (d) \supset \mathcal {S}{\hspace{-0.12em}ub} (e)\). Hence, the underlying comparison is for \(\mathrel {\ge _{\textrm{c}}}\) with respect to the non-strict superset relationship and for \(\mathrel {>_{\textrm{c}}}\) the strict superset relationship. The \(\mathrel {\ge _{\textrm{c}}}\) relation is a preorder on the set of D-terms, while \(\mathrel {>_{\textrm{c}}}\) is a strict partial order. The subterm relationship includes the compaction orderings, as noted by the following proposition.
Proposition 31
For all D-terms d, e, f it holds that
-
(i)
If , then \(d \mathrel {\ge _{\textrm{c}}}e\).
-
(ii)
If and d is not of the form \(\textsf{D}(l_1,l_2)\) where both of \(l_1, l_2\) are primitive D-terms, then \(d \mathrel {>_{\textrm{c}}}e\).
-
(iii)
If and \(e \mathrel {\ge _{\textrm{c}}}f\), then \(d \mathrel {\ge _{\textrm{c}}}f\).
-
(iv)
If and \(e \mathrel {>_{\textrm{c}}}f\), then \(d \mathrel {>_{\textrm{c}}}f\).
Proof
Easy to verify. \(\square \)
According to Propositions 31.i and 31.ii the subterm relationship includes the compaction orderings, with an exception, as stated in the precondition of Proposition 31.ii. An example for this exception is but . However, \(d \mathrel {\ge _{\textrm{c}}}e\) or \(d \mathrel {>_{\textrm{c}}}e\) also holds in cases where , as demonstrated by the following example.
Example 32
The following table shows counterexamples for the converse statements of Propositions 31.i and 31.ii, that is, D-terms d and e where \(d \mathrel {\ge _{\textrm{c}}}e\) or \(d \mathrel {>_{\textrm{c}}}e\) holds but . The respective values of \( \mathcal {S}{\hspace{-0.12em}ub} (d)\) and \( \mathcal {S}{\hspace{-0.12em}ub} (e)\) underlying the definition of \(\mathrel {\ge _{\textrm{c}}}\) are shown in a second table.
The following proposition relates the compaction orderings to the compacted size of the compared D-terms.
Proposition 33
For all D-terms d, e it holds that
-
(i)
If d is compound and \(d \mathrel {\ge _{\textrm{c}}}e\), then \(\textsf {c-size} (d) \ge \textsf {c-size} (e)\).
-
(ii)
If \(d \mathrel {>_{\textrm{c}}}e\), then \(\textsf {c-size} (d) > \textsf {c-size} (e)\).
Proof
Easy to verify. \(\square \)
The converse statements of Propositions 33.i and 33.ii do not hold, as demonstrated by the following example.
Example 34
The following table shows two counterexamples for the converse statements of Propositions 33.i and 33.ii, that is, D-terms d and e such that \(\textsf {c-size} (d) > \textsf {c-size} (e)\) and . The respective values of \( \mathcal {S}{\hspace{-0.12em}ub} (d)\) and \( \mathcal {S}{\hspace{-0.12em}ub} (e)\) underlying the definition of \(\mathrel {\ge _{\textrm{c}}}\) are shown in a second table.
4.1.2 The SC Size Measure of D-Terms
Before we can state the main result of this subsection, Theorem 38, we need to define a further size measure of D-terms, which we call SC size, suggesting Sum of Compacted subterm sizes. This auxiliary measure is useful in termination arguments of repeated subterm replacement: The theorem shows a criterion under which replacing subterm occurrences of a D-term reduces the compacted size, but just non-strictly, whereas the SC size is reduced strictly. The SC size is defined as follows.
Definition 35
For D-terms d define the SC size of d as
The following two examples illustrate the SC size measure.
Example 36
Let d be the D-term
Then the set of subterms of d is
and \(\textsf {sc-size} (d) = 0+1+2+2+4 = 9\).
Example 37
If d, e are D-terms such that \(\textsf {c-size} (d) > \textsf {c-size} (e)\), then it does not necessarily hold that \(\textsf {sc-size} (d) \ge \textsf {sc-size} (e)\). The following \(\text{ D-terms } \) provide an example.
It holds that \(\textsf {c-size} (d) = 8 > 7 = \textsf {c-size} (e)\) but \(\textsf {sc-size} (d) = 27 \not \ge 28 = \textsf {sc-size} (e)\). The calculations of these values are based on the sets of subterms of d and of e, shown in the following, where the compacted size of the respective member is annotated at the right.
Hence \(\textsf {c-size} (d) = 8\), \(\textsf {sc-size} (d) = 0+1+2+2+3+3+4+4+8 = 27\), \(\textsf {c-size} (e) = 7\) and \(\textsf {sc-size} (e) = 0+1+2+3+4+5+6+7 = 28\).
4.1.3 Reducing the Compacted Size by Replacing Subproofs
We are now ready to state the following theorem, the main result of this subsection.
Theorem 38
[Reducing the Compacted Size by Replacing Subproofs] Let \(d,d',e,e'\) be D-terms such that e occurs in d, and \(d' = d{[}e \mapsto e'{]}\). It holds that
-
(i)
If e is compound and \(e \mathrel {\ge _{\textrm{c}}}e'\), then \(\textsf {c-size} (d) \ge \textsf {c-size} (d')\).
-
(ii)
If \(e \mathrel {>_{\textrm{c}}}e'\), then \(\textsf {sc-size} (d) > \textsf {sc-size} (d')\).
Proof
We begin with shared aspects of the proofs of both subtheorems. We can assume that the D-term e is compound, which is explicitly stated as precondition for Theorem 38.i and implied by the precondition \(e \mathrel {>_{\textrm{c}}}e'\) of Theorem 38.ii. There must exist a set \(\{d_1,\ldots ,d_n\}\) of compound D-terms for some \(n \ge 0\) such that the set of compound subterms of d can be characterized as the disjoint union of three particular subsets in the following way.
Let T be the set of those strict subterms of e that are compound and have in d an occurrence in a position other than as subterm of e. Clearly \( \mathcal {S}{\hspace{-0.12em}ub} (e) \supseteq T\). Thus, by (1) we can characterize S also as
Let \( \mathcal {C}\hspace{-0.12em}omp\mathcal {D} \) denote the set of all compound D-terms. The set of compound subterms of \(d'\) can then be characterized as follows.
From \(e \mathrel {\ge _{\textrm{c}}}e'\), which is a precondition of Theorem 38.i as well as Theorem 38.ii, it follows that \( \mathcal {S}{\hspace{-0.12em}ub} (e) \supseteq \mathcal {S}{\hspace{-0.12em}ub} (e')\). Since \( \mathcal {S}{\hspace{-0.12em}ub} (e) \supseteq T\) we can conclude from (3) that
We now turn to the two individual subtheorems.
(38.i) Since \(\textsf {c-size} (d) = |S|\) and \(\textsf {c-size} (d') = |S'|\) we have to show that \(|S| \ge |S'|\). From (4) it follows that \(1 + | \mathcal {S}{\hspace{-0.12em}ub} (e)| + |\{d_1{[}e \mapsto e'{]}, \ldots , d_n{[}e \mapsto e'{]}\}| \;\ge \; |S'|\). Since clearly \(n \ge |\{d_1{[}e \mapsto e'{]}, \ldots , d_n{[}e \mapsto e'{]}\}|\) it follows that \(1 + | \mathcal {S}{\hspace{-0.12em}ub} (e)| + n \;\ge \; |S'|\). Since (1) implies \(|S| = 1 + | \mathcal {S}{\hspace{-0.12em}ub} (e)| + n\), that is, |S| can be characterized as the left side of the previous disequation, it follows that \(|S| \ge |S'|\), which concludes the proof of the subtheorem.
(38.ii) From (4) it follows that
Given the precondition \(e \mathrel {>_{\textrm{c}}}e'\) we can conclude by Theorem 38.i that for each \(i \in \{1,\ldots ,n\}\) it holds that \(\textsf {c-size} (d_i) \ge \textsf {c-size} (d_i{[}e \mapsto e'{]})\). Hence
From the precondition \(e \mathrel {>_{\textrm{c}}}e'\) and Proposition 33.ii it follows that \(\textsf {c-size} (e) > \textsf {c-size} (e')\). From (4) and (6) we can then conclude
By (1), \(\textsf {sc-size} (d)\) can be characterized as follows.
Since the right side of (8) is identical to left side of (7) it follows that \(\textsf {sc-size} (d) > \textsf {sc-size} (d')\), the conclusion of the subtheorem to be shown. \(\square \)
Theorem 38.i states that if \(d'\) is the D-term obtained from d by simultaneously replacing all occurrences of a compound D-term e with a “c-smaller” D-term \(e'\), i.e., \(e \mathrel {\ge _{\textrm{c}}}e'\), then the compacted size of \(d'\) is less than or equal to that of d. Both, precondition and conclusion of the theorem involve non-strict comparisons, such that one may ask whether the strict precondition \(e \mathrel {>_{\textrm{c}}}e'\) would imply the strict conclusion \(\textsf {c-size} (d) > \textsf {c-size} (d')\). This is, however, not the case, as demonstrated by Example 39 below. Nevertheless, as stated with the supplementary Theorem 38.ii, the SC size is a measure that strictly decreases under the strict precondition \(e \mathrel {>_{\textrm{c}}}e'\). By this subtheorem, the number of replacements according to Theorem 38 that can be performed in succession with strict preconditions \(e \mathrel {>_{\textrm{c}}}e'\) is finite. The SC size by itself, however, seems not suitable as a size measure that refines the compacted size because, as already demonstrated by Example 37, there are D-terms \(d,d'\) with \(\textsf {c-size} (d) > \textsf {c-size} (d')\) but \(\textsf {sc-size} (d) < \textsf {sc-size} (d')\). Both of the following two examples exhibit particularities of subproof replacements according to Theorem 38.
Example 39
This example shows that strengthening the precondition \(e \mathrel {\ge _{\textrm{c}}}e'\) of Theorem 38.i to \(e \mathrel {>_{\textrm{c}}}e'\) does not in general permit the stronger conclusion \(\textsf {c-size} (d) > \textsf {c-size} (d')\). Let
Then e occurs in d and \(d' = d{[}e \mapsto e'{]}\), matching the preconditions of Theorem 38. Moreover, it holds that \(e \mathrel {>_{\textrm{c}}}e'\). By Theorem 38.i it follows that \(\textsf {c-size} (d) \ge \textsf {c-size} (d')\). Indeed, \(\textsf {c-size} (d) = \textsf {c-size} (d') = 4\). By Theorem 38.ii it follows that \(\textsf {sc-size} (d) > \textsf {sc-size} (d')\). Indeed, \(\textsf {sc-size} (d) = 10\) and \(\textsf {sc-size} (d') = 9\). These properties and values can be determined on the basis of the following intermediate results. That \(e \mathrel {>_{\textrm{c}}}e'\) follows since
The sets and underlying the calculation of \(\textsf {c-size} (d)\), \(\textsf {sc-size} (d)\), \(\textsf {c-size} (d')\) and \(\textsf {sc-size} (d')\) are as follows, where the compacted size of the respective member is annotated at the right.
Example 40
This example illustrates that the simultaneous replacement of all occurrences of e in d by \(e'\) is essential for Theorem 38 and that \(d'\), the formula after the replacement, can contain occurrences of e again. Let
Then e occurs in d and \(d' = d{[}e \mapsto e'{]}\), matching the preconditions of Theorem 38. Moreover, it holds that \(e \mathrel {>_{\textrm{c}}}e'\). By Theorem 38.i it follows that \(\textsf {c-size} (d) \ge \textsf {c-size} (d')\). Indeed, \(\textsf {c-size} (d) = 5\) and \(\textsf {c-size} (d') = 4\). Notice that e occurs in \(d'\), actually twice. The D-term \(d''\) is obtained from d by replacing just a single occurrence of e with \(e'\). Its compacted size is \(\textsf {c-size} (d'') = 6\), thus not less than or equal to that of d, \(\textsf {c-size} (d) = 5\). The sets of compound subterms of d, \(d'\) and \(d''\), which underlie the determination of their compacted size, are as follows.
The following proposition characterizes the number of D-terms that are smaller than a given D-term with respect to the compaction ordering \(\mathrel {\ge _{\textrm{c}}}\).
Proposition 41
For all compound D-terms d it holds that
Proof
Let S be the set whose cardinality is denoted by the left side of the proposition. Then
Since and \(\textsf {c-size} (d)\) is defined as \(| \mathcal {S}{\hspace{-0.12em}ubeq} (d)|\) it follows that
From the representation of S in the form (4) and (5) it follows that \(|S| = (\textsf {c-size} (d) - 1 + | \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (d)|)^2 + | \mathcal{D}\mathcal{P}\hspace{-0.12em}rim (d)|\), that is, the proposition statement. \(\square \)
By Proposition 41, for a given compound D-term d, the number of D-terms e that are smaller than d with respect to \(\mathrel {\ge _{\textrm{c}}}\) is only quadratically larger than the compacted size of d and thus also than the tree size of d. Hence techniques that inspect all these smaller D-terms for a given D-term can be used efficiently in practice. For example to find D-terms that can be replaced according to Theorem 38, that is, in view of the preconditions of the theorem, finding D-terms \(e'\) for a given D-term e. Or to classify a D-term as redundant because there exists a smaller D-term that proves the same.
4.2 Formula-Related Criteria for Subproof Replacement
According to Theorem 28, a CD proof, that is, a D-term d together with an axiom assignment \(\alpha \) proves the MGT of d for \(\alpha \) along with all instances of the MGT. If d is shortened by replacing subterms, the general objective is that at least these theorems are still proven. That is, the MGT of the modified D-term subsumes that of the original one. In this subsection we identify conditions that ensure that subterm replacement steps yield proofs with a MGT that subsumes the MGT before the replacement. These conditions will be stated as Theorems 45 and 46, which are both consequences of a central underlying property that will be stated as Lemma 44.
4.2.1 Decomposition of the MGU Associated with a D-Term
The proof of Lemma 44 involves several applications of a decomposition of the most general unifier “associated” with a D-term, that is, the most general unifier of the set of pairings of all its positions, with respect to a given axiom assignment \(\alpha \). This decomposition is specified now with Lemma 43, preceded by an auxiliary proposition, which shows a specific way to pass between sets of pairs of terms and most general unifiers.
Proposition 42
[14, Lemma 4.6] If M, N are sets of pairs of terms and \(\sigma \) is a most general unifier of M, then
-
(i)
\(M \cup N\) is unifiable if and only if \(N\sigma \) is unifiable.
-
(ii)
If \(\tau \) is a most general unifier of \(N\sigma \), then \(\sigma \tau \) is a most general unifier of \(M \cup N\).
Lemma 43
[Decomposition of the MGU Associated with a D-Term ] Let d be a D-term and let \(p_1, \ldots , p_n, q\), where \(n \ge 0\), be positions in \( \mathcal {P}\!os (d)\) such that for all \(i \in \{1,\ldots ,n\}\) it holds that \(p_i \not < q\). Then
where
Proof
Let
Then \(\sigma = \textsf{mgu}(S)\), \(\tau = \textsf{mgu}(T)\), and \(\gamma = \textsf{mgu}(G)\). From the definition of \(\textsf{pairing}\) (Definition 21) and the precondition \(p_i \not < q\) for all \(i \in \{1,\ldots ,n\}\) it follows that
The lemma can now be shown in the following steps, explained below.
Step (5) is obtained by expanding the definition of \(\sigma \), and step (6) follows since \(S = T \cup G\). Step (7) is obtained by Proposition 42.ii. By (2) and (1) it follows that \( \mathcal {V}\hspace{-0.11em}ar (G) \cap \mathcal {V}\hspace{-0.11em}ar (T) \subseteq \{y_{p_1},\ldots ,y_{p_n}\}\) and by (3) and (1) that \(\{y_{q}\} \cap \mathcal {V}\hspace{-0.11em}ar (T) \subseteq \{y_{p_1},\ldots ,y_{p_n}\}\). Since \( \mathcal {D}\hspace{-0.08em}om (\tau ) \subseteq \mathcal {V}\hspace{-0.11em}ar (T)\) we can replace \(\tau \) in (7) with its restriction to \(\{y_{p_1},\ldots ,y_{p_n}\}\) and obtain (8). Step (9) follows from Proposition 42.ii since . Finally, step (10) is obtained by Proposition 42.ii and the definition of \(\gamma \). \(\square \)
4.2.2 The Subproof Replacement Monotonicity Core Lemma
Lemma 44, stated and proven in this subsubsection, shows how the subsumption relationship of associated formulas transfers from subterm occurrences in a D-term to the D-term itself. The setting of the lemma is illustrated in Fig. 11.
Lemma 44
[Subproof Replacement Monotonicity Core Lemma] Let d, e be D-terms, let \(\alpha \) be an axiom assignment for d and for e, and let \(p_1, \ldots , p_n, q\), where \(n \ge 0\), be positions in \( \mathcal {P}\!os (d)\) such that for all \(i,j \in \{1,\ldots ,n\}\) with \(i \ne j\) it holds that \(p_i \not \le p_j\) and for all \(i \in \{1,\ldots ,n\}\) it holds that \(p_i \not < q\). If for all \(i \in \{1, \ldots , n\}\) it holds that
then
Proof
Define the shorthand \(d' = d[e]_{p_1}[e]_{p_2}\ldots [e]_{p_n}\). That is, \(d'\) is d with the subterm occurrences at \(p_1, \ldots , p_n\) replaced by e. Define the following sets of pairs of terms and substitutions.
Because the detailed proof is lengthy, we present it modularized into four parts, (I) Conversion of the Preconditions, (II) Determining the Instantiating Substitution \(\rho \), (III) Contexts where \(\rho \) is Void, and (IV) Deriving the Conclusion. Figure 11 may help to get an intuitive overview of the parameters of the lemma statement.
Part I. Conversion of the Preconditions
The following step is a precondition of the lemma to be proven.
The following statements, whose proofs are described below, show that \(\sigma \) when applied to \(y_q\) and \(y_{p_i}\) can be decomposed into \(\gamma \) followed by \(\mu \).
Step (2) follows from Lemma 43 with its parameters \(p_1, \ldots , p_n\) instantiated by the positions of the same name in the lemma to be proven but its parameter q instantiated to \(p_i\) for an arbitrary \(i \in \{1,\ldots ,n\}\). The precondition \(p_i \not < q\) for all \(i \in \{1,\ldots ,n\}\) of Lemma 43 then instantiates to \(p_j \not < p_i\) for all \(j \in \{1,\ldots ,n\}\), which follows from (1). Step (3) follows from Lemma 43 with all of its parameters \(p_1, \ldots , p_n, q\) instantiated by the positions of the same names in the lemma to be proven.
Let us consider now the precondition for an arbitrary \(i \in \{1,\ldots ,n\}\). Its left side can be converted by expanding and contracting definitions and step (2) as follows.
The conversion of the right side of the considered precondition is based on some auxiliary definitions and statements. For all \(i \in \{1,\ldots ,n\}\) define the following sets of pairs of terms and substitutions.
Then, as explained below, for all \(i,j \in \{1,\ldots ,n\}\) the following holds.
Step (9) follows immediately from the definitions of \(T'_i\), \(\overline{T}'_i\) and T. Step (10) follows from the definition of \(T'_i\) and the definition of \(\textsf{pairing}\) (Definition 21). Step (11) follows from (10) and (1). Step (12) follows from the definition of \(\overline{T}'_i\) and steps (10) and (1). Step (13) follows from the definition of \(\overline{T}'_i\) and step (11). Step (14) follows from the definition of \(\tau '\) and steps (9), (12) and (13). Step (15) follows from (14) and (10). Step (16) follows from (14), 11 and 1.
The right side of the precondition can now be converted in the following steps described below.
Step 18 is obtained from 17 by expanding the definition of \( Mgt \). Step 19 follows from Proposition 17, step 20 since by the definition of \(d'\) it holds that \(d'|_{p_i} = e\), and step (21) from Proposition 22. Step (22) is obtained by contracting the definition of \(T'_i\). Step (23) follows from (14). Note that (17) is independent from i and the conversion of (17) to (23) is possible for any \(i \in \{1,\ldots ,n\}\).
Because (4) and (8) as well as (17) and (23) are equal, we can now reformulate the precondition that for all \(i \in \{1,\ldots ,n\}\) it holds that as
Part II. Determining the Instantiating Substitution \(\rho \)
We show, as explained below, that for all \(i \in \{1,\ldots ,n\}\) there exists a substitution \(\rho _i\) with the following properties.
Steps (25) and (26) follow from (24). Step (27) follows from (26) and (15), step (28) from (26) and (16). Step (29) follows from (26) since the idempotence of \(\tau '\) is equivalent to \( \mathcal {D}\hspace{-0.08em}om (\tau ') \cap \mathcal{V}\mathcal{R}\hspace{-0.02em}ng (\tau ') = \emptyset \), which implies \( \mathcal {V}\hspace{-0.11em}ar (y_{p_i}\tau ') \cap \mathcal {D}\hspace{-0.08em}om (\tau ') = \emptyset \).
Step (28) justifies to define a substitution \(\rho \), which combines the substitutions \(\rho _i\) by forming their union:
The substitution \(\rho \) has the following properties, whose derivation is described below.
Step (30) follows from the definition of \(\rho \), given that for all \(i,j \in \{1,\ldots ,n\}\) with \(i \ne j\) it holds that \( \mathcal {V}\hspace{-0.11em}ar (y_{p_i}\tau ') \cap \mathcal {D}\hspace{-0.08em}om (\rho _j) = \emptyset \), which follows from (26) and (16). Step (31) follows from (30) and (25). Step (32) follows from the definition of \(\rho \) and steps (26), (14), and (10). Step (33) follows from the definition of \(\rho \) and step (29).
Part III. Contexts where \(\rho \) is Void
The variables occurring in members of the range of \(\gamma \) as well as \(y_q\) are contained in the same set of positional variables.
Step (34) follows from the definitions of \(\gamma \) and G and the definition of \(\textsf{pairing}\) (Definition 21). Step (35) follows from the precondition that for all \(i \in \{1,\ldots ,n\}\) it holds that \(p_i \not < q\). Now, let y be a positional variable and let v be a variable such that
From (34) and (35) it follows that
As proven below, then
Step (37) is proven by considering three cases (the first two overlap, the third applies if none of the first two applies):
-
1.
Case \(v \notin \{y_{p_1}, \ldots , y_{p_n}\}\). Then, by (36) and (32), \(v \notin \mathcal {D}\hspace{-0.08em}om (\rho )\), hence \(v\mu = v\rho \mu \).
-
2.
Case \(v \in \mathcal {D}\hspace{-0.08em}om (\tau ^{\prime })\). Then, by (33), \(v \notin \mathcal {D}\hspace{-0.08em}om (\rho )\), hence \(v\mu = v\rho \mu \).
-
3.
Case \(v \in \{y_{p_1}, \ldots , y_{p_n}\} \setminus \mathcal {D}\hspace{-0.08em}om (\tau ^{\prime })\). Then, by (31), \(v\gamma \mu = v\rho \). Since \(v \in \mathcal {V}\hspace{-0.11em}ar (y\gamma )\) and \(\gamma \) is idempotent it follows that \(v = v\gamma \). Hence \(v\mu = v\rho \), and, since \(\mu \) is idempotent, \(v\mu = v\rho \mu \).
Given the definition of v and y we can instantiate (37) to the following statements about the \(y_{p_i} \in \mathcal {D}\hspace{-0.08em}om (\tau ')\) for \(i \in \{1,\ldots , n\}\) and \(y_q\).
Part IV. Deriving the Conclusion
The conclusion of the lemma to be proven, that is,
can be reformulated as
For the left side, the reformulation follows since \( Ipt _{\alpha }(d,q) = \textsf{P}(y_{q}\gamma \mu )\), which can be derived analogously to steps (4)–(8), but by applying (3) instead of (2). For the right side it follows since , which can be derived by expanding definitions and, for the last step, applying Lemma 43.
To prove (40), we need a further auxiliary statement, which is derived along with an intermediate step about the domain of \(\gamma \) as explained below.
Step (41) follows from the definitions of \(\gamma \) and G and the definition of \(\textsf{pairing}\) (Definition 21). Step (42) can be shown as follows. Assume \(y_{p_i} \in \mathcal {D}\hspace{-0.08em}om (\tau ^{\prime })\). By (15) it follows that \( \mathcal {V}\hspace{-0.11em}ar (y_{p_i}\tau ^{\prime }) \subseteq \mathcal {P}\!os \mathcal {V}ar(\{r \mid p_i < r\})\). With (41) it follows that \( \mathcal {V}\hspace{-0.11em}ar (y_{p_i}\tau ^{\prime }) \cap \mathcal {D}\hspace{-0.08em}om (\gamma ) = \emptyset \), which implies (42). We can now proceed to prove the goal (40) as follows, explained below.
Step [23] follows from (38) and (31). Step (44) follows from (43) and (42). Step (45) follows from (44). Step (46) follows from (45) since \(\mu \) is idempotent. Step (47) holds since if \(y_{p_i} \notin \mathcal {D}\hspace{-0.08em}om (\tau ^{\prime })\), then \(y_{p_i}\tau ^{\prime }= y_{p_i}\). Step (48) follows from (46) and (47). Step (49) follows from (48). Step (50) follows from (49) and the definition of \(\nu \). Step (51) follows from (50). Finally, step (52), which is the goal to be proven listed above as (40), follows from (51) and (39). \(\square \)
4.2.3 Subproof Replacement Based on IPT and MGT
Lemma 44 is now applied to justify the following two theorems, which may be practically applied to modify proofs represented by a D-term together with an axiom assignment.
Theorem 45
[IPT-Based Subproof Replacement] Let d, e be D-terms, let \(\alpha \) be an axiom assignment for d and for e, and let \(p_1, \ldots , p_n\), where \(n \ge 0\), be positions in \( \mathcal {P}\!os (d)\) such that for all \(i,j \in \{1,\ldots ,n\}\) with \(i \ne j\) it holds that \(p_i \not \le p_j\). If for all \(i \in \{1, \ldots , n\}\) it holds that
then
Proof
The theorem expresses the special case of Lemma 44 with \(q = \epsilon \). The precondition of that lemma that for all \(i \in \{1,\ldots ,n\}\) it holds that \(p_i \not < q\) then holds trivially. The remaining preconditions are the same as those of Lemma 44. The conclusion is obtained from the conclusion of Lemma 44 by contracting the definition of \( Mgt \). \(\square \)
Theorem 45 states that simultaneously replacing a number of occurrences of possibly different subterms in a D-term by the same subterm with the property that its MGT subsumes each of the IPTs of the original occurrences results in an overall D-term whose MGT subsumes that of the original overall D-term. The following theorem is like Theorem 45, but restricted to the case of a single replaced subterm occurrence and with a stronger precondition, which refers to the MGT of that subterm instead of the IPT.
Theorem 46
[MGT-Based Subproof Replacement] Let d, e be D-terms and let \(\alpha \) be an axiom assignment for d and for e. For all positions \(p \in \mathcal {P}\!os (d)\) it then holds that if
then
Proof
Follows from Theorem 45 and Proposition 24. \(\square \)
Simultaneous replacements of subterm occurrences are essential for reducing the compacted size of proofs according to Theorem 38. For replacements according to Theorem 46 these can be achieved by successive replacements of individual occurrences. In Theorem 45 simultaneous replacements are explicitly considered because the replacement of one occurrence according to this theorem can invalidate the preconditions of another occurrence. Specifically, replacing an occurrence at some position \(p_1\) may result in a value of \( Ipt _{\alpha }(d,p_2)\) for another position \(p_2\) that subsumes its original value such that the precondition then fails. Hence, Theorem 46 is formulated just for a single subterm occurrence, while in Theorem 45 simultaneous replacement of multiple occurrences is explicitly taken into account.
The precondition of Theorem 46 is stronger then that of Theorem 45, permitting rewriting according to the theorem in fewer situations. Nevertheless, Theorem 46 can be useful in practice, in particular because its precondition can be evaluated on the basis of \(\alpha \), e and just the subterm \(d|_p\) of d, whereas determining \( Ipt _{\alpha }(d,p)\) for the precondition of Theorem 45 requires also consideration of the context of \(d|_p\) in d.
4.3 Specific Reductions and Regularities
Regularity is a well-known important device in tableau-based theorem proving (see, e.g., [23]): A clausal tableau is regular if none of its branches contains more than one occurrence of the same literal. Regularity is usually considered with respect to completeness, that is, it is shown that if there exists a proof (closed clausal tableau), then there exists one that is regular. Intuitively, this is justified because in a proof that is not regular, the subproof attached at the upper occurrence of the repeated literal can be replaced by the smaller subproof rooted at the lower one. Hence, a non-regular proof can be reduced to a smaller proof by replacing a subproof. From this point of view, regularity is just the failure of a particular form of reducibility. In theorem proving, proofs that are not regular can be excluded from the search space. If the objective is to shorten given proofs, the reductions associated with regularities can be applied.
On the basis of the tools developed in the previous sections several related forms of reduction that are suitable for proof shortening and, viewed as regularities, suitable for proof search can be naturally specified. Some of these are stronger than others, with weaker ones often suggesting advantage in ease and efficiency of implementation.
We group the considered reductions into two families, depending on whether they are based on the replacement of a single subproof occurrence with a subproof of itself, discussed in Sect. 4.3.1, or based on the replacement of all occurrences of a subproof by proofs that are smaller with respect to the compaction ordering, discussed in Sect. 4.3.2.
4.3.1 Reductions Based on Replacement by a Subterm
We consider the following reductions based on the replacement of a single subproof occurrence with a subproof of itself.
Definition 47
Let d be a D-term and let \(\alpha \) be an axiom assignment for d. For positions \(p, p' \in \mathcal {P}\!os (d)\) such that \(p < p'\), we say that the D-term is obtained from d for \(\alpha \) by
-
(i)
IS-reduction, if \( Ipt _{\alpha }(d,p') = Ipt _{\alpha }(d,p)\).
-
(ii)
MS-reduction, if .
-
(iii)
S-reduction, if .
The D-term d is called X-reducible (where X is IS, MS or S) for \(\alpha \) if and only if there exist positions \(p,p'\) such that \(d[d|_{p'}]_p\) is obtained by X-reduction from d for \(\alpha \). Otherwise, d is called X-regular.
In the names of the defined reductions, I and M indicate characterization solely in terms of IPTs and MGTs, respectively, and S indicates replacement of a single subproof occurrence, contrasted with C discussed below in Sect. 4.3.2.
Example 48
The D-term when considered for Syll-Simp (see Sect. 2.4) as axiom is IS-, MS- as well as S-reducible. For all three reductions the respective positions are \(p = \epsilon \) and \(p' = 1.2\). Hence \(d|_p = d\) and \(d|_{p'} = 1\) and the D-term \(d'\) obtained from the reduction is just 1. As a tree in indentation representation [28, Section 2.3, Figure 20c] d with associated MGTs and IPTs can be depicted as follows.
The nodes of the tree appear here from top to bottom in the order in which they are visited by pre-order traversal. We represent each node at a position q with the argument term of the MGT of \(d|_q\) and, separated by a slash, the argument term of the IPT of d at q. These argument terms are written in Łukasiewicz ’s notation. Variables are renamed to \(p,q,r,\ldots \), (unrelated to the use of p, q as symbols for positions in D-terms) starting freshly in each MGT and globally (corresponding to the notion of rigid variables—see, e.g., [23]) for the IPTs. Long terms are only partially presented. The nodes at positions p and \(p'\) are highlighted by framing and gray background, respectively. Observe that for 2 as position \(p'\), represented by the bottom line in the tree presentation, MS- and S-reduction would also be applicable, but not IS-reduction.
All three reductions specified in Definition 47 effect that a subterm occurrence \(d|_p\) of d is replaced by a D-term \(d|_{p'}\), which, because of the precondition \(p < p'\), is a strict subterm of \(d|_p\). Concerning structure, it follows that, if \(d'\) is obtained from d by one of these reductions, then \(\textsf {t-size} (d) > \textsf {t-size} (d')\), \(\textsf {c-size} (d) \ge \textsf {c-size} (d')\) and \(\textsf {sc-size} (d) \ge \textsf {sc-size} (d')\). Concerning the associated formulas, for all three reductions it holds that if \( Mgt _{\alpha }(d)\) is defined, then also \( Mgt _{\alpha }(d')\) is defined and . This subsumption relationship follows for IS-reduction from Theorem 45 together with Proposition 24, for MS-reduction directly from Theorem 46, and for S-reduction directly from Theorem 45.
From Proposition 24 it follows that IS-reduction and MS-reduction both are special cases of S-reduction. It can be shown with examples that of IS- and MS-reduction neither one is more general than the other. S-reduction, however, is strictly more general than both of IS- and MS-reduction, as demonstrated with the following example.
Example 49
The D-term when considered for Łukasiewicz as axiom is S-reducible but neither MS- nor IS-reducible. Analogously to Example 48, the D-term with associated MGTs and IPTs can be presented as follows.
IS-reduction corresponds to the well-known notion of regularity for rigid-variable tableaux (see, e.g., [23]), while MS-reduction corresponds to forms of regularity considered for tableaux with universal variables such as hypertableaux [3]. The strictly more general S-reduction combines aspects of both.
4.3.2 Reductions Based on the Compaction Ordering
We now turn to the second family of reductions that are based on the replacement of all occurrences of a subproof by proofs that are smaller with respect to the compaction ordering. The underlying justifications are Theorems 45 and 38. We define the following notions of reduction and regularity.
Definition 50
Let d be a D-term, let e be a subterm of d and let \(\alpha \) be an axiom assignment for d. For D-terms \(e'\), we say that the D-term is obtained from d for \(\alpha \) by
-
(i)
MC-reduction, if \(e \mathrel {>_{\textrm{c}}}e'\), \( Mgt _{\alpha }(e')\) is defined and .
-
(ii)
C-reduction, if \(e \mathrel {>_{\textrm{c}}}e'\), \( Mgt _{\alpha }(e')\) is defined, and for all positions \(p \in \mathcal {P}\!os (d)\) such that \(d|_p = e\) it holds that .
The D-term d is called X-reducible (where X is MC or C) for \(\alpha \) if and only if there exists a D-term \(e'\) such that \(d{[}e \mapsto e'{]}\) is obtained by X-reduction from d for \(\alpha \). Otherwise, d is called X-regular.
In the names of the defined reductions, M indicates, as for MS-reduction, characterization solely in terms of MGTs, and C indicates replacement based on the compaction ordering. While MC-reduction and C-reduction are similar to MS-reduction and S-reduction in that they compare two MGTs or an IPT with an MGT, respectively, they differ from these in that they are not based on the replacement of a single subproof by a subproof of itself, but on the replacement of all occurrences of a subproof by a subproof that is smaller with respect to the compaction ordering. They aim at reducing the compacted size. Differently from the IS-, MS- and S-reductions, they do not transfer from subterms to containing D-terms. It is, for example, possible that a subterm of a D-term is C-reducible while the D-term itself is not C-reducible. This does not come as a surprise, because a proof with smallest compacted size among all proofs of the same theorem may have a subproof of a lemma that has not the smallest compacted size among all proofs of the lemma. A possibility to implement MC- and C-reduction is by enumerating the set \(\{f \mid e \mathrel {>_{\textrm{c}}}f\}\) as indicated in the proof of Proposition 41.
If \(d'\) is obtained from d by MC- or C-reduction, then by Theorem 46 or Theorem 45, respectively, it follows that . Concerning structural properties, by Theorem 38 it follows that \(\textsf {c-size} (d) \ge \textsf {c-size} (d')\) and \(\textsf {sc-size} (d) > \textsf {sc-size} (d')\). Combining this with the structural effects of the reductions from Definition 47, we can conclude that for all the reductions specified in Definitions 47 and 50 it holds that
where the triples of numbers are compared lexically. Hence any succession of replacement steps with these reductions, intermingling them arbitrarily, terminates after a finite number of steps.
4.4 Removing Irrelevant Minor Premises: N-Simplification
Proofs may involve applications of Det where the conclusion \(\textsf{P}y\) is actually independent from the minor premise \(\textsf{P}x\). Any axiom can then serve as a trivial minor premise. Meredith expresses this with the symbol \(\textrm{n}\) as second argument of the respective D-term. The following function \(\textsf {simp-n} \) specifies a simplification of D-terms with respect to an axiom assignment \(\alpha \) that replaces subterms with \(\textrm{n}\) accordingly on the basis of the preservation of the MGT.
Definition 51
Let d be a D-term and let \(\alpha \) be an axiom assignment for d. Then the n-simplification of d with respect to \(\alpha \) is the D-term \(\textsf {simp-n} _{\alpha }(d)\), where \(\textsf {simp-n} \) is the following function.
N-simplification preserves the MGT of subterms in all positions, except of those that are replaced by \(\textrm{n}\). That is, if \(d' = \textsf {simp-n} _{\alpha }(d)\), then for all positions \(p \in \mathcal {P}\!os (d')\) such that \(d'|_p \ne \textrm{n}\) it holds . The particular effect of n-simplification is that occurrences of complex subterms of a D-term may be replaced by the primitive D-term \(\textrm{n}\), resulting in a shortened proof. We will see examples of the effect of n-simplification in Sects. 5.4 and 6.2.
In some applications it is undesirable to have \(\textrm{n}\) as a special primitive D-term symbol. For example, if there is originally a single proper axiom like Łukasiewicz, the D-terms then can have two different leaf symbols, altering combinatoric properties such as the number of different D-terms of a given tree size or compacted size. This can be addressed by using instead of \(\textrm{n}\) just other primitive D-terms that identify an arbitrary axiom, such as the numeral 1 in previously considered example proofs. The size reduction achieved by n-simplification is then retained, only the explicit marking of independence from the minor premise expressed by \(\textrm{n}\) is lost. When required, however, this marking can easily be restored with an application of conventional n-simplification, which then has just the effect of replacing occurrences of primitive D-terms by \(\textrm{n}\).
5 Inspecting Łukasiewicz ’s Proof and Its Variation by Meredith
As noted in Sect. 2.2, Łukasiewicz [37] has formally proven that his axiom Łukasiewicz entails Syll, Peirce and Simp, and Meredith [48] presented a variation of this proof in his framework of CD, reproduced here as Fig. 6 (p. 13). Can we learn something from these proofs that helps to improve ATP? Developed with only human resources, do they lie among the vast combinatory possibilities within some smaller space that can be characterized by certain features, regular patterns and size restrictions of involved components? To approach these questions, we take a close look at these proofs, inspecting each of their subproofs for various properties.
This section provides a comprehensive analysis of these historic proofs. It takes into account the accumulated knowledge from nearly a century of research as well as new insights. The latter as well as the entire analysis have been made possible due to the formal basis established in the preceding sections. The results of our analysis are presented in a condensed form in the Tables 2 and 4, which are discussed in detail throughout this section.
5.1 The Considered Proofs
We call the two proofs considered in this section \(D_{\textsf {MER}}\) and . Basically, each proof can be understood as a set of three D-terms, one D-term for each of the goal theorems, Syll, Peirce and Simp, which are proven from the axiom Łukasiewicz. The set of the three trees is represented by a single DAG with three roots, one for each goal theorem. The DAG represents the proofs of the three goals simultaneously such that subproofs used for more than a single goal can be shared. In the TPTP the three goals appear separated as problems LCL038-1, LCL083-1 and LCL082-1, respectively. On occasion, we consider information about the modularization of the original proof presentations by Meredith and Łukasiewicz, which is not captured by the set of D-terms and the respective minimal DAG alone, but would be rendered formally by a representation as compacted D-term, as discussed in Sect. 3.2.5.
Proof \(D_{\textsf {MER}}\) is Meredith’s variation [48] of Łukasiewicz ’s proof [37] and is expressed with CD. Figure 6 reproduces the presentation by Meredith. Proof is a CD proof that results from a conversion of Łukasiewicz ’s proof [37], originally expressed by the method of substitution and detachment, with explicitly annotated formula substitutions. We first converted Łukasiewicz ’s original proof straightforwardly to CD. In the result the structure of the detachment applications is strictly retained, while the formula substitutions are considered only implicitly by unification with most general unifiers. The lemma formulas of the intermediate stages, or “theses” [37], are then most general theorems of the respective subproofs. These differ slightly from Łukasiewicz ’s original theses: in most cases both are identical modulo renaming of variables, and in some cases Łukasiewicz ’s thesis is a strict instance of the most general theorem. As a second conversion step we applied n-simplification (Sect. 4.4) to eliminate trivial redundancies. Figure 12 shows the resulting proof , in the notation by Meredith [48], arranged into intermediate steps that match Łukasiewicz ’s original presentation. Figure 13 shows the label dependency ordering of that proof.
5.2 Examined Properties
In the following subsections we are going to inspect each subproof of \(D_{\textsf {MER}}\) and in view of various properties. We use the term property there informally in a generic sense. More precisely, we consider properties of the subproof’s structure, properties of the formula proven by the subproof and properties which take contexts into account, specifically the embedding of occurrences of the subproof into the overall proof and global contexts such as other proofs of the formula proven by the subproof or uses of this formula in the relevant literature.
The considered properties can be grouped into several families. We start with discussing aspects around labeling and naming: which lemmas are explicitly exposed and which are taken as implicit intermediate step; what cross correspondences are among proofs and with formulas well-known in other contexts. Next we examine structural properties of the D-Term and then syntactic properties of the MGTs, that is, the lemmas proven by the subproofs. We continue with considering properties that relate a subproof to all possible proofs of its MGT, for example, to compare with a minimal D-term measure such as compacted size required to prove the MGT. Then we will look at regularity properties as discussed in Sect. 4.3 and finally at properties of the IPTs, which are associated with each occurrence of a subproof in the overall proof when viewed as tree.
Values of the properties are shown for \(D_{\textsf {MER}}\) in Table 2 and for in Table 4. These tables contain a row for each distinct subproof, that is, for each subterm of at least one of the three D-terms corresponding to the three goal theorems. Even if a subproof is referenced multiple times in the proof, it is represented just by a single line. The number of rows is thus the compacted size of the set of the three D-terms, plus one for the axiom.
In the following subsections we specify these properties and discuss their values for \(D_{\textsf {MER}}\) and . Because in both proofs we have Łukasiewicz as a single axiom, we quietly assume the corresponding axiom assignments, not distinguishing between proofs and D-terms, their structural component. For reference and use in the table headers, each property is given a short identifier.
5.3 Labels and Names of Formulas
Properties concerning labels and names of formulas refer to the MGT proven by the subproof, independently of the subproof itself. We consider the concordance with the presentations by Meredith and Łukasiewicz as well as appearances of the MGT formula in the literature, independently from the particular proofs.
5.3.1 MER, ŁUK: Corresponding Step in Meredith’s and Łukasiewicz ’s Proof Presentation
Properties MER and ŁUK show the number of the corresponding step in the original proof presentations by Meredith [48], indicated by the prefix M, and Łukasiewicz [37], indicated by the prefix Ł. In some cases the referenced lemma in the original presentation is not the MGT but just a strict instance of the MGT, which is indicated by prefixing the reference with .Footnote 18
Through presence or absence of an entry in the table, the respective columns show for which of the subproofs the proof goal is explicitly displayed as a lemma in the original presentation and which are considered just as implicit “unnamed” intermediate steps, as discussed in Sect. 3.2.5.
To indicate a specific correspondence of two steps in \(D_{\textsf {MER}}\) and we use M16 \(\,'\) as an additional reference to the most general theorem of subproof \(\textsf{D}16.16\) of the proof of step 17 in \(D_{\textsf {MER}}\) (Fig. 6, p. 13), which is strictly more general than Łukasiewicz ’s [37] thesis 26. M16 \(\,'\) appears as the most general theorem of subproofs 30 and 31 in Tables 2 and 4, respectively.
The cross reference columns in Table 2 and \({{\textbf {MER}}}\) in 4 include gray bullets to indicate that the MGT of the respective row is also the MGT of some subproof of the referenced proof, but is not made explicit with a label there. Actually all fields of the cross reference columns that do not contain a label are filled with a gray bullet, with exception of subproof 26 of , whose MGT does not appear in \(D_{\textsf {MER}}\).
5.3.2 NN: Pointer to Nicknames if it is a Generally Often Used Formula
As noted in Sect. 2, the Łukasiewicz school introduced nicknames for important and often referenced formulas [53, p. 319], [67]. With regard to ATP it was conjectured that these may be of special importance for guiding proof search [67, p. 112]. Table 3 lists all those MGTs of subproofs of \(D_{\textsf {MER}}\) and that are known under such a name. The names considered there include those collected by Ulrich [67]. In addition, beyond the combinators appearing already there, also short combinator terms with well-known combinators are listed here as names of their principal type-scheme.Footnote 19 As a further source of “names” we took Thesis 1–68 from a textbook by Łukasiewicz [38]. These were often used for experiments in ATP [44, 74, 80,81,82]. If the MGT of a subproof is a named formula in this sense, this is indicated as property NN, whose value points to the respective row of Table 3.
5.4 Structural Properties of the D-Term
Structural Properties of the D-Term refer to the respective subproof as D-term or full binary tree.
5.4.1 DC, DT, DH: Compacted Size, Tree Size and Height
The properties DC, DT, DH describe the basic dimensions of the subproof’s structure: compacted size, tree size and height. For the proofs of \({\textit{Syll}} \) (subproof 32 of \(D_{\textsf {MER}}\) and subproof 33 of ), the compacted size of \(D_{\textsf {MER}}\) improves with 31 upon that of , which is 32, by one. Both proofs have the same height, 29. However, with respect to tree size \(D_{\textsf {MER}}\) with 491 is larger than with only 435.
N-simplification has no reducing effect on \(D_{\textsf {MER}}\), but was applied to obtain from the straightforward conversion of Łukasiewicz ’s original proof to CD. There, n-simplification effected on the subproof of \({\textit{Syll}} \) a reduction of the tree size from 563 to 435, while compacted size and height remained unchanged, 32 and 29, respectively.
Dimensions for the whole proof of the three goal theorems \({\textit{Syll}} \), \({\textit{Peirce}} \) and \({\textit{Simp}} \) as a set of the three D-terms can be determined as follows. The compacted size is the total number of compound subterms, that is, the number of rows in the respective tables, not counting the first row, which represents the primitive D-term 1 that corresponds to the axiom. Thus the compacted size of \(D_{\textsf {MER}}\) is 33, improving by one on that of , which is 34.
As tree size of the overall proof, the set of the three D-terms for the goal theorems, we take the sum over the tree sizes of its members, which is \(491+159+19 = 669\) for \(D_{\textsf {MER}}\) and \(435+131+19 = 585\) for . As height of the overall proof we take the maximum of the height of its members, which is \(\textsf{max}(\{29,25,10\}) = 29\) for \(D_{\textsf {MER}}\) and \(\textsf{max}(\{29,27,10\}) = 29\) for . For the CD conversion of Łukasiewicz ’s proof before n-simplification, the overall dimensions are: compacted size 34, tree size 751, and height 29.
5.4.2 DI: Number of Incoming DAG Edges
With DI we refer to the number of incoming edges in the DAG representation of the overall proof of all theorems. The roots of the DAG, corresponding to the goal theorems, can be identified by the DI value 0. In both tables it can be observed that there are rows such as the row for subproof 5 of \(D_{\textsf {MER}}\) in Table 2 where the DI value is 1 and there is nevertheless an entry in the column with the labels of the original proof presentation, \({{\textbf {MER}}}\) for Table 2 and for Table 4. These rows exemplify the use of labels by Meredith and Łukasiewicz to modularize proofs as addressed in Sect. 3.2.5.
5.4.3 DR: Repeats
DR denotes the total number of occurrences in the set of expanded trees of all roots of the DAG. Because leaves labeled by n-simplification with \(\textrm{n}\) are not considered here, the number of occurrences of the primitive subproof 1 shown in the tables is smaller than the total number of leaves of the three trees, which is the overall tree size plus one, that is, 670 for \(D_{\textsf {MER}}\) and 586 for .
5.4.4 DS: Structural Relationship between the Subproofs of Major and Minor Premise
DS describes special cases of the structural relationship between the subproofs of major and minor premise. Possible values are identity, expressed with \(=\), the strict subterm and superterm relationships expressed with \(\lhd \) and \(\rhd \), respectively, and the strict compaction ordering relationship (if none of the other relationships holds) expressed with \(\mathrel {<_{\textrm{c}}}\) and \(\mathrel {>_{\textrm{c}}}\).Footnote 20 In addition, it is indicated if a premise is the axiom or is \(\textrm{n}\).
Consideration of this property was motivated by the empiric observation that for most subproofs of \(D_{\textsf {MER}}\) and the subproofs of both premises are related by the subterm relationship. In fact, in each of the proofs \(D_{\textsf {MER}}\) and the value of \({{\textbf {DS}}}\) is for all compound subproofs with exception of two ones either \(=\), \(\lhd \) or \(\rhd \). This observed pattern can actually be reversed into a proof construction method that succeeds for many CD problems, also with multiple axioms, and leads in some cases to proofs with small compacted size where the exhaustive search for proofs with guaranteed smallest compacted size appears unfeasible [74]. In Sect. 6.3 we will specify this method and show a particularly short proof of Syll from Łukasiewicz obtained with it.
5.4.5 DP: Is Prime
DP expresses that DT and DC are the same. We call D-terms with this property prime, because they do not have repeated subterms that can be “factored” in a DAG representation. Assuming a singleton set \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim = \{1\}\) of primitive D-terms, the property can be characterized in different ways: (1) DT and DC are the same. (2) DT and DH are the same. (3) Every compound subterm of the given D-term has only a single occurrence in it. (4) The given D-term is a member of \(\bigcup _{i = 0}^\omega \mathcal {P}\hspace{-0.05em}rime\mathcal {L}\hspace{-0.05em}evel (i)\). where for natural numbers \(n \ge 0\) the set \( \mathcal {P}\hspace{-0.05em}rime\mathcal {L}\hspace{-0.05em}evel (n)\) of D-terms is specified inductively as
-
1.
.
-
2.
.
-
3.
.
Characterization (3) suggests to perform proof search by enumerating D-terms for increasing values of \( \mathcal {P}\hspace{-0.05em}rime\mathcal {L}\hspace{-0.05em}evel \). Members of \( \mathcal {P}\hspace{-0.05em}rime\mathcal {L}\hspace{-0.05em}evel (n)\) have size (compacted size, tree size or height, which are identical for them) n. The number of distinct prime D-terms of a given size n grows by the sequence oeis:A011782 of integers [49] (Table 5), i.e., 1 for \(n=0\) and \(2^{n-1}\) for \(n > 0\), which is much slower than the growth for compacted size, tree size or height shown in Table 1 on p. 22.
For \(D_{\textsf {MER}}\) we observe in Table 2 that the subproofs 1–18 are exactly those that are prime. Moreover, all these prime proofs in \(D_{\textsf {MER}}\) are a subproof of a single subproof, subproof 18. This suggests that proof search may be decomposed into two phases. First, identifying a small number of “maximal prime proofs” or “prime cores” [77], such as subproof 18 in \(D_{\textsf {MER}}\) for axiom Łukasiewicz. This is in a search space that—narrowed through the prime property and possibly further properties—relatively quickly leads beyond small proof sizes for which all structures can be trivially explored. Second, further search with the MGTs of the prime cores available as proven lemma formulas. Such experiments were performed for deriving Syll from Łukasiewicz with Prover9 [43] as prover for the second phase, leading to proofs with much smaller compacted size (44) than obtained by Prover9 alone (80–94, see Sect. 6.2) [77, 78]. Yet above the size of the human-made proofs (31–32) and a machine proof obtained with another technique (22) described in Sect. 6.3.
5.4.6 DK \(_L\), DK \(_R\): Left and Right Successive Height
DK\(_L\), DK\(_R\) are the maximal number of successive edges going to the left and right, respectively, on any path from the root to a leaf. These properties were motivated by the observation that in \(D_{\textsf {MER}}\) and these values are relatively low compared to the height of the subproof. This suggests that limiting them could restrict the number of candidate structures during proof search. Both proofs would, for example, satisfy the constraint \({{\textbf {DK}}}_L^2 \le 2.5*{{\textbf {DH}}}\) and \({{\textbf {DK}}}_R^2 \le 2.5*{{\textbf {DH}}}\). Whether such restrictions can indeed be successfully used in proof search has not yet been settled. Empirical observations obtained in our experiments suggest that with structure enumeration for increasing tree size they lead to a linear reduction of the number of considered trees. Namely, while the numbers of full binary trees of tree sizes 13 and 14 are 742,900 and 2,674,440, respectively (oeis:A000108), with the above constraints these numbers are roughly halved to 385,234 and 1,405,546, respectively. With enumeration for increasing height the reduction seems stronger: The number of binary trees of heights 4 and 5 are 651 and 457,653, respectively (oeis:A001699). With the above constraints, the numbers are reduced to 231 and 9,153, respectively.
5.5 Properties of the MGT
Here we discuss properties of the argument term f of the MGT \(\textsf{P}(f)\) of the respective subproof.
5.5.1 FC, FT, FH: Compacted Size, Tree Size and Height
The properties FC, FT, FH describe the basic dimensions of f. They are defined now for terms in full analogy to the respective measures for D-terms (Definitions 5.ii and 4): The compacted size FC is the number of inner nodes of the minimal DAG representing the tree; the tree size FT is the number of inner nodes, in other words, the number of occurrences of function symbols of arity larger than 0; the height FH is the length (number of edges) of the longest downward path from the root to a leaf. In the literature, the term height is also called term depth.
The maximal FT value in \(D_{\textsf {MER}}\) as well as in is 15. It pertains in both proofs to the same formula, which, moreover, happens to appear in both proofs as MGT of the respective subproof number 15. In Meredith’s presentation it is just an implicit intermediate formula, indicated by the empty value of \({{\textbf {MER}}}\) in Table 2, whereas in Łukasiewicz ’s presentation it is made explicit as thesis number Ł13. With respect to the tree size, this formula stands out: the next largest value of \({{\textbf {FT}}}\) is 12, which pertains in both proofs to two subproofs. The maximal value of FH in both proofs is 6 and pertains in each of the proofs to two subproofs, including that with the \({{\textbf {FT}}}\) value 15.
Deleting inferred formulas whose tree size or height exceeds a threshold are basic techniques to restrict the search space of resolution provers. Corresponding Prover9 options are for example max_weight and max_depth [43]. The default measure used as term weight by Prover9 is linearly related to the tree size as defined here. CD problems are processed by Prover9 in default settings with positive hyperresolution. The inferred resolvents are then actually MGTs of D-terms that can be associated with the hyperresolution derivations. In contrast, clausal tableau provers with rigid variables do not explicitly construct these MGTs; they only construct the deeper instantiated IPTs associated with particular nodes of the tableau tree. Hence, restricting the search space by limiting term dimensions of MGTs is usually not available for clausal tableau provers.
Blending goal-driven structure enumeration with axiom-driven structure enumeration that permits the application of heuristic limitations to MGTs was recently studied for CD problems; it led to a drastic improvement compared to conventional clausal tableau provers [76].
5.5.2 FV: Number of Distinct Variables
Like FT and FH, the property FV, that is, the number of distinct variables, is commonly used in resolution provers as a threshold to delete inferred formulas that exceed it. In Prover9 this threshold can be specified with the max_vars option. The discussion in Sect. 5.5.1 on the availability of MGTs for heuristic restrictions applies here as well.
5.5.3 FO: Is [Weakly] Organic
The organic property FO of a propositional formula, with respect to a set of axioms, says that it has no strict subformula that is itself a theorem entailed by the axioms. With our wrapper predicate \(\textsf{P}\) this means that an MGT \(\textsf{P}(f)\) is organic if f has no strict subterm \(f'\) such that \(\forall \textsf{P}(f')\) is entailed by the given axioms. Łukasiewicz and his collaborators aimed at finding axiomatizations of propositional logics with axioms that are organic [37, 40]. For axiomatizations of fragments of propositional logic, the organic property can be checked by a SAT solver. In the proofs \(D_{\textsf {MER}}\) and we observe that with a few exceptions the MGTs of all subproofs actually are organic. The exceptions can, however, be ascribed a weakened form of organic that is specified as follows: We call an atomic formula \(\textsf{P}(f)\) weakly organic if it is not organic and f is an implication \(\textsf{i}(p,g)\) (or \( Cpg \) in Łukasiewicz ’s notation) where p is a variable that does not occur in g and \(\textsf{P}(g)\) is organic. The weakly organic property is indicated in the property tables by a gray bullet.
5.6 Comparisons with all Proofs of the MGT
The properties considered in this subsection apply to all proofs of the MGT of the respective subproof, regarded as a set of D-terms.
5.6.1 MC, MT: Minimal Compacted and Tree Size of a Proof
The values of MC and MT are the minimal compacted size of a proof of the MGT and the minimal tree size of a proof of the MGT, respectively. These values may be hard to determine such that they often can only be narrowed down to an integer interval. Values of these properties were found with the provers CCS [75] and SGCD [76] in configurations that exhaustively search for proofs with a given compacted size or tree size, respectively.
In particular for the goals Peirce and Simp (subproofs 33 and 34 in Table 2, subproofs 34 and 35 in Table 4) it can be observed that the compacted size DC and tree size DT are much larger than the respective minimal values MC and MT. This is understandable because the apparent aim of Meredith and Łukasiewicz was to reduce the overall compacted size. Peirce and Simp are thus proven in \(D_{\textsf {MER}}\) and not as standalone problems but as side results from the given proof of \({\textit{Syll}} \). Subproofs of that proof are permitted to be re-used there without increase of the overall compacted size.
5.7 Regularity
The regularity properties hold for the respective subproof as D-term.
5.7.1 RS, RC: Is S-Regular, Is C-Regular
These properties are regularities as specified in Definitions 47.iii and 50.ii. In \(D_{\textsf {MER}}\) and all subproofs are S-regular, with the exception of a single subproof that derives Peirce as a side result. In \(D_{\textsf {MER}}\) there is just a single subproof that is not C-regular, while in C-regularity fails for nine subproofs, indicating a greater redundancy.
5.8 Properties of Occurrences of the IPTs
The respective subproof has DR (see Sect. 5.4.3) occurrences in the overall proof as a set of trees. The following properties refer to the multiset of the arguments f of the IPTs \(\textsf{P}(f)\) of all these occurrences.
5.8.1 IT \(_U\), IT \(_M\): Tree Size of the IPTs—Maximum and Rounded Median
IT\(_U\) and IT\(_M\) indicate the tree size of the members of the considered multiset by the values of the maximum and the rounded median. These values may be compared with FT, the tree size of (the argument term of) the MGT. In particular for subproofs that appear at deeper levels in the overall proof, IT\(_U\) and IT\(_M\) are much larger than FT, illustrating Proposition 24. The largest tree size of (the argument term of) an IPT in \(D_{\textsf {MER}}\) as well as is 4451. It is the value of an instance of the axiom, where the tree size of (the argument term of) the MGT, that is, the axiom formula itself, is just 6.
5.8.2 IH \(_U\), IH \(_M\): Height of the IPTs—Maximum and Rounded Median
IH\(_U\) and IH\(_M\) indicate the height of the members of the considered multiset by the values of the maximum and the rounded median. Compared with FH, the height of (the argument term of) the MGT, they are similarly as in the comparison of IT\(_U\) and IT\(_M\) with FT much higher for subproofs appearing at deeper levels, however on a quite different scale: The largest height of (the argument term of) an IPT in \(D_{\textsf {MER}}\) as well as is 18, for an instance of the axiom, where the height of (the argument term of) the MGT is 3.
6 Proofs of Syll from Łukasiewicz by ATP Systems
Deriving Syll from Łukasiewicz and Det, that is, showing the validity of ŁDS (Sect. 2), or solving TPTP problem LCL038-1, which was achieved without a computer by Łukasiewicz [37], was brought up as a challenge problem for ATP by Frank Pfenning in 1988 [51]. In this section we summarize the achievements of ATP systems on the problem since then and report the dimensions of proofs found by Prover9 [43], which in essence are CD proofs. For the proofs by Prover9 we show the effects of the novel reductions introduced in Sect. 4. Finally we present a new proof, which is much shorter than all known ones. It has been obtained with a novel technique inspired by observations made at the investigation of the human-made proofs.
6.1 From a Challenge Problem to a Not-that-Easy Zero-Rated Problem
According to Larry Wos et al. [84] Syll, Peirce and Simp could be derived in 1990 by OTTER [42] in about 11 h. Techniques were weighting formulas by symbol count and hyperresolution as inference rule. In 1992 OTTER needed about 8 h, generating 6.7 million clauses and keeping about 20 thousands clauses to derive Syll, while the parallel prover Roo achieved a nearly linear speedup for the problem, solving it with 24 processes in about 21 min [41]. The inference rule was hyperresolution, and forward subsumption (but not back subsumption) was applied. In addition, to conserve memory, generated clauses with more than 20 symbols were discarded. Also in 1992 strategies for CD with Otter were compared [44]. Depending on the strategy, OTTER could derive Syll in about 2–4 h. As mentioned there, proving Syll from Łukasiewicz was the first truly difficult CD theorem proved by OTTER and has been used extensively as a benchmark for parallel deduction programs. CODE [21], a dedicated solver for CD from 1997 apparently could also solve the problem.
Branden Fitelson and Wos [18] studied various classes of “missing” proofs. Łukasiewicz ’s proof is there the leading example of a proof with omissions, where subproofs of some steps are missing. Łukasiewicz ’s presentation shows 28 steps. The objective of Fitelson and Wos was to produce from these displayed steps a proof that contains all of these, but is entirely formed by the more fine-grained CD steps. Otter succeeded, finding a proof of length (i.e., compacted size) 36. Actually, our proof (Fig. 12) is another such completion, but was obtained without proof search just from a detailed transcription of Łukasiewicz ’s presentation, as described in Sect. 5.1. Its compacted size is 34.
The problem of deriving Syll from Łukasiewicz and Det entered the TPTP as LCL038-1. Its first documented difficulty rating in TPTP version 2.0.0, 1997, is 1.00, meaning that the problem is hard because no state-of-the-art ATP system in a specific sense [65] can solve it. A value of 0.00, meaning that the problem is easy, or all “state-of-the-art ATP systems” can solve the problem, first appeared with version 3.2.0 in 2006. Since then the difficulty rating fluctuated between 0.00 and 0.81. Its current value in version 9.0.0 is 0.60.
According to the ProblemAndSolutionStatistics file of TPTP 9.0.0 from 2024 the two well-known powerful provers E [60] and Vampire [29] fail on it in their recent versions 3.2.0 and 4.9, respectively. Nevertheless, in earlier versions they succeed, as documented in the ProblemAndSolutionStatistics file of TPTP 7.5.0 from 2021 and replicable with versions downloadable from the systems’ Web pages.Footnote 21E 2.6Footnote 22 finds a proof with 88 steps and Vampire 4.5.1Footnote 23 a proof with 148 steps (in both cases not counting the three initial clauses as steps). It is not evident how these proofs would be translated to CD proofs and thus how their size actually compares to that of the human proofs. For a rough estimate, however, we can observe that the compacted size, which is 32 and 31 for the proof by Łukasiewicz and Meredith’s variation, respectively, is the exact number of positive hyperresolution steps to build the proof. If the hyperresolution is modeled by binary resolution, the number of steps doubles to 64 or 62, respectively.
For the goal-driven first-order provers such as leanCoP [50], SETHEO [33] or PTTP [63], which may described as based on clausal tableaux [31], the CM [4, 8] or model elimination [36], the problem remains out of reach. This is not surprising, given that these systems in essence enumerate tree structures whose size is linearly related to the tree size of D-terms, 435 and 491 for Łukasiewicz ’s proof and Meredith’s variation, respectively, and 64 as currently known smallest value (Sect. 6.3). The only known solutions of the problem with this approach are with a recent generalization where the goal-driven structure enumeration is interwoven with heuristically restricted axiom-driven structure enumeration [76]. We will discuss a proof obtained in this way below in Sect. 6.3.
6.2 Prover9’s Proofs and Reductions by Replacing Subproofs
Prover9, like OTTER [42], succeeds on LCL038-1. Moreover, by default it applies positive hyperresolution to CD problems, where proofs directly translate to CD proofs, that is, D-terms. It appears that in applications with axiomatizations of logics it is often desired to have CD proofs in contrast to arbitrary resolution proofs [72]. CD Tools [74], a SWI Prolog library to support experimenting with CD, provides a conversion of Prover9 ’s hyperresolution proofs to D-terms. This is implemented using Prooftrans, a proof conversion tool, which comes with Prover9. The availability of Prover9 ’s proofs as D-terms permits to compare their dimensions with those of the human proofs and to experiment with the reductions introduced in Sect. 4.3.
Prover9 in default settings returns for LC038-1 different proofs, although of roughly similar size, depending on whether in the clause Det the major premise appears before the minor premise, as in the original TPTP problem file, or Det is reordered such that the major premise appears after the minor premise.Footnote 24 Tables 6 and 7 show properties of the respective proofs: (1) in its original form as obtained from Prover9; (2) after n-simplification (Definition 51); (3) and (4) after exhaustively applying S-reduction (47.iii) and C-reduction (50.ii), respectively, to (2); and (5) after applying C-reduction to (3). The proofs (4) and (5) within each table are identical.
The shown properties are as those specified in Sect. 5.2 with the following additions. \({{\textbf {DX}}}\) is the SC size (Definition 35) of the D-term. FT\(_{ Max }\) and FH\(_{ Max }\) are the maximal values of \({{\textbf {FT}}}\) and \({{\textbf {FH}}}\) among all subproofs of the given proof, i.e., the maximal tree size and maximal height of the MGT of a subproof. Red. indicates the number of reduction steps performed to obtain the proof as described in the Source of the D-term column. Specifically, for n-simplification Red. shows the number of occurrences of \(\textrm{n}\) in the D-term and for S- and C-reduction it shows the actual number of rewriting steps according to Definitions 47.iii and 50.ii, respectively.
We also experimented with configuring Prover9 such that it continues to search for further proofs after a proof was found, but this did not lead to finding a second proof within several minutes. In another experiment we tried Prover9 with increasing values of max_depth, which limits FH\(_{ Max }\). The lowest number where it succeeds is 7, corresponding in our scale, not counting the predicate, to term height 6. The prover then succeeds very quickly, in 7 s, compared to 44 s without max_depth restriction, but the proofs are larger, with compacted size 110 (tree size 315,246, height 50) if the major premise of Det appears after the minor premise, and compacted size 131 (tree size 400,792, height 50) if it appears before. Also the value of FT\(_{ Max }\) with 14 is in both cases larger.
The most striking values in Tables 6 of 7 are the vast tree sizes DT of the original proofs, which are drastically reduced by n-simplification. It is not clear whether this apparent redundancy has a negative effect on proof search.
Actually, tree size seems to be not much taken into consideration in the context of resolution. Being closely related to the multiplicity of a clause in a proof, it may be seen as a fundamental measure for clausal tableaux with rigid variables. While it is considered by Veroff as CDcount in the investigation of finding shortest proofs [71], it is, in contrast to compacted size and height, not even mentioned in a CD-related work by Wos [82]. On the other hand, it appears that compacted size—underlying DAGs as proof structures—is considered in the context of clausal tableaux only rarely, for example in [15, 75]. The deeper reason for these preferences lies in the fact that any resolvent may be regarded as a lemma. The use of lemmas leads to DAGs, hence the focus on these in resolution.
6.3 PSP Level Enumeration and a Short Proof
Column DS in Tables 2 and 4 shows that steps in the human-made proofs can often be described in a proof-structural way as a D-term \(\textsf{D}(d,d')\) where either d is the proof of some previously proven lemma and \(d'\) is a subterm of d, or vice versa. The question is then whether this observed pattern can be turned into a proof construction method that is useful for proof search. As a basis for such a method we define an inductive characterization of sets of D-terms by PSP level, with “PSP” suggesting “Proof-SubProof”.
Definition 52
We assume a singleton set \( \mathcal{D}\mathcal{P}\hspace{-0.12em}rim = \{1\}\) of primitive D-terms. For natural numbers \(n \ge 0\), the PSP level of n, in symbols \( \mathcal {PSPL}\hspace{-0.05em}evel (n)\), is a set of D-terms specified inductively as
-
1.
.
- 2.
Assuming a procedure that enumerates the subterms of a given D-term, we can associate with Definition 52 straightforwardly a procedure that enumerates D-terms interwoven with unification in an axiom-driven way for increasing PSP levels. The procedure may be improved by caching computed PSP levels instead of recomputing them.
PSP levels are disjoint. All D-terms in PSP level n have compacted size n. However, the cardinality of D-terms at PSP level n grows slower than that of D-terms of compacted size n, according to the sequence oeis:A001147 [49] of integers in contrast to oeis:A254789. Table 8 shows the initial values of both sequences. It follows that the enumeration of D-terms according to the PSP level is “incomplete”, that is, there are D-terms that are not a member of any PSP level.
Enumeration by PSP level is not just growing slower than by compacted size, but also apparently simpler to realize. In contrast to DAG enumeration based on variations of the value-number method [1, 75], enumeration by PSP level does not require an interplay of rigid variables with copies of MGTs [75] or forgetting of variables [15]. For enumeration by PSP level it is straightforward to maintain just MGTs.
Most importantly for proof search, the maintenance of MGTs permits simple incorporation of heuristic restrictions based on their properties as discussed in Sect. 5.5. This includes discarding D-terms whose MGT dimensions exceed configured thresholds, discarding D-terms whose MGT already appeared as MGT of a D-term produced earlier in the enumeration, and limiting the overall size of cached solutions by deleting entries according to heuristic criteria based on properties of the MGTs.
Experiments showed that the enumeration of D-terms by PSP level indeed succeeds on many CD problems. For problems with more than a single axiom, the definition of \( \mathcal {PSPL}\hspace{-0.05em}evel (n+1)\) was there extended to include also \(\textsf{D}(d,a)\) and \(\textsf{D}(a,d)\) for \(d \in \mathcal {PSPL}\hspace{-0.05em}evel (n)\) and arbitrary axiom identifiers \(a \in \mathcal{D}\mathcal{P}\hspace{-0.12em}rim \), not just those occurring in d. SGCD [76] can operate with enumeration by PSP level. In five such configurations with different heuristic restrictions, SGCD enumeration succeeded for 153 of the 196 “basic” CD problems in TPTP 8.0.0,Footnote 25 [74]. Among the 196 problems of the corpus there are 189 rated \(< 1.00\). Among the 153 solutions obtained with enumeration by PSP level there are 12 problems rated 0.25 and two rated 0.50.Footnote 26 The proofs obtained with enumeration by PSP level tend to have small compacted size, also for problems where exhaustive enumeration by compacted size to find a proof with minimal compacted size appears not feasible. The CCS system [75], for example, succeeds in finding solutions with minimal compacted size for only 86 problems.Footnote 27
Lemmas obtained from SGCD with enumeration by PSP level can substantially increase the performance of first-order provers, including the leading system Vampire [29], on CD problems [54]. Moreover, LCL073-1, a problem known as really hard for automated provers, can be solved by SGCD in a setting based on enumeration by PSP level [54]. SGCD is invoked there twice, for lemma generation by PSP level and for proving with a combination of enumeration by PSP level and by height. Both phases use different heuristic restrictions. The problem is rated 1.00, continuously since ratings were introduced in the TPTP in 1997. Mechanically, it was so far proven only once, in 2000 by Wos [83] with transferring outputs and insights between several invocations of OTTER.
For deriving Syll from Łukasiewicz, problem LCL038-1, SGCD with enumeration by PSP level finds in a few seconds a proof that is substantially smaller than the proof by Łukasiewicz and its variation by Meredith: The proof has compacted size 22, tree size 64 and height 22. Figure 14 shows it as a DAG.
This proof of Syll was supplemented with enumeration techniques to derive also Peirce and Simp [74]. We call the overall proof of the three goal theorems, whose compacted size is 29, \(D_{\textsf {29}}\). Figure 15 shows it in Meredith’s notation, where labeled intermediate steps are only introduced for nodes with multiple incoming edges. Figure 16 shows the corresponding label dependency ordering.
The criteria on combinator terms as formula names from footnote 19 (p. 58) lead for \(D_{\textsf {29}}\) to three additional “named” formulas, which are shown in Table 10. Properties of all subproofs of \(D_{\textsf {29}}\) are shown in Table 9, in analogy to Tables 2 and 4.
The small D-term size apparently comes for the price of a slight extension of the maximal size of MGTs: For \(D_{\textsf {29}}\) the maximal value of FT is 17 and the maximal value of FH is 7, compared to 15 and 6 for \(D_{\textsf {MER}}\) and , respectively. Subproof 26 is the only one where the DS column has an empty value, indicating that it cannot be obtained by a PSP induction step from some subproof appearing at a row further above. Subproof 26 does not belong to the subproof of Syll, subproof 27, which was obtained purely by PSP level enumeration, but to the supplements to prove Peirce and Simp. Figure 14 shows just the proof of Syll as a DAG.
A further size reduction of our proof of LCL038-1 from Fig. 14 can be achieved with combinatory compression [75]: The tree grammar obtained from a grammar-based tree compression tool [35] for the proof can be converted to a generalized form of D-term that permits leaves labeled by combinators expressing proof structure transformations of the original D-term. It has compacted size 19, height 15, but tree size 119 [74].
7 Conclusion
Our leading motivation has been improving proof search in ATP by the incorporation of operations that are more global than extending a set of formulas by an inferred formula. A comparative analysis of proof systems seemed necessary to this end. Our focus here was on Meredith’s system known as condensed detachment (CD). For it we have elaborated a new formal reconstruction as a special case of the connection method (CM).
Our reconstruction preserves an important aspect of CD, the reification of proofs as terms, more specifically D-terms, which may be regarded as full binary trees. The underlying ATP model is the CM, where structures formed by connections attached to the formula provide the key concept. D-terms then are one way to represent such structures for problems of a certain restricted class.
The incorporation of lemmas belongs to the key global operations for reducing the amount of search and the size of proofs. We specifically considered a form of lemmas that corresponds to the repeated use of a substructure—subtree or subterm—in a proof. Or, in other words, the interplay of trees as proof structures and their representation as DAGs. Lemmas are then characterized by way of D-terms, along with various measures and properties concerning the proof structure as well as the proven formulas.
The resulting formalism has opened the door towards enhancement of ATP systems by taking into account global features within the proof search, suggesting various techniques that are immediately applicable in practice. First experiments on the restricted kind of problems considered in the paper, which, among others, include the 196 CD problems in the TPTP problem collection, are promising and encourage future work.
On the basis of the formalism we analyzed and compared the remarkable historic proofs by Łukasiewicz and Meredith of a problem stated by the former. The problems played a historic role also in ATP, which is surveyed in the paper. However, an in-depth analysis of its human-made proofs has not been undertaken before. In a particular experiment we “learned” from the human-made proofs by converting an observed structural feature into a novel method for proof search. It finds short proofs for many problems for which a systematic search for shortest proofs appears unfeasible. In particular, for Łukasiewicz’s problem, it quickly yields a particularly short proof, shorter than the human-made model proofs, and drastically shorter than all known proofs by ATP systems.
In the longer run, our approach lends itself towards supporting ATP by machine learning (see, e.g., [17, 26, 54]). This is because the reification of proof structures provides information that can be exploited in the learning process and is not available within other ATP approaches.
Notes
The CD Tools system is available from http://cs.christophwernhard.com/cdtools. Supplementary material specific for the paper and to reproduce the experimental results is provided at http://cs.christophwernhard.com/cdtools/exp-investigations/.
The table http://cs.christophwernhard.com/cdtools/exp-lemmas/lemmas.html indicates the state of the art in ATP with respect to the CD problems in the TPTP—taking into account results that already emerged from this foundational work.
See Sect. 6.1.
As noted by Prior [52], the Dublin logic school with Prior and Meredith contracted from Łukasiewicz the habit of referring to various key formulas by proper names, in some cases by names used in the Principia Mathematica, for example Simp for \( CpCqp \) and Syll for \( CCpqCCqrCpr \), and in other cases by the names of logicians associated with the formulas. Thus \( CCCpqpp \) is called Peirce and \( CCCpqrCCrpCsp \) is called Łukasiewicz. Tables of such formula nicknames are provided by Prior [53, p. 319] and Dolph Ulrich [67]. See also Sect. 5.3.2.
Łukasiewicz ’s paper is also reproduced in his Selected Works [39, pp. 295–205], however with a typo in the proof: The substitution of thesis 18 reads r/CCrCsp instead of the correct r/CCrpCsp.
This theorem has been chosen as proof goal just because it has a proof that is suitable to illustrate the interplay of the considered proof representations.
Note that the separate display of these instances of Det is only for a better understanding of the reader but not a feature of the CM, which rather involves instead indexed connections for the formula given in Fig. 4.
This has, e.g., been implemented in the CCS CD reasoner [75].
First-order logic permits to encode a non-Horn problem as a Horn problem.
There are, however, relationships to equality. It is well-known that equality can be axiomatized by Horn clauses expressing reflexivity, symmetry, transitivity and substitutivity. It is also possible to encode Horn problems as purely equational problems [11], where, e.g., LCL038-10 is an equational variation of LCL038-1 (Syll from Łukasiewicz). Some CD problems, e.g., LCL006-1, are about axiomatizations of an equivalential calculus.
We use subtree with the meaning common in computer science and matching the notion of subterm: A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T.
We took the name compacted size from Flajolet, Sipala and Steyaert [19].
Properties of such binary DAGs for the special case of a single root and a single leaf have been recently investigated by Genitrini et al. [22], where they are called compacted trees.
The inaccuracy observed by Hindley and David Meredith [25] in early formalizations of CD based on the notion of most general unifier can be attributed to disregarding the requirement \( \mathcal {D}\hspace{-0.08em}om (\sigma ) \cup \mathcal{V}\mathcal{R}\hspace{-0.02em}ng (\sigma ) \subseteq \mathcal {V}\hspace{-0.11em}ar (M)\) of the clean property.
The use of the symbol here is an adaptation of , which stands for s subsumes t (Sect. 3.1).
The precise restriction was to terms formed from up to five occurrences of the well-known combinators \({{{\textbf {{\textsf {I}}}}}},{{{\textbf {{\textsf {K}}}}}},{{{\textbf {{\textsf {B}}}}}},{{{\textbf {{\textsf {C}}}}}},{{{\textbf {{\textsf {S}}}}}},{{{\textbf {{\textsf {W}}}}}}\). Combinators \({{{\textbf {{\textsf {S}}}}}}\) and \({{{\textbf {{\textsf {W}}}}}}\) and terms with five occurrences were not among the characterizations of the considered MGTs.
Cases where the compaction ordering applied only non-strictly did not occur in the investigated proofs.
http://www.eprover.org/ and https://vprover.github.io/, accessed Jan 15, 2023.
Invoked with flags-s –print-statistics –proof-object=1.
Invoked with flags –time_limit 600 –mode casc.
Provers that are more sensitive to the ordering of literals in a clause typically determine this ordering on the basics of heuristics, independently from the ordering in the input, e.g., [33, Sect. 5.3].
These “basic” CD problems are all CD problems in TPTP 8.0.0 with exception of two with status satisfiable five with a form of detachment that is based on implication represented by disjunction and negation, and three with a non-atomic goal theorem.
For details, see http://cs.christophwernhard.com/cdtools/exp-tptpcd-2022-07/table_4.html.
Details are included in the table referenced in footnote 26.
References
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers—Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)
Astrachan, O.L., Stickel, M.E.: Caching and lemmaizing in model elimination theorem provers. In: Kapur, D. (ed.) CADE-11, pp. 224–238. Springer, Berlin (1992). https://doi.org/10.1007/3-540-55602-8_168
Baumgartner, P., Furbach, U., Niemelä, I.: Hyper tableaux. In: Alferes, J.J., Pereira, L.M., Orlowska, E. (eds.) JELIA’96. LNCS (LNAI), vol. 1126, pp. 1–17. Springer, Berlin (1996).https://doi.org/10.1007/3-540-61630-6_1
Bibel, W.: Automated Theorem Proving. Vieweg, Braunschweig (1982).https://doi.org/10.1007/978-3-322-90102-6. Second edition 1987
Bibel, W.: Deduction: Automated Logic. Academic Press, London (1993)
Bibel, W.: Comparison of proof methods. In: Otten, J., Bibel, W. (eds.) AReCCa 2023. CEUR Workshop Proc., vol. 3613, pp. 119–132. CEUR-WS.org, Aachen (2024)
Bibel, W.: A conjecture for ATP research. CoRR abs/2403.10334 (2024). https://doi.org/10.48550/2403.10334
Bibel, W., Otten, J.: From Schütte’s formal systems to modern automated deduction. In: Kahle, R., Rathjen, M. (eds.) The Legacy of Kurt Schütte, pp. 215–249. Springer, Cham (2020). Chap. 13. https://doi.org/10.1007/978-3-030-49424-7_13
Bull, R., Cubrinovska, A.: Interview with Robert Bull. Popper and prior in New Zealand. http://popper-prior.nz/items/show/255. Accessed 9 Jul 2024 (2018)
Bunder, M.W.: A simplified form of condensed detachment. J. Log. Lang. Inf. 4(2), 169–173 (1995). https://doi.org/10.1007/BF01048619
Claessen, K., Smallbone, N.: Efficient encodings of first-order Horn formulas in equational logic. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900, pp. 388–404. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94205-6_26
Dershowitz, N., Jouannaud, J.: Notations for rewriting. Bull. EATCS 43, 162–174 (1991)
Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. JACM 27(4), 758–771 (1980). https://doi.org/10.1145/322217.322228
Eder, E.: Properties of substitutions and unification. J. Symb. Comput. 1(1), 31–46 (1985). https://doi.org/10.1016/S0747-7171(85)80027-4
Eder, E.: A comparison of the resolution calculus and the connection method, and a new calculus generalizing both methods. In: Börger, E., Kleine Büning, H., Richter, M.M. (eds.) CSL ’88. LNCS, vol. 385, pp. 80–98. Springer, Berlin (1989). https://doi.org/10.1007/BFb0026296
Eder, E.: Relative Complexities of First Order Calculi. Vieweg, Braunschweig (1992). https://doi.org/10.1007/978-3-322-84222-0
Färber, M., Kaliszyk, C., Urban, J.: Machine learning guidance for connection tableaux. J. Autom. Reason. 65(2), 287–320 (2021). https://doi.org/10.1007/s10817-020-09576-7
Fitelson, B., Wos, L.: Missing proofs found. J. Autom. Reason. 27(2), 201–225 (2001). https://doi.org/10.1023/A:1010695827789
Flajolet, P., Sipala, P., Steyaert, J.: Analytic variations on the common subexpression problem. In: ICALP90. LNCS, vol. 443, pp. 220–234. Springer, Berlin (1990). https://doi.org/10.1007/BFb0032034
Fuchs, M.: Lemma generation for model elimination by combining top-down and bottom-up inference. In: Dean, T. (ed.) IJCAI 1999, pp. 4–9. Morgan Kaufmann, San Francisco, CA (1999). http://ijcai.org/Proceedings/99-1/Papers/001.pdf
Fuchs, D., Fuchs, M.: CODE: A powerful prover for problems of condensed detachment. In: McCune, W. (ed.) CADE-14, pp. 260–263. Springer, Berlin (1997). https://doi.org/10.1007/3-540-63104-6_25
Genitrini, A., Gittenberger, B., Kauers, M., Wallner, M.: Asymptotic enumeration of compacted binary trees of bounded right height. J. Comb. Theory Ser. A 172, 105177 (2020). https://doi.org/10.1016/j.jcta.2019.105177
Hähnle, R.: Tableaux and related methods. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, Chap. 3, vol. 1, pp. 101–178. Elsevier, Amsterdam (2001). https://doi.org/10.1016/b978-044450813-3/50005-9
Hindley, J.R.: Basic Simple Type Theory. Cambridge University Press, Cambridge (1997). https://doi.org/10.1017/CBO9780511608865
Hindley, J.R., Meredith, D.: Principal type-schemes and condensed detachment. J. Symb. Log. 55(1), 90–105 (1990). https://doi.org/10.2307/2274956
Jakubuv, J., Chvalovský, K., Olsák, M., Piotrowski, B., Suda, M., Urban, J.: ENIGMA anonymous: symbol-independent inference guiding machine (system description). In: Peltier, N., Sofronie-Stokkermans, V. (eds.) IJCAR 2020. LNCS (LNAI), vol. 12167, pp. 448–463. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51054-1_29
Kalman, J.A.: Condensed detachment as a rule of inference. Stud. Log. 42, 443–451 (1983). https://doi.org/10.1007/BF01371632
Knuth, D.E.: The Art of Computer Programming: Volume 1/Fundamental Algorithms. Addison-Wesley, Reading, MA (1968)
Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-39799-8_1
Lemmon, E.J., Meredith, C.A., Meredith, D., Prior, A.N., Thomas, I.: Calculi of pure strict implication. In: Davis, J.W., Hockney, D.J., Wilson, W.K. (eds.) Philosophical Logic, pp. 215–250. Springer, Dordrecht (1969).https://doi.org/10.1007/978-94-010-9614-0_17. Reprint of a technical report, Canterbury University College, Christchurch, 1957
Letz, R.: Tableau and connection calculi. structure, complexity, implementation. Habilitationsschrift, TU München (1999). https://web.archive.org/web/20230604101128/https://www2.tcs.ifi.lmu.de/~letz/habil.ps. Accessed 9 Jul 2024
Letz, R., Mayr, K., Goller, C.: Controlled integration of the cut rule into connection tableaux calculi. J. Autom. Reason. 13(3), 297–337 (1994)
Letz, R., Schumann, J., Bayerl, S., Bibel, W.: SETHEO: a high-performance theorem prover. J. Autom. Reason. 8(2), 183–212 (1992). https://doi.org/10.1007/BF00244282
Lohrey, M.: Grammar-based tree compression. In: Potapov, I. (ed.) DLT 2015. LNCS, vol. 9168, pp. 46–57. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21500-6_3
Lohrey, M., Maneth, S., Mennicke, R.: XML tree structure compression using RePair. Inf. Syst. 38(8), 1150–1167 (2013). https://doi.org/10.1016/j.is.2013.06.006
Loveland, D.W.: Automated Theorem Proving: A Logical Basis. North-Holland, Amsterdam (1978)
Łukasiewicz, J.: The shortest axiom of the implicational calculus of propositions. In: Proc. of the Royal Irish Academy, vol. 52, Sect. A, No. 3, pp. 25–33 (1948). http://www.jstor.org/stable/20488489
Łukasiewicz, J.: Elements of Mathematical Logic. Pergamon Press, Oxford,: English translation of the second edition (1958) of Elementy logiki matematycznej. PWM, Warszawa (1963)
Łukasiewicz, J.: Selected Works. North-Holland, Amsterdam (1970)
Łukasiewicz, J., Tarski, A.: Untersuchungen über den Aussagenkalkül. Comptes rendus des séances de la Soc. d. Sciences et d. Lettres de Varsovie 23 (1930). English translation in [39], pp. 131–152
Lusk, E.L., McCune, W.W.: Experiments with ROO, a parallel automated deduction system. In: Fronhöfer, B., Wrightson, G. (eds.) Parallelization in Inference Systems. LNCS (LNAI), vol. 590, pp. 139–162. Springer, Berlin (1992). https://doi.org/10.1007/3-540-55425-4_6
McCune, W.: OTTER 3.3 Reference Manual. Technical Report ANL/MCS-TM-263, Argonne National Laboratory (2003). https://www.cs.unm.edu/~mccune/otter/Otter33.pdf. Accessed 9 Jul 2024
McCune, W.: Prover9 and Mace4. http://www.cs.unm.edu/~mccune/prover9. Accessed 9 Jul 2024 (2005–2010)
McCune, W., Wos, L.: Experiments in automated deduction with condensed detachment. In: Kapur, D. (ed.) CADE-11. LNCS (LNAI), vol. 607, pp. 209–223. Springer, Berlin (1992). https://doi.org/10.1007/3-540-55602-8_167
Megill, N.D.: A finitely axiomatized formalization of predicate calculus with equality. Notre Dame J. Formal Logic 36(3), 435–453 (1995). https://doi.org/10.1305/ndjfl/1040149359
Megill, N., Wheeler, D.A.: Metamath: A Computer Language for Mathematical Proofs, 2nd edn. lulu.com (2019). https://us.metamath.org/downloads/metamath.pdf
Meredith, D.: In memoriam: Carew Arthur Meredith (1904–1976). Notre Dame J. Formal Logic 18(4), 513–516 (1977). https://doi.org/10.1305/ndjfl/1093888116
Meredith, C.A., Prior, A.N.: Notes on the axiomatics of the propositional calculus. Notre Dame J. Formal Logic 4(3), 171–187 (1963). https://doi.org/10.1305/ndjfl/1093957574
OEIS Foundation Inc.: The on-line encyclopedia of integer sequences (2022). http://oeis.org
Otten, J.: Restricting backtracking in connection calculi. AI Commun. 23(2–3), 159–182 (2010). https://doi.org/10.3233/AIC-2010-0464
Pfenning, F.: Single axioms in the implicational propositional calculus. In: Lusk, E., Overbeek, R. (eds.) CADE-9. LNCS (LNAI), vol. 310, pp. 710–713. Springer, Berlin (1988). https://doi.org/10.1007/BFb0012869
Prior, A.N.: Logicians at play; or Syll, Simp and Hilbert. Australas. J. Philos. 34(3), 182–192 (1956). https://doi.org/10.1080/00048405685200181
Prior, A.N.: Formal Logic, 2nd edn. Clarendon Press, Oxford (1962). https://doi.org/10.1093/acprof:oso/9780198241560.001.0001
Rawson, M., Wernhard, C., Zombori, Z., Bibel, W.: Lemmas: Generation, selection, application. In: Ramanayake, R., Urban, J. (eds.) TABLEAUX 2023. LNCS (LNAI), vol. 14278, pp. 153–174. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43513-3_9. Extended version: https://arxiv.org/abs/2303.05854
Rezus, A.: On a theorem of Tarski. Lib. Math. 2, 63–97 (1982)
Rezuş, A.: Tarski singleton bases: 1925-1932 (on an allegedly lost ‘method of proof’ of Alfred Tarski) (2019). In: Witness Theory—Notes on \(\lambda \)-calculus and Logic. Studies in Logic, vol. 84, pp. 227–243. College Publications, London (2020). Preprint (2019). https://doi.org/10.13140/RG.2.2.10955.34081
Rezuş, A.: Tarski’s Claim thirty years later (2010). In: Witness Theory—Notes on \(\lambda \)-Calculus and Logic. Studies in Logic, vol. 84, pp. 217–225. College Publications, London (2020). Preprint (2016). http://www.equivalences.org/editions/proof-theory/ar-tc-20160512.pdf
Rezuş, A.: Witness Theory—Notes on \(\lambda \)-Calculus and Logic. Studies in Logic, vol. 84. College Publications, London (2020)
Robinson, J.A.: A machine-oriented logic based on the resolution principle. JACM 12(1), 23–41 (1965)
Schulz, S., Cruanes, S., Vukmirović, P.: Faster, higher, stronger: E 2.3. In: Fontaine, P. (ed.) CADE 27. LNAI, pp. 495–507. Springer, Cham (2019).https://doi.org/10.1007/978-3-030-29436-6_29
Schumann, J.M.P.: DELTA—a bottom-up preprocessor for top-down theorem provers. In: CADE-12. LNCS (LNAI), vol. 814, pp. 774–777. Springer, Berlin (1994). https://doi.org/10.1007/3-540-58156-1_58
Sobocínski, B.: Z badań nad teorią dedukcji. Przegląd Filozoficzny 35, 171–193 (1932). Excerpts translated into English and edited by A. Rezuş are published as [58, p. 257-268 (Appendix: Bolesław Sobocínski 1932, §1)]
Stickel, M.E.: A Prolog technology theorem prover: implementation by an extended Prolog compiler. J. Autom. Reason. 4(4), 353–380 (1988). https://doi.org/10.1007/BF00297245
Sutcliffe, G.: The TPTP problem library and associated infrastructure. From CNF to TH0, TPTP v6.4.0. J. Autom. Reason. 59(4), 483–502 (2017). https://doi.org/10.1007/s10817-017-9407-7
Sutcliffe, G., Suttner, C.: Evaluating general purpose automated theorem proving systems. AI 131(1–2), 39–54 (2001). https://doi.org/10.1016/S0004-3702(01)00113-8
Thomas, I.: Final word on a shortest implicational axiom. Notre Dame J. Formal Logic 11(1), 16 (1970)
Ulrich, D.: A legacy recalled and a tradition continued. J. Autom. Reason. 27(2), 97–122 (2001). https://doi.org/10.1023/A:1010683508225
Ulrich, D.: Sentential Calculi Pages. https://web.ics.purdue.edu/~dulrich/Home-page.htm. Accessed 9 Jul 2024 (2007)
Ulrich, D.: Single axioms and axiom-pairs for the implicational fragments of R, R-Mingle, and some related systems. In: Bimbó, K. (ed.) J. Michael Dunn on Information Based Logics. Outstanding Contributions to Logic, vol. 8, pp. 53–80. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29300-4_4
Veroff, R.: Using hints to increase the effectiveness of an automated reasoning program: case studies. J. Autom. Reason. 16(3), 223–239 (1996). https://doi.org/10.1007/BF00252178
Veroff, R.: Finding shortest proofs: an application of linked inference rules. J. Autom. Reason. 27(2), 123–139 (2001). https://doi.org/10.1023/A:1010635625063
Veroff, R.: Challenge problems with condensed detachment (2011). https://www.cs.unm.edu/~veroff/CD/. Accessed 9 Jul 2024
Walsh, M., Fitelson, B.: Answers to some open questions of Ulrich and Meredith (2021). Under review. http://fitelson.org/walsh.pdf. Accessed 9 Jul 2024
Wernhard, C.: CD Tools—condensed detachment and structure generating theorem proving (system description). CoRR abs/2207.08453 (2022). https://doi.org/10.48550/arXiv.2207.08453
Wernhard, C.: Generating compressed combinatory proof structures—an approach to automated first-order theorem proving. In: Konev, B., Schon, C., Steen, A. (eds.) PAAR 2022. CEUR Workshop Proc., vol. 3201. CEUR-WS.org, Aachen (2022). https://arxiv.org/abs/2209.12592
Wernhard, C.: Structure-generating first-order theorem proving. In: Otten, J., Bibel, W. (eds.) AReCCa 2023. CEUR Workshop Proc., vol. 3613, pp. 64–83. CEUR-WS.org, Aachen (2024). https://ceur-ws.org/Vol-3613/AReCCa2023_paper5.pdf
Wernhard, C., Bibel, W.: Learning from Łukasiewicz and Meredith: investigations into proof structures. In: Platzer, A., Sutcliffe, G. (eds.) CADE 28. LNCS (LNAI), vol. 12699, pp. 58–75. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79876-5_4
Wernhard, C., Bibel, W.: Learning from Łukasiewicz and Meredith: investigations into proof structures (extended version). CoRR abs/2104.13645 (2021). https://doi.org/10.48550/arXiv.2104.13645
Wielemaker, J., Schrijvers, T., Triska, M., Lager, T.: SWI-Prolog. Theory Pract. Logic Program. 12(1–2), 67–96 (2012). https://doi.org/10.1017/S1471068411000494
Wos, L.: Automated reasoning and Bledsoe’s dream for the field. In: Boyer, R.S. (ed.) Automated Reasoning: Essays in Honor of Woody Bledsoe. Automated Reasoning Series, pp. 297–345. Kluwer Academic Publishers, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3488-0_15
Wos, L.: The resonance strategy. Comput. Math. Appl. 29(2), 133–178 (1995). https://doi.org/10.1016/0898-1221(94)00220-F
Wos, L.: The power of combining resonance with heat. J. Autom. Reason. 17(1), 23–81 (1996). https://doi.org/10.1007/BF00247668
Wos, L.: Conquering the Meredith single axiom. J. Autom. Reason. 27(2), 175–199 (2001). https://doi.org/10.1023/A:1010691726881
Wos, L., Winker, S., McCune, W., Overbeek, R., Lusk, E., Stevens, R., Butler, R.: Automated reasoning contributes to mathematics and logic. In: Stickel, M.E. (ed.) CADE-10, pp. 485–499. Springer, Berlin (1990). https://doi.org/10.1007/3-540-52885-7_109
Acknowledgements
We thank Michael Rawson and Zsolt Zombori as well as anonymous reviewers of CADE 2021 and of JAR for helpful comments and suggestions that led to significant improvements of the presentation. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID 457292495. The work was supported by the North-German Supercomputing Alliance (HLRN).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wernhard, C., Bibel, W. Investigations into Proof Structures. J Autom Reasoning 68, 24 (2024). https://doi.org/10.1007/s10817-024-09711-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10817-024-09711-8