Proof Theory and Philosophy

proof theory
& philosophy
Greg Restall
Philosophy Department
University of Melbourne
restall@unimelb.edu.au
http://consequently.org/writing/ptp
version of september 18, 2006

c
greg restall
Comments on this work in progress are most welcome.
Visit http://consequently.org/edit/page/Proof_Theory_and_Philosophy to give feedback.
A single paper copy may be made for study purposes.
Other rights reserved.
contents
1 Why Proof Theory? | 9
Part I Propositional Logic

2 Propositional Logic: Tools & Techniques | 15
2.1 Natural Deduction for Conditionals · 15
2.2 Sequents and Derivations · 53
2.3 From Proofs to Derivations and Back · 69
2.4 Circuits · 99
2.5 Counterexamples · 116
2.6 Proof Identity · 128
3 Propositional Logic: Applications | 129

3.1 Assertion and Denial · 129
3.2 Definition and Harmony · 138
3.3 Tonk and Non-Conservative Extension · 141
3.4 Meaning · 144
3.5 Achilles and the Tortoise · 147
3.6 Warrant · 147
3.7 Gaps and Gluts · 147
3.8 Realism · 147
Part II Quantifiers, Identity and Existence

4 Quantifiers: Tools & Techniques | 151
4.1 Predicate Logic · 151
4.2 Identity and Existence · 151
4.3 Models · 152
4.4 Arithmetic · 153
4.5 Second Order Quantification · 153
5 Quantifiers: Applications | 155

5.1 Objectivity · 155
5.2 Explanation · 155
5.3 Relativity · 155
5.4 Existence · 155
5.5 Consistency · 155
5.6 Second Order Logic · 155
2
[september 18, 2006]
Part III Modality and Truth

6 Modality and Truth: Tools and Techniques | 159
6.1 Simple Modal Logic · 159
6.2 Modal Models · 159
6.3 Quantified Modal Logic · 159
6.4 Truth as a Predicate · 159
7 Modality and Truth: Applications | 161

7.1 Possible Worlds · 161
7.2 Counterparts · 161
7.3 Synthetic Necessities · 161
7.4 Two Dimensional Semantics · 161
7.5 Truth and Paradox · 161
References | 163
3
introduction
This manuscript is a draft of a guided introduction to logic and its ap- I should like to outline an image
plications in philosophy. The focus will be a detailed examination of which is connected with the most
profound intuitions which I always
the different ways to understand proof. Along the way, we will also experience in the face of logistic.
take a few glances around to the other side of inference, the kinds of That image will perhaps shed more
counterexamples to be found when an inference fails to be valid. light on the true background of
that discipline, at least in my case,
than all discursive description could.
The book is designed to serve a number of different purposes, and it Now, whenever I work even on the
can be used in a number of different ways. In writing the book I have least significant logistic problem,
at least these four aims in mind. for instance, when I search for the
shortest axiom of the implicational
propositional calculus I always have
a gentle introduction to key ideas in the theory of proof: The the impression that I am facing a
powerful, most coherent and most
literature on proof theory contains some very good introductions to resistant structure. I sense that struc-
the topic. Bostock’s Intermediate Logic [9], Tennant’s Natural Lo- ture as if it were a concrete, tangible
gic [87], Troelstra and Schwichtenberg’s Basic Proof Theory [91], and object, made of the hardest metal,
a hundred times stronger than steel
von Plato and Negri’s Structural Proof Theory [56] are all excellent
and concrete. I cannot change any-
books, with their own virtues. However, they all introduce the core thing in it; I do not create anything
ideas of proof theory in what can only be described as a rather complic- of my own will, but by strenuous
ated fashion. The core technical results of proof theory (normalisation work I discover in it ever new details
and arrive at unshakable and eternal
for natural deduction and cut elimination for sequent systems) are rel- truths. Where is and what is that
atively simple ideas at their heart, but the expositions of these ideas in ideal structure? A believer would say
the available literature are quite difficult and detailed. This is through that it is in God and is His thought.
— Jan Łukasiewicz
no fault of the existing literature. It is due to a choice. In each book,
a proof system for the whole of classical or intuitionistic logic is in-
troduced, and then, formal properties are demonstrated about such a
system. Each proof system has different rules for each of the connect-
ives, and this makes the proof-theoretical results such as normalisation
and cut elimination case-ridden and lengthy. (The standard techniques
are complicated inductions with different cases for each connective: the
more connectives and rules, the more cases.)
In this book, the exposition will be somewhat different. Instead
of taking a proof system as given and proving results about it, we will
first look at the core ideas (normalisation for natural deduction, and cut
elimination for sequent systems) and work with them in their simplest
and purest manifestation. In Section 2.1.2 we will see a two-page norm-
alisation proof. In Section 2.2.3 we will see a two-page cut-elimination
proof. In each case, the aim is to understand the key concepts behind
the central results.
an introduction to logic from a non-partisan, pluralist, proof-

theoretic perspective: We are able to take this liberal approach to
introducing proof theory because we take a pluralist attitude to the
choice of logical system. This book is designed to be an introduction
to logic that does not have a distinctive axe to grind in favour of a
5
particular logical system. Instead of attempting to justify this or that
formal system, we will give an overview of the panoply of different
accounts of consequence for which a theory of proof has something
interesting and important to say. As a result, in Chapter 2 we will
examine the behaviour of conditionals from intuitionistic, relevant and
linear logic. The system of natural deduction we will start off with is
well suited to them. In Chapter 2, we also look at a sequent system for
the non-distributive logic of conjunction and disjunction, because this
results in a very simple cut elimination proof. From there, we go on to
more rich and complicated settings, once we have the groundwork in
place.
an introduction to the applications of proof theory: We will

always have our eye on the kinds of concerns others have concerning
proof theory. What are the connections between proof theories and
theories of meaning? What does an account of proof tell us about how
I have in mind the distinction we might apply the formal work of logical theorising? All accounts of
between representationalist and in- meaning have something to say about the role of inference. For some,
ferentialist theories of meaning. For
a polemical and provocative ac- it is what things mean that tells you what inferences are appropriate.
count of the distinction, see Robert For others, it is what inferences are appropriate that constitutes what
Brandom’s Articulating Reasons [11]. things mean. For everyone, there is an intimate connection between
inference and semantics.
An accessible example of this a presentation of new results: Recent work in proofnets and other
work is Robinson’s “Proof techniques in non-classical logics like linear logic can usefully illumin-
nets for Classical Logic” [80].
ate the theory of much more traditional logical systems, like classical
logic itself. I aim to present these results in an accessible form, and
extend them to show how you can give a coherent picture of classical
and non-classical propositional logics, quantifiers and modal operators.
The book is filled with marginal notes which expand on and comment
on the central text. Feel free to read or ignore them as you wish, and
to add your own comments. Each chapter (other than this one) con-
tains definitions, examples, theorems, lemmas, and proofs. Each of
these (other than the proofs) are numbered consecutively, first with
the chapter number, and then with the number of the item within the
chapter. Proofs end with a little box at the right margin, like this:
The manuscript is divided into three parts and each part divides into
two chapters. The parts cover different aspects of logical vocabulary.
First, propositional logic; second, quantifiers, identity and existence;
third, modality and truth. In each part, the first chapter covers lo-
gical tools and techniques suited to the topic under examination. The
second chapter both discusses the issues that are raised in the tools
& techniques chapter, and applies these tools and techniques to differ-
ent issues in philosophy of language, metaphysics, epistemology, philo-
sophy of mathematics and elsewhere.
Each ‘Tools & Techniques’ chapter contains many exercises to complete.
Logic is never learned without hard work, so if you want to learn the
6 introduction
material, work through the exercises: especially the basic, intermedi-

ate and advanced ones. The project questions are examples of current
research topics.
The book has an accompanying website: http://consequently.org/
writing/ptp. From here you can look for an updated version of the
book, leave comments, read the comments others have left, check for
solutions to exercises and supply your own. Please visit the website
and give your feedback. Visitors to the website have already helped me
make this volume much better than it would have been were it written
in isolation. It is a delight to work on logic within such a community,
spread near and far.
Greg Restall
Melbourne
september 18, 2006
7
why proof theory?

1
Why? My first and overriding reason to be interested in proof the-
ory is the beauty and simplicity of the subject. It is one of the central
strands of the discipline of logic, along with its partner, model theory.
Since the flowering of the field with the work of Gentzen, many beau-
tiful definitions, techniques and results are to be found in this field,
and they deserve a wider audience. In this book I aim to provide an in-
troduction to proof theory that allows the reader with only a minimal
background in logic to start with the flavour of the central results, and
then understand techniques in their own right.
It is one thing to be interested in proof theory in its own right, or as a
part of a broader interest in logic. It’s another thing entirely to think
that proof theory has a role in philosophy. Why would a philosopher
be interested in the theory of proofs? Here are just three examples of
concerns in philosophy where proof theory finds a place.
example 1: meaning. Suppose you want to know when someone is

using “or” in the same way that you do. When does “or” in their
vocabulary have the same significance as “or” in yours? One answer
could be given in terms of truth-conditions. The significance of “or”
can be given as follows:
pp or qq is true if and only if ppq is true or pqq is true.
Perhaps you have seen this information presented in a truth-table.
p q p or q
0 0 0
0 1 1
1 0 1
1 1 1
Clearly, this table can be used to distinguish between some uses of

disjunctive vocabulary from others. We can use it to rule out exclusive
disjunction. If we take pp or qq to be false when we take ppq and pqq
to be both true, then we are using “or” in a manner that is at odds with
the truth table.
However, what can we say of someone who is ignorant of the truth
or falsity of ppq and of pqq? What does the truth table tell us about
pp or qq in that case? It seems that the application of the truth table to
our practice is less-than-straightforward.
It is for reasons like this that people have considered an alternate ex-
planation of a logical connective such as “or.” Perhaps we can say that
9
someone is using “or” in the way that you do if you are disposed to
make the following deductions to reason to a disjunction
p q
p or q p or q
and to reason from a disjunction
[p] [q]
· ·
· ·
· ·
p or q r r
r
That is, you are prepared to infer to a disjunction on the basis of either
disjunct; and you are prepared to reason by cases from a disjunction.
Is there any more you need to do to fix the use of “or”? That is, if you
and I both use “or” in a manner consonant with these rules, then is
there any way that our usages can differ with respect to meaning?
Clearly, this is not the end of the story. Any proponent of a proof-
first explanation of the meaning of a word such as “or” will need to
say something about what it is to accept an inference rule, and what
sorts of inference rules suffice to define a concept such as disjunction
(or negation, or universal quantification, and so on). When does a
definition work? What are the sorts of things that can be defined using
inference rules? What are the sorts of rules that may be used to define
these concepts? We will consider these issues in Chapter 3.
example 2: generality. It is a commonplace that it is impossible or

very difficult to prove a nonexistence claim. After all, if there is no
object with property F, then every object fails to have property F. How
can we demonstrate that every object in the entire universe has some
property? Surely we cannot survey each object in the universe one-by-
one. Furthermore, even if we come to believe that object a has prop-
erty F for each object a that happens to exist, it does not follow that
we ought to believe that every object has that property. The univer-
sal judgement tells us more than the truth of each particular instance
of that judgement, for given all of the objects a1 , a2 , . . ., it certainly
seems possible that a1 has property F, that a2 has property F and so
on, without everything having property F since it seems possible that
there might be some new object which does not actually exist. If you
care to talk of ‘facts’ then we can express the matter by saying that the
fact that everything is F cannot amount to just the fact that a1 is F and
the fact that a2 is F, etc., it must also include the fact that a1 , a2 , . . .
are all of the objects. There seems to be some irreducible universality
in universal judgements.
If this was all that we could say about universality, then it would
seem to be very difficult to come to universal conclusions. However, we
manage to derive universal conclusions regularly. Consider mathemat-
ics, it is not difficult to prove that every whole number is either even
or odd. We can do this without examining every number individually.
Just how do we do this?
10 why proof theory? · chapter 1

example 3: modality. A third example is similar. Philosophical dis-

cussion is full of talk of possibility and necessity. What is the signi-
ficance of this talk? What is its logical structure? One way to give
an account of the logical structure of possibility and necessity talk is
to analyse it in terms of possible worlds. To say that it is possible
that Australia win the World Cup is to say that there is some possible
world in which Australia wins the World Cup. Talk of possible worlds
helps clarify the logical structure of possibility and necessity. It is pos-
sible that either Australia or New Zealand win the World Cup only
if there’s a possible world in which either Australia or New Zealand
win the World Cup. In other words, either there’s a possible world in
which Australia wins, or a possible world in which New Zealand wins,
and hence, it is either possible that Australia wins the World Cup or
that New Zealand wins. We have reasoned from the possibility of a dis-
junction to the disjunction of the corresponding possibilities. Such an
inference seems correct. Is talk of possible worlds required to explain
this kind of step, or is there some other account of the logical structure
of possibility and necessity?
These are three examples of the kinds of issues that we will consider in
the light of proof theory. Before we can broach these topics, we need
to learn some proof theory. We will start with proofs for conditional
judgements.
11
part i
propositional logic
13
propositional logic:
tools & techniques
2
2.1 | natural deduction for conditionals
We start with modest ambitions. In this section we focus on one way of
understanding inference and proof—natural deduction, in the style of
Gentzen [33]—and we will consider just one kind of judgement: con- Gerhard Gentzen, German Logician:
ditionals. Conditional judgements are judgements of the form Born 1909, student of David Hil-
bert at Göttingen, died in 1945 in
World War II. http://www-groups.
If . . . then . . . dcs.st-and.ac.uk/~history/
Mathematicians/Gentzen.html
To make things precise, we will use a formal language in which we can
express conditional judgements. Our language will have an unending
supply of atomic formulas
p, q, r, p0 , p1 , p2 , . . . q0 , q1 , q2 , . . . r0 , r1 , r2 , . . .
When we need to refer to the collection of all atomic formulas, we will

call it ‘atom.’ Whenever we have two formulas A and B, whether A
and B are in atom or not, we will say that (A → B) is also a formula.
Succinctly, this grammar can be represented as follows: This is bnf, or “Backus Naur Form,”
first used in the specification of
formula ::= atom | (formula → formula) formal computer programming
languages such as algol. http:
That is, a formula is either an atom, or is found by placing an arrow //cui.unige.ch/db-research/
Enseignement/analyseinfo/
(‘→’) between two formulas, and surrounding the result with paren- AboutBNF.html
theses. So, these are formulas
p3 (q → r) ((p1 → (q1 → r1 )) → (q1 → (p1 → r1 ))) (p → (q → (r → (p1 → (q1 → r1 )))))
but these are not:
t p→q→r p→p
The first, t, fails to be a formula since is not in our set atom of atomic
formulas (so it doesn’t enter the collection of formulas by way of being
an atom) and it does not contain an arrow (so it doesn’t enter the collec-
tion through the clause for complex formulas). The second, p → q → r
does not enter the collection because it is short of a few parentheses. You can do without parentheses
The only expressions that enter our language are those that bring a if you use ‘prefix’ notation for the
conditional: ‘Cpq’ instead of ‘p → q’.
pair of parentheses along with every arrow: “p → q → r” has two The conditional are then CpCqr and
arrows but no parentheses, so it does not qualify. You can see why it CCpqr. This is Polish notation.
should be excluded because the expression is ambiguous. Does it ex-
press the conditional judgement to the effect that if p then if q then r,
or is it the judgement that if it’s true that if p then q, then it’s also true
that r? In other words, it is ambiguous between these two formulas:
(p → (q → r)) ((p → q) → r)
15
Our last example of an offending formula—p → p—does not offend
nearly so much. It is not ambiguous. It merely offends against the
letter of the law laid down, and not its spirit. I will feel free to use ex-
pressions such as “p → p” or “(p → q) → (q → r)” which are missing
their outer parentheses, even though they are, strictly speaking, not in
If you like, you can think of formula.
them as including their outer Given a formula containing at least one arrow, such as (p → q) →
parentheses very faintly, like
this: ((p → q) → (q → r)). (q → r), it is important to be able to isolate its main connective (the last
arrow introduced as it was constructed). In this case, it is the middle
arrow. The formula to the left of the arrow (in this case p → q) is said
to be the antecedent of the conditional, and the formula to the right is
the consequent (here, q → r).
We can think of these formulas in at least two different ways. We can
think of them as the sentences in a toy language. This language is
either something completely separate from our natural languages, or
it is a fragment of a natural language, consisting only of atomic ex-
pressions and the expressions you can construct using a conditional
construction like “if . . . then . . . ” On the other hand, you can think
of formulas as not constituting a language themselves, but as construc-
tions used to display the form of expressions in a language. Nothing
here will stand on which way you understand formulas. In either case,
we use the conditional p → q to represent the conditional proposition
with antecedent p and consequent q.
Sometimes, we will want to talk quite generally about all formulas of
a particular form. We will want to do this very often, when it comes
to logic, because we are interested in the structures or forms of valid
arguments. The structural or formal features of arguments apply gen-
erally, to more than just a particular argument. (If we know that an
argument is valid in virtue of its possessing some particular form, then
other arguments with that form are valid as well.) So, these formal or
structural principles must apply generally. Our formal language goes
some way to help us express this, but it will turn out that we will not
want to talk merely about specific formulas in our language, such as
(p3 → q7 ) → r26 . We will, instead, want to say things like
Given a conditional formula, and its antecedent, its con-

sequent follows.
This can get very complicated very quickly. It is not at all convenient
to say
Given a conditional formula whose consequent is also a

conditional, the conditional formula whose antecedent is
the antecedent of the consequent of the original condi-
tional, and whose consequent is a conditional whose ante-
cedent is the antecedent of the original conditional and
whose consequent is the consequent of the conditional in-
side the first conditional follows from the original condi-
tional.
16 propositional logic: tools & techniques · chapter 2

Instead of that mouthful, we will use variables to talk generally about

formulas in much the same way that mathematicians use variables to
talk generally about numbers and other such things. We will use cap-
ital letters like
A, B, C, D, . . .
as variables ranging over the class formula. So, instead of the long
paragraph above, it suffices to say Number theory books don’t often
include lots of numerals. Instead,
From A → (B → C) you can infer B → (A → C). they’re filled with variables like ‘x’
and ‘y.’ This isn’t because these
which seems much more perspicuous and memorable. Now we have books aren’t about numbers. They
the raw formal materials to address the question of deduction using are, but they don’t merely list partic-
ular facts about numbers. They talk
conditional judgements. How may we characterise valid reasoning us- about general features of numbers,
ing conditional constructions? We will look at one way of addressing and hence the variables.
this topic in this section.
2.1.1 | proofs for conditionals

Start with a piece of reasoning using conditional judgements. One
example might be this:
Suppose A → (B → C). Suppose A. It follows that B → C.
Suppose B. It follows that C.
This kind of reasoning has two important features. We make suppos-
itions or assumptions. We also infer from these assumptions. From
A → (B → C) and A we inferred B → C. From this new information,
together with the supposition that B, we inferred a new conclusion, C.
One way to represent the structure of this piece of reasoning is in
this tree diagram shown here.
A → (B → C) A
B→C B
C
The leaves of the tree are the formulas A → (B → C), A and B. They
are the assumptions upon which the deduction rests. The other formu-
las in the tree are deduced from formulas occurring above them in the
tree. The formula B → C is written immediately below a line, above
which are the formulas from which we deduced it. So, B → C follows
from the leaves A → (B → C) and A. Then the root of the tree (the for-
mula at the bottom), C, follows from that formula B → C and the other
leaf B. The ordering of the formulas bears witness to the relationships
of inference between those formulas in our process of reasoning.
The two steps in our example proof use the same kind of reason-
ing. The inference from a conditional, and from its antecedent to its
consequent. This step is called modus ponens.1 It’s easy to see that
1 “Modus ponens” is short for “modus ponendo ponens,” which means “the
mode of affirming by affirming.” You get to the affirmation of B by way of the
affirmation of A (and the other premise, A → B). It may be contrasted with Modus
tollendo tollens, the mode of denying by denying: from A → B and not B to not A.
§2.1 · natural deduction for conditionals 17

using modus ponens we always move from more complicated formu-
las to less complicated formulas. However, sometimes we wish to infer
the conditional A → B on the basis of our information about A and
about B. And it seems that sometimes this is legitimate. Suppose we
want to know about the connection between A and C in a context in
which we are happy to assume both A → (B → C) and B. What kind
of connection is there (if any) between A and C? It would seem that it
would be appropriate to infer A → C, since we have a valid argument
to the conclusion that C if we make the assumption that A.
A → (B → C) [A](1)
B→C B
C
[1]
A→C
So, it seems we can reason like this. At the step marked with [1], we
make the inference to the conditional conclusion, on the basis of the
reasoning up until that point. Since we can infer to C using A as an
assumption, we can conclude A → C. At this stage of the reasoning, A
is no longer active as an assumption: we discharge it. It is still a leaf
of the tree (there is no node of the tree above it), but it is no longer
an active assumption in our reasoning. So, we bracket it, and annotate
the brackets with a label, indicating the point in the demonstration at
which the assumption is discharged. Our proof now has two assump-
tions, A → (B → C) and B, and one conclusion, A → C.
[A](i)
A→B A ·
·
→E ·
B B
→I,i
A→B
Figure 2.1: natural deduction rules for conditionals
We have motivated two rules of inference. These rules are dis-

played in Figure 2.1. The first rule, modus ponens, or conditional
elimination →E allows us to step from a conditional and its antecedent
to the consequent of the conditional.We call the conditional premise
The major premise in a connect- A → B the major premise of the →E inference, and the antecedent
ive rule features that connective. A the minor premise of that inference. When we apply the inference
→E, we combine two proofs: the proof of A → B and the proof of
A. The new proof has as assumptions any assumptions made in the
proof of A → B and also any assumptions made in the proof of A. The
conclusion is B.
The second rule, conditional introduction →I allows us to use a
proof from A to B as a proof of A → B. The assumption of A is dis-
charged in this step. The proof of A → B has as its assumptions all of

the assumptions used in the proof of B except for the instances of A

that we discharged in this step. Its conclusion is A → B.
definition 2.1.1 [proofs for conditionals] A proof is a tree, whose

nodes are either formulas, or bracketed formulas. The formula at the
root of the tree is said to be the conclusion of the proof. The unbrack-
eted formulas at the leaves of the tree are the premises of the proof.
» Any formula A is a proof, with premise A and conclusion A.

» If πl is a proof, with conclusion A → B and πr is a proof, with
conclusion A, then the following tree
· ·
· πl · πr
· ·
A→B A
→E
B
is a proof with conclusion B, and having the premises consisting
of the premises of πl together with the premises of πr .
» If π is a proof, for which A is one of the premises and B is the
conclusion, then the following tree How do you choose the number
for the label (i) on the discharged
formula? Find the largest number
[A](i)
· labelling a discharge in the proof π,
· and then choose the next one.
·
B
→I,i
A→B
is a proof of A → B.
» Nothing else is a proof.
This is a recursive definition, in just the same manner as the recursive

definition of the class formula.
(1)
[C → A](2) [C](1)
A → B [A] →E
(2)
→E [A → B](1) [A](2) [A → B](3) A
[B → C] B →E →E
→E B B
C →I,1 →I,1
→I,1 (A → B) → B C→B
A→C →I,2 →I,2
→I,2 A → ((A → B) → B) (C → A) → (C → B)
(B → C) → (A → C) →I,3
(A → B) → ((C → A) → (C → B))
suffixing (inference) assertion (formula) prefixing (formula)
Figure 2.2: three implicational proofs
Figure 2.2 gives three proofs of implicational proofs constructed using

our rules. The first is a proof from A → B to (B → C) → (A → C).
This is the inference of suffixing. (We “suffix” both A and B with

→ C.) The other proofs conclude in formulas justified on the basis of
no undischarged assumptions. It is worth your time to read through
these proofs to make sure that you understand the way each proof is
constructed.
You can try a number of different strategies when making proofs for
yourself. For example, you might like to try your hand at constructing
a proof to the conclusion that B → (A → C) from the assumption
A → (B → C). Here are two ways to piece the proof together.
constructing proofs top-down: You start with the assumptions

and see what you can do with them. In this case, with A → (B → C)
you can, clearly, get B → C, if you are prepared to assume A. And
then, with the assumption of B we can deduce C. Now it is clear that
we can get B → (A → C) if we discharge our assumptions, A first, and
then B.
constructing proofs bottom-up: Start with the conclusion, and

find what you could use to prove it. Notice that to prove B → (A → C)
you could prove A → C using B as an assumption. Then to prove
A → C you could prove C using A as an assumption. So, our goal is
now to prove C using A, B and A → (B → C) as assumptions. But this
is an easy pair of applications of →E.
I have been intentionally unspecific when it comes to discharging for-
mulas in proofs. In the examples in Figure 2.2 you will notice that at
each step when a discharge occurs, one and only one formula is dis-
charged. By this I do not mean that at each →I step a formula A is
discharged and a different formula B is not. I mean that in the proofs
we have seen so far, at each →I step, a single instance of the formula
is discharged. Not all proofs are like this. Consider this proof from the
assumption A → (A → B) to the conclusion A → B. At the final step
of this proof, two instances of the assumption A are discharged in one
go.
A → (A → B) [A](1)
→E
A→B [A](1)
→E
B
→I,1
A→B
For this to count as a proof, we must read the rule →I as licensing the
discharge of one or more instances of a formula in the inference to the
conditional. Once we think of the rule in this way, one further gen-
eralisation comes to mind: If we think of an →I move as discharging
a collection of instances of our assumption, someone of a generalising
spirit will ask if that collection can be empty. Can we discharge an
“Yesterday upon the stair, I met a assumption that isn’t there? If we can, then this counts as a proof:
man who wasn’t there. He wasn’t
there again today. I wish that man A
would go away.” — Hughes Mearns →I,1
B→A

Here, we assume A, and then, we infer B → A discharging all of the

active assumptions of B in the proof at this point. The collection of
active assumptions of B is, of course, empty. No matter, they are all
discharged, and we have our conclusion: B → A.
You might think that this is silly—how, after all, can you discharge
a nonexistent assumption? Nonetheless, discharging assumptions that
are not there plays a role in what will follow. To give you a foretaste
of why, notice that the inference, from A to B → A, is valid if we
read “→” as the material conditional of standard two-valued classical
propositional logic. In a pluralist spirit we will investigate different For more work on a “pluralist spirit,”
policies for discharging formulas. see my work with JC Beall [4, 5, 77].
definition 2.1.2 [discharge policy] A discharge policy may either

allow or disallow duplicate discharges (discharging more than one in-
stance of a formula at once) or vacuous discharges (discharging zero
instances of a formula in a discharge step). We can present names for
the different discharge policies possible in a table.
I am not happy with the label “af-
fine,” but that’s what the literature
vacuous ok vacuous not ok
has given us. Does anyone have any
duplicates ok Standard Relevant better ideas for this? “Standard” is
not “classical” because it suffices for
duplicates not ok “Affine” Linear intuitionistic logic in this context, not
classical logic. It’s not “intuitionistic”
because “intuitionistic” is difficult
The “standard” discharge policy is to allow both vacuous and duplicate to pronounce, and it is not distinct-
discharge. There are reasons to explore each of the different combin- ively intuitionistic. As we shall see
ations. As I indicated above, you might think that vacuous discharge later, it’s the shape of proof and not
the discharge policy that gives us
is a bit silly. It is not merely silly: it seems downright wrong if you intuitionistic implicational logic.
think that a judgement of the form A → B records the claim that B
may be inferred from A.2 If A is not used in the inference to B, then
we hardly have reason to think that B follows from A in this sense. So,
if you are after a conditional which is relevant in this way, you would
be interested in discharge policies that ban vacuous discharge [1, 2, 71].
There are also reasons to ban duplicate discharge: Victor Pambuc-
cian has found an interesting example of doing without duplicate dis-
charge in early 20th Century geometry [58]. He traces cases where
geometers took care to keep track of the number of times a postulate
was used in a proof. So, they draw a distinction between A → (A → B)
and A → B. We shall see more of the distinctive properties of different
discharge policies as the book progresses.
definition 2.1.3 [discharge in proofs] An proof in which every dis-

charge is linear is a linear proof. Similarly, a proof in which every dis-
charge is relevant is a relevant proof, a proof in which every discharge
is affine is an affine proof.
2 You must be careful if you think that more than one discharge policy is ok.
Consider Exercise 19 at the end of this section, in which it is shown that if you
have two conditionals →1 and →2 with different discharge policies, the conditionals
collapse into one (in effect having the most lax discharge policy of either →1 or →2 ).
Consider Exercise 23 to explore how you might have the one logic with more than
one conditional connective.

Proofs underwrite arguments. If we have a proof from a collection
X of assumptions to a conclusion A, then the argument X ∴ A is valid
We will generalise the notion of by the light of the rules we have used. So, in this section, we will think
an argument later, in a number of of arguments as structures involving a collection of assumptions and
directions. But this notion of ar-
gument is suited to the kind of a single conclusion. But what kind of thing is that collection X? It
proof we are considering here. isn’t a set, because the number of premises makes a difference: (The
example here involves linear discharge policies. We will see later that
even when we allow for duplicate discharge, there is a sense in which
the number of occurrences of a formula in the premises might still
matter.) There is a linear proof from A → (A → B), A, A to B:
A → (A → B) A
→E
A→B A
→E
B
We shall see later that there is no linear proof from A → (A → B), A

to B. The collection appropriate for our analysis at this stage is what
is called a multiset, because we want to pay attention to the number of
times we make an assumption in an argument.
definition 2.1.4 [multiset] Given a class X of objects (such as the class

formula), a multiset M of objects from X is a special kind of collection
If you like, you could define a of elements of X. For each x in X, there is a natural number fM (x), the
multiset of formulas to be the func- number of occurrences of the object x in the multiset M. The number
tion fM (x) function f : formula → ω
from formulas to counting numbers. fM (x) is sometimes said to be the degree to which x is a member of M.
Then f = g when f(A) = g(A) The multiset M is finite if fM (x) > 0 for only finitely many objects
for each formula A. f(A) is the x. The multiset M is identical to the multiset M 0 if and only if fM (x) =
number of times A is in the
multiset f. A finite multiset is a fM 0 (x) for every x in X.
multiset f such that f(A) > 0 Multisets may be presented in lists, in much the same way that sets
for only finitely many objects A. can. For example, [1, 2, 2] is the finite multiset containing 1 only once
and 2 twice. [1, 2, 2] = [2, 1, 2], but [1, 2, 2] 6= [1, 1, 2]. We shall only
consider finite multisets of formulas, and not multisets that contain
other multisets as members. This means that we can do without the
brackets and write our multisets as lists. We will write “A, B, B, C” for
the finite multiset containing B twice and A and C once. The empty
multiset, to which everything is a member to degree zero, is [ ].
definition 2.1.5 [comparing multisets] When M and M 0 are multis-

ets and fM (x) 6 fM 0 (x) for each x in X, we say that M is a sub-multiset
of M 0 , and M 0 is a super-multiset of M.
The ground of the multiset M is the set of all objects that are members
of M to a non-zero degree. So, for example, the ground of the multiset
A, B, B, C is the set {A, B, C}.
We use finite multisets as a part of a discriminating analysis of proofs

and arguments. (An even more discriminating analysis will consider
premises to be structured in lists, according to which A, B differs from
B, A. You can examine this in Exercise 24 on page 52.) We have no

need to consider infinite multisets in this section, as multisets repres-

ent the premise collections in arguments, and it is quite natural to con-
sider only arguments with finitely many premises. So, we will consider
arguments in the following way.
definition 2.1.6 [argument] An argument X ∴ A is a structure con-

sisting of a finite multiset X of formulas as its premises, and a single
formula A as its conclusion. The premise multiset X may be empty.An John Slaney has joked that the
argument X ∴ A is standardly valid if and only if there is some proof empty multiset [ ] should be dis-
tinguished from the empty set ∅,
with undischarged assumptions forming the multiset X, and with the since nothing is a member of ∅, but
conclusion A. It is relevantly valid if and only if there is a relevant everything is a member of [ ] zero
proof from the multiset X of premises to A, and so on. times.
Here are some features of validity.
lemma 2.1.7 [validity facts] Let v-validity be any of linear, relevant,

affine or standard validity.
1. A ∴ A is valid.
2. X, A ∴ B is v-valid if and only if X ∴ A → B is v-valid.
3. If X, A ∴ B and Y ∴ A are both v-valid, so is X, Y ∴ B.
4. If X ∴ B is affine or standardly valid, so is X, A ∴ B.
5. If X, A, A ∴ B is relevantly or standardly valid, so is X, A ∴ B.
Proof: It is not difficult to verify these claims. The first is given by the
proof consisting of A as premise and conclusion. For the second, take
a proof π from X, A to B, and in a single step →I, discharge the (single
instance of) A to construct the proof of A → B from X. Conversely,
if you have a proof from X to A → B, add a (single) premise A and
apply →E to derive B. In both cases here, if the original proofs satisfy
a constraint (vacuous or multiple discharge) so do the new proofs.
For the third fact, take a proof from X, A to B, but replace the in-
stance of assumption of A indicated in the premises, and replace this
with the proof from Y to A. The result is a proof, from X, Y to B as
desired. This proof satisfies the constraints satisfied by both of the
original proofs.
For the fourth fact, if we have a proof π from X to B, we can extend
this as follows
X
·
·π
·
B
→I
A→B A
→E
B
to construct a proof from to B involving the new premise A, as well as
the original premises X. The →I step requires a vacuous discharge.
Finally, if we have a proof π from X, A, A to B (that is, a proof with
X and two instances of A as premises to derive the conclusion B) we

discharge the two instances of A to derive A → B and then reinstate a
single instance of A to as a premise to derive B again.
X, [A, A](i)
·
·π
·
B
→I,i
A→B A
→E
B
Now, we might focus our attention on the distinction between those
arguments that are valid and those that are not—to focus on facts about
validity such as those we have just proved. That would be to ignore the
distinctive features of proof theory. We care not only that an argument
is proved, but how it is proved. For each of these facts about validity,
we showed not only the existential fact (for example, if there is a proof
from X, A to B, then there is a proof from X to A → B) but the stronger
and more specific fact (if there is a proof from X, A to B then from this
proof we construct the proof from X to A → B in this uniform way).
» «
It is often a straightforward matter to show that an argument is valid.
Find a proof from the premises to the conclusion, and you are done.
Showing that an argument is not valid seems more difficult. According
to the literal reading of this definition, if an argument is not valid there
is no proof from the premises to the conclusion. So, the direct way to
show that an argument is invalid is to show that it has no proof from
the premises to the conclusion. But there are infinitely many proofs!
You cannot simply go through all of the proofs and check that none
of them are proofs from X to A in order to convince yourself that the
argument is not valid. To accomplish this task, subtlety is called for.
We will end this section by looking at how we might summon up the
required skill.
One subtlety would be to change the terms of discussion entirely,
and introduce a totally new concept. If you could show that all valid
arguments have some special property – and one that is easy to detect
when present and when absent – then you could show that an argu-
ment is invalid by showing it lacks that special property. How this
might manage to work depends on the special property. We shall look
at one of these properties in Section 2.5 when we show that all valid ar-
guments preserve truth in models. Then to show that an argument is
invalid, you could provide a model in which truth is not preserved from
the premises to the conclusion. If all valid arguments are truth-in-a-
model-preserving, then such a model would count as a counterexample
to the validity of your argument.
In this section, on the other hand, we will not go beyond be conceptual

bounds of the study of proof. We will find instead a way to show that
an argument is invalid, using an analysis of proofs themselves. The
collection of all proofs is too large to survey. From premises X and

conclusion A, the collection of direct proofs – those that go straight

from X to A without any detours down byways or highways – might be
more tractable. If we could show that there are not many direct proofs
from a given collection of premises to a conclusion, then we might be
able to exploit this fact to show that for a given set of premises and a
conclusion there are no direct proofs from X to A. If, in addition, you
were to how that any proof from a premise set to a conclusion could
somehow be converted into a direct proof from the same premises to
that conclusion, then you would have success in showing that there is
no proof from X to A.
Happily, this technique works. But to make this work we need to I think that the terminology ‘normal’
understand what it is for a proof to be “direct” in some salient sense. comes from Prawitz [63], though the
idea comes from Gentzen.
Direct proofs have a name—they are ‘normal’.
2.1.2 | normal proofs

It is best to introduce normal proofs by contrasting them with non-
normal proofs. And non-normal proofs are not difficult to find. Many
proofs are quite strange. Take a proof that concludes with an implic-
ation introduction: it infers from A to B by way of the sub-proof π1 .
Then we discharge the A to conclude A → B. Imagine that at the very
next step, it uses a different proof – call it π2 – with conclusion A to
deduce B by means of an implication elimination. This proof contains
a redundant step. Instead of taking the detour through the formula
A → B, we could use the proof π1 of B, but instead of taking A as an
assumption, we could use the proof of A we have at hand, namely π2 .
The before-and-after comparison is this:
[A](i)
· ·
· π1 · π2
· ·
B · after: A
before: · π2 ·
→I,i · · π1
A→B A ·
→E B
B
The result is a proof of B from the same premises as our original proof.
The premises are the premises of π1 (other than the instances of A that
were discharged in the other proof) together with the premises of π2 .
This proof does not go through the formula A → B, so it is, in a sense,
simpler.
Well . . . there are some subtleties with counting, as usual with our
proofs. If the discharge of A was vacuous, then we have nowhere to
plug in the new proof π2 , so the premises of π2 don’t appear in the
final proof. On the other hand, if a number of duplicates of A were
discharged, then the new proof will contain that many copies of π2 ,
and hence, that many copies of the premises of π2 . Let’s make this
discussion more explicit, by considering an example where π1 has two
instances of A in the premise list. The original proof containing the

introduction and then elimination of A → B is
A → (A → B) [A](1)
→E
A→B [A](1) [A](2)
→E →I,2
B (A → A) → A A → A
→I,1 →E
A→B A
→E
B
We can cut out the →I/→E pair (we call such pairs indirect pairs)
using the technique described above, we place a copy of the inference
to A at both places that the A is discharged (with label 1). The result is
this proof, which does not make that detour.
[A](2)
→I,2
(A → A) → A A → A [A](2)
→E →I,2
A → (A → B) A (A → A) → A A → A
→E →E
A→B A
→E
B
which is a proof from the same premises (A → (A → B) and (A →
A) → A) to the same conclusion B, except for multiplicity. In this
proof the premise (A → A) → A is used twice instead of once. (Notice
too that the label ‘2’ is used twice. We could relabel one subproof to
A → A to use a different label, but there is no ambiguity here because
the two proofs to A → A do not overlap. Our convention for labelling
is merely that at the time we get to an →I label, the numerical tag is
unique in the proof above that step.)
We have motivated the concept of normality. Here is the formal defin-
ition:
definition 2.1.8 [normal proof] A proof is normal if and only if the

concluding formula A → B of an →I step is not at the same time the
major premise of an →E step.
definition 2.1.9 [indirect pair; detour formula] If a formula A →

B introduced in an →I step is at the same time the major premise of
an →E step, then we shall call this pair of inferences an indirect pair
and we will call the instance A → B in the middle of this indirect pair
a detour formula in the proof.
So, a normal proof is one without any indirect pairs. It has no detour
formulas.
Normality is not only important for proving that an argument is in-
valid by showing that it has no normal proofs. The claim that every
valid argument has a normal proof could well be vital. If we think of
the rules for conditionals as somehow defining the connective, then
proving something by means of a roundabout →I/→E step that you

cannot prove without it would seem to be quite illicit. If the condi-

tional is defined by way of its rules then it seems that the things one
can prove from a conditional ought to be merely the things one can
prove from whatever it was you used to introduce the conditional. If
we could prove more from a conditional A → B than one could prove
on the basis on the information used to introduce the conditional, then
we are conjuring new arguments out of thin air.
For this reason, many have thought that being able to convert non-
normal proofs to normal proofs is not only desirable, it is critical if the
proof system is to be properly logical. We will not continue in this
philosophical vein here. We will take up this topic in a later section,
after we understand the behaviour of normal proofs a little better. Let
us return to the study of normal proofs.
Normal proofs are, intuitively at least, proofs without a kind of redund-
ancy. It turns out that avoiding this kind of redundancy in a proof
means that you must avoid another kind of redundancy too. A nor-
mal proof from X to A may use only a very restricted repertoire of
formulas. It will contain only the subformulas of X and A.
definition 2.1.10 [subformulas and parse trees] The parse tree for
an atom is that atom itself. The parse tree for a conditional A → B
is the tree containing A → B at the root, connected to the parse tree
for A and the parse tree for B. The subformulas of a formula A are
those formulas found in A’s parse tree. We let sf(A) be the set of all
subformulas of A. sf(p) = {p}, and sf(A → B) = {A → B}∪ sf(A)∪ sf(B).
To generalise, when X is a multiset of formulas, we will write sf(X) for
the set of subformulas of each formula in X.
Here is the parse tree for (p → q) → ((q → r) → p):

q r
p q q→r p
p→q (q → r) → p
(p → q) → ((q → r) → p)
So, sf((p → q) → ((q → r) → p)) = {(p → q) → ((q → r) → p), p →

q, p, q, (q → r) → p, q → r, r}.
We may prove the following theorem.
theorem 2.1.11 [the subformula theorem] Each normal proof from

the premises X to the conclusion A contains only formulas in sf(X, A).
Notice that this is not the case for non-normal proofs. Consider the
following circuitous proof from A to A.
[A](1)
→I,1
A→A A
→E
A

Here A → A is in the proof, but it is not a subformula of the premise
(A) or the conclusion (also A).
The subformula property for normal proofs goes some way to re-
assure us that a normal proof is direct. A normal proof from X to A
cannot stray so far away from the premises and the conclusion so as to
incorporate material outside X and A.
Proof: To prove the subformula theorem, we need to look carefully

at how proofs are constructed. If π is a normal proof, then it is con-
structed in exactly the same way as all proofs are, but the fact that the
proof is normal gives us some useful information. By the definition of
proofs, π either is a lone assumption, or π ends in an application of →I,
or it ends in an application of →E. Assumptions are the basic building
blocks of proofs. We will show that assumption-only proofs have the
subformula property, and then, also show on the assumption that the
proofs we have on had have the subformula property, then the normal
Notice that the subproofs of nor- proofs we construct from them also have the property. Then it will fol-
mal proofs are normal. If a sub- low that all normal proofs have the subformula property, because all
proof of a proof contains an indirect
pair, then so does the larger proof. of the normal proofs can be generated in this way.
assumption A sole assumption, considered as a proof, satisfies the

subformula property. The assumption A is the only constituent of the
proof and it is both a premise and the conclusion.
introduction In the case of →I, π is constructed from another nor-

mal proof π 0 from X to B, with the new step added on (and with the
discharge of a number – possibly zero – of assumptions). π is a proof
from X 0 to A → B, where X 0 is X with the deletion of some number of
instances of A. Since π 0 is normal, we may assume that every formula
in π 0 is in sf(X, B). Notice that sf(X 0 , A → B) contains every element of
sf(X, B), since X differs only from X 0 by the deletion of some instances
of A. So, every formula in π (namely, those formulas in π 0 , together
with A → B) is in sf(X 0 , A → B) as desired.
elimination In the case of →E, π is constructed out of two normal

proofs: one (call it π1 ) to the conclusion of a conditional A → B from
premises X, and the other (call it π2 ) to the conclusion of the antecedent
of that conditional A from premises Y . Both π1 and π2 are normal, so
we may assume that each formula in π1 is in sf(X, A → B) and each
formula in π2 is in sf(Y, A). We wish to show that every formula in π
is in sf(X, Y, B). This seems difficult (A → B is in the proof—where can
it be found inside X, Y or B?), but we also have some more information:
π1 cannot end in the introduction of the conditional A → B. So, π1 is
either the assumption A → B itself (in which case Y = A → B, and
clearly in this case each formula in π is in sf(X, A → B, B)) or π1 ends
in a →E step. But if π1 ends in an →E step, the major premise of that
inference is a formula of the form C → (A → B). So π1 contains the
formula C → (A → B), so whatever list Y is, C → (A → B) ∈ sf(Y, A),

and so, A → B ∈ sf(Y). In this case too, every formula in π is in

sf(X, Y, B), as desired.
This completes the proof of our theorem. Every normal proof is con-
structed from assumptions by introduction and elimination steps in
this way. The subformula property is preserved through each step of
the construction.
Normal proofs are useful to work with. Even though an argument

might have very many proofs, it will have many fewer normal proofs,
and we can exploit this fact.
example 2.1.12 [no normal proofs] There is no normal proof from p

to q. There is no normal relevant proof from p → r to p → (q → r).
Proof: Normal proofs from p to q (if there are any) contain only for-
mulas in sf(p, q): that is, they contain only p and q. That means they
contain no →I or →E steps, since they contain no conditionals at all.
It follows that any such proof must consist solely of an assumption.
As a result, the proof cannot have a premise p that differs from the
conclusion q. There is no normal proof from p to q.
Consider the second example: If there is a normal proof of p → (q →
r), from p → r, it must end in an →I step, from a normal (relevant)
proof from p → r and p to q → r. Similarly, this proof must also end
in an →I step, from a normal (relevant) proof from p → r, p and q to r.
Now, what normal relevant proofs can be found from p → r, p and q to
r? There are none! Any such proof would have to use q as a premise
somewhere, but since it is normal, it contains only subformulas of p →
r, p, q and r—namely those formulas themselves. There is no formula
involving q other than q itself on that list, so there is nowhere for q
to go. It cannot be used, so it will not be a premise in the proof. There
is no normal relevant proof from the premises p → r, p and q to the
conclusion r.
These facts are interesting enough. It would be more productive, how-

ever, to show that there is no proof at all from p to q, and no relevant
proof from p → r to p → (q → r). We can do this if we have some
way of showing that if we have a proof for some argument, we have a
normal proof for that argument.
So, we now work our way towards the following theorem:
theorem 2.1.13 [normalisation theorem] A proof π from X to A re-

duces in some number of steps to a proof π 0 from X 0 to A.
If π is linear, so is π 0 , and X = X 0 . If π is affine, so is π 0 , and X 0 is
a sub-multiset of X. If π is relevant, then so is π 0 , and X 0 covers the
same ground as X, and is a super-multiset of X. If π is standard, then [1, 2, 2, 3] covers the same ground
so is π 0 , and X 0 covers no more ground than X. as–and is a super-multiset of–[1, 2, 3].
And [2, 2, 3, 3] covers no more
ground than [1, 2, 3].
Notice how the premise multiset of the normal proof is related to the
premise multiset of the original proof. If we allow duplicate discharge,
then the premise multiset may contain formulas to a greater degree
than in the original proof, but the normal proof will not contain any
new premises. If we allow vacuous discharge, then the normal proof
might contain fewer premises than the original proof.
The normalisation theorem mentions the notion of reduction, so
let us first define it.
definition 2.1.14 [reduction] A proof π reduces to π 0 (π π 0 ) if
some indirect pair in π is eliminated in the usual way.
[A](i)
· ·
· π1 · π2
· ·
B · A
· π2 ·
→I,i · · π1
A→B A ·
→E B
·
B ·
· ·
· C
·
C
If there is no π 0 such that π π 0 , then π 0 is normal. If π0 π2
··· πn we write “π0 ∗ πn ” and we say that π0 reduces to πn in a
We allow that π ∗ π. A proof number of steps. We aim to show that for any proof π, there is some
reduces to itself in zero steps. normal π∗ such that π ∗ π∗ .
The only difficult part in proving the normalisation theorem is show-
ing that the process reduction can terminate in a normal proof. In the
case where we do not allow duplicate discharge, there is no difficulty at
all.
Proof [Theorem 2.1.13: linear and affine cases]: If π is a linear proof,
or is an affine proof, then whenever you pick an indirect pair and nor-
malise it, the result is a shorter proof. At most one copy of the proof
π2 for A is inserted into the proof π1 . (Perhaps no substitution is made
in the case of an affine proof, if a vacuous discharge was made.) Proofs
have some finite size, so this process cannot go on indefinitely. Keep de-
leting indirect pairs until there are no pairs left to delete. The result is a
normal proof to the conclusion A. The premises X remain undisturbed,
except in the affine case, where we may have lost premises along the
way. (An assumption from π2 might disappear if we did not need to
make the substitution.) In this case, the premise multiset X 0 from the
normal proof is a sub-multiset of X, as desired.
If we allow duplicate discharge, however, we cannot be sure that in
normalising we go from a larger to a smaller proof. The example on
page 26 goes from a proof with 11 formulas to another proof with 11
formulas. The result is no smaller, so size is no guarantee that the
process terminates.
To gain some understanding of the general process of transforming a
non-normal proof into a normal one, we must find some other measure

that decreases as normalisation progresses. If this measure has a least

value then we can be sure that the process will stop. The appropriate Well, the process stops if the meas-
measure in this case will not be too difficult to find. Let’s look at a part ures are ordered appropriately—so
that there’s no infinitely descending
of the process of normalisation: the complexity of the formula that is chain.
normalised.
definition 2.1.15 [complexity] A formula’s complexity is the number

of connectives in that formula. In this case, it is the number of in-
stances of ‘→’ in the formula.
The crucial features of complexity are that each formula has a finite
complexity, and that the proper subformulas of a formula each have a
lower complexity than the original formula. This means that complex-
ity is a good measure for an induction, like the size of a proof.
Now, suppose we have a proof containing just one indirect pair, intro-
ducing and eliminating A → B, and suppose that otherwise, π1 (the
proof of B from A) and π2 (the proof of A) are normal.
[A](i)
· ·
· π1 · π2
· ·
B · after: A
before: · π2 ·
→I,i · · π1
A→B A ·
→E B
B
Unfortunately, the new proof is not necessarily normal. The new proof
is non-normal if π2 ends in the introduction of A, while π1 starts off
with the elimination of A. Notice, however, that the non-normality of
the new proof is, somehow, smaller. There is no non-normality with
respect to A → B, or any other formula that complex. The potential
non-normality is with respect to a subformula A. This result would
still hold if the proofs π1 and π2 weren’t normal themselves, but when
they might have →I/→E pairs for formulas less complex than A → B.
If A → B is the most complex detour formula in the original proof,
then the new proof has a smaller most complex detour formula.
definition 2.1.16 [non-normality] The non-normality measure of a

proof is a sequence hc1 , c2 , . . . , cn i of numbers such that ci is the num-
ber of indirect pairs of formulas of complexity i. The sequence for a
proof stops at the last non-zero value. Sequences are ordered with their
last number as most significant. That is, hc1 , . . . , cn i > hd1 , . . . , dm i if
and only if n > m, or if n = m, when cn > dn , or if cn = dn , when
hc1 , . . . , cn−1 i > hd1 , . . . , dn−1 i.
Non-normality measures satisfy the finite descending chain condition.

Starting at any particular measure, you cannot find any infinite des-
cending chain of measures below it. There are infinitely many meas-
ures smaller than h0, 1i (in this case, h0i, h1i, h2i, . . .). However, to form
a descending sequence from h0, 1i you must choose one of these as
your next measure. Say you choose h500i. From that, you have only

finitely many (500, in this case) steps until hi. This generalises. From
the sequence hc1 , . . . , cn i, you lower cn until it gets to zero. Then
you look at the index for n − 1, which might have grown enormously.
Nonetheless, it is some finite number, and now you must reduce this
value. And so on, until you reach the last quantity, and from there, the
empty sequence hi. Here is an example sequence using this ordering
h3, 2, 30i > h2, 8, 23i > h1, 47, 15i > h138, 478i > · · · > h1, 3088i >
h314159i > · · · > h1i > hi.
lemma 2.1.17 [non-normality reduction] Any a proof with an in-
direct pair reduces in one step to some proof with a lower measure of
non-normality.
Proof: Choose a detour formula in π of greatest complexity (say n),
such that its proof contains no other detour formulas of complexity
n. Normalise that proof. The result is a proof π 0 with fewer detour
formulas of complexity n (and perhaps many more of n − 1, etc.). So,
it has a lower non-normality measure.
Now we have a proof of our normalisation theorem.
Proof [of Theorem 2.1.13: relevant and standard case]: Start with π, a
proof that isn’t normal, and use Lemma 2.1.17 to choose a proof π 0
with a lower measure of non-normality. If π 0 is normal, we’re done. If
it isn’t, continue the process. There is no infinite descending chain of
non-normality measures, so this process will stop at some point, and
the result is a normal proof.
Every proof may be transformed into a normal proof. If there is a lin-
ear proof from X to A then there is a normal linear proof from X to
A. Linear proofs are satisfying and strict in this manner. If we allow
vacuous discharge or duplicate discharge, matters are not so straight-
forward. For example, there is a non-normal standard proof from p, q
to p:
p
→I,1
q→p q
→E
p
but there is no normal standard proof from exactly these premises to
the same conclusion, since any normal proof from atomic premises to
an atomic conclusion must be an assumption alone. We have a normal
proof from p to p (it is very short!), but there is no normal proof from
p to p that involves q as an extra premise.
Similarly, there is a relevant proof from p → (p → q), p to q, but
it is non-normal.
p → (p → q) [p](1)
→E
p→q [p](1)
→E
q
→I,1
p→q p
→E
q

There is no normal relevant proof from p → (p → q), p to q. Any

normal relevant proof from p → (p → q) and p to q must use →E to
deduce p → q, and then the only other possible move is either →I (in
which case we return to p → (p → q) none the wiser) or we perform
another →E with another assumption p to deduce q, and we are done.
Alas, we have claimed two undischarged assumptions of p. In the non-
linear cases, the transformation from a non-normal to a normal proof
does damage to the number of times a premise is used.
» «
It is very tempting to view normalisation as a way of eliminating re-
dundancies and making explicit the structure of a proof. However, if This passage is the hardest part of
that is the case, then it should be the case that the process of normal- Section 2.1. Feel free to skip over
the proofs of theorems in this sec-
isation cannot give us two distinct “answers” for the structure of the tion, until page 38 on first reading.
one proof. Can two different reduction sequences for a single proof res-
ult in different normal proofs? To investigate this, we need one more
notion of reduction.
definition 2.1.18 [parallel reduction] A proof π parallel reduces to
π 0 if some number of indirect pairs in π are eliminated in parallel. We
=
write “π π 0 .”
For example, consider the proof with the following two detour formu-
las marked:
A → (A → B) [A](1)
→E
A→B [A](1) [A](2)
→E →I,2
B A→A A
→I,1 →E
A→B A
→E
B
To process them we can take them in any order. Eliminating the A → B,
we have
[A](2)
→I,2
A→A A [A](2)
→E →I,2
A → (A → B) A A→A A
→E →E
A→B A
→E
B
which now has two copies of the A → A to be reduced. However, these
copies do not overlap in scope (they cannot, as they are duplicated in
the place of assumptions discharged in an eliminated →I rule) so they
can be processed together. The result is the proof
A → (A → B) A
→E
A→B A
→E
B

You can check that if you had processed the formulas to be eliminated
in the other order, the result would have been the same.
= = =
lemma 2.1.19 [diamond property for ] If π π1 and π π2 then
= =
there is some π 0 where π1 π 0 and π2 π 0.
Proof [sketch]: Take the detour formulas in π that are eliminated in the
To do: This proof sketch move to π1 or to π2 . For those not eliminated in the move to π1 , mark
should be made more precise. their corresponding occurrences in π1 . Similarly, mark the occurrences
of formulas in π2 that are detour formulas in π that are eliminated in
the move to π1 . Now eliminate the marked formulas in π1 and those
in π2 to produce the proof π 0 .
theorem 2.1.20 [only one normal form] Any sequence of reduction

steps from a proof π that terminates, terminates in a unique normal
proof π∗ .
Proof: Suppose that π ∗ π 0 , and π ∗ π 00 . It follows that we have

two reduction sequences
= = = = =
π π10 π20 ··· πn0 π0
= = = = =
π π100 π200 ··· 00
πm π 00
= =
By the diamond property, we have a π1,1 where π10 π1,1 and π100
= =
π1,1 . Then π100 π1,1 and π100 π200 so by the diamond property there
= =
is some π2,1 where π200 π2,1 and π1,1 π2,1 . Continue in this vein,
guided by the picture below:
= = = =
π π10 π20 ··· πn0
=
= = = =
π100 π1,1 π1,2 ··· π1,n
=
= = = =
π200 π2,1 π2,2 ··· π2,n
=
.. .. .. ..
. . . .
=
= = = =
00
πm πm,1 πm,2 ··· π∗
to find the desired proof π∗ . So, if πn0 and πn00 are normal they must be
identical.
So, sequences of reductions from π cannot terminate in two different

proofs. However, does every reduction process terminate?
definition 2.1.21 [strongly normalising] A proof π is strongly nor-

malising (under a reduction relation ) if and only if there is no infin-
ite reduction sequence starting from π.

We will prove that every proof is strongly normalising under the rela-
tion of deleting detour formulas. To assist in talking about this, we
need to make a few more definitions. First, the reduction tree.
definition 2.1.22 [reduction tree] The reduction tree (under ) of
a proof π is the tree whose branches are the reduction sequences on
the relation . So, from the root π we reach any proof accessible in
one step from π. From each π 0 where π π 0 , we branch similarly.
Each node has only finitely many successors as there are only finitely
many detour formulas in a proof. For each proof π, ν(π) is the size of
its reduction tree.
lemma 2.1.23 [the size of reduction trees] The reduction tree of a
strongly normalising proof is finite. It follows that not only is every
reduction path finite, but there is a longest reduction path.
Proof: This is a corollary of König’s Lemma, which states that every

tree in which the number of immediate descendants of a node is finite
(it is finitely branching), and in which every branch is finitely long, is
itself finite. It follows that any strongly normalising proof not only It’s true that every finitely branching
has only finite reduction paths, it also has a longest reduction path. tree with finite branches is finite.
But is it obvious that it’s true?
Now to prove that every proof is strongly normalising. To do this, we

define a new property that proofs can have: of being red. It will turn
out that all red proofs are strongly normalising. It will also turn out
that all proofs are red.
definition 2.1.24 [red proofs] We define a new predicate ‘red’ apply- The term ‘red’ should bring to
ing to proofs in the following way. mind ‘reducible.’ This formulation
of strong normalisation is originally
due to William Tait [86]. I am fol-
» A proof of an atomic formula is red if and only if it is strongly
lowing the presentation of Jean-Yves
normalising. Girard [36, 37].
» A proof π of an implication formula A → B is red if and only if
whenever π 0 is a red proof of A, then the proof
· · 0
·π ·π
· ·
A→B A
B
is a red proof of type B.
We will have cause to talk often of the proof found by extending a
proof π of A → B and a proof π 0 of A to form the proof of B by adding
an →E step. We will write ‘(π π 0 )’ to denote this proof. If you like, you
can think of it as the application of the proof π to the proof π 0 .
Now, our aim will be twofold: to show that every red proof is strongly
normalising, and to show that every proof is red. We start by proving
the following crucial lemma:
lemma 2.1.25 [properties of red proofs] For any proof π, the follow-
ing three conditions hold:

c1 If π is red then π is strongly normalisable.
c2 If π is red and π reduces to π 0 in one step, then π 0 is red too.
c3 If π is a proof not ending in →I, and whenever we eliminate one
indirect pair in π we have a red proof, then π is red too.
Proof: We prove this result by induction on the formula proved by π.
We start with proofs of atomic formulas.
c1 Any red proof of an atomic formula is strongly normalising, by
the definition of ‘red’.
c2 If π is strongly normalising, then so is any proof to which π

reduces.
c3 π does not end in →I as it is a proof of an atomic formula. If

whenever π ⇒1 π 0 and π 0 is red, since π 0 is a proof of an atomic
formula, it is strongly normalising. Since any reduction path
through π must travel through one such proof π 0 , each such path
through π terminates. So, π is red.
Now we prove the results for a proof π of A → B, under the assump-
tion that c1, c2 and c3 they hold for proofs of A and proofs of B. We
can then conclude that they hold of all proofs, by induction on the
complexity of the formula proved.
c1 If π is a red proof of A → B, consider the proof
·
·π
·
σ: A→B A
B
The assumption A is a normal proof of its conclusion A not end-
ing in →I, so c3 applies and it is red. So, by the definition of red
proofs of implication formulas, σ is a red proof of B. Condition
c1 tells us that red proofs of B are strongly normalising, so any
reduction sequence for σ must terminate. It follows that any re-
duction sequence for π must terminate too, since if we had a non-
terminating reduction sequence for π, we could apply the same
reductions to the proof σ. But since σ is strongly normalising,
this cannot happen. It follows that π is strongly normalising too.
c2 Suppose that π reduces in one step to a proof π 0 . Given that π

is red, we wish to show that π 0 is red too. Since π 0 is a proof
of A → B, we want to show that for any red proof π 00 of A, the
proof (π 0 π 00 ) is red. But this proof is red since the red proof
(π π 00 ) reduces to (π 0 π 00 ) in one step (by reducing π to π 0 ), and
c2 applies to proofs of B.
c3 Suppose that π does not end in →I, and suppose that all of the
proofs reached from π in one step are red. Let σ be a red proof
of A. We wish to show that the proof (π σ) is red. By c1 for the

formula A, we know that σ is strongly normalising. So, we may

reason by induction on the length of the longest reduction path
for σ. If σ is normal (with path of length 0), then (π σ) reduces
in one step only to (π 0 σ), with π 0 one step from π. But π 0 is red
so (π 0 σ) is too.
On the other hand, suppose σ is not yet normal, but the result
holds for all σ 0 with shorter reduction paths than σ. So, suppose
τ reduces to (π σ 0 ) with σ 0 one step from σ. σ 0 is red by the
induction hypothesis c2 for A, and σ 0 has a shorter reduction
path, so the induction hypothesis for σ 0 tells us that (π σ 0 ) is
red.
There is no other possibility for reduction as π does not end in
→I, so reductions must occur wholly in π or wholly in σ, and not
in the last step of (π σ).
This completes the proof by induction. The conditions c1, c2 and c3

hold of every proof.
Now we prove one more crucial lemma.
lemma 2.1.26 [red proofs ending in →I] If for each red proof σ of A,
the proof
·
·σ
·
π(σ) : A
·
·π
·
B
is red, then so is the proof
[A]
·
·π
τ: ·
B
→I
A→B
Proof: We show that the (τ σ) is red whenever σ is red. This will
suffice to show that the proof τ is red, by the definition of the predicate
‘red’ for proofs of A → B. We will show that every proof resulting
from (τ σ) in one step is red, and we will reason by induction on the
sum of the sizes of the reduction trees of π and σ. There are three
cases:
» (τ σ) π(σ). In this case, π(σ) is red by the hypothesis of the

proof.
» (τ σ) (τ 0 σ). In this case the sum of the size of the reduction
trees of τ 0 and σ is smaller, and we may appeal to the induction
hypothesis.
» (τ σ) (τ σ 0 ). In this case the sum of the size of the reduction
trees is τ and σ 0 smaller, and we may appeal to the induction
hypothesis.

theorem 2.1.27 [all proofs are red] Every proof π is red.
lemma 2.1.28 [red proofs by induction] Given proof π with assump-

tions A1 , . . . , An , and any red proofs σ1 , . . . , σn of the respective for-
mulas A1 , . . . , An , it follows that the proof π(σ1 , . . . , σn ) in which each
assumption Ai is replaced by the proof σi is red.
Proof: We prove this by induction on the construction of the proof.

» If π is an assumption A1 , the claim is a tautology (if σ1 is red,
then σ1 is red).
» If π ends in →E, and is (π1 π2 ), then by the induction hypothesis
π1 (σ1 , . . . , σn ) and π2 (σ1 , . . . , σn ) are red. Since π1 (σ1 , . . . , σn )
has type A → B the definition of redness tells us that when
ever it is applied to a red proof the result is also red. So, the
proof (π1 (σ1 , . . . , σn ) π2 (σ1 , . . . , σn )) is red, but this is simply
π(σ1 , . . . , σn ).
» If π ends in an application of →I. This case is dealt with by
Lemma 2.1.26: if π is a proof of A → B ending in →E, then
we may assume that π 0 , the proof of B from A inside π is red, so
by Lemma 2.1.26, the result π is red too.
It follows that every proof is red.
It follows also that every proof is strongly normalising, since all red
proofs are strongly normalising.
2.1.3 | proofs and λ-terms

It is very tempting to think of proofs as processes or functions that
convert the information presented in the premises into the informa-
tion in the conclusion. This is doubly tempting when you look at the
notation for implication. In →E we apply something which converts
A to B (a function from A to B?) to something which delivers you
A (from premises) into something which delivers you B. In →I if we
can produce B (when supplied with A, at least in the presence of other
resources—the other premises) then we can (in the context of the other
resources at least) convert As into Bs at will.
Let’s make this talk a little more precise, by making explicit this
kind of function-talk. It will give us a new vocabulary to talk of proofs.
We start with simple notation to talk about functions. The idea is
straightforward. Consider numbers, and addition. If you have a num-
ber, you can add 2 to it, and the result is another number. If you like,
if x is a number then
x+2
is another number. Now, suppose we don’t want to talk about a partic-
ular number, like 5 + 2 or 7 + 2 or x + 2 for any choice of x, but we
want to talk about the operation or of adding two. There is a sense in
which just writing “x + 2” should be enough to tell someone what we

mean. It is relatively clear that we are treating the “x” as a marker for
the input of the function, and “x + 2” is the output. The function is the
output as it varies for different values of the input. Sometimes leaving
the variables there is not so useful. Consider the subtraction
x−y
You can think of this as the function that takes the input value x and
takes away y. Or you can think of it as the function that takes the input
value y and subtracts it from x. or you can think of it as the function
that takes two input values x and y, and takes the second away from
the first. Which do we mean? When we apply this function to the
input value 5, what is the result? For this reason, we have a way of
making explicit the different distinctions: it is the λ-notation, due to
Alonzo Church [17]. The function that takes the input value x and
returns x + 2 is denoted
λx.(x + 2)
The function taking the input value y and subtracts it from x is
λy.(x − y)
The function that takes two inputs and subtracts the second from the
first is
λx.λy.(x − y)
Notice how this function works. If you feed it the input 5, you get the
output λy.(5 − y). We can write application of a function to its input
by way of juxtaposition. The result is that
(λx.λy.(x − y) 5)
evaluates to the result λy.(5 − y). This is the function that subtracts
y from 5. When you feed this function the input 2 (i.e., you evaluate
(λy.(5 − y) 2)) the result is 5 − 2 — in other words, 3. So, functions can
have other functions as outputs.
Now, suppose you have a function f that takes two inputs y and z, and
we wish to consider what happens when you apply f to a pair where
the first value is the repeated as the second value. (If f is λx.λy.(x − y)
and the input value is a number, then the result should be 0.) We can
do this by applying f to the value x twice, to get ((f x) x). But this is
not a function, it is the result of applying f to x and x. If you consider
this as a function of x you get
λx.((f x) x)
This is the function that takes x and feeds it twice into f. But just as
functions can create other functions as outputs, there is no reason not
to make functions take other functions as inputs. The process here
was competely general — we knew nothing specific about f — so the
function
λy.λx.((y x) x)

takes an input y, and returns the function λx.((y x) x). This function
takes an input x, and then applies y to x and then applies the result to
x again. When you feed it a function, it returns the diagonal of that
Draw the function as atable function.
of values for each pair of in-
puts, and you will see why Now, sometimes this construction does not work. Suppose we feed
this is called the ‘diagonal.’ our diagonal function λy.λx.((y x) x) an input that is not a function, or
that is a function that does not expect two inputs? (That is, it is not
a function that returns another function.) In that case, we may not
get a sensible output. One response is to bite the bullet and say that
everything is a function, and that we can apply anything to anything
This is the untyped λ-calculus. else. We won’t take that approach here, as something becomes very
interesting if we consider what happens if we consider variables (the x
and y in the expression λy.λx.((y x) x)) to be typed. We could consider
y to only take inputs which are functions of the right kind. That is, y
is a function that expects values of some kind (let’s say, of type A), and
when given a value, returns a function. In fact, the function it returns
has to be a function that expects values of the very same kind (also
type A). The result is an object (perhaps a function) of some kind or
other (say, type B). In other words, we can say that the variable y takes
values of type A → (A → B). Then we expect the variable x to take
values of type A. We’ll write these facts as follows:
y : A → (A → B) x:A
Now, we may put these two things together, to say derive the type of
the result of applying the function y to the input value x.
y : A → (A → B) x:A
(y x) : A → B
Applying the result to x again, we get
y : A → (A → B) x:A
(y x) : A → B x:A
((y x) x) : B
Then when we abstract away the particular choice of the input value x,
we have this
y : A → (A → B) [x : A]
(y x) : A → B [x : A]
((y x) x) : B
λx.((y x) x) : A → B
and abstracting away the choice of y, we have
[y : A → (A → B)] [x : A]
(y x) : A → B [x : A]
((y x) x) : B
λx.((y x) x) : A → B
λy.λx.((y x) x) : (A → (A → B)) → (A → B)

so the diagonal function λy.λx.((y x) x) has type (A → (A → B)) →

(A → B). It takes functions of type A → (A → B) as input and returns
an output of type A → B.
Does that process look like something you have already seen?
We may use these λ-terms to represent proofs. Here are the definitions.
We will first think of formulas as types.
type ::= atom | (type → type)
Then, given the class of types, we can construct terms for each type.
definition 2.1.29 [typed simple λ-terms] The class of typed simple λ-

terms is defined as follows:
» For each type A, there is an infinite supply of variables xA , yA ,

zA , wA , xA A
1 , x2 , etc.
» If M is a term of type A → B and N is a term of type A, then

(M N) is a term of type B.
» If M is a term of type B then λxA .M is a term of type A → B.
These formation rules for types may be represented in ways familiar

to those of us who care for proofs. See Figure 2.3.
[x : A](i)
M:A→B N:A ·
·
→E ·
(M N) : B M:B
→I,i
λx.M : A → B
Figure 2.3: rules for λ-terms
Sometimes we write variables without superscripts, and leave the typ-

ing of the variable understood from the context. It is simpler to write
λy.λx.((y x) x) instead of λyA→(A→B) .λxA ((yA→(A→B) xA ) xA ).
Not everything that looks like a typed λ-term actually is. Consider
the term
λx.(x x)
There is no such simple typed λ-term. Were there such a term, then x
would have to both have type A → B and type A. But as things stand
now, a variable can have only one type. Not every λ-term is a typed
λ-term.
Now, it is clear that typed λ-terms stand in some interesting rela-
tionship to proofs. From any typed λ-term we can reconstruct a unique

proof.Take λx.λy.(y x), where y has type p → q and x has type p. We
can rewrite the unique formation pedigree of the term is a tree.
[y : p → q] [x : p]
(y x) : q
λy.(y x) : (p → q) → q
λx.λy.(y x) : p → ((p → q) → q)
and once we erase the terms, we have a proof of p → ((p → q) →

q). The term is a compact, linear representation of the proof which is
presented as a tree.
The mapping from terms to proofs is many-to-one. Each typed term
constructs a single proof, but there are many different terms for the
one proof. Consider the proofs
p→q p p → (q → r) p
q (q → r)
we can label them as follows

x:p→q y:p z : p → (q → r) y:p
(xy) : q (zy) : q → r
we could combine them into the proof

z : p → (q → r) y:p x:p→q y:p
(zy) : q → r (xy) : q
(zy)(xy) : r
but if we wished to discharge just one of the instances of p, we would

have to have chosen a different term for one of the two subproofs. We
could have chosen the variable w for the first p, and used the following
term:
z : p → (q → r) [w : p] x:p→q y:p
(zw) : q → r (xy) : q
(zw)(xy) : r
λw.(zw)(xy) : p → r
So, the choice of variables allows us a great deal of choice in the con-
struction of a term for a proof. The choice of variables both does not
matter (who cares if we replace xA by yA ) and does matter (when it
comes to discharge an assumption, the formulas discharged are exactly
those labelled by the particular free variable bound by λ at that stage).
definition 2.1.30 [from terms to proofs and back] For every typed
term M (of type A), we find proof(M) (of the formula A) as follows:
» proof(xA ) is the identity proof A.

» If proof(MA→B ) is the proof π1 of A → B and proof(NA ) is the

proof π2 of A, then extend them with one →E step into the proof
proof(MNB ) of B.
» If proof(MB ) is a proof π of B and xA is a variable of type A,
then extend the proof π by discharging each premise in π of
type A labelled with the variable xA . The result is a the proof
proof((λx.M)A→B ) of type A → B.
Conversely, for any proof π, we find the set terms(π) as follows:
» terms(A) is the set of variables of type A. (Note that the term is
an unbound variable, whose type is the only assumption in the
proof.)
» If πl is a proof of A → B, and M (of type A → B) is a member
of terms(πl ), and N (of type A) is a member of terms(πr ), then
(MN) (which is of type B) is a member of terms(π), where π is
the proof found by extending πl and πr by the →E step. (Note
that if the unbound variables in M have types corresponding to
the assumptions in πl and those in N have types corresponding
to the assumptions in πr , then the unbound variables in (MN)
have types corresponding to the variables in π.)
» Suppose π is a proof of B, and we extend π into the proof π 0
by discharging some set (possibly empty) of instances of the for-
mula A, to derive A → B using →I. Then, in M is a member
of terms(π) for which a variable x labels all and only those as-
sumptions A that are discharged in this →I step, then λx.M is
a member of terms(π 0 ). (Notice that the free variables in λx.M
correspond to the remaining active assumptions in π 0 .
theorem 2.1.31 [relating proofs and terms] If M ∈ terms(π) then
π = proof(M). Conversely, M 0 ∈ terms(proof(M)) if and only if M 0
is a relabelling of M.
Todo: write the proof out in full.
Proof: A simple induction on the construction of π in the first case,
and M in the second.
The following theorem shows that the λ-terms of different kinds of

proofs have different features.
theorem 2.1.32 [discharge conditions and λ-terms] A λ-term is a
term of a linear proof iff each λ expression binds exactly one variable.
It is a term of a relevant proof iff each λ expression binds at least one
variable. It is a term of an affine proof iff each λ expression bind at
most one variable.
The most interesting connection between proofs and λ-terms is not
simply this pair of mappings. It is the connection between normalisa-
tion and evaluation. We have seen how the application of a function,
like λx.((y x) x) to an input like M is found by removing the lambda
binder, and substituting the term M for each variable x that was bound
by the binder. In this case, we get ((y M) M).

definition 2.1.33 [β reduction] The term λx.M N is said to directly
β-reduce to the term M[x := N] found by substituting the term N for
each free occurrence of x in M.
Furthermore, M β-reduces in one step to M 0 if and only if some
subterm N inside M immediately β-reduces to N 0 and M 0 = M[N :=
N 0 ]. A term M is said to β-reduce to M∗ if there is some chain M =
M1 , · · · Mn = M∗ where each Mi β-reduces in one step to Mi+1 .
Consider what this means for proofs. The term (λx.M N) immediately
β-reduces to M[x := N]. Representing this transformation as a proof,
we have
[x : A]
· ·
· πl · πr
· ·
M:B · N:A
· πr =⇒β ·
· · πl
λx.M : A → B N:A ·
M[x := N] : B
(λx.M N) : B
and β-reduction corresponds to normalisation. This fact leads immedi-

ately to the following theorem.
theorem 2.1.34 [normalisation and β-reduction] proof(N) is nor-
mal if and only if the term N does not β-reduce to any other term.
If N β-reduces to N 0 then a normalisation process sends proof(N) to
proof(N 0 ).
This natural reading of normalisation as function application, and the
easy way that we think of (λx.M N) as being identical to M[x := N]
leads some to make the following claim:
If π and π 0 normalise to the same proof,
then π and π 0 are really the same proof.
We will discuss proposals for the identity of proofs in a later section.
2.1.4 | history
Gentzen’s technique for natural deduction is not the only way to rep-
resent this kind of reasoning, with introduction and elimination rules
for connectives. Independently of Gentzen, the Polish logician, Stan-
isław Jaśkowski constructed a closely related, but different system for
presenting proofs in a natural deduction style. In Jaśkowski’s system, a
proof is a structured list of formulas. Each formula in the list is either
a supposition, or it follows from earlier formulas in the list by means
of the rule of modus ponens (conditional elimination), or it is proved
by conditionalisation. To prove something by conditionalisation you
first make a supposition of the antecedent: at this point you start a box.
The contents of a box constitute a proof, so if you want to use a for-
mula from outside the box, you may repeat a formula into the inside.
A conditionalisation step allows you to exit the box, discharging the
supposition you made upon entry. Boxes can be nested, as follows:

1. A → (A → B) Supposition
2. A Supposition
3. A → (A → B) 1, Repeat
4. A→B 2, 3, Modus Ponens
5. B 2, 4, Modus Ponens
6. A→B 2–5, Conditionalisation
7. (A → (A → B)) → (A → B) 1–6, Conditionalisation
This nesting of boxes, and repeating or reiteration of formulas to enter
boxes, is the distinctive feature of Jaśkowski’s system. Notice that we
could prove the formula (A → (A → B)) → (A → B) without using
a duplicate discharge. The formula A is used twice as a minor premise
in a Modus Ponens inference (on line 4, and on line 5), and it is then
discharged at line 6. In a Gentzen proof of the same formula, the as-
sumption A would have to be made twice.
Jaśkowski proofs also straightforwardly incorporate the effects of a
vacuous discharge in a Gentzen proof. We can prove A → (B → A)
using the rules as they stand, without making any special plea for a
vacuous discharge:
1. A Supposition
2. B Supposition
3. A 1, Repeat
4. B→A 2–3, Conditionalisation
5. A → (B → A) 1–4, Conditionalisation
The formula B is supposed, and it is not used in the proof that fol-
lows. The formula A on line 4 occurs after the formula B on line 3,
in the subproof, but it is harder to see that it is inferred from thatB.
Conditionalisation, in Jaśkowski’s system, colludes with reiteration to
allow the effect of vacuous discharge. It appears that the “fine control”
over inferential connections between formulas in proofs in a Gentzen
proof is somewhat obscured in the linearisation of a Jaśkowski proof.
The fact that one formula occurs after another says nothing about how
that formula is inferentially connected to its forbear.
Jaśkowski’s account of proof was modified in presentation by Fre-
deric Fitch (boxes become assumption lines to the left, and hence be-
come somewhat simpler to draw and to typeset). Fitch’s natural deduc-
tion ststem gained quite some popularity in undergraduate education
in logic in the 1960s and following decades in the United States [31].
Edward Lemmon’s text Beginning Logic [49] served a similar purpose
in British logic education. Lemmon’s account of natural deduction is
similar to this, except that it does without the need to reiterate by
breaking the box.
1 (1) A → (A → B) Assumption
2 (2) A Assumption
1,2 (3) A→B 1, 2, Modus Ponens
1,2 (4) B 2,3, Modus Ponens
1 (5) A→B 2, 4, Conditionalisation
(6) B 1, 5, Conditionalisation

Now, line numbers are joined by assumption numbers: each formula
is tagged with the line number of each assumption upon which that
formula depends. The rules for the conditional are straightforward: If
A → B depends on the assumptions X and A depends on the assump-
tions Y , then you can derive B, depending on the assumptions X, Y .
(You should ask yourself if X, Y is the set union of the sets X and Y , or
the multiset union of the multisets X and Y . For Lemmon, the assump-
tion collections are sets.) For conditionalisation, if B depends on X, A,
then you can derive A → B on the basis of X alone. As you can see,
vacuous discharge is harder to motivate, as the rules stand now. If we
attempt to use the strategy of the Jaśkowski proof, we are soon stuck:
1 (1) A Assumption
2 (2) B Assumption
.. .
. (3) ..
There is no way to attach the assumption number “2” on to the for-
mula A. The linear presentation is now explicitly detached from the
inferential connections between formulas by way of the assumption
numbers. Now the assumption numbers tell you all you need to know
about the provenance of formulas. In Lemmon’s own system, you can
prove the formula A → (B → A) but only, as it happens, by taking a
detour through conjunction or some other connective.
1 (1) A Assumption
2 (2) B Assumption
1,2 (3) A∧B 1,2, Conjunction intro
1,2 (4) A 3, Conjunction elim
1 (5) B→A 2,4, Conditionalisation
(6) A → (B → A) 1,5, Conditionalisation
This seems quite unsatisfactory, as it breaks the normalisation property.
(The formula A → (B → A) is proved only by a non-normal proof—in
this case, a proof in which a conjunction is introduced and then immedi-
ately eliminated.) Normalisation can be restored to Lemmon’s system,
but at the cost of the introduction of a new rule, the rule of weakening,
which says that if A depends on assumptions X, then we can infer A
For more information on the his- depending on assumptions X together with another formula.
tory of natural deduction, con- Notice that the lines in a Lemmon proof don’t just contain formu-
sult Jeffrey Pelletier’s article [62].
las (or formulas tagged a line number and information about how the
formula was deduced). They are pairs, consisting of a formula, and the
formulas upon which the formula depends. In a Gentzen proof this
information is implicit in the structure of the proof. (The formulas
upon which a formula depends in a Gentzen proof are the leaves in the
tree above that formula that are undischarged at the moment that this
formula is derived.) This feature of Lemmon’s system was not original
to him. The idea of making completely explicit the assumptions upon
which a formula depends had also occurred to Gentzen, and this insight
is our topic for the next section.
» «

Linear, relevant and affine implication have a long history. Relevant

implication bust on the scene through the work of Alan Anderson and
Nuel Belnap in the 1960s and 1970s [1, 2], though it had precursors
in the work of the Russian logician, I. E. Orlov in the 1920s [23, 57].
The idea of a proof in which conditionals could only be introduced
if the assumption for discharge was genuinely used is indeed one of
the motivations for relevant implication in the Anderson–Belnap tra-
dition. However, other motivating concerns played a role in the devel-
opment of relevant logics. For other work on relevant logic, the work
of Dunn [26, 27], Routley and Meyer [81], Read [71] and Mares [51]
are all useful. Linear logic arose much more centrally out of proof-
theoretical concerns in the work of the proof-theorist Jean-Yves Girard
in the 1980s [35, 37]. A helpful introduction to linear logic is the text
of Troelstra [90]. Affine logic is introduced in the tradition of linear lo-
gic as a variant on linear implication. Affine implication is quite close,
however to the implication in Łukasiewicz’s infinitely valued logic—
which is slightly stronger, but shares the property of rejecting all con-
traction-related principles [75]. These logics are all substructural lo-
gics [24, 59, 76]
The definition of normality is due to Prawitz [63], though glimpses of
the idea are present in Gentzen’s original work [33].
The λ-calculus is due to Alonzo Church [17], and the study of λ-calculi
has found many different applications in logic, computer science, type
theory and related fields [3, 39, 83]. The correspondence between for-
mulas/proofs and types/terms is known as the Curry–Howard corres-
pondence [43]. Todo: find the Curry reference.
2.1.5 | exercises
Working through these exercises will help you understand the material. I am not altogether confident about
As with all logic exercises, if you want to deepend your understanding the division of the exercises into “ba-
sic,” “intermediate,” and “advanced.”
of these techniques, you should attempt the exercises until they are I’d appreciate your feedback on
no longer difficult. So, attempt each of the different kinds of basic whether some exercises are too easy
exercises, until you know you can do them. Then move on to the or too difficult for their categories.
intermediate exercises, and so on. (The project exercises are not the
kind of thing that can be completed in one sitting.)
basic exercises
q1 Which of the following formulas have proofs with no premises?
1 : p → (p → p)
2 : p → (q → q)
3 : ((p → p) → p) → p
4 : ((p → q) → p) → p
5 : ((q → q) → p) → p
6 : ((p → q) → q) → p
7 : p → (q → (q → p))
8 : (p → q) → (p → (p → q))
9 : ((q → p) → p) → ((p → q) → q)

10 : (p → q) → ((q → p) → (p → p))
11 : (p → q) → ((q → p) → (p → q))
12 : (p → q) → ((p → (q → r)) → (p → r))
13 : (q → p) → ((p → q) → ((q → p) → (p → q)))
14 : ((p → p) → p) → ((p → p) → ((p → p) → p))
15 : (p1 → p2 ) → ((q → (p2 → r)) → (q → (p1 → r)))
For each formula that can be proved, find a proof that complies with
the strictest discharge policy possible.
q2 Annotate your proofs from Exercise 1 with λ-terms. Find a most gen-
eral λ-term for each provable formula.
q3 Construct a proof from q → r to (q → (p → p)) → (q → r) using
vacuous discharge. Then construct a proof of B → (A → A) (also using
vacuous discharge). Combine the two proofs, using →E to deduce B →
C. Normalise the proof you find. Then annotate each proof with λ-
terms, and explain the β reductions of the terms corresponding to the
normalisation.
Then construct a proof from (p → r) → ((p → r) → q)) to (p → r) →
q using duplicate discharge. Then construct a proof from p → (q → r)
and p → q to p → r (also using duplicate discharge). Combine the two
proofs, using →E to deduce q. Normalise the proof you find. Then
annotate each proof with λ-terms, and explain the β reductions of the
terms corresponding to the normalisation.
q4 Find types and proofs for each of the following terms.
1 : λx.λy.x
2 : λx.λy.λz.((xz)(yz))
3 : λx.λy.λz.(x(yz))
4 : λx.λy.(yx)
5 : λx.λy.((yx)x)
Which of the proofs are linear, which are relevant and which are affine?
q5 Show that there is no normal relevant proof of these formulas.
1 : p → (q → p)
2 : (p → q) → (p → (r → q))
3 : p → (p → p)
q6 Show that there is no normal affine proof of these formulas.
1 : (p → q) → ((p → (q → r)) → (p → r))
2 : (p → (p → q)) → (p → q)
q7 Show that there is no normal proof of these formulas.
1 : ((p → q) → p) → p
2 : ((p → q) → q) → ((q → p) → p)
q8 Find a formula that can has both a relevant proof and an affine proof,
but no linear proof.

intermediate exercises
q9 Consider the following “truth tables.”
→ t n f → t n f → t n f
t t n f t t n n t t f f
n t t f n t t f n t n f
f t t t f t t t f t t t
gd3 ł3 rm3
A gd3 tautology is a formula that receives the value t in every gd3
valuation. An ł3 tautology is a formula that receives the value t in
every ł3 valuation. Show that every formula with a standard proof is
a gd3 tautology. Show that every formula with an affine proof is an ł3
tautology.
q10 Consider proofs that have paired steps of the form →E/→I. That is, a
conditional is eliminated only to be introduced again. The proof has a
sub-proof of the form of this proof fragment:
A→B [A](i)
→E
B
→I,i
A→B
These proofs contain redundancies too, but they may well be normal.
Call a proof with a pair like this circuitous. Show that all circuitous
proofs may be transformed into non-circuitous proofs with the same
premises and conclusion.
q11 In Exercise 5 you showed that there is no normal relevant proof of
p → (p → p). By normalisation, it follows that there is no relevant
proof (normal or not) of p → (p → p). Use this fact to explain why
it is more natural to consider relevant arguments with multisets of
premises and not just sets of premises. (hint: is the argument from
p, p to p relevantly valid?)
q12 You might think that “if . . . then . . . ” is a slender foundation upon
which to build an account of logical consequence. Remarkably, there
is rather a lot that you can do with implication alone, as these next
questions ask you to explore.
ˆ as follows: A∨B
First, define A∨B ˆ ::= (A → B) → B. In what way is
ˆ
“∨” like disjunction? What usual features of disjunction are not had
by ∨ˆ ? (Pay attention to the behaviour of ∨ ˆ with respect to different
discharge policies for implication.)
ˆ that do not involve
q13 Provide introduction and elimination rules for ∨
the conditional connective →.
q14 Now consider negation. Given an atom p, define the p-negation ¬p A
to be A → p. In what way is “¬p ” like negation? What usual features
of negation are not had by ¬p defined in this way? (Pay attention
to the behaviour of ¬ with respect to different discharge policies for
implication.)

q15 Provide introduction and elimination rules for ¬p that do not involve
the conditional connective →.
q16 You have probably noticed that the inference from ¬p ¬p A to A is not,
in general, valid. Define a new language cformula inside formula as
follows:
cformula ::= ¬p ¬p atom | (cformula → cformula)
Show that ¬p ¬p A ∴ A and A ∴ ¬p ¬p A are valid when A is a cfor-
mula.
q17 Now define A∧B˙ to be ¬p (A → ¬p B), and A∨B ˙ to be ¬p A → B. In
what way are A∧B˙ and ∨ ˙ like conjunction and disjunction of A and
B respectively? (Consider the difference between when A and B are
formulas and when they are cformulas.)
q18 Show that if there is a normal relevant proof of A → B then there is
an atom occurring in both A and B.
q19 Show that if we have two conditional connectives →1 and →2 defined
using different discharge policies, then the conditionals collapse, in the
sense that we can construct proofs from A →1 B to A →2 B and vice
versa.
q20 Explain the significance of the result of Exercise 19.
q21 Add rules the obvious introduction rules for a conjunction connective
⊗ as follows:
A B
⊗I
A⊗B
Show that if we have the following two ⊗E rules:
A⊗B A⊗B
⊗I1 ⊗I2
A B
we may simulate the behaviour of vacuous discharge. Show, then, that
we may normalise proofs involving these rules (by showing how to
eliminate all indirect pairs, including ⊗I/⊗E pairs).
advanced exercises
q22 Another demonstration of the subformula property for normal proofs
uses the notion of a track in a proof.
definition 2.1.35 [track] A sequence A0 , . . . , An of formula instances
in the proof π is a track of length n + 1 in the proof π if and only if
• A0 is a leaf in the proof tree.
• Each Ai+1 is immediately below Ai .
• For each i < n, Ai is not a minor premise of an application of →E.
A track whose terminus An is the conclusion of the proof π is said to
be a track of order 0. If we have a track t whose terminus An is the
minor premise of an application of →E whose conclusion is in a track
of order n, we say that t is a track of order n + 1.

The following annotated proof gives an example of tracks.

♠ A → ((D → D) → B) ♦[A](2) ♣[D](1)
→E →I,1
♠ (D → D) → B ♣D→D
→E
♥[B → C](2) ♠B
→E
♥C
→I,2
♥A→C
→I,3
♥ (B → C) → (A → C)
(Don’t let the fact that this proof has one track of each order 0, 1, 2 and
3 make you think that proofs can’t have more than one track of the
same order. Look at this example —
A → (B → C) A
B→C B
C
— it has two tracks of order 1.) The formulas labelled with ♥ form one
track, starting with B → C and ending at the conclusion of the proof.
Since this track ends at the conclusion of the proof, it is a track of order
0. The track consisting of ♠ formulas starts at A → ((D → D) → B)
and ends at B. It is a track of order 1, since its final formula is the
minor premise in the →E whose conclusion is C, in the ♥ track of
order 0. Similarly, the ♦ track is order 2 and the ♣ track has order 3.
For this exercise, prove the following lemma by induction on the
construction of a proof.
lemma 2.1.36 In every proof, every formula is in one and only one
track, and each track has one and only one order.
Then prove this lemma.
lemma 2.1.37 Let t : A0 , . . . , An be a track in a normal proof. Then
a) The rules applied within the track consist of a sequence (possibly
empty) of [→E] steps and then a sequence (possibly empty) of
[→I] steps.
b) Every formula Ai in t is a subformula of A0 or of An .
Now prove the subformula theorem, using these lemmas.
q23 Consider the result of Exercise 19. Show how you might define a nat-
ural deduction system containing (say) both a linear and a standard
conditional, in which there is no collapse. That is, construct a system
of natural deduction proofs in which there are two conditional con-
nectives: →l for linear conditionals, and →s for standard conditionals,
such that whenever an argument is valid for a linear conditional, it is
(in some appropraite sense) valid in the system you design (when →
is translated as →l ) and whenever an argument is valid for a standard
conditional, it is (in some appropriate sense) valid in the system you
design (when → is translated as →s ). What mixed inferences (those
using both →l and →s ) are valid in your system?

q24 Suppose we have a new discharge policy that is “stricter than linear.”
The ordered discharge policy allows you to discharge only the right-
most assumption at any one time. It is best paired with a strict version
of →E according to which the major premise (A → B) is on the left,
and the minor premise (A) is on the right. What is the resulting logic
like? Does it have the normalisation property?
q25 Take the logic of Exercise 24, and extend it with another connective ←,
with the rule ←E in which the major premise (B ← A) is on the right,
and the minor premise (A) is on the left, and ←I, in which the leftmost
assumption is discharged. Examine the connections between → and
←. Does normalisation work for these proofs? [This is Lambek’s logic.
Add references.]
q26 Show that there is a way to be even stricter than the discharge policy
of Exercise 24. What is the strictest discharge policy for →I, that will
result in a system which normalises, provided that →E (in which the
major premise is leftmost) is the only other rule for implication.
q27 Consider the introduction rule for ⊗ given in Exercise 21. Construct
an appropriate elimination rule for fusion which does not allow the
simulation of vacuous (or duplicate) discharge, and for which proofs
normalise.
q28 Identify two proofs where one can be reduced to the other by way of
the elimination of circuitous steps (see Exercise 10). Characterise the
identities this provides among λ-terms. Can this kind if identification
be maintained along with β-reduction?
project
q29 Thoroughly and systematically explain and evaluate the considerations
for choosing one discharge policy over another. This will involve look-
ing at the different uses to which one might put a system of natural
deduction, and then, relative to a use, what one might say in favour of
a different policy.

2.2 | sequents and derivations

In this section we will look at a different way of thinking about infer-
ence: Gentzen’s sequent calculus. The core idea is straightforward. We
want to know what follows from what, so we will keep a track of facts
of consequence: facts we will record in the following form:
A`B
One can read “A ` B” in a number of ways. You can say that B follows
from A, or that A entails B, or that the argument from A to B is valid.
The symbol used here is sometimes called the turnstile. “Scorning a turnstile wheel at her
Once we have the notion of consequence, we can ask ourselves reverend helm, she sported there
a tiller; and that tiller was in one
what properties consequence has. There are many different ways you mass, curiously carved from the long
could answer this question. The focus of this section will be a partic- narrow lower jaw of her hereditary
ular technique, originally due to Gerhard Gentzen. We can think of foe. The helmsman who steered by
that tiller in a tempest, felt like the
consequence—relative to a particular language—like this: when we Tartar, when he holds back his fiery
want to know about the relation of consequence, we first consider each steed by clutching its jaw. A noble
different kind of formula in the language. To make the discussion con- craft, but somehow a most melan-
choly! All noble things are touched
crete, let’s consider a very simple language: the language of proposi- with that.” — Herman Melville, Moby
tional logic with only two connectives, conjunction ∧ and disjunction Dick.
∨. That is, we will now look at formulas expressed in the following
grammar:
formula ::= atom | (formula ∧ formula) | (formula ∨ formula)
To characterise consequence relations, we need to figure out how

consequence works on the atoms of the language, and then how the
addition of ∧ and ∨ expands the repertoire of facts about consequence.
To do this, we need to know when we can say A ` B when A is a
conjunction, or when A is a disjunction, and when B is a conjunction,
or when B is a disjunction. In other words, for each connective, we need
to know when it is appropriate to infer from a formula featuring that
connective, and when it is appropriate to infer to a formula featuring
that connective. Another way of putting it is that we wish to know
how a connective works on the left of the turnstile, and how it works
on the right.
The answers for our language seem straightforward. For atomic
formulas, p and q, we have p ` q only if p and q are the same atom:
so we have p ` p for each atom p. For conjunction, we can say that This is a formal account of con-
if A ` B and A ` C, then A ` B ∧ C. That’s how we can infer to a sequence. We look only at the form
of propositions and not their con-
conjunction. Inferring from a conjunction is also straightforward. We tent. For atomic propositions (those
can say that A ∧ B ` C when A ` C, or when B ` C. For disjunction, with no internal form) there is noth-
we can reason similarly. We can say A ∨ B ` C when A ` C and B ` C. ing upon which we could pin a
claim to consequence. Thus, p ` q
We can say A ` B ∨ C when A ` B, or when A ` C. This is inclusive is never true, unless p and q are the
disjunction, not exclusive disjunction. same atom.
You can think of these definitions as adding new material (in this
case, conjunction and disjunction) to a pre-existing language. Think of
the inferential repertoire of the basic language as settled (in our dis-
cussion this is very basic, just the atoms), and the connective rules
§2.2 · sequents and derivations 53

are “definitional” extensions of the basic language. These thoughts
are the raw materials for the development of an account of logical con-
sequence.
2.2.1 | derivations for “and” and “or”

Like natural deduction proofs, derivations involving sequents are trees.
The structure is as before:
• •
• • •
• •
•
Where each position on the tree follows from those above it. In a tree,
the order of the branches does not matter. These are two different ways
to present the same tree:
A B B A
C C
In this case, the tree structure is at the one and the same time sim-
pler and more complicated than the tree structure of natural deduction
proofs. They are simpler, in that there is no discharge. They are more
complicated, in that trees are not trees of formulas. They are trees
consisting of sequents. As a result, we will call these structures deriv-
ations instead of proofs. The distinction is simple. For us, a proof is a
structure in which the formulas are connected by inferential relations
I say “tree-like” since we will see in a tree-like structure. A proof will go from some formulas to other
different structures in later sections. formulas, via yet other formulas. Our structures involving sequents
are quite different. The last sequent in a tree (the endsequent) is itself
a statement of consequence, with its own antecedent and consequent
(or premise and conclusion, if you prefer.) The tree derivation shows
you why (or perhaps how) you can infer from the antecedent to the
consequent. The rules for constructing sequent derivations are found
in Figure 2.4.
definition 2.2.1 [simple sequent derivation] If the leaves of a tree

are instances of the [Id] rule, and if its transitions from node to node
are instances of the other rules in Figure 2.4, then the tree is said to be
a simple sequent derivation.
We must read these rules completely literally. Do not presume any

properties of conjunction or disjunction other than those that can be
demonstrated on the basis of the rules. We will take these rules as
constituting the behaviour of the connectives ∧ and ∨.
example 2.2.2 [example sequent derivations] In this section, we will

look at a few sequent derivations, demonstrating some simple proper-
ties of conjunction, disjunction, and the consequence relation.

p ` p [Id]
L`C C`R
Cut
L`R
A`R A`R LÀ L`B

∧L1 ∧L2 ∧R
A∧B`R B∧A`R LÀ∧B
A`R B`R LÀ LÀ

∨L ∨R1 ∨R2
A∨B`R LÀ∨B L`B∨A
Figure 2.4: a simple sequent system
The first derivations show some commutative and associative proper-

ties of conjunction and disjunction. Here is the conjunction case, with
derivations to the effect that p ∧ q ` q ∧ p, and that p ∧ (q ∧ r) `
(p ∧ q) ∧ r.
q`q
∧L1
q`q p`p p`p q∧r`q r`r
∧L2 ∧L1 ∧L1 ∧L2 ∧L2
p∧q`q p∧q`p p ∧ (q ∧ r) ` p p ∧ (q ∧ r) ` q q∧r`r
∧R ∧R ∧L2
p∧q`q∧p p ∧ (q ∧ r) ` p ∧ q p ∧ (q ∧ r) ` r
∧R
p ∧ (q ∧ r) ` (p ∧ q) ∧ r
Here are the cases for disjunction. The first derivation is for the com-
mutativity of disjunction, and the second is for associativity. (It is im-
portant to notice that these are not derivations of the commutativity or
associativity of conjunction or disjunction in general. They only show
the commutativity and associativity of conjunction and disjunction of
atomic formulas. These are not derivations of A ∧ B ` B ∧ A (for ex-
ample) since A ` A is not an axiom if A is a complex formula. We will
see more on this in the next section.)
q`q
∨R1
p`p q`q p`p q`q∨r r`r
∨R1 ∨R2 ∨R1 ∨R2 ∨R2
p`q∨p q`p∨q p ` p ∨ (q ∨ r) q ` p ∨ (q ∨ r) r`q∨r
∨L ∨L ∨R2
p∨q`q∨p p ∨ q ` p ∨ (q ∨ r) r ` p ∨ (q ∨ r)
∨L
(p ∨ q) ∨ r ` p ∨ (q ∨ r)
You can see that the disjunction derivations have the same structure
as those for conjunction. You can convert any derivation into another Exercise 14 on page 67 asks you to
(its dual) by swapping conjunction and disjunction, and swapping the make this duality precise.
left-hand side of the sequent with the right-hand side. Here are some

more examples of duality between derivations. The first is the dual of
the second, and the third is the dual of the fourth.
p`p p`p
p`p p`p p`p p`p ∧L1 ∨R
∨L ∧R p`p p∧q`p p`p p`p∨q
p∨p`p p`p∧p ∨L ∧R
p ∨ (p ∧ q) ` p p ` p ∧ (p ∨ q)
You can use derivations you have at hand, like these, as components of
other derivations. One way to do this is to use the Cut rule.
p`p p`p
∧L1 ∨R
p`p p∧q`p p`p p`p∨q
∨L ∧R
p ∨ (p ∧ q) ` p p ` p ∧ (p ∨ q)
Cut
p ∨ (p ∧ q) ` p ∧ (p ∨ q)
Notice, too, that in each of these derivations we’ve seen so far. move
from less complex formulas at the top to more complex formulas, at the
bottom. Reading from bottom to top, you can see the formulas decom-
posing into their constituent parts. This isn’t the case for all sequent
derivations. Derivations that use the Cut rule can include new (more
complex) material in the process of deduction. Here is an example:
p`p q`q q`q p`p
∨R1 ∨R2 ∨R1 ∨R2
p`q∨p q`q∨p q`p∨q p`p∨q
∨L ∨L
p∨q`q∨p q∨p`p∨q
Cut
p∨q`p∨q
We call the concluding sequent This derivation is a complicated way to deduce p ∨ q ` p ∨ q, and it
of a derivation the “endsequent.” includes q ∨ p, which is not a subformula of any formula in the final
sequent of the derivation. Reading from bottom to top, the Cut step
can introduce new formulas into the derivation.
2.2.2 | identity derivations

This derivation of p ∨ q ` p ∨ q is a derivation of an identity (a sequent
of the form A ` A). There is a more systematic way to show that
p ∨ q ` p ∨ q, and any identity sequent. Here is a derivation of the
sequent without Cut, and its dual, for conjunction.
p`p q`q p`p q`q
∨R1 ∨R2 ∧L1 ∧L2
p`p∨q q`p∨q p∧q`p p∧q`q
∨L ∧R
p∨q`p∨q p∧q`p∧q
We can piece together these little derivations in order to derive any
sequent of the form A ` A. For example, here is the start of derivation
of p ∧ (q ∨ (r1 ∧ r2 )) ` p ∧ (q ∨ (r1 ∧ r2 )).
p`p q ∨ (r1 ∧ r2 ) ` q ∨ (r1 ∧ r2 )
∧L1 ∧L2
p ∧ (q ∨ (r1 ∧ r2 )) ` p p ∧ (q ∨ (r1 ∧ r2 )) ` q ∨ (r1 ∧ r2 )
∧R
p ∧ (q ∨ (r1 ∧ r2 )) ` p ∧ (q ∨ (r1 ∧ r2 ))

It’s not a complete derivation yet, as one leaf q∨(r1 ∧r2 ) ` q∨(r1 ∧r2 )
is not an axiom. However, we can add the derivation for it.
r1 ` r1 r2 ` r2
∧L1 ∧L2
r1 ∧ r2 ` r1 r1 ∧ r2 ` r2
∧R
q`q r1 ∧ r2 ` r1 ∧ r2
∨R1 ∨R2
q ` q ∨ (r1 ∧ r2 ) r1 ∧ r2 ` q ∨ (r1 ∧ r2 )
∨L
p`p q ∨ (r1 ∧ r2 ) ` q ∨ (r1 ∧ r2 )
p ∧ (q ∨ (r1 ∧ r2 )) ` p p ∧ (q ∨ (r1 ∧ r2 )) ` q ∨ (r1 ∧ r2 )
p ∧ (q ∨ (r1 ∧ r2 )) ` p ∧ (q ∨ (r1 ∧ r2 ))
The derivation of q ∨ (r1 ∧ r2 ) ` q ∨ (r1 ∧ r2 ) itself contains a smaller

identity derivation, for r1 ∧ r2 ` r1 ∧ r2 . The derivation displayed here
uses shading to indicate the way the derivations are nested together.
This result is general, and it is worth a theorem of its own.
theorem 2.2.3 [identity derivations] For each formula A, A ` A has
a derivation. A derivation for A ` A may be systematically construc-
ted from the identity derivations for the subformulas of A.
Proof: We define Id(A), the identity derivation for A by induction
on the construction of A, as follows. Id(p) is the axiom p ` p. For
complex formulas, we have
Id(A) Id(B) Id(A) Id(B)
∨R1 ∨R2 ∧L1 ∧L2
Id(A ∨ B) : A ` A ∨ B BÀ∨B Id(A ∧ B) : A ∧ B ` A A∧B`B
∨L ∧R
A∨BÀ∨B A∧BÀ∧B
We say that A ` A is derivable in the sequent system. If we think of
[Id] as a degenerate rule (a rule with no premise), then its generalisa-
tion, [IdA ], is a derivable rule.
It might seem crazy to have a proof of identity, like A ` A where
A is a complex formula. Why don’t we take [IdA ] as an axiom? There
are a few different reasons we might like to consider for taking [IdA ] as
derivable instead of one of the primitive axioms of the system.
the system is simple: In an axiomatic theory, it is always preferable These are part of a general story, to
to minimise the number of primitive assumptions. Here, it’s clear that be explored throughout this book, of
what it is to be a logical constant.
[IdA ] is derivable, so there is no need for it to be an axiom. A system These sorts of considerations have a
with fewer axioms is preferable to one with more, for the reason that long history [38].
we have reduced derivations to a smaller set of primitive notions.
the system is systematic: In the system without [IdA ] as an axiom,

when we consider a sequent like L ` R in order to know whether it
it is derived (in the absence of Cut, at least), we can ask two separate
questions. We can consider L. If it is complex perhaps L ` R is derivable
by means of a left rule like [∧L] or [∨L]. On the other hand, if R is

complex, then perhaps the sequent is derivable by means of a right
rule, like [∧R] or [∨R]. If both are primitive, then L ` R is derivable by
identity only. And that is it! You check the left, check the right, and
there’s no other possibility. There is no other condition under which
the sequent is derivable. In the presence of [IdA ], one would have to
check if L = R as well as the other conditions.
the system provides a constraint: In the absence of a general iden-

tity axiom, the burden on deriving identity is passed over to the con-
nective rules. Allowing derivations of identity statements is a hurdle
over which a connective rule might be able to jump, or over which it
might fail. As we shall see later, this is provides a constraint we can use
to sort out “good” definitions from “bad” ones. Given that the left and
right rules for conjunction and disjunction tell you how the connect-
ives are to be introduced, it would seem that the rules are defective (or
at the very least, incomplete) if they don’t allow the derivation of each
instance of [Id]. We will make much more of this when we consider
other connectives. However, before we make more of the philosophical
motivations and implications of this constraint, we will add another
possible constraint on connective rules, this time to do with the other
rule in our system, Cut.
2.2.3 | cut is redundant

Some of the nice properties of a sequent system are as a matter of
fact, the nice features of derivations that are constructed without the
Cut rule. Derivations constructed without cut satisfy the subformula
property.
theorem 2.2.4 [subformula property] If δ is a sequent derivation not

containing Cut, then the formulas in δ are all subformulas of the for-
mulas in the endsequent of δ.
Notice how much simpler this proof Proof: You can see this merely by looking at the rules. Each rule except
is than the proof of Theorem 2.1.11. for Cut has the subformula property.
A derivation is said to be cut-free if it does not contain an instance of

the Cut rule. Doing without Cut is good for some things, and bad for
others. In the system of proof we’re studying in this section, sequents
have very many more proofs with Cut than without it.
example 2.2.5 [derivations with or without cut] p ` p∨q has only

one cut-free derivation, it has infinitely many derivation using Cut.
You can see that there is only one cut-free derivation with p ` p ∨ q as
the endsequent. The only possible last inference in such a derivation
is [∨R], and the only possible premise for that inference is p ` p. This
completes that proof.

On the other hand, there are very many different last inferences in a
derivation featuring Cut. The most trivial example is the derivation:
p`p
∨R1
p`p p`p∨q
Cut
p`p∨q
which contains the cut-free derivation of p ` p ∨ q inside it. We can

nest the cuts with the identity sequent p ` p as deeply as we like.
p`p
p`p ∨R1
∨R1 p`p p`p∨q
p`p p`p∨q Cut
Cut p`p p`p∨q ···
p`p p`p∨q Cut
Cut p`p p`p∨q
p`p∨q Cut
p`p∨q
However, we can construct quite different derivations of our sequent,
and we involve different material in the derivation. For any formula
A you wish to choose, we could implicate A (an “innocent bystander”)
in the derivation as follows:
q`q
∧L1
p`p q∧A`q
∨R1 ∨R2
p`p p`p∨q q∧A`p∨q
∨R1 ∨L
p ` p ∨ (q ∧ A) p ∨ (q ∧ A) ` p ∨ q
Cut
p`p∨q
In this derivation the cut formula p ∨ (q ∧ A) is doing genuine work. Well, it’s doing work, in that p ∨
It is just repeating either the left formula p or the right formula q. (q ∧ A) is, for many choices for A,
genuinely intermediate between p
and p ∨ q. However, A is doing the
So, using Cut makes the search for derivations rather difficult. There kind of work that could be done
are very many more possible derivations of a sequent, and many more by any formula. Choosing different
actual derivations. The search space is much more constrained if we are values for A makes no difference
to the shape of the derivation. A is
looking for cut-free derivations instead. Constructing derivations, on
doing the kind of work that doesn’t
the other hand, is easier if we are permitted to use Cut. We have very require special qualifications.
many more options for constructing a derivation, since we are able to
pass through formulas “intermediate” between the desired antecedent
and consequent.
Do we need to use Cut? Is there anything derivable with Cut that
cannot be derived without it? Take a derivation involving Cut, such as
this one:
q`q
∧L1
p`p q∧r`q q`q
∧L1 ∧L2 ∧L1
p ∧ (q ∧ r) ` p p ∧ (q ∧ r) ` q p∧q`q
∧R ∨R1
p ∧ (q ∧ r) ` p ∧ q p∧q`q∨r
Cut
p ∧ (q ∧ r) ` q ∨ r

This sequent p ∧ (q ∧ r) ` q ∨ r did not have to be derived using
Cut. We can eliminate the Cut-step from the derivation in a systematic
The systematic technique I am using way by showing that whenever we use a cut in a derivation we could
will be revealed in detail very soon. have either done without it, or used it earlier. For example in the last
inference here, we did not need to leave the cut until the last step. We
could have cut on the sequent p ∧ q ` q and left the inference to q ∨ r
until later:
q`q
∧L1
p`p q∧r`q
∧L1 ∧L2
p ∧ (q ∧ r) ` p p ∧ (q ∧ r) ` q q`q
∧R ∧L1
p ∧ (q ∧ r) ` p ∧ q p∧q`q
Cut
p ∧ (q ∧ r) ` q
∨R1
p ∧ (q ∧ r) ` q ∨ r
Now the cut takes place on the conjunction p ∧ q, which is introduced

The similarity with non-normal immediately before the application of the Cut. Notice that in this case
proofs as discussed in the previ- we use the cut to get us to p ∧ (q ∧ r) ` r, which is one of the sequents
ous section is not an accident.
already seen in the derivation! This derivation repeats itself. (Do not
be deceived, however. It is not a general phenomenon among proofs
involving Cut that they repeat themselves. The original proof did not
repeat any sequents except for the axiom q ` q.)
No, the interesting feature of this new proof is that before the Cut,
the cutformula is introduced on the right in the derivation of left se-
quent p∧(q∧r) ` p∧q, and it is introduced on the left in the derivation
of the right sequent p ∧ q ` q.
Notice that in general, if we have a cut applied to a conjunction which
is introduced on both sides of the step, we have a shorter route to L ` R.
We can sidestep the move through A ∧ B to cut on the formula A, since
we have L ` A and A ` R.
LÀ L`B A`R
∧R ∧L1
LÀ∧B A∧B`R
Cut
L`R
In our example we do the same: We cut with p ∧ (q ∧ r) ` q on the

left and q ` q on the right, to get the first proof below in which the
cut moves further up the derivation. Clearly, however, this cut it is
redundant, as cutting on an identity sequent does nothing. We could
elminate that step, without cost.
q`q
∧L1 q`q
q∧r`q ∧L1
∧L2 q∧r`q
p ∧ (q ∧ r) ` q q`q ∧L2
Cut p ∧ (q ∧ r) ` q
p ∧ (q ∧ r) ` q ∨R1
∨R1 p ∧ (q ∧ r) ` q ∨ r
p ∧ (q ∧ r) ` q ∨ r

We have a cut-free derivation of our concluding sequent.

As I hinted before, this technique is a general one. We may use
exactly the same method to convert any derivation using Cut into a
derivation without it. To do this, we will make explicit a number of the
concepts we saw in the example.
definition 2.2.6 [active and passive formulas] The formulas L and

R in each inference in Figure 2.4 are said to be passive in the infer-
ence (they “do nothing” in the step from top to bottom), while the
other formulas are active.
definition 2.2.7 [depth of an inference] The depth of an inference

in a derivation δ is the number of nodes in the sub-derivation of δ in
which that inference is the last step, minus one. In other words, it is
the number of sequents above the conclusion of that inference.
Now we can proceed to present the technique for eliminating cuts from
a derivation. First we show that cuts may be moved upward in a deriv-
ation. Then we show that this process will terminate in a Cut-free
derivation.
lemma 2.2.8 [cut-depth reduction] Given a derivation δ of A ` C,

whose final inference is Cut, which is otherwise cut-free, and in which
that inference has a depth of n, we may construct another derivation
of A ` C which is cut-free, or in which each Cut step has a depth less
than n.
Proof: Our derivation δ contains two subderivations: δl ending in A `

B and δr ending in B ` C. These subderivations are are cut-free.
· ·
· δl · δr
· ·
A`B B`C
A`C
To find our new derivation, we look at the formula B and its roles in
the final inference in δl and δr .
case 1: the cut-formula is passive in either inference Suppose

that the formula B is passive in the last inference in δl or passive in
the last inference in δr . For example, if δl ends in [∧L1 ], then we may The [∧L2 ] case is the same, except
push the cut above it like this: for the choice of A2 instead of A1 .
· 0 · 0 ·
· δl · δl · δr
· · ·
A1 ` B · A1 ` B B`C
· δr
before: ∧L1 · after: Cut
A1 ∧ A2 ` B B`C A1 ` C
Cut ∧L1
A1 ∧ A2 ` C A1 ∧ A2 ` C

The resulting derivation has a has a cut-depth lower by one. If, on the
other hand, δl ends in [∨L], we may push the Cut above that [∨L] step.
The result is a derivation in which we have duplicated the Cut, but we
have reduced the cut-depth more significantly, as the effect of δl is split
between the two cuts.
· 1 · 2 · 1 · · 2 ·
· δl · δl · δl · δr · δl · δr
· · · · · ·
A1 ` B A2 ` B · A1 ` B B`C A2 ` B B`C
· δr
before: ∨L · after: Cut Cut
A1 ∨ A2 ` B B`C A1 ` C A2 ` C
Cut ∨L
A1 ∨ A2 ` C A1 ∨ A2 ` C
The other two ways in which the cut formula could be passive are when
δ2 ends in [∨R] or [∧R]. The technique for these is identical to the
examples we have seen. The cut passes over [∨R] trivially, and it passes
over [∧R] by splitting into two cuts. In every instance, the depth is
reduced.
case 2: the cut-formula is active In the remaining case, the cut-

formula formula B may be assumed to be active in the last inference
in both δl and in δr , because we have dealt with the case in which it is
passive in either inference. What we do now depends on the form of
the formula B. In each case, the structure of the formula B determines
the final rule in both δl and δr .
case 2a: the cut-formula is atomic If the cut-formula is an atom,

then the only inference in which an atomic formula is active in the
conclusion is [Id]. In this case, the cut is redundant.
p`p p`p
before: Cut after: p ` p
p`p
case 2b: the cut-formula is a conjunction If the cut-formula is a

conjunction B1 ∧ B2 , then the only inferences in which a conjunction
The choice for [∧L2 ] instead of [∧L1 ] is active in the conclusion are ∧R and ∧L. Let us suppose that in the
involves choosing B2 instead of B1 inference ∧L, we have inferred the sequent B1 ∧ B2 ` C from the
premise sequent B1 ` C. In this case, it is clear that we could have cut
on B1 instead of the conjunction B1 ∧ B2 , and the cut is shallower.
· 1 · 2 · 0
· δl · δl · δr · 1 · 0
· · · · δl · δr
A ` B1 A ` B2 B1 ` C · ·
before: ∧R ∧L1 after: A ` B1 B1 ` C
A ` B1 ∧ B2 B1 ∧ B2 ` C Cut
Cut A`C
A`C
case 2c: the cut-formula is a disjunction The case for disjunction

is similar. If the cut-formula is a disjunction B1 ∨ B2 , then the only
inferences in which a conjunction is active in the conclusion are ∨R
and ∨L. Let’s suppose that in ∨R the disjunction B1 ∨ B2 is introduced

in an inference from B1 . In this case, it is clear that we could have cut

on B1 instead of the disjunction B1 ∨ B2 , with a shallower cut.
· 0 · 1 · 2
· δl · δr · δr · 0 · 1
· · · · δl · δr
A ` B1 B1 ` C B2 ` C · ·
before: ∨R1 ∨L after: A ` B1 B1 ` C
A ` B1 ∨ B2 B1 ∨ B2 ` C Cut
Cut A`C
A`C
In every case, then, we have traded in a derivation for a derivation
either without Cut or with a shallower cut.
The process of reducing cut-depth cannot continue indefinitely, since

the starting cut-depth of any derivation is finite. At some point we find
a derivation of our sequent A ` C with a cut-depth of zero: We find a
derivation of A ` C without a cut. That is,
theorem 2.2.9 [cut elimination] If a sequent is derivable with Cut, it

is derivable without Cut.
Proof: Given a derivation of a sequent A ` C, take a Cut with no Cuts

above it. This cut has some depth, say n. Use the lemma to find a deriv-
ation with lower cut-depth. Continue until there is no Cut remaining
in this part of the derivation. (The depth of each Cut decreases, so this
process cannot continue indefinitely.) Keep selecting cuts in the ori-
ginal derivation and eliminate them one-by-one. Since there are only
finitely many cuts, this process terminates. The result is a cut-free
derivation.
This result has a number of fruitful consequences.
corollary 2.2.10 [decidability for lattice sequents] There is an al-

gorithm for determining whether or not a sequent A ` B is valid in
lattice logic.
Proof: To determine whether or not A ` B has a derivation, look for

the finitely many different sequents from which this sequent may be
derived. Repeat the process until you find atomic sequents. Atomic
sequents of the form p ` p are derivable, and those of the form p ` q
are not.
Here is an example:
example 2.2.11 [distribution is not derivable] The sequent p ∧ (q ∨

r) ` (p ∧ q) ∨ r is not derivable.
Proof: Any cut-free derivation of p ∧ (q ∨ r) ` (p ∧ q) ∨ r must end

in either a ∧L step or a ∨R step. If we use ∧L, we infer our sequent
from either p ` (p ∧ q) ∨ r, or from q ∨ r ` (p ∧ q) ∨ r. None of
these are derivable. As you can see, p ` (p ∧ q) ∨ r is derivable only,
using ∨R from either p ` p ∧ q or from p ` r. The latter is not
derivable (it is not an axiom, and it cannot be inferred from anywhere)

and the former is derivable only when p ` q is — and it isn’t. Similarly,
q ∨ r ` (p ∧ q) ∨ r is derivable only when q ` (p ∧ q) ∨ r is derivable,
and this is only derivable when either q ` p ∧ q or when q ` r are
derivable, and as before, neither of these are derivable either.
Similarly, if we use ∨R, we infer our sequent from either p ∧ (q ∨
r) ` p ∧ q or from p ∧ (q ∨ r) ` r. By analogous reasoning, (more pre-
cisely, by dual reasoning) neither of these sequents are derivable. So,
p∧(q∨r) ` (p∧q)∨r has no cut-free derivation, and by Theorem 2.2.9
it has no derivation at all.
Searching for derivations in this naïve manner is not as efficient as we

can be: we don’t need to search for all possible derivations of a sequent
if we know about some of the special properties of the rules of the sys-
tem. For example, consider the sequent A ∨ B ` C ∧ D (where A, B, C
and D are possibly complex statements). This is derivable in two ways
(a) from A ` C ∧ D and B ` C ∧ D by ∨L or (b) from A ∨ B ` C and
A ∨ B ` D by ∧R. Instead of searching both of these possibilites, we
may notice that either choice would be enough to search for a deriva-
tion, since the rules ∨L and ∧R ‘lose no information’ in an important
sense.
definition 2.2.12 [invertibility] A sequent rule of the form
S1 · · · Sn
S
is invertible if and only if whenever the sequent S is derivable, so are
the sequents S1 , . . . Sn .
theorem 2.2.13 [invertible sequent rules] The rules ∨L and ∧R are
invertible, but the rules ∨R and ∧L are not.
Proof: Consider ∨L. If A ∨ B ` C is derivable, then since we have a

derivation of A ` A ∨ B (by ∨R), a use of Cut shows us that A ` C
is derivable. Similarly, since we have a derivation of B ` A ∨ B, the
sequent B ` C is derivable too. So, from the conclusion A ∨ B ` C
of a ∨L inference, we may derive the premises. The case for ∧R is
completely analogous.
For ∧L, on the other hand, we have a derivation of p∧q ` p, but no
derivation of the premise q ` p, so this rule is not invertible. Similarly,
p ` q ∨ p is derivable, but p ` q is not.
It follows that when searching for a derivation of a sequent, instead of

searching for all of the ways that a sequent may be derived, if it may be
derived from an invertible rule we can look to the premises of that rule
immediately, and consider those, without pausing to check the other
sequents from which our target sequent is constructed.
example 2.2.14 [derivation search using invertibility] The sequent
(p ∧ q) ∨ (q ∧ r) ` (p ∨ r) ∧ p is not derivable. By the invertibility of
∨L, it is derivable only if (a) p∧q ` (p∨r)∧p and (b) q∧r ` (p∨r)∧p

are both derivable. Using the invertibility of ∧R, the sequent (b) this is
derivable only if (b1 ) q ∧ r ` p ∨ r and (b2 ) q ∧ r ` p are both derivable.
But (b2 ) is not derivable because q ` p and r ` p are underivable.
The elimination of cut is useful for more than just limiting the search
for derivations. The fact that any derivable sequent has a cut-free de-
rivation has other consequences. One consequence is the fact of inter-
polation.
corollary 2.2.15 [interpolation for lattice sequents] If a sequent
A ` B is derivable, then there is a formula C containing only atoms
present in both A and B such that A ` C and C ` B are derivable.
This result tells us that if the sequent A ` B is derivable then that
consequence “factors through” a statement in the vocabulary shared
between A and B. This means that the consequence A ` B not only
relies only upon the material in A and B and nothing else (that is due
to the availability of a cut-free derivation) but also in some sense the
derivation ‘factors through’ the material in common between A and B.
The result is a straightforward consequence of the cut-elimination the-
orem. A cut-free derivation of A ` B provides us with an interpolant.
Proof: We prove this by induction on the construction of the deriva-

tion of A ` B. We keep track of the interpolant with these rules:
p `p p [Id]
A `C R A `C R L `C1 A L `C2 B
∧L1 ∧L2 ∧R
A ∧ B `C R B ∧ A `C R L `C1 ∧C2 A ∧ B
A `C1 R B `C2 R L `C A L `C A
∨L ∨R1 ∨R2
A ∨ B `C1 ∨C2 R L `C A ∨ B L `C B ∨ A
We show by induction on the length of the derivation that if we have a

derivation of L `C R then L ` C and C ` R and the atoms in C present
in both L and in R. These properties are satisfied by the atomic sequent
p `p p, and it is straightforward to verify them for each of the rules.
example 2.2.16 [a derivation with an interpolant] Consider the se-

quent p ∧ (q ∨ (r1 ∧ r2 )) ` (q ∨ r1 ) ∧ (p ∨ r2 )). We may annotate a
cut-free derivation of it as follows:
q `q q r1 `r1 r1
∨R ∧L
q `q q ∨ r r1 ∧ r2 `r1 r1 p`p
∨L ∨R
q ∨ (r1 ∧ r2 ) `q∨r1 q ∨ r1 p `p p ∨ r2
∧R ∧L
p ∧ (q ∨ (r1 ∧ r2 )) `q∨r1 q ∨ r1 p ∧ (q ∨ (r1 ∧ r2 )) `p ` p ∨ r2
∧R
p ∧ (q ∨ (r1 ∧ r2 )) `(q∨r1 )∧p (q ∨ r1 ) ∧ (p ∨ r2 )

Notice that the interpolant (q∨r1 )∧p does not contain r2 , even though
r2 is present in both the antecedent and the consequent of the sequent.
This tells us that r2 is doing no ‘work’ in this derivation. Since we have
p ∧ (q ∨ (r1 ∧ r2 )) ` (q ∨ r1 ) ∧ p, (q ∨ r1 ) ∧ p ` (q ∨ r1 ) ∧ (p ∨ r2 )
We can replace the r2 in either derivation derivation with another state-

ment – say r3 – preserving the structure of each derivation. We get the
more general fact:
p ∧ (q ∨ (r1 ∧ r2 )) ` (q ∨ r1 ) ∧ (p ∨ r3 )
2.2.4 | history
[To be written.]
2.2.5 | exercises
basic exercises
q1 Show that there is no cut-free derivation of the following sequents
1 : p ∨ (q ∧ r) ` p ∧ (q ∨ r)
2 : p ∧ (q ∨ r) ` (p ∧ q) ∨ r
3 : p ∧ (q ∨ (p ∧ r)) ` (p ∧ q) ∨ (p ∧ r)
q2 Suppose that there is a derivation of A ` B. Let C(A) be a formula
containing A as a subformula, and let C(B) be that formula with the
subformula A replaced by B. Show that there is a derivation of C(A) `
C(B). Furthermore, show that a derivation of C(A) ` C(B) may be
systematically constructed from the derivation of A ` B together with
the context C(−) (the shape of the formula C(A) with a ‘hole’ in the
place of the subformula A).
q3 Find a derivation of p ∧ (q ∧ r) ` (p ∧ q) ∧ r. Find a derivation of
(p ∧ q) ∧ r ` p ∧ (q ∧ r). Put these two derivations together, with a
Cut, to show that p ∧ (q ∧ r) ` p ∧ (q ∧ r). Then eliminate the cuts
from this derivation. What do you get?
q4 Do the same thing with derivations of p ` (p∧q)∨p and (p∧q)∨p ` p.
What is the result when you eliminate this cut?
q5 Show that (1) A ` B ∧ C is derivable if and only if A ` B and A ` C is
derivable, and that (2) A ∨ B ` C is derivable if and only if A ` C and
B ` C are derivable. Finally, (3) when is A ∨ B ` C ∧ D derivable, in
terms of the derivability relations between A, B, C and D.
q6 Under what conditions do we have a derivation of A ` B when A con-
tains only propositional atoms and disjunctions and B contains only
propositional atoms and conjunctions.
q7 Expand the system with the following rules for the propositional con-
stants ⊥ and >.
A ` > [>R] ⊥ ` A [⊥L]

Show that Cut is eliminable from the new system. (You can think of
⊥ and > as zero-place connectives. In fact, there is a sense in which >
is a zero-place conjunction and ⊥ is a zero-place disjunction. Can you
see why?)
q8 Show that lattice sequents including > and ⊥ are decidable, follow-
ing Corollary 2.2.10 and the results of the previous question.
q9 Show that every formula composed of just >, ⊥, ∧ and ∨ is equivalent
to either > or ⊥. (What does this result remind you of?)
q10 Prove the interpolation theorem (Corollary 2.2.15) for derivations in-
volving ∧, ∨, > and ⊥.
q11 Expand the system with rules for a propositional connective with the
following rules:
A`R L`B
tonk L tonk R
A tonk B ` R L ` A tonk B
What new things can you derive using tonk? Can you derive A tonk See Arthur Prior’s “The Runabout
B ` A tonk B? Is Cut eliminable for formulas involving tonk? Inference-Ticket” [70] for tonk’s first
appearance in print.
following rules:
A`R LÀ L`B
honk L honk R
A honk B ` R L ` A honk B
What new things can you derive using honk? Can you derive A honk
B ` A honk B? Is Cut eliminable for formulas involving honk?
following rules:
A`R B`R L`B
plonk L plonk R
A plonk B ` R L ` A plonk B
What new things can you derive using plonk? Can you derive A plonk
B ` A plonk B? Is Cut eliminable for formulas involving plonk?
q14 Give a formal, recursive definition of the dual of a sequent, and the
dual of a derivation, in such a way that the dual of the sequent p1 ∧
(q1 ∨ r1 ) ` (p2 ∨ q2 ) ∧ r2 is the sequent (p2 ∧ q2 ) ∨ r2 ` p1 ∨ (q1 ∧ r1 ).
And then use this definition to prove the following theorem.
theorem 2.2.17 [duality for derivations] A sequent A ` B is deriv-
able if and only if its dual (A ` B)d is derivable. Furthermore, the dual
of the derivation of A ` B is a derivation of the dual of A ` B.
q15 Even though the distribution sequent p ∧ (q ∨ r) ` (p ∧ q) ∨ r is not
derivable (Example 2.2.11), some sequents of the form A ∧ (B ∨ C) `
(A ∧ B) ∨ C are derivable. Give an independent characterisation of the
triples hA, B, Ci such that A ∧ (B ∨ C) ` (A ∧ B) ∨ C is derivable.

q16 Prove the invertibility result of Theorem 2.2.13 without appealing to
the Cut rule or to Cut-elimination. (hint: if a sequent A ∨ B ` C
has a derivation δ, consider the instances of A ∨ B ‘leading to’ the
instance of A ∨ B in the conclusion. How does A ∨ B appear first in the
derivation? Can you change the derivation in such a way as to make
it derive A ` C? Or to derive B ` C instead? Prove this, and a similar
result for ∧L.)
advanced exercises
q17 Define a notion of reduction for lattice derivations. Show that it is
strongly normalising and that each derivation reduces to a unique cut-
free derivation.
projects
q18 Provide sequent formulations for logics intermediate between lattice
logic and the logic of distributive lattices (those in which p ∧ (q ∨ r) `
(p∧q)∨r). Characterise which logics intermediate between lattice logic
and distributive lattice logic have sequent presentations, and which do
not. (This requires making explicit what counts as a logic and what
counts as a sequent presentation of a logic.)

2.3 | from proofs to derivations and back

The goal in this section is to collect together what we have learned so
far into a more coherent picture. We will begin to see how natural
deduction and sequent systems can be related. It seems clear that there
must be connections, as the normalisation theorem and the proof of
the redundancy of Cut have a similar “flavour.” They both result in
a subformula property, they can be proved in similar ways. In this
section we will show how close this connection can be.
So, to connect sequent systems and natural deduction, start to think
of a derivation of A ` B as declaring the existence of a proof from A to
B. A proof from A to B cannot be a derivation: as these proofs contain
sequents, not just formulas. A proof from A to B is a natural deduction
proof. Thinking of the rules in a sequent system, then, perhaps we
can understand them as telling us about a existence (and perhaps the
construction) of natural deduction proofs. For example, the step from
L ` A and L ` B to L ` A ∧ B might be seen as saying that if we have
a proof from L to A and another proof from L to B then these may
(somehow) be combined into a proof from L to A ∧ B.
The story is not completely straightforward, for we have different
vocabularies for our Gentzen system and for natural deduction. By
the end of this section we will put them together and look at the logic
of conjunction, disjunction and implication. But for now, let us focus
on implication alone. Natural deduction proofs in this vocabulary can
have many assumptions but always only one conclusion. This means
that a natural way of connecting these arguments with sequents is to
use sequents of the form X ` A where X is a multiset and A a single
formula.
2.3.1 | sequents for linear conditionals

In this section we will examine linear natural deduction, and sequent
rules appropriate for it. We need rules for conditionals in a sequent
context. That is, we want rules that say when it is appropriate to intro-
duce a conditional on the left of a sequent, and when it is appropriate
to introduce one on the right. The rule for conditionals on the right
seems straightforward:
X, A ` B
→R
XÀ→B
The rule makes sense when talking about proofs. If π1 is a proof from
X, A to B, then we can extend it into a proof from X to A → B by dis-
charging A. We use only linear discharge, so we read this rule quite
literally. X, A is the multiset containing one more instance of A than
X does. We delete that instance of A from X, A, and we have the ante-
cedent multiset X, from which we can deduce A → B, discharging just
that instance of A.
The rule for conditionals on the left, on the other hand, is not as
straightforward as the right rule. Just as with our Gentzen system
§2.3 · from proofs to derivations and back 69

for ∧ and ∨, we want a rule that introduces our connective in the ante-
cedent of the sequent. This means we are after a rule that indicates
when it is appropriate to infer something from a conditional formula.
The canonical case of inferring something from a conditional formula
is by modus ponens. This motivates the sequent
A → B, A ` B
and this should be derivable as a sequent in our system. However,

this is surely not the only context in which we may introduce A → B
into the left of a sequent. We may want to infer from A → B when
the minor premise A is not an assumption of our proof, but is itself
deduced from some other premise set. That is, we at least want to
endorse this step:
XÀ
A → B, X ` B
If we have a proof from A on the basis of X then adding to this proof
a new assumption of A → B will lead us to B, when we add the extra
step of →I. This is straightforward enough. However, we may not
only think that the A has been derived from other material — we may
also think that the conclusion B has already been used as a premise in
another proof. It would be a shame to have to use a Cut to deduce what
follows from B (or perhaps, what follows from B together with other
premises Y .) In other words, we should endorse this inference:
XÀ B, Y ` C
→L
A → B, X, Y ` C
which tells us how we can infer from A → B. If we can infer to A
and from B, then adding the assumption of A → B lets us connect
the proofs. This is clearly very closely related to the Cut rule, but it
satisfies the subformula property, as A and B remain present in the con-
clusion sequent. The Cut rule is as before, except with the modification
for our new sequents. The cut formula C is one of the antecedents in
the sequent C, Y ` R, and it is cut out and replaced by whatever as-
sumptions are required in the proof. This motivates the following four
rules for derivations in this sequent calculus in Figure 2.5.
Sequent derivations using these rules can be constructed as follows:
q`q r`r
→L
p`p q, q → r ` r
→L
p → q, q → r, p ` r
→R
p → q, q → r ` p → r
→R
p → q ` (q → r) → (p → r)
They have the same structure as the sequent derivations for conjunc-
tion and disjunction seen in the previous section: if we do not use the
Cut rule, then derivations have the subformula property. Now, how-
ever, there are more positions for formulas to appear in a sequent, as

p ` p [Id]
XÀ B, Y ` R X, A ` B
→L →R
A → B, X, Y ` R XÀ→B
X`C C, Y ` R
Cut
X, Y ` R
Figure 2.5: sequents for conditionals
many formulas may appear on the left hand side of the sequent. In
the rules in Figure 2.5, a formula appearing in the spots filled by p, A,
B, A → B, or C are active, and the formulas in the other positions —
filled by X, Y and R — are passive.
As we’ve seen, these rules can be understood as “talking about” the
natural deduction system. We can think of a derivation of the sequent
X ` A as a recipe for constructing a proof from X to A. We may define
a mapping, giving us for each derivation δ of X ` A a proof nd(δ) from
X to A.
definition 2.3.1 [nd : derivations → proofs] For any sequent deriv-
ation δ of X ` A, there is a natural deduction proof nd(δ) from the
premises X to the conclusion A. It is defined recursively by first choos-
ing nd of an identity derivation, and then, given nd of simpler deriva-
tions, we define nd of a derivation extending those derivations by →L,
→I, or Cut:
» If δ is an identity sequent p ` p, then nd(δ) is the proof with the
sole assumption p. This is a proof from p to p.
» If δ is a derivation
· 0
·δ
·
X, A ` B
→R
XÀ→B
then we already have the proof nd(δ 0 ) from X, A to B. The proof
nd(δ), from X to A → B is the following:
X, [A](i)
·
· nd(δ 0 )
·
B
→I,i
A→B
· ·
· δ1 · δ2
· ·
XÀ B, Y ` R
→L
A → B, X, Y ` R

then we already have the proofs nd(δ1 ) from X to A and nd(δ2 )
from B, Y to R. The proof nd(δ), from A → B, X, Y to R is the
following:
X
·
· nd(δ1 )
·
A→B A
→E
B Y
·
· nd(δ2 )
·
R
· ·
· δ3 · δ4
· ·
X`C C, Y ` R
Cut
X, Y ` R
then we already have the proofs nd(δ3 ) from X to C and nd(δ4 )

from C, Y to R. The proof nd(δ), from X, Y to R is the following:
X
·
· nd(δ3 )
·
C Y
·
· nd(δ4 )
·
R
Using these rules, we may read derivations as sets of instructions con-

structing proofs. If you examine the instructions closely, you will see
that we have in fact proved a stronger result, connecting normal proofs
and cut-free derivations.
theorem 2.3.2 [normality and cut-freedom] For any cut-free deriv-

ation δ, the proof nd(δ) is normal.
Proof: This can be seen in a close examination of the steps of construc-

tion. Prove it by induction on the recursive construction of δ. If δ is
an identity step, nd(δ) is normal, so the induction hypothesis is satis-
fied. Notice that whenever an →E step is added to the proof, the major
premise is a new assumption in a proof with a different conclusion.
Whenever an →I step is added to the proof, the conclusion is added at
the bottom of the proof, and hence, it cannot be a major premise of an
→E step, which is an assumption in that proof and not a conclusion.
The only way we could introduce an indirect pair in to nd(δ) would
be by the use of the Cut rule, so if δ is cut-free, then nd(δ) is normal.
Another way to understand this result is as follows: the connective

rules of a sequent system introduce formulas involving that connective
either on the left or the right. Looking at in from the point of view of
a proof, that means that the new formula is either introduced as an
assumption or as a conclusion. In this way, the new material in the
proof is always built on top of the old material, and we never compose

an introduction with an elimination in such a way as to have an indirect

pair in a proof. The only way to do this is by way of a Cut step.
This mapping from sequent derivations to proofs brings to light one
difference between the systems as we have set up. As we have defined
them, there is no derivation δ such that nd(δ) delivers the simple proof
consisting of the sole assumption p → q. It would have to be a deriv-
ation of the sequent p → q ` p → q, but the proof corresponding to
this derivation is more complicated than the simple proof consisting of
the assumption alone:
p`p q`q p→q [p](1)

→L →E
δ: p → q, p ` q nd(δ) : q
→R →I,1
p→q`p→q p→q
What are we to make of this? If we want there to be a derivation

constructing the simple proof for the argument from p → p to itself,
an option is to extend the class of derivations somewhat:
δ? : p → p ` p → p nd(δ? ) : p → p
If δ? is to be a derivation, we can expand the scope of the identity rule,

to allow arbitrary formulas, instead of just atoms.
A ` A [Id+ ]
This motivates the following distinction:

definition 2.3.3 [liberal and strict derivations] A strict derivation
of a sequent is one in which [Id] is the only identity rule used. A liberal
derivation is one in which [Id+ ] is permitted to be used.
lemma 2.3.4 If δ is a liberal derivation, it may be extended into δst , a
strict derivation of the same sequent. Conversely, a strict derivation
is already a liberal derivation. A strict derivation featuring a sequent
A ` A where A is complex, may be truncated into a liberal derivation
by replacing the derivation of A ` A by an appeal to [Id+ ].
Proof: The results here are a matter of straightforward surgery on de-

rivations. To transform δ into δst , replace each appeal to [Id+ ] to justify
A ` A with the identity derivation Id(A) to derive A ` A.
Conversely, in a strict derivation δ, replace each derivation of an
identity sequent A ` A, below which there are no more identity se-
quents, with an appeal to [Id+ ] to find the smallest liberal derivation
corresponding to δ.
From now on, we will focus on liberal derivations, with the understand-
ing that we may “strictify” our derivations if the need or desire arises.
So, we have nd : derivations → proofs. This transformation also
sends cut-free derivations to normal proofs. This lends some support
to the view that derivations without cut and normal proofs are closely

related, and that cut elimination and normalisation are in some sense
the same kind of process. Can we make this connection tighter? What
about the reverse direction? Is there a map that takes proofs to deriva-
tions? There is, but the situation is somewhat more complicated. In the
rest of this section we will see how to transform proofs into derivations,
and we will examine the way that normal proofs can be transformed
into cut-free derivations.
Firstly, note that the map nd is many-to-one. There are derivations
δ 6= δ 0 such that nd(δ) = nd(δ 0 ). Here is a simple example:
p`p q`q q`q r`r

→L →L
δ : p → q, p ` q r`r δ0 : p ` p q → r, q ` r
→L →L
q → r, p → q, p ` r q → r, p → q, p ` r
Applying nd to δ and δ 0 , you generate the one proof π in two different

ways:
p→q p
→E
π: q→r q
→E
r
This means that there are at least two different ways to make the re-
verse trip, from π to a derivation. The matter is more complicated than
this. There is another derivation δ 00 , using a cut, such that nd(δ 00 ) = π.
p`p q`q q`q r`r

→L →L
p → q, ` q q → r, q ` r
Cut
q → r, p → q, p ` r
So, even though nd sends cut-free derivations to normal proofs, it also

sends some derivations with cut to normal proofs.
To understand what is going on, we will consider two different
ways to reverse the trip, to go from a proof π to a (possibly liberal)
derivation δ.
bottom-up construction of derivations: If π is a proof from X to

A then sqb (π) is a derivation of the sequent X ` A, defined as follows:
» If π is an assumption A, then sqb (π) is the identity derivation

Id(A).
» If π is a proof from X to A → B, composed from a proof π 0 from
X, A to B by a conditional introduction, then take the derivation
sqb (π 0 ) of the sequent X, A ` B, and extend it with a [→R] step
to conclude X ` A → B.
· b 0
· sq (π )
·
X, A ` B
→R
XÀ→B

» If π is composed from a proof π 0 from X to A → B and another

proof π 00 from Y to A, then take the derivations sqb (π 0 ) of X `
A → B and sqb (π 0 ) of Y ` A,
· b 00
· sq (π )
·
· b 0 YÀ B`B
· sq (π )
· →L
XÀ→B A → B, Y ` B
Cut
X, Y ` B
This definition constructs a derivation for each natural deduction proof,
from the bottom to the top.
The first thing to notice about sqb is that it does not always generate a
cut-free derivation, even if the proof you start off with is normal. We
always use a Cut in the translation of a →E step, whether or not the
proof π is normal. Let’s look at how this works in an example: we can
construct sqb of the following normal proof:
p→q [p](1)
→E
[q → r](2) q
→E
r
→I,1
p→r
→I,2
(q → r) → (p → r)
We are going to construct a derivation of the sequent p → q ` (q →
r) → (p → r). We start by working back from the last step, using the
definition. We have the following part of the derivation:
p→q [p](1)
· 0 →E
·δ
· 0 b
q→r q
p → q, q → r ` p → r where δ is sq of →E
→R r
p → q ` (q → r) → (p → r) →I,1
p→r
Going back, we have
· 00
·δ p→q p
·
p → q, q → r, p ` r →E
→R where δ 00 is sqb of q→r q
p → q, q → r ` p → r →E
→R r
p → q ` (q → r) → (p → r)
And going back further we get:

· 000
·δ
·
p → q, p ` q r`r
→L
q→r`q→r q → r, p → q, p ` r p→q p
Cut where δ 000 is sqb of →E
q → r, p → q, p ` r q
→R
q → r, p → q ` p → r
→R
q → r ` (p → q) → (p → r)

Finally, we get
p`p q`q
→L
p → q ` p → q p → q, p ` q
Cut
p → q, p ` q r`r
→L
q→r`q→r p → r, p → q, p ` r
Cut
p → q, q → r, p ` r
→R
p → q, q → r ` p → r
→R
p → q ` (q → r) → (p → r)
This contains redundant Cut steps (we applied Cuts to identity se-
quents, and these can be done away with). We can eliminate these,
to get a much simpler cutfree derivation:
p`p q`q
→L
p → q, p ` r r`r
→L
q → r, p → q, p ` r
→R
p → q, q → r ` p → r
→R
p → q ` (q → r) → (p → r)
You can check for yourself that when you apply nd to this derivation,
you construct the original proof.
So, we have transformed the proof π into a derivation δ, which
contained Cuts, and in this case, we eliminated them. Is there a way
to construct a cut-free derivation in the first place? It turns out that
there is. We need to construct the proof in a more subtle way than
unravelling it from the bottom.
perimeter construction of derivations: If we wish to generate a

cut-free derivation from a normal proof, the subtlety is in how we use
→L to encode an →E step. We want the major premise to appear as an
assumption in the natural deduction proof. That means we must defer
the decoding of an →E step until the major premise is an undischarged
assumption in the proof. Thankfully, we can always do this if the proof
is normal.
lemma 2.3.5 [normal proof structure] Any normal proof, using the
rules →I and →E alone, is either an assumption, or ends in an →I step,
or contains an undischarged assumption that is the major premise of
an →E step.
Proof: We show this by induction on the construction of the proof π.

We want to show that the proof π has the property of being either (a)
an assumption, (b) ends in an →I step, or (c) contains an undischarged
assumption that is the major premise of an →E step. Consider how π
is constructed.

» If π is an assumptions, it qualifies under condition (a).

» So, suppose that π ends in →I. Then it qualifies under condition
(b).
» Suppose that π ends in →E. Then π conbines two proofs, π1
ending in A → B and π2 ending in A, and we compose these
with an →E step to deduce B. Since π1 and π2 are normal, we
may presume the induction hypothesis, and that either (a), (b)
or (c) apply to each proof. Since the whole proof π is normal,
we know that the proof π1 cannot end in an →E step. So, it
must satisfy property (a) or property (c). If it is (c), then one
of the undischarged assumptions in π1 is the major premise of
an →E step, and it is undischarged in π, and hence π satisfies
property (c). If, on the other hand, π1 satisfies property (a), then
the formula A → B, the major premise of the →E step concluding
π, is undischarged, and π also satisfies property (c).
Now we may define the different map sqp (“p” for “perimeter”) ac-
cording to which we strip each →I off the bottom of the proof π, until
we have no more to take, and then, instead of dealing with the →E at
the bottom of the proof, we deal with the the leftmost undischarged
major premise of an →E step, unless there is none.
definition 2.3.6 [sqp ] If π is a proof from X to A then sqp (π) is a de-

rivation of the sequent X ` A, defined as follows:
» If π is an assumption A, then sqp (π) is the identity derivation

Id(A).
» If π is a proof from X to A → B, composed from a proof π 0 from
X, A to B by a conditional introduction, then take the derivation
sqp (π 0 ) of the sequent X, A ` B, and extend it with a [→R] step
to conclude X ` A → B.
· b 0
· sq (π )
·
X, A ` B
→R
XÀ→B
» If π is a proof ending in a conditional elimination, then if π con-

tains an undischarged assumption that is the major premise of
an →E step, choose the leftmost one in the proof. The proof π
will have the following form:
Z
·
· π2
·
C→D C
→E
D Y
·
· π3
·
A

Take the two proofs π2 and π3 , and apply sqp to them to find
derivations sqp (π2 ) of Z ` C and sqp (π3 ) of Y, D ` A. Compose
these with an →L step as follows:
· p · p
· sq (π2 ) · sq (π3 )
· ·
Z`C Y, D ` A
→L
C → D, Z, Y ` A
to complete the derivation for C → D, Z, Y ` A.

» If, on the other hand, there is no major premise of an →E step
that is an undischarged assumption in π (in which case, π is not
normal), use a Cut as in the last part of the definition of sqb (the
bottom-up translation) to split the proof at the final →E step.
This transformation will send a normal proof into a cut-free derivation,

since a Cut is only used in the mapping when the source proof is not
normal. We have proved the following result.
theorem 2.3.7 For each natural deduction proof π from X to A, sqp (π)
is a derivation of the sequent X ` A. Furthermore, if π is normal,
sqp (π) is cut-free.
This has an important corollary.
corollary 2.3.8 [cut is redundant] If δ is a derivation of X ` A, then

there is a cut-free derivation δ 0 of X ` A.
Proof: Given a proof π, let norm(π) be the normalisation of π. Then

given δ, consider sqp (norm(nd(δ))). This is a cut free derivation of
X ` A. The result is a cut-free derivation.
This is a different way to prove the redundancy of Cut. Of course,

we can prove the redundancy of Cut directly, by eliminating it from a
derivation.
The crucial steps in the proof of the elimination of Cut are just as
they were before. We have a derivation in which the last step is Cut,
and we push this Cut upwards towards the leaves, or trade it in for a
cut on a simpler formula. The crucial distinction is whether the cut
formula is active in both sequents in cut step, or passive in either one.
Consider the case in which the cut formula is active.
cut formula active on both sides: In this case the derivation is as

follows:
· · · 0
· δ1 · δ2 · δ2
· · ·
X, A ` B YÀ B, Z ` C
→R →L
XÀ→B A → B, Y, Z ` C
Cut
X, Y, Z ` C

Now, the Cut on A → B may be traded in for two simpler cuts: one on
A and the other on B.
· ·
· δ2 · δ1
· ·
YÀ X, A ` B · 0
· δ2
Cut ·
X, Y ` B B, Z ` C
Cut
X, Y, Z ` C
cut formula passive on one side: There are more cases to consider
here, as there are more ways the cut formula can be passive in a deriva-
tion. The cut formula can be passive by occuring in X, Y , or R in either
[→L] or [→R]:
XÀ B, Y ` R X, A ` B
→L →R
A → B, X, Y ` R XÀ→B
So, let’s mark all of the different places that a cut formula could occur
passively in each of these inferences. The inferences in Figure 2.6 mark
the four different locations of a cut formula with C.
X 0, C ` A B, Y ` R XÀ B, Y 0 , C ` R XÀ B, Y ` C X 0 , C, A ` B
→L →L →L →R
A → B, X 0 , C, Y ` R A → B, X, Y 0 , C ` R A → B, X, Y ` C X 0, C ` A → B
Figure 2.6: four positions for passive cut formulas
In each case we want to show that a cut on the presented formula C

occuring in the lower sequent could be pushed up to occur on the upper
sequent instead. (That is, that we could permute the Cut step and this
inference.)
Start with the first example. We want to swap the Cut and the [→L]
step in this fragment of the derivation:
· · 0
· δ2 · δ2
· ·
· 0
X ,C ` A B, Y ` R
· δ1
· →L
Z`C A → B, X 0 , C, Y ` R
Cut
A → B, X 0 , Z, Y ` R
But the swap is easy to achieve. We do this:
· ·
· δ1 · δ2
· ·
Z`C 0
X ,C ` A · 0
· δ2
Cut ·
X 0, Z ` A B, Y ` R
→L
A → B, X 0 , Z, Y ` R
The crucial feature of the rule [→L] that allows this swap is that it is
closed under the substitution of formulas in passive position. We could

replace the formula C by Z in the inference without disturbing it. The
result is still an instance of [→L]. The case of the second position in
[→L] is similar. The Cut replaces the C in A → B, X, Y 0 , C ` R by Z,
and we could have just as easily deduced this sequent by cutting on the
C in the premise sequent B, Y 0 , C ` R, and then inferring with [→L].
The final case where the passive cut formula is on the left of the
sequent is in the [→R] inference. We have
·
· δ2
·
· 0
X , C, A ` B
· δ1
· →R
Z`C X 0, C ` A → B
Cut
X 0, Z ` A → B
and again, we could replace the C in the [→R] step by Z and still have
an instance of the same rule. We permute the Cut and the [→R] step
to get
· ·
· δ1 · δ2
· ·
0
Z`C X , C, A ` B
Cut
X 0 , Z, A ` B
→R
X 0, Z ` A → B
a proof of the same endsequent, in which the Cut is higher. The only
other case is for an [→L] step in which the cut formula C is on the right
of the turnstile. This is slightly more complicated. We have
· · 0
· δ1 · δ1
· ·
XÀ B, Y ` C ·
· δ2
→L ·
A → B, X, Y ` C Z, C ` D
Cut
Z, A → B, X, Y ` D
In this case we can permute the cut and the [→L] step:
· 0 ·
· δ1 · δ2
· ·
· B, Y ` C Z, C ` D
· δ1
· Cut
XÀ Z, B, Y ` D
→L
Z, A → B, X, Y ` D
Here, we have taken the C in the step
XÀ B, Y ` C
→L
A → B, X, Y ` C
and cut on it. In this case, it does not simply mean replacing the C
by another formula, or even by a multiset of formulas. Instead, when
you cut with the sequent Z, C ` D, it means that you replace the C
by D and you add Z to the left side of the sequent. So, we make the
following transformation in the [→L] step:
XÀ B, Y ` C XÀ B, Y, Z ` D
→L =⇒ →L
A → B, X, Y ` C A → B, X, Y, Z ` D

The result is also a [→L] step.

We can see the features of [→L] and [→R] rules that allow the per-
mutation with Cut. They are preserved under cuts on formulas in
passive position. If you cut on a formula in passive position in the
endsequent of the rule, then find the corresponding formula in the “Corresponding formula”? This re-
topsequent of the rule, and cut on it. The resulting inference is also an quires an analysis of the formulas in
the rules, according to which you
instance of the same rule. We have proved the following lemma: can match formulas in the top and
bottom of rules to say which formula
lemma 2.3.9 [cut-depth reduction] Given a derivation δ of X ` A, instance corresponds to which other
whose final inference is Cut, which is otherwise cut-free, and in which instance. Nuel Belnap calls this an
that inference has a depth of n, we may construct another derivation analysis [7] of the rules. It is often
left implicit in discussions of sequent
of X ` C in which each cut on C has a depth less than n. systems. The analysis we use here is
as follows: Formula occurrences in
The only step for which the depth reduction might be in doubt is in the an instance of a rule are matched if
case where the cut formula is active on both sides. Before, we have and only if they are either represen-
ted by the same schematic formula
· · · 0 letter (A, B, etc.), or they occur in
· δ1 · δ2 · δ2
· · · the corresponding place in a schem-
X, A ` B YÀ B, Z ` C atic multiset position (X, Y , etc.).
→R →L
XÀ→B A → B, Y, Z ` C
Cut
X, Y, Z ` C
and the depth of the cut is |δ1 | + 1 + |δ2 | + |δ20 | + 1. After pushing the We think of the derivation δ1 as
Cut up we have: containing its endsequent X, A ` B,
and so on. So, |δ1 | is the number of
· · sequents in that derivation, including
· δ2 · δ1 its endsequent.
· ·
YÀ X, A ` B · 0
· δ2
Cut ·
X, Y ` B B, Z ` C
Cut
X, Y, Z ` C
The depth of the first cut is |δ2 | + |δ1 | (which is significantly shallower
than depth of the previous cut), and the depth of the second is |δ2 | +
|δ1 | + 1 + |δ20 | (which is shallower by one). So, we have another proof And this, gives a distinct proof of
of the cut elimination theorem, directly eliminating cuts in proofs by normalisation too. Go from a proof
to a derivation, eliminate cuts, and
pushing them up until they disappear. then pull back with nd.
2.3.2 | structural rules

What about non-linear proofs? If we allow vacuous discharge, or du-
plicate discharge, we must modify the rules of the sequent system in
some manner. The most straightforward possibility is to change the
rules for →R, as it is the rule →I that varies in application when we
use different policies for discharge. The most direct modification would
be this:
X`B
→R−
X−AÀ→B
where X − A is a multiset found by deleting instances of A from X. Its
treatment depends on the discharge policy in place:
» In linear discharge, X − A is the multiset X with one instance of
A deleted. (If X does not contain A, there is no multiset X − A.)

» In relevant discharge, X − A is a multiset X with one or more
instances of A deleted. (If X contains more than one instance of
A, then there are different multisets which can count as X − A:
it is not a function of X and A.)
» In affine discharge, X − A is a multiset X with at most one in-
stance of A deleted. (Now, there is always a multiset X − A for
any choice of X and A. There are two choices, if X actually con-
tains A, delete it or not.)
» In standard discharge, X − A is a multiset X with any number
(including zero) of instances of A deleted.
The following derivations give examples of the new rule.
p`p q`q q`q
→L →R−
p`p p → q, p ` q p`p q`r→q
→L →L
p → (p → q), p, p ` q p → q, p ` r → q
→R− →R−
p → (p → q) ` p → q p → q ` p → (r → q)
In the first derivation, we discharge two instances of p, so this is a
relevant sequent derivation, but not a linear (or affine) derivation. In
the second derivation, the last →R step is linear, but the first is not: it
discharges a nonexistent instance of r.
These rules match our natural deduction system very well. However,
they have undesirable properties. The rules for implication vary from
system to system. However, the features of the system do not actu-
ally involve implication alone: they dictate the structural properties of
deduction. Here are two examples. Allowing for vacuous discharge, if
the argument from X to B is valid, so is the argument from X, A to B.
X
·
·
·
B
→I
A→B A
B
In other words, if we have a derivation for X ` B, then we also should
have a derivation for X, A ` B. We do, if we go through A → B and a
Cut.
·
·δ
·
X`B AÀ B`B
→R− →L
XÀ→B A → B, A ` B
Cut
X, A ` B
So clearly, if we allow vacuous discharge, the step from X ` B to X, A `
B is justified. Instead of requiring the dodgy move through A → B, we
‘K’ for weakening. We weaken the may allow it the addition of an antecedent as a primitive rule.
sequent by trading in a stronger fact
(we can get B from X) for a weaker X`B
fact (we can get B from X with A). K
X, A ` B

In just the same way, we can motivate the structural rule of contraction
X, A, A ` B
W
X, A ` B
by going through A → B in just the same way. Why “W” for contraction and “K”
for weakening? It is due to Schön-
· finkel’s original notation for combin-
·δ ators [82].
·
X, A, A ` B AÀ B`B
→R− →L
XÀ→B A → B, A ` B
Cut
X, A ` B
With K and W we may use the old →R rule and ‘factor out’ the differ-
ent behaviour of discharge:
p`p q`q q`q
→L K
p`p p → q, p ` q q, r ` q
→L →R
p → (p → q), p, p ` q p`p q`r→q
W →L
p → (p → q), p ` q p → q, p ` r → q
→R →R
p → (p → q) ` p → q p → q ` p → (r → q)
The structural rules do not interfere with the elimination of cut,

though contraction does make the elimination of cut more difficult.
The first thing to note is that every formula occurring in a structural
rule is passive. We may commute cuts above structural rules. In the
case of the weakening rule, the weakened-in formula appears only in
the endsequent. If the cut is made on the weakened-in formula, it
disappears, and is replaced by further instances of weakening, like this:
·
· δ2 ·
· · δ2
· Y`B ·
· δ1
· K Y`B
XÀ Y, A ` B K, repeated
Cut X, Y ` B
X, Y ` B
In this case, the effect of the cut step is achieved without any cuts at
all. The new derivation is clearly simpler, in that the derivation δ1
is rendered unnecessary, and the number of cuts decreases. If the cut
formula is not the weakened in formula, then the cut permutes trivially
with the weakening step:
· · ·
· δ2 · δ1 · δ2
· · ·
· Y, A ` B XÀ Y, A ` B
· δ1
· K Cut
XÀ Y, A, C ` B X, Y ` B
Cut K
X, Y, C ` B X, Y, C ` B
In the case contraction formula matters are not so simple. If the con-
tracted formula is the cut formula, it occurs once in the endsequent

but twice in the topsequent. This means that if this formula is the cut
formula, when the cut is pushed upwards it duplicates.
· ·
· · δ1 · δ2
· δ2 · ·
· · XÀ Y, A, A ` B
· · δ1
· δ1 Y, A, A ` B · Cut
· W XÀ X, Y, A ` B
XÀ Y, A ` B Cut
Cut X, X, Y ` B
X, Y ` B W, repeated
X, Y ` B
In this case, the new proof is not less complex than the old one. The
depth of the second cut in the new proof (2|δ1 | + |δ2 | + 1) is greater than
in the old one (|δ1 | + |δ2 |). The old proof of cut elimination no longer
works in the presence of contraction. There are number of options one
might take here. Gentzen’s own approach is to prove the elimination
of multiple applications of cut.
· ·
· δ1 · δ2
· ·
XÀ Y, A, A ` B
Multicut
X, X, Y ` B
W, repeated
X, Y ` B
Another option is to eliminate contraction as one of the rules of the

system (retaining its effect by rewriting the connective rules) [28]. In
our treatment of the elimination of Cut we will not take either of these
approaches. We will be more subtle in the formulation of the inductive
argument, following a proof due to Haskell Curry [19, page 250], in
which we trace the occurrences of the Cut formula in the derivation
back to the points (if any) at which the formulas become active. We
show how Cuts at these points derive the endsequent without further
Cuts. But the details of the proof we will leave for later. Now we shall
look to the behaviour of the other connectives.
2.3.3 | conjunction and disjunction

Let’s add conjunction and disjunction to our vocabulary. We have a
number of options for the rules. One straightforward option would be
to use natural deduction, and use the traditional rules. For example,
the rules for conjunction in Gentzen’s natural deduction are
A B A∧B A∧B
∧I ∧E ∧E
A∧B A B
Notice that the structural rule of weakening is implicit in these rules:
A B
∧I
A∧B
∧E
A

So, if we wish to add conjunction to a logic in which we don’t have

weakening, we must modify the rules. If we view these rules as se-
quents, it is easy to see what has happened:
XÀ Y`B X, A ` R X, B ` R
∧R? ∧L? ∧L?
X, Y ` A ∧ B X, A ∧ B ` R X, A ∧ B ` R
The effect of weakening is then found using a Cut.
AÀ B`B AÀ
∧L? ∧R?
A, B ` A ∧ B A∧BÀ
Cut
A, B ` A
The ∧R? rule combines two contexts (X and Y ) whereas the ∧L? does
not combine two contexts—it merely infers from A ∧ B to A within
the one context. The context ‘combination’ structure (the ‘comma’ in
the sequents, or the structure of premises in a proof) is modelled using
conjunction using the ∧R? rule but the structure is ignored by the
∧L? rule. It turns out that there are two kinds of of conjunction (and
disjunction).
The rules in Figure 2.7 are additive. These rules do not exploit
premise combination in the definition of the connectives. (The “X,”
in the conjunction left and disjunction left rules is merely a passive
‘bystander’ indicating that the rules for conjunction may apply regard-
less of the context.) These rules define conjunction and disjunction,
regardless of the presence or absence of structural rules.
X, A ` R X, A ` R XÀ X`B
∧L1 ∧L2 ∧R
X, A ∧ B ` R X, B ∧ A ` R XÀ∧B
X, A ` R X, B ` R XÀ XÀ
∨L ∨R1 ∨R2
X, A ∨ B ` R XÀ∨B X`B∨A
Figure 2.7: additive conjunction and disjunction rules
These rules are the generalisation of the lattice rules for conjunc-
tion seen in the previous section. Every sequent derivation in the old
system is a proof here, in which there is only one formula in the ante-
cedent multiset. We may prove many new things, given the interaction
of implication and the lattice connectives:
p`p q`q p`p r`r
→L →L
p → q, p ` q p → r, p ` r
∧L ∧L
(p → q) ∧ (p → r), p ` q (p → q) ∧ (p → r), p ` r
∧R
(p → q) ∧ (p → r), p ` q ∧ r
→R
(p → q) ∧ (p → r) ` p → (q ∧ r)

Just as with the sequents with pairs of formulas, we cannot derive the
sequent p ∧ (q ∨ r) ` (p ∧ q) ∨ (p ∧ r)—at least, we cannot without any
structural rules. It is easy to see that there is no cut-free derivation
of the sequent. There is no cut-free derivation using only sequents
with single formulas in the antecedent (we saw this in the previous
section) and a cut-free derivation of a sequent in ∧ and ∨ containing
no commas, will itself contain no commas (the additive conjunction
and disjunction rules do not introduce commas into a derivation of a
conclusion if the conclusion does not already contain them). So, there
is no cut-free derivation of distribution. As we will see later, this means
that there is no derivation at all.
But the situation changes in the presence of the structural rules.
(See Figure 2.8.) This sequent is not derivable without the use of both
weakening and contraction.
AÀ B`B AÀ C`C

K K K K
A, B ` A A, B ` B A, C ` A A, C ` C
∧R ∧R
A, B ` A ∧ B A, C ` A ∧ C
∨R1 ∨R2
A, B ` (A ∧ B) ∨ (A ∧ C) A, C ` (A ∧ B) ∨ (A ∧ C)
∨L
A, B ∨ C ` (A ∧ B) ∨ (A ∧ C)
∧L2
A, A ∧ (B ∨ C) ` (A ∧ B) ∨ (A ∧ C)
∧L1
A ∧ (B ∨ C), A ∧ (B ∨ C) ` (A ∧ B) ∨ (A ∧ C)
W
A ∧ (B ∨ C) ` (A ∧ B) ∨ (A ∧ C)
Figure 2.8: distribution of ∧ over ∨, using k and w.
Without using weakening, there is no way to derive A, B ` A ∧ B us-

ing the additive rules for conjunction. If we think of the conjunction
of A and B as the thing derivable from both A and B, then this seems
to define a different connective. This motivates a different pair of con-
junction rules, the so-called multiplicative rules. We use a different
symbol (the tensor: ⊗) for conjunction defined with the multiplicat-
ive rules, because in certain proof-theoretic contexts (in the absence of
either contraction or weakening), they differ.
X, A, B ` C XÀ Y`B
⊗L ⊗R
X, A ⊗ B ` C X, Y ` A ⊗ B
For example, you can derive the distribution of ⊗ over ∨ (see Fig-
ure 2.9) in the absence of any structural rules. Notice that the deriva-
tion is much simpler than the case for additive conjunction.
2.3.4 | negation
You can get some of the features of negation by defining it in terms
of conditionals. If we pick a particular atomic proposition (call it f for

AÀ B`B AÀ C`C

⊗R ⊗R
A, B ` A ⊗ B A, C ` A ⊗ C
∨R1 ∨R2
A, B ` (A ⊗ B) ∨ (A ⊗ C) A, C ` (A ⊗ B) ∨ (A ⊗ C)
∨L
A, B ∨ C ` (A ⊗ B) ∨ (A ⊗ C)
⊗L
A ⊗ (B ∨ C) ` (A ⊗ B) ∨ (A ⊗ C)
Figure 2.9: distribution of ⊗ over ∨.
the moment) then A → f behaves somewhat like the negation of A.

For example, we can derive A ` (A → f) → f, (A ∨ B) → f ` (A →
f) ∧ (B → f), and vice versa, (A → f) ∧ (B → f) ` (A ∨ B) → f.
AÀ f`f B`B f`f
→L →L
A → f, A ` f B → f, B ` f
∧L1 ∧L2
(A → f) ∧ (B → f), A ` f (A → f) ∧ (B → f), B ` f
∨L
(A → f) ∧ (B → f), A ∨ B ` f
→R
(A → f) ∧ (B → f) ` (A ∨ B) → f
Notice that no special properties of f are required for this derivation to

work. The propsoition f is completely arbitrary. Now look at the rules
for implication in this special case of implying f:
XÀ f, Y ` R X, A ` f
→L →R
A → f, X, Y ` R XÀ→f
If we want to do this without appealing to the proposition f, we could
consider what happens if f goes away. Write A → f as ¬A, and con-
sider first [¬R]. If we erase f, we get
X, A `
¬R
X ` ¬A
Now the topsequent has an empty right-hand side. What might this
mean? One possible interpretation is that X, A ` is a refutation of A
in the context of X. This could be similar to a proof from X and A to a
contradiction, except that we have no particular contradiction in mind.
To derive X, A ` is to refute A (if we are prepared to keep X). In other
words, from X we can derive ¬A, the negation of A. If we keep with
this line of investigation, consider the rule [→L]. First, notice the right
premise sequent f, Y ` R. A special case of this is f `, and we can take
this sequent as a given: if a refutation of a statement is a deduction
from it to f, then f is self-refuting. So, if we take Y and R to be empty,
A → f to be ¬A and f ` to be given, then what is left of the [→L] rule
is this:
XÀ
¬L
X, ¬A `

If we can deduce A from X, then ¬A is refuted (given X). This seems an
eminently reasonable thing to mean by ‘not’. And, these are Gentzen’s
rules for negation. With them, we can prove many of the usual prop-
erties of negation, even in the absence of other structural rules. The
proof of distribution of negation over conjunction (one of the de Mor-
gan laws) simplifies in the following way:
AÀ B`B
¬L ¬L
¬A, A ` ¬B, B `
∧L1 ∧L2
¬A ∧ ¬B, A ` ¬A ∧ ¬B, B `
∨L
¬A ∧ ¬B, A ∨ B `
¬R
¬A ∧ ¬B ` ¬(A ∨ B)
We may show that A entails its double negation ¬¬A

AÀ
¬L
A, ¬A `
¬R
A ` ¬¬A
but we cannot prove the converse. There is no proof of p from ¬¬p.
Similarly, there is no derivation of ¬(p ∧ q) ` ¬p ∨ ¬q, using all of
the structural rules considered so far. What of the other property of
negation, that contradictions imply everything? We can get this far:
AÀ
¬L
A, ¬A `
⊗L
A ⊗ ¬A `
(using contraction, we can derive A∧¬A ` too) but we must stop there
in the absence of more rules. To get from here to A ⊗ ¬A ` B, we must
somehow add B into the conclusion. But the B is not there! How can
we do this? We can come close by adding B to the left by means of a
weakening move:
AÀ
¬L
A, ¬A `
K
A, ¬A, B `
¬R
A, ¬A ` ¬B
This shows us that a contradiction entails any negation. But to show
that a contradiction entails anything we need a little more. We can do
this by means of a structural rule operating on the right-hand side of
a sequent. Now that we have sequents with empty right-hand sides,
we may perhaps add things in that position, just as we can add things
on the left by means of a weakening on the right. The rule of right
weakening is just what is required to derive A, ¬A ` B.
X`
KR
X`B

The result is a sequent system for intuitionistic logic. Intuitionistic

logic arises out of the program of intuitionism in mathematics due
to L. E. J. Brouwer [12, 50]. The entire family of rules is listed in
Figure 2.10.
The sequents take the form X ` R where X is a multiset of formulas
and R is either a single formula or is empty. We take a derivation of
X ` A to record a proof of X from A. Furthermore, a derivation of X ` We have not presented the entire
records a refutation of X. The system of intuitionistic logic is a stable, system of natural deduction in which
these proofs may be found — yet.
natural and useful account of logical consequence [20, 42, 79].
X`C C, X 0 ` R
Identity and Cut p ` p [Id] Cut
X, X 0 ` R
XÀ B, X 0 ` R X, A ` B
Conditional Rules →L →R
A → B, X, X 0 ` R XÀ→B
XÀ X, A `
Negation Rules ¬L ¬R
X, ¬A ` X ` ¬A
X, A ` R X, A ` R XÀ X`B
Conjunction Rules ∧L1 ∧L2 ∧R
X, A ∧ B ` R X, B ∧ A ` R XÀ∧B
X, A ` R X, B ` R XÀ XÀ
Disjunction Rules ∨L ∨R1 ∨R2
X, A ∨ B ` R XÀ∨B X`B∨A
X, A, A ` R X`R X`
Structural Rules WL KL KR
X, A ` R X, A ` R X`C
Figure 2.10: sequent rules for intuitionistic propositional logic
A case could be made for the claim that intuitionistic logic is the
strongest logic is the strongest and most natural logic you can motivate
using inference rules on sequents of the form X ` R. It is possible to go
further and to add rules to ensure that the connectives behave as one
would expect given the rules of classical logic: we can add the rule of
double negation elimination This is equivalent to the natural de-
duction rule admitting the inference
X ` ¬¬A from ¬¬A to A, used in many sys-
DNE tems of natural deduction for clas-
XÀ sical logic.
which strengthens the system far enough to be able to derive all clas-
sical tautologies and to derive all classically valid sequents. However,
the results are not particularly attractive on proof-theoretical consider-
ations. For example, the rule DNE does not satisfy the subformula
property: the concluding sequent X ` A is derived by way of the

premise sequent involving negation, even when negation does not fea-
ture in X or in A. This feature of the rule is exploited in the derivation
of the sequent ` ((p → q) → p) → p, of Peirce’s Law, which is clas-
sically derivable but not derivable intuitionistically. This derivation
uses negation liberally, despite the fact that the concluding sequent is
negation-free.
p`p
KL
p, (p → q) → p ` p
→R
p ` ((p → q) → p) → p
¬L
¬(((p → q) → p) → p), p `
KR
¬(((p → q) → p) → p), p ` q
→R
¬(((p → q) → p) → p) ` p → q p`p
→L
¬(((p → q) → p) → p), (p → q) → p ` p
→R
¬(((p → q) → p) → p) ` ((p → q) → p) → p
¬L
¬(((p → q) → p) → p), ¬(((p → q) → p) → p) `
WL
¬(((p → q) → p) → p) `
¬R
` ¬¬(((p → q) → p) → p)
DNE
` ((p → q) → p) → p
It seems clear that this is not a particularly simple proof of Peirce’s

law. It violates the subformula property, by way of the detour through
negation. Looking at the structure of the proof, it seems clear that the
contraction step (marked WL) is crucial. We needed to duplicate the
conditional for Peirce’s law so that the inference of →L would work.
Using →L on an unduplicated Peirce conditional does not result in a
derivable sequent. The options for deriving (p → q) → p ` p are
grim:
`p→q p`p
→L
(p → q) → p ` p
No such proof will work. We must go through negation to derive the
sequent, unless we can find a way to mimic the behaviour of the WL
step without passing the formulas over to the left side of the turnstile,
using negation.
2.3.5 | classical logic

Gentzen’s great insight in the sequent calculus was that we could get
the full power of classical logic by way of a small but profound change
to the structure of sequents. We allow multisets on both sides of the
turnstile. For intuitionistic logic we already allow a single formula or
none. We now allow for more. The rules are trivial modifications of
the standard intuitionistic rule, except for this one change. The rules
are listed in Figure 2.11. In each sequent in the rules, ‘p’ is an atomic

proposition, ‘A’, ‘B’ and ‘C’ are formulas, X, X 0 , Y and Y 0 are multisets
(possibly empty), of formulas.
X ` Y, C C, X 0 ` Y 0
Identity and Cut p ` p [Id] Cut
X, X 0 ` Y, Y 0
X ` Y, A B, X 0 ` Y 0 X, A ` B, Y
X, X 0 , A → B ` Y, Y 0 X ` A → B, Y
X ` A, Y X, A ` Y
X, ¬A ` Y X ` ¬A, Y
X, A ` Y X, A ` Y X ` A, Y X 0 ` B, Y 0
Conjunction Rules ∧L1 ∧L2 ∧R
X, A ∧ B ` Y X, B ∧ A ` Y X, X 0 ` A ∧ B, Y, Y 0
X, A ` Y X, B ` Y X ` A, Y X ` A, Y
Disjunction Rules ∨L ∨R1 ∨R2
X, A ∨ B ` Y X ` A ∨ B, Y X ` B ∨ A, Y
X, A, A ` Y X ` A, A, Y X`Y X`Y
Structural Rules WL WR KL KR
X, A ` Y X ` A, Y X, A ` Y X ` A, Y
Figure 2.11: sequent rules for classical propositional logic
Using these rules, we can derive Peirce’s Law, keeping the structure
of the old derivation intact, other than the deletion of all of the steps
involving negation. Instead of having to swing the formula for Peirce’s
Law onto the left to duplicate it in a contraction step, we may keep it
on the right of the turnstile to perform the duplication. The negation
laws are eliminated, the WL step changes into a WR step, but the other
rules are unchanged. You might think that this is what
we were ‘trying’ to do in the
p`p other derivation, and we had to
KL be sneaky with negation to do what
p, (p → q) → p ` p we wished.
KR
p, (p → q) → p ` q, p
→R
p ` q, ((p → q) → p) → p
→R
` p → q, ((p → q) → p) → p p`p
→L
(p → q) → p ` p, ((p → q) → p) → p
→R
` ((p → q) → p) → p, ((p → q) → p) → p
WR
` ((p → q) → p) → p
The system of rules is elegant and completely symmetric between left

and right. The old derivations have mirror images in every respect.

Here are two double negation sequents:
AÀ AÀ
¬L ¬R
A, ¬A ` ` A, ¬A
¬R ¬L
A ` ¬¬A ¬¬A ` A
Here are two derivations of de Morgan laws. Again, they are com-
pletely left–right symmetric pairs (with conjunction and disjunction
exchanged, but negation fixed).
AÀ B`B AÀ B`B

¬L ¬L ¬R ¬L
¬A, A ` ¬B, B ` ` ¬A, A ` ¬B, B
∧L1 ∧L2 ∨R1 ∧L2
¬A ∧ ¬B, A ` ¬A ∧ ¬B, B ` ` ¬A ∨ ¬B, A ` ¬A ∨ ¬B, B
∨L ∧R
¬A ∧ ¬B, A ∨ B ` ` ¬A ∨ ¬B, A ∧ B
¬R ¬L
¬A ∧ ¬B ` ¬(A ∨ B) ¬(A ∧ B) ` ¬A ∨ ¬B
The sequent rules for classical logic share the ‘true–false’ duality im-
plicit in the truth-table account of classical validity. But this leads on
to an important question. Intuitionistic sequents, of the form X ` A,
record a proof from X to A. What do classical sequents mean? Do they
mean anything at all about proofs? A sequent of the form A, B ` C, D
does not tell us that C and D both follow from A and B. (Then it could
be replaced by the two sequents A, B ` C and A, B ` D.) No, the se-
quent A, B ` C, D may be valid even when A, B ` C and A, B ` D are
not valid. The combination of the conclusions is disjunctive and not
conjunctive when read ‘positively’. We can think of a sequent X ` Y as
proclaiming that if each member of X is true then some member of Y
is true. Or to put it ‘negatively’, it tells us that it would be a mistake
to assert each member of X and to deny each member of Y .
This leaves open the important question: is there any notion of
proof appropriate for structures like these, in which premises and con-
clusions are collected in exactly the same way? Whatever is suitable
will have to be quite different from the tree-structured proofs we have
already seen.
2.3.6 | cut elimination and corollaries

[To be written: A simple proof of the cut-elimination theorem of the
systems we have seen (it works, though the presence of contraction on
both sides of the turnstile makes things trickier). And a discussion of
interpolation and other niceties.]
We will end this section with a demonstration of yet another way that
we can show that cut is eliminable from a sequent system. We will
show that the system of sequents—without the cut rule—suffices to
derive every truth-table-valid sequent.
definition 2.3.10 [truth table validity] A sequent X ` Y is truth-
table valid if and only if there is no truth-table evaluation v such that

v(A) is true for each A in X and v(B) is false for each B in Y . If X ` Y

is not truth-table valid, then we say that an evaluation v that makes
each member of X true and each member of Y false a counterexample
to the sequent.
In other words, a sequent X ` Y is truth-table valid if and only if there

is no way to make each element of X true while making each element
of Y false. Or, if you like, if we make each member of X true, we must
also make some member of Y true. Or, to keep the story balanced, if
we make each member of Y false, we make some member of X false
too. To understand the detail of this, we need another definition. It is
what you expect.
definition 2.3.11 [truth-table evaluations] A function v assigning

each atom a truth value (either true or false) is said to be a truth-
table evaluation. A truth-table evaluation assigns a truth value to each
formula as follows:
» v(¬A) = true if and only if v(A) = false. (Otherwise, if v(A) =

true, then v(¬A) = false.)
» v(A ∧ B) = true if and only if v(A) = true and v(B) = true.
» v(A ∨ B) = true if and only if v(A) = true or v(B) = true (or
both).
» v(A → B) = true if and only if either v(A) = false or v(B) =
true.
Now, we will show two facts. Firstly, that if the sequent X ` Y is

derivable (with or without Cut) then it is truth-table valid. Second,
we will show that if the sequent X ` Y is not derivable, (again, with
or without Cut) then it is not truth-table valid. Can you see why this
proves that anything derivable with Cut may be derived without it?
theorem 2.3.12 [truth-table soundness of sequent rules] If X ` Y

is derivable (using Cut if you wish), then it is truth-table valid.
Proof: Axiom sequents (identities) are clearly truth-table valid. Take

an instance of a rule: if the premises of that rule are truth-table valid,
then so is the conclusion. We will consider two examples, and leave
the rest as an exercise. Consider the rule →R:
X ` Y, A B, X 0 ` Y 0
→L
X, X 0 , A → B ` Y, Y 0
Suppose that X ` Y, A and B, X 0 ` Y 0 are both truth-table valid, and
that we have an evaluation v for which each member of X, X 0 , and
A → B is true. We wish to show that for v. Now, since A → B
is true according to v, it follows that either A is false or B is true
(again, according to v). If A is false, then by the truth-table validity of
X ` Y, A, it follows that one member (at least) of Y is true according
to v. On the other hand, if B is true, then the truth-table validity of

B, X 0 ` Y 0 tells us that one member (at least) of Y 0 is true. In either
case, at least one member of Y, Y 0 is true according to v, as we desired.
Now consider the rule KR:
X`Y
KR
X ` A, Y
Suppose that X ` Y is truth-table valid. Is X ` A, Y ? Suppose we
have an evaluation making each element of X true. By the truth-table
validity of X ` Y , some member of Y is true according to v. It follows
that some member of Y, A is true according to v too. The other rules
are no more difficult to verify, so (after you verify the rest of the rules
to your own satisfaction) you may declare this theorem proved.
theorem 2.3.13 [truth-table completeness for sequents] If X ` Y

is truth-table valid, then it is derivable. In fact, it has a derivation that
does not use Cut.
X ` Y, C C, X ` Y
Identity and Cut X, A ` A, Y [Id] Cut
X`Y
X ` Y, A B, X ` Y X, A ` B, Y
A → B, X ` Y X ` A → B, Y
X ` A, Y X, A ` Y
X, ¬A ` Y X ` ¬A, Y
X, A, B ` Y X ` A, Y X ` B, Y
Conjunction Rules ∧L ∧R
X, A ∧ B ` Y X ` A ∧ B, Y
X, A ` Y X, B ` Y X ` A, B, Y
Disjunction Rules ∨L ∨R
X, A ∨ B ` Y X ` A ∨ B, Y
X, A, A ` Y X ` A, A, Y
Structural Rules WL WR
X, A ` Y X ` A, Y
Figure 2.12: alternative sequent rules for classical logic
Proof: We prove the converse: that if the sequent X ` Y has no cut-

free derivation, then it is not truth-table valid. To do this, we appeal to
the result of Exercise 8 to show that if we have a derivation (without
Cut) using the sequent system given in Figure 2.12, then we have a
derivation (also without Cut) in our original system. This result is not

too difficult to prove: simply show that the new identity axioms of the
system in Figure 8 may be derived using our old identity together with
instances of weakening; and that if the premises of any of the new rules
are derivable, so are the conclusions, using the corresponding rule from
the old system, and perhaps using judicious applications of contraction
to manipulate the parameters.
The new sequent system has some very interesting properties. Sup-
pose we have a sequent X ` Y , that has no derivation (not using Cut)
in this system. then we may reason in the following way:
» Suppose X ` Y contains no complex formulas, and only atoms.

Since it is underivable, it is not an instance of the new Id rule.
That is, it contains no formula common to both X and Y . There-
fore, counterexample evaluation v: simply take each member of
X to be true and Y to be false.
That deals with what we might call atomic sequents. We now proceed
by induction, with the hypothesis for a sequent X ` Y being that if
it has no derivation, it is truth-table invalid. And we will show that
if the hypothesis holds for simpler sequents than X ` Y then it holds
for X ` Y too. What is a simpler sequent than X ` Y ? Let’s say that
the complexity of a sequent is the number of connectives (∧, ∨, →, ¬)
occuring in that sequent. So, we have shown that the hypothesis holds
for sequents of complexity zero.
Now to deal with sequents of greater complexity: that is, those
containing formulas with connectives.
» Suppose that the sequent contains a negation formula. If this

formula occurs on the left, the sequent has the form X, ¬A ` Y . It
follows that if this is underivable in our sequent system, then so
is X ` A, Y . If this were derivable, then we could derive our target
sequent by ¬L. But look! This is a simpler sequent. So we may
appeal to the induction hypothesis to give us a counterexample
evaluation v, making each member of X true and making A false
and each member of Y false. Now since this evaluation makes
A false, then it makes ¬A true. So, it is a counterexample for
X, ¬A ` Y too.
If the negation formula occurs on the right instead, then the se-
quent has the form X ` ¬A, Y . It follows that X, A ` Y is underiv-
able (for otherwise, we could derive our sequent by ¬R). This is a
simpler sequent, so it has a counterexample v, making each mem-
ber of X, and A true and Y false. This is also a counterexample
to X ` ¬A, Y , since it makes ¬A false.
» Suppose that our sequent contains a conditional formula A → B.
If it occurs on the left, the sequent has the form A → B, X `
Y . If it is not derivable then, using the rules in Figure 2.12, we
may conclude that either X ` Y, A is underivable, or B, X ` Y
is underivable. (If they were both derivable, then we could use
→L to derive our target sequent A → B, X ` Y .) Both of these

sequents are simpler than our original sequent, so we may apply
the induction hypothesis. If X ` Y, A is underivable, we have an
evaluation v making each member of X true and each member
of Y false, together with A false. But look! This makes A → B
true, so v is a counterexample for A → B, X ` Y . Similarly, if
B, X ` Y is underivable, we have a counterexample v, making
each member of X true and each member of Y false, together
with making B true. So we are in luck in this case too! The
evaluation v makes A → B true, so it is a counterexample to our
target sequent A → B, X ` Y . This sequent is truth-table invalid.
Suppose, on the other hand, that an implication formula is on the
right hand side of the sequent. If X ` A → B, Y is not derivable,
then neither is X, A ` B, Y , a simpler sequent. The induction hy-
pothesis applies, and we have an evaluation v making the formu-
las in X true, the formulas in Y false, and A true and Y false. So,
it makes A → B false, and our evaluation is a counterexample
to our target sequent X ` A → B. This sequent is truth-table
invalid.
» The cases for conjunction and disjunction are left as exercises.
They pose no more complications than the cases we have seen.
So, the sequent rules, read backwards from bottom-to-top, can be un-
derstood as giving instructions for making a counterexample to a se-
The similarity to rules for quent. In the case of sequent rules with more than one premise, these
tableaux is not an accident [85]. instructions provide alternatives which can both be explored. If a se-
See Exercise 10 on page 97.
quent is underivable, these instructions may be followed to the end,
and we finish with a counterexample to the sequent. If following the
instructions does not meet with success, this means that all searches
have terminated with derivable sequents. So we may play this attempt
backwards, and we have a derivation of the sequent.
2.3.7 | history
[To be written.]
2.3.8 | exercises
basic exercises
q1 Which of the following sequents can be proved in intuitionistic logic?
For those that can, find a derivation. For those that cannot, find a
derivation in classical sequent calculus:
1 : p → (q → p ∧ q)
2 : ¬(p ∧ ¬p)
3 : p ∨ ¬p
4 : (p → q) → ((p ∧ r) → (q ∧ r))
5 : ¬¬¬p → ¬p
6 : ¬(p ∨ q) → (¬p ∧ ¬q)
7 : (p ∧ (q → r)) → (q → (p ∧ r))

8 : p ∨ (p → q)
9 : (¬p ∨ q) → (p → q)
10 : ((p ∧ q) → r) → ((p → r) ∨ (q → r))
q2 Consider all of the formulas unprovable in q1 on page 47. Find deriva-
tions for these formulas, using classical logic if necessary.
q3 Define the dual of a classical sequent in a way generalising the result
of Exercise 14 on page 67, and show that the dual of a derivation of a
sequent is a derivation of the dual of a sequent. What is the dual of a
formula involving implication?
q4 Define A →∗ B as ¬(A ∧ ¬B). Show that any classical derivation of
X ` Y may be transformed into a classical derivation of X∗ ` Y ∗ , where
X∗ and Y ∗ are the multisets X and Y respectively, with all instances
of the connective → replaced by →∗ . Take care to explain what the
transformation does with the rules for implication. Does this work for
intuitionistic derivations?
q5 Consider the rules for classical propositional logic in Figure 2.11. De-
lete the rules for negation. What is the resulting logic like? How does
it differ from intuitionistic logic, if at all?
q6 Define the Double Negation Translation d(A) of formula A as follows:
d(p) = ¬¬p
d(¬A) = ¬d(A)
d(A ∧ B) = d(A) ∧ d(B)
d(A ∨ B) = ¬(¬d(A) ∧ ¬d(B))
d(A → B) = d(A) → d(B)
What formulas are d((p → q) ∨ (q → p))) and d(¬¬p → p)? Show

that these formulas have intuitionistic proofs by giving a sequent de-
rivation for each.
q7 Using the double negation translation d of the previous question, show
how a classical derivation of X ` Y may be transformed (with a num-
ber of intermediate steps) into an intuitionistic derivation of Xd ` Y d ,
where Xd and Y d are the multisets of the d-translations of each ele-
ment of X, and of Y respectively.
q8 Consider the alternative rules for classical logic, given in Figure 2.12.
Show that X ` Y is derivable using these rules iff it is derivable using
the old rules. Which of these new rules are invertible? What are some
distinctive properties of these rules?
q9 Construct a system of rules for intuitionistic logic with as similar as
you can to the classical system in Figure 2.12. Is it quite as nice? Why,
or why not?
q10 Relate cut-free sequent derivations of X ` Y with tableaux refutations
of X, ¬Y [44, 78, 85]. Show how to transform any cut-free sequent

derivation of X ` Y into a corresponding closed tableaux, and vice-
versa. What are the differences and similarities between tableaux and
derivations?
q11 Relate natural deduction proofs for intuitionistic logic in Gentzen–
Prawitz style with natural deduction proofs in other systems (such as
Lemmon [49], or Fitch [31]). Show how to transform proofs in one
system into proofs in the other. How do the systems differ, and how
are they similar?
advanced exercises
q12 Consider what sort of rules make sense in a sequent system with se-
quents of the form A ` X, where A is a formula and X a multiset.
What connectives make sense? (One way to think of this is to define
This defines what Igor Urbas has the dual of an intuitionistic sequent, in the sense of Exercise 3 in this
called ‘Dual-intuitionistic Logic’ [93]. section and Exercise 14 on page 67.)

2.4 | circuits
In this section we will look at the kinds of proofs motivated by the two-
sided classical sequent calculus. Our aim is to “complete the square.”
Derivations of X ` A ↔ Proofs from X to A
Derivations of X ` Y ↔ ???
Just what goes in that corner? If the parallel is to work, the structure
is not a straightforward tree with premises at the top and conclusion at
the bottom, as we have in proofs for a single conclusion A. What other
structure could it be?
For the first example of proofs with multiple conclusions as well
as multiple premises. We will not look at the case of classical logic,
for the presence of the structural rules of weakening and contraction
complicates the picture somewhat. Instead, we will start with a logic
without these structural rules—linear logic.
2.4.1 | derivations describing circuits

We will start a with a simple sequent system for the multiplicative
fragment of linear logic. So, we will do without the structural rules
of contraction or weakening. However, sequents have multisets on the
left and on the right. In this section we will work with the connectives
⊕ and ⊗ (multiplicative disjunction and conjunction respectively) and Girard’s preferred notation for mul-
¬ (negation). The sequent rules are as follows. First, negation flips tiplicative disjunction in linear logic
&
is ‘ ’, to emphasise the connec-
conclusion to premise, and vice versa. tion between additive conjunction &
&
and multiplicative disjunction (and
X ` A, Y X, A ` Y
¬L ¬R
similarly, between his multiplicative
X, ¬A ` Y X ` ¬A, Y disjunction ⊕ and additive conjunc-
tion &). We prefer to utilise familiar
Multiplicative conjunction mirrors the behaviour of premise combin- notation for the additive connect-
ation. We may trade in the two premises A, B for the single premise ives, and use tensor notation for the
multiplicatives where their behaviour
A ⊗ B. On the other hand, if we have a derivation of A (from X, and differs markedly from the expected
with Y as alternate conclusions) and a derivation of B (from X 0 and with classical or intuitionistic behaviour.
Y 0 as alternate conclusions) then we may combine these derivations to
form a derivation of A ⊗ B from both collections of premises, and with
both collections of alternative conclusions.
X, A, B ` Y X ` A, Y X 0 ` B, Y 0
⊗L ⊗R
X, A ⊗ B ` Y X, X 0 ` A ⊗ B, Y, Y 0
The case for multiplicative disjunction is dual to the case for conjunc-
tion. We swap premise and conclusion, and replace ⊗ with ⊕.
X, A ` Y X 0, B ` Y 0 X ` A, B, Y
⊕L ⊕R
X, X 0 , A ⊕ B ` Y, Y 0 X ` A ⊕ B, Y
The cut rule is simple:
X ` A, Y X 0, A ` Y 0
Cut
X, X 0 ` Y, Y 0
§2.4 · circuits 99
The cut formula (here it is A) is left out, and all of the other material
remains behind. Any use of the cut rule is eliminable, in the usual
manner. Notice that this proof system has no conditional connective.
Its loss is no great thing, as we could define A → B to be ¬(A ⊗ ¬B),
or equivalently, as ¬A ⊕ B. (It is a useful exercise to verify that these
definitions are equivalent, and that they both “do the right thing” by
inducing appropriate rules [→E] and [→I].) So that is our sequent sys-
tem for the moment.
Let’s try to find a notion of proof appropriate for the derivations
in this sequent system. It is clear that the traditional many-premise
single-conclusion structure does not fit neatly. The cut free derivation
of ¬¬A ` A is no simpler and no more complex than the cut free
derivation of A ` ¬¬A.
AÀ AÀ
¬R ¬L
` ¬A, A ¬A, A `
¬L ¬R
¬¬A ` A A ` ¬¬A
The natural deduction proof from A to ¬¬A goes through a stage
where we have two premises A and ¬A and has no active conclusion
(or equivalently, it has the conclusion ⊥).
A [¬A](1)
¬E
∗
¬I,1
¬¬A
In this proof, the premise ¬A is then discharged or somehow otherwise
converted to the conclusion ¬¬A. The usual natural deduction proofs
from ¬¬A to A are either simpler (we have a primitive inference from
¬¬A to A) or more complicated. A proof that stands to the derivation
of ¬¬A ` A would require a stage at which there is no premise but
two conclusions. We can get a hint of the desired “proof” by turning
the proof for double negation introduction on its head:
¬¬A
¬I,1
∗
¬E
[¬A](1) A
Let’s make it easier to read by turning the formulas and labels the right
way around, and swap I labels with E labels:
¬¬A
¬E,1
∗
¬I
¬A
]( [
)
1 A
We are after a proof of double negation elimination at least as simple
as this. However, constructing this will require hard work. Notice that
not only does a proof have a different structure to the natural deduc-
tion proofs we have seen—there is downward branching, not upward—
there is also the kind of “reverse discharge” at the bottom of the tree

which seems difficult to interpret. Can we make out a story like this?
Can we define proofs appropriate to linear logic?
To see what is involved in answering this question in the affirmat-
ive, we will think more broadly to see what might be appropriate in
designing our proof system. Our starting point is the behaviour of
each rule in the sequent system. Think of a derivation ending in X ` Y
as having constructed a proof π with the formulas in X as premises
or inputs and the formulas in Y as conclusions, or outputs. We could
think of a proof as having a shape reminiscent of the traditional proofs
from many premises to a single conclusion:
A1 A2 ··· An
B1 B2 ··· Bm
However, chaining proofs together like this is notationally very dif-
ficult to depict. Consider the way in which the sequent rule [Cut]
corresponds to the composition of proofs. In the single-formula-right
sequent system, a Cut step like this:
X`C A, C, B ` D
Cut
A, X, B ` D
corresponds to the composition of the proofs
X
X A C B π1
π1 and π2 to form A C B
C D π2
D
In the case of proofs with multiple premises and multiple conclusions,
this notation becomes difficult if not impossible. The cut rule has an
instance like this:
X ` D, C, E A, C, B ` Y
Cut
A, X, B ` D, Y, E
This should correspond to the composition of the proofs
X A C B
π1 and π2
D C E Y
If we are free to rearrange the order of the conclusions and premises,
we could manage to represent the cut:
X
π1
D E C A C
π2
Y

But we cannot always rearrange the cut formula to be on the left of one
proof and the right of the other. Say we want to cut with the conclusion
E in the next step? What do we do?
It turns out that it is much more flexible to change our notation com-
pletely. Instead of representing proofs as consisting of characters on a
page, ordered in a tree diagrams, think of proofs as taking inputs and
outputs, where we represent the inputs and outputs as wires. Wires can
be rearranged willy-nilly—we are all familiar with the tangle of cables
behind the stereo or under the computer desk—so we can exploit this
to represent cut straightforwardly. In our pictures, then, formulas will
label wires. This change of representation will afford another insight:
instead of thinking of the rules as labelling transitions between formu-
las in a proof, we will think of inference steps (instances of our rules)
as nodes with wires coming in and wires going out. Proofs are then
circuits composed of wirings of nodes. Figure 2.13 should give you the
idea.
X X0
X π1
A
π
Y π2
Y Y0
Figure 2.13: a circuit, and chaining together two circuits
Draw for yourself the result of mak- A proof π for the sequent X ` Y has premise or input wires for
ing two cuts, one after another, in- each formula in X, and conclusion or output wires for each formula in
ferring from the sequents X1 ` A, Y1
and X2 , A ` B, Y2 and X3 , B `, Y3 to Y . Now think of the contribution of each rule to the development of
the sequent X1 , X2 , X3 ` Y1 , Y2 , Y3 . inferences. The cut rule is the simplest. Given two proofs, π1 from X
You get two different possible de- to A, Y , and π2 from X 0 , A to Y 0 , we get a new proof by chaining them
rivations with different intermedi-
ate steps depending on whether together. You can depict this by “plugging in” the A output of π1 into
you cut on A first or on B first. the A input of π2 . The remaining material stays fixed. In fact, this
Does the order of the cuts mat- picture still makes sense if the cut wire A occurs in the middle of the
ter when these different deriva-
tions are represented as circuits?
output wires of π1 and in the middle of the input wires of π2 .
X X0
π1
A
π2
Y Y0
So, we are free to tangle up our wires as much as we like. It is clear

from this picture that the conclusion wire A from the proof π1 is used

as a premise in the proof π2 . It is just as clear that any output wire

in one proof may be used as an input wire in another proof, and we
can always represent this fact diagrammatically. The situation is much As a matter of fact, I will try to
improved compared with upward-and-downward branching tree nota- make proofs as tangle free as pos-
sible, for ease of reading.
tion.
Now consider the behaviour of the connective rules. For negation, the
behaviour is simple. An application of a negation rule turns an output
A into an input ¬A (this is ¬L), or an input A into an output ¬A (this
is ¬R). So, we can think of these steps as plugging in new nodes in
the circuit. A [¬E] node takes an input A and input ¬A (and has no
outputs), while a [¬I] node has an output A and and output ¬A (and
has no inputs). In other words, these nodes may be represented in the
following way: Ignore, for the moment, the little
green squares on the surface of the
node, and the shade of the nodes.
¬A A ¬i These features have a significance
¬A A
¬e which will be revealed in good time.
and they can be added to existing proofs to provide the behaviour of

the sequent rules ¬L and ¬R.
X X
X ` A, Y
¬L π becomes π
X, ¬A ` Y
¬A A
A Y ¬e Y
Here, a circuit for X ` A, Y becomes, with the addition of a [¬E] node,

a circuit for X, ¬A ` Y . Similarly,
A X
¬i X
¬A A
X, A ` Y
¬R π becomes π
X ` ¬A, Y
Y Y
a circuit for the sequent X, A ` Y becomes, with the addition of a [¬I]

node, a circuit for X ` ¬A, Y . Notice how these rules (or nodes) are
quite simple and local. They do not involve the discharge of assump-
tions (unlike the natural deduction rule ¬I we have already seen). In-
stead, these rules look like straightforward transcriptions of the law of
non-contradiction (A and ¬A form a dead-end—don’t assert both) and
the law of the excluded middle (either A or ¬A is acceptable—don’t
deny both).

For conjunction, the right rule indicates that if we have a proof π with
A as one conclusion, and a proof π 0 with B as another conclusion, we
can construct a proof by plugging in the A and the B conclusion wires
into a new node with a single conclusion wire A ⊗ B. This motivates a
node [⊗I] with two inputs A and B an a single output A ⊗ B. So, the
node looks like this:
A B
⊗i
A⊗B
and it can be used to combine circuits in the manner of the [⊗R] se-
quent rule:
X X0
X X0
π0 π π0
π becomes
A B
Y ⊗i Y0
Y A B Y0
A⊗B
This is no different from the natural deduction rule

A B
A⊗B
except for the notational variance and the possibility that it might be
employed in a context in which there are conclusions alongside A ∧ B.
The rule [⊗E], on the other hand, is novel. This rule takes a single
proof π with the two premises A and B and modifies it by wiring to-
gether the inputs A and B into a node which has a single input A ⊗ B.
It follows that we have a node [⊗E] with a single input A ⊗ B and two
outputs A and B.
A⊗B
X A B X ⊗e
A B
π becomes π
Y Y
Ignore, for the moment, the different In this case the relevant node has one input and two outputs:
colour of this node, and the two
small circles on the surface of the
A⊗B
node where the A and B wires join.
All will be explained in good time. ⊗e
A B

This is not a mere variant of the rules [∧E] in traditional natural deduc-
tion. It is novel. It corresponds to the other kind of natural deduction
rule
[A, B]
·
·
·
A⊗B C
C
in which two premises A and B are discharged, and the new premise
A ⊗ B is used in its place.
The extent of the novelty of this rule becomes apparent when you
see that the circuit for [⊕E] also has one input and two outputs, and
the two outputs are A and B, if the input is A ⊕ B. The step for [⊕L]
takes two proofs: π1 with a premise A and π2 with a premise B, and
combines them into a proof with the single premise A ⊗ B. So the node
for [⊗E] looks identical. It has a single input wire (in this case, A ⊗ B),
and two output wires, A and B
A⊕B
X A B X0 X ⊕e X0
A B
π π0 becomes π π0
Y Y0 Y Y0
The same happens with the rule to introduce a disjunction. The se-
quent step [⊕R] converts the two conclusions A, B into the one conclu-
sion A ⊕ B. So, if we have a proof π with two conclusion wires A and
B, we can plug these into a [⊕I] node, which has two input wires A and
B and a single output wire A ⊕ B.
X X
π becomes π
A B
Y A B Y ⊕i
A⊕B
Notice that this looks just like the node for [⊗I]. Yet ⊗ and ⊕ are very
different connectives. The difference between the two nodes is due to
the different ways that they are added to a circuit.
2.4.2 | circuits from derivations

definition 2.4.1 [inductively generated circuit] A derivation of X `
Y constructs a circuit with input wires labelled with the formulas in X
and output wires labelled with the formulas in Y in the manner we

have seen in the previous section. We will call these circuits induct-
ively generated.
Here is an example. This derivation:
AÀ B`B
A, ¬A ` B, ¬B `
A, B, ¬A ⊕ ¬B `
A, B ` ¬(¬A ⊕ ¬B)
A ⊗ B ` ¬(¬A ⊕ ¬B)
can be seen as constructing the following circuit:
A⊗B
⊗e
A B
¬e ¬e
¬A ¬B
⊕e
¬A ⊕ ¬B
¬i
¬(¬A ⊕ ¬B)
Just as with other natural deduction systems, this is representation of

derivations is efficient, in that different derivations can represent the
one and the same circuit. This derivation
AÀ B`B
A, ¬A ` B, ¬B `
A, B, ¬A ⊕ ¬B `
A ⊗ B, ¬A ⊕ ¬B `
A ⊗ B ` ¬(¬A ⊕ ¬B)
defines exactly the same circuit. The map from derivations to circuits
is many-to-one.
Notice that the inductive construction of proof circuits provides for a
difference for ⊕ and ⊗ rules. The nodes [⊗I] and [⊕E] combine dif-
ferent proof circuits, and [⊗E] and [⊕I] attach to a single proof circuit.
This means that [⊗E] and [⊕I] are parasitic. They do not constitute
a proof by themselves. (There is no linear derivation that consists
merely of the step [⊕R], or solely of [⊗L], since all axioms are of the
form A ` A.) This is unlike [⊕L] and [⊗R] which can make fine proofs
on their own.

Not everything that you can make out of the basic nodes is a circuit
corresponding to a derivation. Not every “circuit” (in the broad sense)
is inductively generated.
A⊕B A⊗B
⊕e ⊗e ¬i
A B A B A ¬A
⊗i ⊕i ¬e
A⊗B A⊕B
You can define ‘circuits’ for A ⊕ B ` A ⊗ B or A ⊗ B ` A ⊕ B, or even

worse, `, but there are no derivations for these sequents. What makes
an assemblage of nodes a proof?
2.4.3 | correct circuits

When is a circuit inductively generated? There are different correct-
ness criteria for circuits. Here are two:
The notion of a switching is due to Vincent Danos and Laurent Reg-

nier [21], who applied it to give an elegant account of correctness for
proofnets.
definition 2.4.2 [switched nodes and switchings] The nodes [⊗E]

and [⊕I] are said to be switched nodes: the two output wires of [⊗E]
are its switched wires and the two input wires of [⊕I] are its switched
wires. A switching of a switched node is found by breaking one (and
one only) of its switched wires. A switching of a circuit is found by
switching each of its switch nodes.
theorem 2.4.3 [switching criterion] A circuit is inductively gener-

ated if and only if each of its switchings is a tree.
We call the criterion of “every-switching-being-a-tree” the switching

criterion. There are two ways to fail it. First, by having a switching
that contains a loop. Second, by having a switching that contains two
disconnected pieces.
Proof: The left-to-right direction is a straightforward check of each

inductively generated circuit. The basic inductively generated circuits
(the single wires) satisfy the switching criterion. Then show that for
each of the rules, if the starting circuits satisfy the switching criterion,
so do the result of applying the rules. This is a straightforward check.
The right-to-left direction is another thing entirely. We prove it by
introducing a new criterion.
The new notion is the concept of retractability [21].

definition 2.4.4 [retractions] A single-step retraction of a circuit
[node with a single link to another node, both unswitched, retracts
into a single unswitched node], and a [a switched node with its two
links into a single unswitched node.]. . . A circuit π 0 is a retraction of
another circuit π if there is a sequence π = π0 , π1 , . . . , πn−1 , πn = π 0
of circuits such that πi+1 is a retraction of πi .
theorem 2.4.5 [retraction theorem] If a circuit satisfies the switch-
ing criterion then it retracts to a single node.
Proof: This is a difficult proof. Here is the structure: it will be ex-
plained in more detail in class. Look at the number of switched nodes
in your circuit. If you have none, it’s straightforward to show that the
circuit is retractable. If you have more than one, choose a switched
node, and look at the subcircuit of the circuit which is strongly at-
tached to the switched ports of that node (this is called empire of the
node). This satisfies the switching criterion, as you can check. So it
must either be retractable (in which case we can retract away to a single
point, and then absorb this switched node and continue) or we cannot.
If we cannot, then look at this subcircuit. It must contain a switched
node (it would be retractable if it didn’t), which must also have an em-
pire, which must be not retractable, and hence, must contain a switched
node . . . which is impossible.
theorem 2.4.6 [conversion theorem] If a circuit is retractable, it can

be inductively generated.
You can use the retraction process to convert a circuit into a deriva-
tion. Take a circuit, and replace each unswitched node by a derivation.
The circuit on the left becomes the structure on the right:
A⊗B
⊗e
A⊗B
A B
⊗e
A B AÀ B`B
¬e ¬e ¬A, A ` ¬B, B `
¬A ¬B
¬A ¬B
⊕e ¬A ` ¬A ¬B ` ¬B
¬A ⊕ ¬B
¬A ⊕ ¬B ` ¬A, ¬B
¬i ¬A ⊕ ¬B
¬(¬A ⊕ ¬B)
¬A ⊕ ¬B ` ¬A ⊕ ¬B
` ¬(¬A ⊕ ¬B), ¬A ⊕ ¬B
¬(¬A ⊕ ¬B)

In this diagram, the inhabitants of a (green) rectangle are derivations,

whose concluding sequent mirrors exactly the arrows in and out of the
box. The in-arrows are on the left, and the out-arrows are on the right.
If we have two boxes joined by an arrow, we can merge the two boxes.
The effect on the derivation is to cut on the formula in the arrow. The
result is in Figure 2.14. After absorbing the two remaining derivations,
we get a structure with only one node remaining, the switched ⊗E
node. This is in Figure 2.15.
A⊗B
A ⊗e B
AÀ B`B
¬A, A ` ¬B, B `
¬A ¬B
¬A ` ¬A ¬B ` ¬B ¬A ⊕ ¬B ` ¬A ⊕ ¬B
¬A ⊕ ¬B ` ¬A, ¬B ` ¬(¬A ⊕ ¬B), ¬A ⊕ ¬B
` ¬(¬A ⊕ ¬B), ¬A, ¬B
¬(¬A ⊕ ¬B)
Figure 2.14: a retraction in progress: part 1
Now at last, the switched node ⊗E has both output arrows linked to
the one derivation. This means that we have a derivation of a sequent
with both A and B on the left. We can complete the derivation with a
[⊗L] step. The result is in Figure 2.16.
In general, this process gives us a weird derivation, in which every con-

nective rule, except for ⊗L and ⊕R occurs at the top of the derivation,
and the only other steps are cut steps and the inferences ⊗L and ⊕R,
which correspond to switched nodes.
[Notice that there is no explicit conception of discharge in these cir-
cuits. Nonetheless, conditionals may be defined using the vocabulary
we have at hand: A → B is ¬A ⊕ B. If we consider what it would be to
eliminate ¬A ⊕ B, we see that the combination of a ⊕E and a ¬E node
allows us to chain together a proof concluding in ¬A ⊕ B with a proof
concluding in A to construct a proof concluding in B (together with the
other conclusions remaining from the original two proofs).
[diagram to go here]

A⊗B
⊗e
A B
¬A ` ¬A ¬B ` ¬B ¬A ⊕ ¬B ` ¬A ⊕ ¬B
¬A ⊕ ¬B ` ¬A, ¬B ` ¬(¬A ⊕ ¬B), ¬A ⊕ ¬B AÀ
` ¬(¬A ⊕ ¬B), ¬A, ¬B ¬A, A ` B`B
A ` ¬(¬A ⊕ ¬B), ¬B ¬B, B `
A, B ` ¬(¬A ⊕ ¬B)
¬(¬A ⊕ ¬B)
Figure 2.15: a retraction in progress: part 2
For an introduction rule, we can see that a if we have a proof with

a premise A and a conclusion B (possibly among other premises and
conclusions) we may plug the A input wire into a ¬I node to give us
a new ¬A concluding wire, and the two conclusions ¬A and B may be
wired up with a ⊕I node to give us the new conclusion ¬A ⊕ B, or if
you prefer, A → B.
[diagram to go here]
Expand this, with a discussion of the locality of circuits as opposed to

the ‘action at a distance’ of the traditional discharge rules.]
¬A ` ¬A ¬B ` ¬B ¬A ⊕ ¬B ` ¬A ⊕ ¬B
⊕L ¬L
¬A ⊕ ¬B ` ¬A, ¬B ` ¬(¬A ⊕ ¬B), ¬A ⊕ ¬B AÀ
Cut ¬L
` ¬(¬A ⊕ ¬B), ¬A, ¬B ¬A, A ` B`B
Cut ¬L
A ` ¬(¬A ⊕ ¬B), ¬B ¬B, B `
Cut
A, B ` ¬(¬A ⊕ ¬B)
⊗L
A ⊗ B ` ¬(¬A ⊕ ¬B)
Figure 2.16: a derivation for A ⊗ B ` ¬(¬A ⊕ ¬B)

2.4.4 | normal circuits

Not every circuit is normal. In a natural deduction proof in the system
for implication, we said that a proof was normal if there is no step
introducing a conditional A → B which then immediately serves as a
major premise in a conditional elimination move. The definition for
normality for circuits is completely parallel to this definition. A circuit
is normal if and only if no wire for A ⊗ B (or A ⊕ B or ¬A) is both
the output of a [⊗I] node (or a [⊕I] or [¬I] node) and an input of a
[⊗E] node (or [⊕E] or [¬E]). This is the role of the small dots on the
boundary of the nodes: these mark the ‘active wire’ of a node, and a
non-normal circuit has a wire that has this dot at both ends.
It is straightforward to show that if we have a cut-free derivation
of a sequent X ` Y , then the circuit constructed by this derivation is
normal. The new nodes at each stage of construction always have their
dots facing outwards, so a dot is never added to an already existing
wire. So, cut-free derivations construct normal circuits.
The process of normalising a circuit is simplicity itself: Pairs of
introduction and elimination nodes can be swapped out by node-free
wires in any circuit in which they occur. The square indicates the “act-
ive” port of the node, and if we have a circuit in which two active ports
are joined, they can “react” to simplify the circuit. The rules of reac-
tion are presented in Figure 2.17.
⊗ before ⊕ before ¬ before

A B A B
⊗i ⊕i ¬i
A⊗B A⊕B A ¬A A
⊗e ⊕e ¬e
A B A B
⊗ after ⊕ after ¬ after
A
A B A B
Figure 2.17: normalisation steps
The process of normalisation is completely local. We replace a re-

gion of the circuit by another region with the same periphery. At no
stage do any global transformations have to take place in a circuit, and
so, normalisation can occur in parallel. It is clearly terminating, as we
delete nodes and do not add them. Furthermore, the process of norm-

alisation is confluent. No matter what order we decide to process the
nodes, we will always end with the same normal circuit in the end.
[add an example]
theorem 2.4.7 [normalisation for linear circuits]
2.4.5 | classical circuits

To design circuits for classical logic, you must incorporate the effect
of the structural rules in some way. The most straightforward way to
do this is to introduce (switched) contraction nodes and (unswitched)
weakening nodes. In this way the parallel with the sequent system is
completely explicit.
The first account of multiple-conclusion proofs is Kneale’s “Tables
of Development” [47]. Shoesmith and Smiley’s Multiple Conclusion
Logic [84] is an extensive treatment of the topic. The authors explain
why Kneale’s formulation is not satisfactory due to problems of substi-
tution of one proof into another — the admissibility of cut. Shoesmith
and Smiley introduce a notation similar to the node and wire diagrams
used here. The problem of substitution is further discussed in Ungar’s
Normalization, cut-elimination, and the theory of proofs [92], which
proposes a general account of what it is to substitute one proof into an-
other. One account of classical logic that is close to the account given
here is Edmund Robinson’s, in “Proof Nets for Classical Logic” [80].
definition 2.4.8 [classical circuits] These are the inductively gener-

ated circuits:
A
• An identity wire: for any formula A is an inductively
generated circuit. The sole input type for this circuit is A and its
output type is also (the very same instance) A. As there is only
one wire in this circuit, it is near to itself.
• Each boolean connective node presented below is an inductively

generated circuit. The negation nodes:
¬A A
¬E ¬I
¬A A
The conjunction nodes:
A∧B
A∧B
A B
∧I ∧ E1 ∧ E2
A∧B A B

And disjunction nodes:
A B
A∨B
∨I1 ∨I2
∧I
A B
A∨B A∨B
The inputs of a node are those wires pointing into the node, and
the outputs of a node are those wires pointing out.
• Given an inductively generated circuit π with an output wire la-

belled A, and an inductively generated circuit π 0 with an input
wire labelled A, we obtain a new inductively generated circuit in
which the output wire of π is plugged in to the input wire of π 0 .
The output wires of the new circuit are the output wires of π (ex-
cept for the indicated A wire) and the output wires of π 0 , and the
input wires of the new circuit are the input wires of π together
with the input wires of π 0 (except for the indicated A wire).
• Given an inductively generated circuit π with two input wires A,

a new inductively generated circuit is formed by plugging both
of those input wires into the input contraction node we . Simil-
arly, two output wires with the same label may be extended with
a contraction node wi .
• Given an inductively generated circuit π, we may form a new

circuit with the addition of a new output, or output wire (with
an arbitrary label) using a weakening node ki or ke .3
X X
B
π ki π ke
Y B Y
3 Using an unlinked weakening node like this makes some circuits disconnected.
It also forces a great number of different sequent derivations to be represented by
the same circuit. Any derivation of a sequent of the form X ` Y, B in which B is
weakened in at the last step will construct the same circuit as a derivation in which
B is weakened in at an earlier step. If this identification is not desired, then a more
complicated presentation of weakening, using the ‘supporting wire’ of Blute, Cockett,
Seely and Trimble [8] is possible. Here, I opt for a simple presentation of circuits
rather than a comprehensive account of “proof identity.”

Here is an example circuit:
¬I A ∧ ¬A A ∧ ¬A ¬I ¬I ¬I
A ∧ ¬A A ∧ ¬A
∧ E1 ∧ E2 ∧ E1 ∧ E2
A ¬A A ¬A
¬E ¬E
¬(A ∧ ¬A) ¬(A ∧ ¬A)
¬(A ∧ ¬A) ¬(A ∧ ¬A)
WI WI
¬(A ∧ ¬A) ¬(A ∧ ¬A)
∧I
¬(A ∧ ¬A) ∧ ¬(A ∧ ¬A)
[Correctness theorem. (Retraction)]

[Translation between circuits and sequents.]
[Normalisation (including strong normalisation). Failure of Church–
Rosser with the usual rules. Church-Rosser property for a particular
choice of the W/W and K/K rules. Is this desirable?]
2.4.6 | history and other matters

We can rely on the duality of ⊗ and ⊕ to do away with half of our rules,
if we are prepared to do a little bit of work. Translate the sequent X ` Y
into ` ¬X, Y `, and then trade in ¬(A ⊗ B) for ¬A ⊕ ¬B; ¬(A ⊕ B) for
¬A ⊗ ¬B, and ¬¬A for A. The result will be a sequent where the only
negations are on atoms. Then we can have rules of the following form:
` p, ¬p [Id]
` X, A, B ` X, A ` X 0, B
⊕R ⊗R
` X, A ⊕ B ` X, X 0 , A ⊗ B
The circuits are also much simpler. They only have outputs and no
inputs. These are Girard’s proofnets [35].
2.4.7 | exercises
basic exercises
q1 Construct circuits for the following sequents:
1 : ` p ⊕ ¬p
2 : p ⊗ ¬p `
3 : ¬¬p ` p
4 : p ` ¬¬p
5 : ¬(p ⊗ q) ` ¬p ⊕ ¬q
6 : ¬p ⊕ ¬q ` ¬(p ⊗ q)
7 : ¬(p ⊕ q) ` ¬p ⊗ ¬q
8 : ¬p ⊗ ¬q ` ¬(p ⊕ q)

9 : p⊗q`q⊗p
10 : p ⊕ (q ⊕ r) ` p ⊕ (q ⊕ r)
q2 Show that every formula A in the language ⊕, ⊗, ¬ is equivalent to a
formula n(A) in which the only negations are on atomic formulas.
q3 For every formula A, construct a circuit encodeA from A to n(A), and
decodeA from n(A) to A. Show that encodeA composed with decodeA
normalises to the identity arrow A , and that decodeA composed
n(A)
with encodeA normalises to . (If this doesn’t work for the encode
and decode circuits you chose, then try again.)
q4 Given a circuit π1 for A1 ` B1 and a circuit π2 for A2 ` B2 , show how
to construct a circuit for A1 ⊗ A2 ` B1 ⊗ B2 by adding two more nodes.
Call this new circuit π1 ⊗π2 . Now, suppose that τ1 is a proof from B1 to
C1 , and τ2 is a proof from B2 to C2 . What is the relationship between
the proof (π1 ⊗ π2 ) · (τ1 ⊗ τ2 ) (composing the two proofs π1 ⊗ π2 and
τ1 ⊗ τ2 with a cut on B1 ⊗ B2 ) from A1 ⊗ A2 to C1 ⊗ C2 and the proof
(π1 · τ1 ) ⊗ (π2 ⊗ τ2 ), also from A1 ⊗ A2 to C1 ⊗ C2 ?
Prove the same result for ⊕ in place of ⊗. Is there a corresponding
fact for negation?
q5 Re-prove the results of all of the previous questions, replacing ⊗ by ∧
and ⊕ by ∨, using the rules for classical circuits. What difference does
this make?
q6 Construct classical circuits for the following sequents
1 : q ` p ∨ ¬p
2 : p ∧ ¬p ` q
3 : p ` (p ∧ q) ∨ (p ∧ ¬q)
4 : (p ∧ q) ∨ (p ∧ ¬q) ` p
5 : (p ∧ q) ∨ r ` p ∧ (q ∨ r)
6 : p ∧ (q ∨ r) ` (p ∧ q) ∨ r
q7 The following statement is a tautology:
¬ (p1,1 ∨ p1,2 ) ∧ (p2,1 ∨ p2,2 ) ∧ (p3,1 ∨ p3,2 ) ∧
¬(p1,1 ∧ p2,1 ) ∧ ¬(p1,1 ∧ p3,1 ) ∧ ¬(p2,1 ∧ p3,1 ) ∧ ¬(p1,2 ∧ p3,2 ) ∧ ¬(p2,2 ∧ p3,2 )

It is the pigeonhole principle for n = 2. The general pigeonhole prin-

ciple is the formula Pn .
n+1
^ ^ n n+1
^ n+1
^ n
^
¬ pi,j ∧ ¬(pi,j ∧ pi 0 ,j )

Pn :
i=1 j=1 i=1 i 0 =i+1 j=1
Pn says that you cannot fit n + 1 pigeons in n pigeonholes, no two

pigeons in the one hole. Find a proof of the pigeonhole principle for Read ‘pi,j ’ as ‘pigeon number i is in
n = 2. How large is your proof? Describe a proof of Pn for each pigeonhole number j.’
value of n. How does the proof increase in size as n gets larger? Are
there non-normal proofs of Pn that are significantly smaller than any
non-normal proofs of Pn ?

2.5 | counterexamples
Models (truth tables, algebraic models, Kripke frame models for the lo-
gics we have seen so far) are introduced as counterexamples to invalid
arguments. Search for derivations is construed as a counterexample
search, and sequent systems suited to derivation search are presented,
and the finite model property and decidability is proved for a range of
logical systems.
Natural deduction proofs are structures generated by a recursive defin-
ition: we define atomic proofs (in this case, assumptions on their own),
and given a proof, we provide ways to construct new proofs (condi-
tional introduction and elimination). Nothing is a proof that is not
constructed in this way from the atoms. This construction means that
we can prove results about them using the standard technique of in-
duction. For example, we can show that any standard proof is valid
according to the familiar truth-table test. That is, if we assign the val-
ues of true and false to each atom in the language, and then extend the
valuation so that v(A → B) is false if and only if v(A) is true and v(B)
is false, and v(A → B) is true otherwise, then we have the following
result:
theorem 2.5.1 [truth table validity] Given a standardly valid argu-

ment X ∴ A, there is no assignment of truth values to the atoms such
that each formula in X is true and the conclusion A is false.
It can be proved systematically from the way that proofs are construc-
ted.
Proof: We first show that the simplest proofs have this property. That
is, given a proof that is just an assumption, we show that there is no
counterexample in truth tables. But this is obvious. A counterexample
for an assumption A would be a valuation such that v(A) was true and
was at the same time false. Truth tables do not allow this. So, mere
assumptions have the property of being truth table valid. Now, let’s
suppose that we have a proof whose last move is an elimination, from
A → B and A to B and let’s suppose that its constituent proofs, π1
from X to A → B and π2 from Y to A, are truth table valid. It remains
to show that our proof from X and Y to B is truth table valid. If it is
not, then we have a valuation v that makes each formula in X true, and
each formula in Y true, and that makes B false. This cannot be the case,
since v must make A either true or false. If it makes A false, then the
valuation is a counterexample to the argument from Y to A. (But we
have supposed that this argument has no truth table counterexamples.)
On the other hand, if it makes A true, then it makes A → B false (since
B is false) and so, it is a counterexample to the argument from X to
A → B. (But again, we have supposed that this argument has no truth
table counterexamples.) So, we have shown that if our proofs π1 and
π2 are valid, then the result of extending it with an →E move is also
valid.

Let’s do the same thing with →I. Suppose that π 0 , from X to B is a

valid argument, and let’s suppose that we are interested in discharging
the assumption A from π 0 to deduce A → B from the premise list X 0 ,
which is X with a number (possibly zero) of instances of A deleted. Is
this new argument truth table valid? Well, let’s suppose that it is not.
It follows that we have a valuation that makes the list X 0 of formulas
each true, and A → B false. That is, it makes A true and B false. So,
the valuation makes the formulas in X all true (since X is X 0 together
with some number of instances of A) and B false. So, if our longer
proof is invalid truth-table invalid, so is π 0 .
It follows, then, that any proof constructs an argument that is valid
according to truth tables. There is no counterexample to any one-line
proof (simply an assumption), and →I and →E steps also do not permit
counterexamples. So, no proof has a counterexample in truth tables.
All standardly valid arguments are truth table valid.
However, the converse is not the case. Some arguments that are valid
from the perspective of truth tables cannot be supplied with proofs.
Truth tables are good for sifting out some of the invalid arguments,
and for those arguments for which this techniques work, a simple
truth table counterexample is significantly more straightforward to
work with than a direct demonstration that there is no proof to be
found. Regardless, truth tables are a dull instrument. Many argu-
ments with no standard proofs are truth table valid. Here are two
examples: (A → B) → B ∴ (B → A) → A and (A → B) → A ∴ A.
Now we will look at ways to refute these arguments.
2.5.1 | counterexamples for conditionals

In this section we will consider a more discriminating instrument for
finding counterexamples to invalid arguments involving conditionals.
The core idea is that we will have simple models in which we can inter-
pret formulas. Models will consist of points at which formulas can be
true or false. Points will be able to be combined, and this is the heart
of the behaviour of implication. If A → B is true at x and A is true at
y, then B will be true at x ∗ y.
definition 2.5.2 [conditional structure] A triple hP, ∗, 0i consisting

of a nonempty set P of points, an operation ∗ on P and a specific ele-
ment 0 of P forms a conditional structure if and only if ∗ is commut-
ative (a∗b = b∗a for each a, b ∈ P) and associative (a∗(b∗c) = (a∗b)∗c
for each a, b, c ∈ P) and 0 is an identity for ∗ (0 ∗ a = a = a ∗ 0 for each
a ∈ P.) A conditional structure is said to be contracting if a ∗ a = a for
each a ∈ P.
example 2.5.3 [conditional structures] Here are some simple condi-

tional structures.
[the trivial structure] The trivial conditional structure has P = {0},

and 0 ∗ 0 = 0. It is the only structure in which P has one element.
§2.5 · counterexamples 117

[two-element structures] There are only two distinct two-element
conditional structures. Suppose P = {0, 1}. We have the addition
table settled except for the value of 1 ∗ 1. So, there are two two-
element structures. One in which 1 ∗ 1 = 0 and one in which 1 ∗
1 = 1. The second choice provides us with a contracting structure,
and the first does not.
[linear structures] Consider an conditional structure in which P =

{0, 1, 2, . . .} consists of all counting numbers. There are at least
two straightforward ways to evaluate ∗ on such a structure. If we
wish to have a contracting structure, we can set a∗b = max(a, b).
If we wish to have a different structure, we can set a ∗ b = a + b.
Once we have our structures, we can turn to the task of interpreting

formulas in our language.
These models are a slight general- definition 2.5.4 [conditional model] Given a conditional structure
isation of the models introduced hP, ∗, 0i, a model on that structure is determined by a valuation of
by Alasdair Urquhart in 1972 [94],
for the relevant logics r and e. the atoms of the language at each point in P. We will write “a p” to
say that p is true at a, and “a 6 p” to say that p is not true at a. Given
a model, we may interpret every formula at each point as follows:
» c A → B iff for each a ∈ P where a A, we have c ∗ a B.
example 2.5.5 [a simple valuation] Consider the two-element struc-
ture in which P = {0, 1} and 1 ∗ 1 = 1. Let’s suppose that 0 6 p,
1 p, 0 6 q and 1 6 q. It follows that 0 6 p → q, since 1 p and
0 ∗ 1 = 1 6 q. Similarly, 1 6 p → q, since 1 p and 1 ∗ 1 = 1 6 q.
However, 0 q → p, since there is no point at which q is true. Simil-
arly, 0 (p → q) → p, since there is no point at which p → q is true.
It follows that 0 6 ((p → q) → p) → p, since 0 (p → q) → p and
0 ∗ 0 = 0 6 p.
In this example, we have a refutation of a classically valid formula.
((p → q) → p) → p is valid according to truth tables, but it can be
refuted at a point in one of our models.
definition 2.5.6 [atom preservation] A model is said to be atom pre-

serving if whenever a p for an atom p and a point a, then for any
point b, we have a ∗ b p too.
lemma 2.5.7 [conditional preservation] For any points a and b in

an atom preserving model, and for any formula A, if a A then
a ∗ b A too. (That is, A is preserved across points.)
Proof: By induction on the construction of formulas. The case for

atoms holds by our assumption that the model is atom preserving.
Now suppose we have a conditional formula A → B, and suppose that
A and B are preserved in the model. We will show that A → B is
preserved too. Suppose that a A → B. We wish to show that
a ∗ b A → B too. So, we take any c ∈ P where c A, and we
attempt to show that (a ∗ b) ∗ c B. By the associativity of ∗ we have

(a ∗ b) ∗ c = a ∗ (b ∗ c). We also have that a A → B and that c A.

By the commutativity of ∗, b ∗ c = c ∗ b, and by the preservation of A,
c∗b A. So, b∗c A, and since a A → B, (a∗b)∗c = a∗(b∗c) B,
as desired. So, A → B is preserved.
To use these models to evaluate arguments, we need to think of how

it is appropriate to evaluate multisets of formulas in a model. Clearly
a singleton multiset A is evaluated just as a single formula is. The
multiset A, B is to be interpreted as true at a point x when x = a ∗ b
and a A and b B. We can generalise this.
Notice that this definition uses the
definition 2.5.8 [evaluating multisets] For each point x ∈ P, we say fact that 0 is an identity for ∗ (this
that makes x X work when X is a
singleton) and that ∗ is commutative
and associative (this makes it not
» x iff x = 0. matter what order you “unwrap”
» x Y, A iff there are y, a ∈ P where x = y ∗ a, y Y and a A. your multiset into elements).
definition 2.5.9 [validity in models] A formula is valid in a model

iff it is true at 0 in that model. An argument X ∴ A is valid in a model So, A is valid iff ∴ A is valid.
iff for each x ∈ P where x X, we also have x A.
theorem 2.5.10 [soundness for conditional models] If we have a

linear proof from X to A, then the argument X ∴ A is valid in every
model. If the proof utilises duplicate discharge, it is valid in every
contracting model. If the proof utilises vacuous discharge, it is valid
in every preserving model. So, every standard proof is valid in every
contracting preserving model.
Proof: By induction on the construction of the proof from X to A. As-

sumptions are valid in every model. If we have a proof π from X to
A → B and a proof π 0 from Y to A, we want the argument from X, Y
to B to be valid. We may assume that X ∴ A → B and Y ∴ A are
both valid in every model. Is X, Y ∴ B? Suppose we have z X, Y . It
follows that z = x ∗ y where x X and y Y . Then by the validity of
X ∴ A → B we have x A → B. Similarly, we have y A. By the
valuation clause for → then, z = x ∗ y B, as we desired.
Suppose now that π is a proof from X to B, where X contains some
number (possibly zero) of copies of A. We may assume that π is
valid (according to some restriction or other), and hence that if x X
then x B in each model (perhaps these are contracting models, per-
haps preserving, depending on whether we are observing or failing to
observe the restrictions on duplicating or vacuous discharge). Since
x X, it follows that x = x1 ∗ x2 ∗ · · · ∗ xn where x1 P1 , x2 P2 ,
. . . , xn Pn , where X = P1 , P2 , . . . , Pn . Without loss of generality,
we may present the list in such a way that the discharged instances of
A come last, so X = Y, A, . . . , A where i instances of A are discharged.
We want to show that Y ∴ A → B is valid. Suppose y Y . We wish We may well have i = n, in which
to show that y A → B. To do this, we need to show that if a A, case X = A, . . . , A and then Y = .
On the other hand, we may well
y ∗ a B. In the simplest case, the proof used a linear discharge at have i = 0, in which case X = Y .
this point, i = 1 (the number of instances of A discharged), and hence

X = Y, A. In this case, y ∗ a X = Y, A, and by our assumption of the
validity of π, y ∗ a B, as desired.
Suppose the discharge was vacuous, and X = Y . In this case, we
may assume that our model is a preserving one. Now since y Y , by
preservation y ∗ a Y too, but Y = X so, y ∗ a X and since X ∴ B is
valid (in preserving models) we have y ∗ a B as desired.
Suppose the discharge was a duplicate, and i > n. In this case,
we may assume that our model is a contracting one. Since we have
a A, we also have a ∗ a A (since a ∗ a = a) and more generally,
a ∗ · · · ∗ a A where we choose an i-fold repeated ∗-list. So, y ∗ a =
y ∗ a ∗ · · · ∗ a X = Y, A, . . . , A and hence y ∗ a B as desired.
So, we have shown that our argument is valid in the appropriate
class of models.
Straight away we may use this to provide counterexamples to invalid

arguments.
example 2.5.11 The argument A → (A → B) ∴ A → B is linearly
invalid (and invalid in “affine” logic too), for example. Take the condi-
tional model on the set of counting numbers, in which we have n∗m =
n + m, and in which we have an evaluation that preserves atoms. Take,
for example, p true at 1, 2 and every larger number (but not 0) and q
true at 2, 3 and every larger number (but not 0 or 1.) In this model
we have 0 p → (p → q), since for each n where n p (that is, for
each n > 0) we have n p → q. This is true, since for each m where
m p (that is, for each m > 0) we have n + m q (that is, we have
m + n > 1). However, we do not have 0 p → q, since 1 p but
0 + 1 = 1 6 q.
It follows from our soundness theorem and this model that we can-
not find a proof satisfying the no-duplicate discharge constraint for the
argument A → (A → B) ∴ A → B. There is none.
theorem 2.5.12 [completeness for conditional models] If X ∴ A
has no linear proof, then it has a counterexample in some model. If it
has no proof with duplicate discharge, it has a counterexample in some
contracting model. If it has no proof with vacuous discharge, it has a
counterexample in a preserving model. If it has no standard proof,
it has a counterexample in a standard (contracting and preserving)
model.
Proof: Consider the linear discharge policy first. We will construct a

model, which will contain a counterexample to every linearly invalid
argument. The points in this model are the multisets of formulas in
the language. X ∗ Y is X, Y , the multiset union of X and Y . 0 is , the
empty multiset. This clearly satisfies the definition of a conditional
structure.
Now we will define truth-at-a-point. We will say that X p if and
only if there is some proof for X ∴ p for any atom p. We will show
that in general, X A if and only if there is a proof for X ∴ A. This
proceeds by induction on the formation of the formula. We already

have the case for atoms. Now suppose we have A → B and the result
holds for A and B. We want to show that there is a proof for X ∴ A → B
if and only if for each Y where Y A, we have X, Y B. That is, we
wish to show that there is a proof for X ∴ A → B if and only if for
each Y where there’s a proof of Y ∴ A, there is also a proof of X, Y ∴ B.
From left-to-right it is straightforward. If there is a proof from X to
A → B and a proof from Y to A, then extend it by →E to form a proof
from X, Y to B. From right-to-left we may assume that for any Y , if
there’s a proof for Y ∴ A, then there is a proof X, Y ∴ B. Well, there
is a proof of A ∴ A, so it follows that there’s a proof of X, A ∴ B. Use
that proof and apply →I, to construct a proof of X ∴ A → B.
So, our structure is a model in which X A in general if and only
if there is a proof for X ∴ A. It is an easy induction on the structure
of X to show that X X. It follows, then that if there is no proof for
X ∴ A, then X itself is a point at which X X but X 6 A. We have a
counterexample to any invalid argument.
Consider the other discharge policies. If we allow vacuous dis-
charge, then it is straightforward to show that our model satisfies the
preservation condition. If X ∴ A is valid, so is X, B ∴ A. If X ∴ A is
valid, we may discharge a non-appearing B to find X ∴ B → A. We
may then use an assumption of B to deduce X, B ∴ A.
X
·
·
·
A
→I,1
B→A B
→E
A
So, in this model, if vacuous discharge is allowed, the preservation
condition is satisfied. So, we have an appropriate model for affine de-
ductions.
If we allow duplicate discharge, we must do a little work. Our
model we have constructed so far does not satisfy the contraction con-
dition, since the multiset A, A is not the same multiset as the singleton
A. Instead, we work simply with sets of formulas, and proceed as be-
fore. We must do a little more work when it comes to →E. We know
that if we have a proof for X ∴ A → B and one for Y ∴ A then we
have a proof from multiset union X, Y to B. Do we have one for the
set union too? We do, because for any proof from a list of premises to
a conclusion, if we allow duplicate discharges we can construct a proof
in which each premise is used only once.
X, [B, B](1)
·
·
·
A
→I,1
B→A B
→E
A
In this example, we trade in two uses of B in a proof from X, B, B to
A for one. The rest of the argument goes through just as before. Our

model with sets of formulas will do for relevant arguments. It satisfies
the preservation condition, if we allow vacuous discharge as required
for standard arguments.
2.5.2 | counterexamples through algebra

We have seen nothing yet of counterexamples to invalid sequents in-
volving ∧ and ∨. A sequent may be underivable, but we have, as
yet, no “take-away” representation of that invalidity of the kind we
saw in Section 2.5.1 for arguments involving conditionals. Counter-
examples for invalid sequents in our logic of conjunction and disjunc-
tions are not going to be straightforward, either. It is not just simple
sequents such as p ` q that are unprovable. Some sequents that are
valid in the standard sense (using truth tables for conjunction and dis-
junction) are also unprovable. For example, we have no derivation of
p ∧ (q ∨ r) ` (p ∧ q) ∨ r (see Example 2.2.11 on page 63), a sequent
for the distribution of conjunction over disjunction. We can show that
this has no cut-free derivation, and then appeal to Theorem 2.2.9.
This kind of reasoning is effective enough, but tedious if you need
to repeat it every time you want to refute an invalid argument. We
want an alternative: a way of representing invalidity. Truth-tables, just
“Even the sentences of Frege’s ma- as in the previous section, are not discriminating enough. They would
ture logical system are complex judge distribution to be valid, not invalid. One alternative, though, is
terms; they are terms that denote
truth-values. Frege distinguished to look for structures that are like truth tables, but more discriminating.
two truth-values, the true and the Consider truth-tables for conjunction and disjunction:
false, which he took to be ob-
jects.” Edward N. Zalta, “Gottlob ∧ t f ∨ t f
Frege,” http://plato.stanford.
edu/entries/frege/, [98].
t t f t t t
f f f f t f
We construe conjunction and disjunction as operations on the little set
t {t, f} of truth values. You can encode the information in these tables
more compactly, if you are prepared to use a little imagination, and if
f you are prepared to accept a metaphor. Think of the true (t) as higher
than the false (f). Think of the “under-and-over” relationship as “less
than” or “<” and you get f < t, but not vice versa. We also get t 6 t,
f 6 t and f 6 f but t 66 f. We can then understand the behaviour of
conjunction and disjunction on the set of truth values as minimum and
maximum respectively. A sequent A ` B is valid (according to truth
tables) if and only if the value of the premise A is always less than
or equal to the value of the conclusion B, no matter how we evaluate
the atoms in A and B as truth values. You can think of 6 then as a
kind of rendering of entailment in the small universe of truth values.
A conjunction entails the both conjuncts, since its value will be the
minimum of the values of either of the conjuncts. A disjunction is
entailed by either of the disjuncts, because its value is the maximum of
the values of the two disjuncts.
Does the treatment work in the more general context of the logic
of conjunction and disjunction generated by our sequent system? The

answer is negative. Minimum and maximum behave quite a lot like

conjunction and disjunction, but they do slightly more than we can
prove with these connectives here. You can show that the distribution A ∧ (B ∨ C) is the smaller of A and
law A ∧ (B ∨ C) ` (A ∧ B) ∨ C is valid under the interpretation of (the larger of B and C). (A ∧ B) ∨ C
is the larger of (the smaller of A and
conjunction and disjunction as minimum and maximum. Treating dis- B) and C. The first is always smaller
junction and conjunction as maximum and minimum is too strong for than (or equal to) the second.
our purposes. Regardless, it points in a helpful direction.
Sometimes orderings do not have maxima and minima. Consider
the following ordering, in which a < c, b < c and c < d. This picture is a Hasse Diagram.
This diagram is a way of presenting
d an ordering relation <: a relation
that is irreflexive (x 6< x for each x)
c
transitive (if x < y and y < z then
x < z) and antistymmetric (if x < y
a b then y 6< x). In the diagram, we
represent the objects by dots, and
There is a disjunction of a and b in one sense: it is c. a 6 c and b 6 c, we draw a single line from x up to
so c is an upper bound for both a and b. Not only that, but c is a least y just when x < y and there is no z
where x < z < y. Then, in general,
upper bound for a and b. That is, among all of the upper bounds of a
x < y if and only if you can follow a
and b (that is, among c and d), c is the smallest. A least upper bound path from x up to y.
is a good candidate for disjunction, since if z is a least upper bound for
x and y then we have
x 6 z and y 6 z
(it’s an upper bound) and
If x 6 z 0 and y 6 z 0 then z 6 z 0
(it’s the least of the upper bounds). If we write the z here as x ∨ y, and
if we utilise the transitivity of 6, we could write x 6 x ∨ y as “if v 6 x
then v 6 x ∨ z.” Our rules then take the form
v6x v6y x6u y6u
v6x∨y v6x∨y x∨y6u
which should look rather familiar. If we think of entailment as an or-
dering among pieces of information (or propositions, or what-have-
you), then disjunction forms a least upper bound on that ordering.
Clearly the same sort of thing could be said for conjunction. Conjunc-
tion is a greatest lower bound:
x6v y6v u6x u6y
x∧y6v x∧y6v u6x∧y
Ordered structures in which every pair of elements has a greatest lower
bound (or meet) and least upper bound (or join) are called lattices.
definition 2.5.13 [lattice] An ordered set hP, 6, ∧, ∨i with operators
∧ and ∨ is said to be a lattice iff for each x, y ∈ P, x ∧ y is the greatest
lower bound of x and y (with respect to the ordering 6) and x ∨ y is
the least upper bound of x and y (with respect to the ordering 6).
Consider the two structures below. The one on the left is not a
lattice, but the one on the right is a lattice.

t t
a c a b c
b d f
On the left, b and d have no lower bound at all (nothing is below both
of them), and while they have an upper bound (both a and c are upper
bounds of b and d) they do not have a least upper bound. On the other
hand, every pair of objects in the structure on the right has a meet and
a join. They are listed in the tables below:
∧ f a b c t ∨ f a b c t
f f f f f f f f a b c t
a f a f f a a a a t t t
b f f b f b b b t b t t
c f f f c c c c t t c t
t f a b c t t t t t t t
Notice that in this lattice, the distribution law fails in the following
way:
a ∧ (b ∨ c) = a ∧ t = a 66 c = f ∨ c = (a ∧ b) ∨ c
Lattices stand to our logic of conjunction and disjunction in the same
sort of way that truth tables stand to traditional classical propositional
logic. Given a lattice hP, 6, ∧, ∨i we can define a valuation v on for-
mula in the standard way.
definition 2.5.14 [valuation] We assign, for every atom p, a value

v(p) from the lattice P. Then we extend v to map each formula into P
by setting
v(A ∧ B) = v(A) ∧ v(B) v(A ∨ B) = v(A) ∨ v(B)
Using valuations, we can evaulate sequents.
theorem 2.5.15 [suitability of lattices] A sequent A ` B has a proof

if and only if for every lattice and for every valuation v, on that lattice,
v(A) 6 v(B).
Proof: The proof takes two parts, “if” and “only if.” For “only if” we
need to ensure that if A ` B has a proof, then for any valuation on any
lattice, v(A) 6 v(B). For this, we proceed by induction the construction
of the derivation of A ` B. If the proof is simply the axiom of identity,
then A ` B is p ` p, and v(p) 6 v(p). Now suppose that the proof is
more complicated, and that the hypothesis holds for the prior steps in
the proof. We inspect the rules one-by-one. Consider ∧L: from A ` R
to A ∧ B ` R. If we have a proof of A ` R, we know that v(A) 6 v(R).
We also know that v(A ∧ B) = v(A) ∧ v(B) 6 v(A) (since ∧ is a lower
bound), so v(A ∧ B) 6 v(R) as desired. Similarly for ∨R: from L ` A
to L ` A ∨ B. If we have a proof of L ` A we know that v(L) ` v(A).

We also know that v(A) 6 v(A ∨ B) (since ∨ is an upper bound), so

v(L) 6 v(A ∨ B) as desired.
For ∧R, we may assume that L ` A and that L ` B have proofs,
so for every v, we have v(L) 6 v(A) and v(L) 6 v(B). Now, let w
be any evaluation, since w(L) is a lower bound of w(A) and w(B), it
must be less than or equal to the greatest lower bound w(A) ∧ w(B).
That is, we have w(L) 6 w(A) ∧ w(B) = w(A ∧ B), which is what we
wanted. For ∨L we reason similarly. Assuming that A ` R and B ` R
we have w(A) 6 w(R) and w(B) 6 w(R), and hence w(R) is an upper
bound for w(A) and w(B). So w(A ∨ B) = w(A) ∨ w(B) 6 w(R), since
w(A) ∨ w(B) is the least upper bound for w(A) and w(B).
We can even show that if the proof uses Cut it preserves validity
in lattices, since if v(A) 6 v(B) and v(B) 6 v(C) then we have v(A) 6
v(C), since 6 is a transitive relation.
Now for the “if” part. We need to show that if A ` B has no proof,
then there is some lattice in which v(A) 66 v(B). Just as in the proof of
Theorem 2.5.12, we will construct a model (in this case it is a lattice)
from the formulas and proofs themselves. The core idea is that we will The core construction is really (as a
assume as little as possible about the relationships between objects in mathematician will tell you) a con-
struction of the free lattice on the
our lattice. We will choose a value v(A) for A such that v(A) 6 v(B) generator set atom.
if and only if A ` B. We can do this if let the objects in our lattice be
equivalence classes of formulas, like this:
B ∈ [A] if and only if there are proofs A ` B and B ` A
So, [A] contains A ∧ A, A ∨ A, A ∧ (A ∨ B), and many other formulas

besides. It is the collection of all of the formulas exactly as strong
as A (and no stronger). These classes of formulas form the objects
in our lattice. To define the ordering, we set [A] 6 [B] if and only
if A ` B. This definition makes sense, because if we chose different The relation A ∼ B, defined by setting
representatives for the collections [A] and [B] (say, A 0 ∈ [A], so [A 0 ] = A ∼ B iff A ` B and B ` A is a
congruence.
[A], and similarly, [B] = [B 0 ]) we would have A 0 ` A and B ` B 0 so
A 0 ` B 0 too.
It remains to be seen that this ordering has a lattice structure. We
want to show that [A] ∧ [B], defined as [A ∧ B], is the meet of [A] and
[B], and that [A] ∨ [B], defined as [A ∨ B], is the join of [A] and [B]. This
requires first showing that the definitions make sense. (Would we have
got a different result for the meet or for the join of two collections had
we chosen different representatives?) For this we need to show that if
A 0 ∈ [A] and B 0 ∈ [B] then A 0 ∧ B 0 ∈ [A ∧ B]. This is straightforward,
using the two proofs below:
A0 ` A B0 ` B A ` A0 B ` B0
∧R1 ∧R2 ∧R1 ∧R2
A0 ∧ B0 ` A A0 ∧ B0 ` B A ∧ B ` A0 A ∧ B ` B0
∧R ∧R
A0 ∧ B0 ` A ∧ B A ∧ B ` A0 ∧ B0
If A and A 0 are equivalent, and B and B 0 are also equivalent, then so

are their respective conjunctions. The proof is similar for disjunction,
so the operations on equivalence classes are well defined. It remains

to show that they are, respectively, meet and join. For disjunction, we
need to show that [A ∨ B] is the join of [A] and [B]. First, [A ∨ B] is an
upper bound of [A] and of [B], since A ` A ∨ B and B ` A ∨ B. It is the
least such upper bound, since if [A] 6 [R] and [B] 6 [R] for any formula
R, we have a proof of A ` R and a proof of B ` R, which ensures that
we have a proof of A ∨ B ` R, and hence [A ∨ B] 6 [R]. The proof that
∧ on the set of equivalence classes is a greatest lower bound has the
same form, and we consider it done. We have constructed a lattice out
of the set formula.
Now, if A 6` B, we can construct a refuting valuation v for this
sequent. Define v into our lattice by setting v(p) to be [p]. For any
complex formula C, v(C) = [C], by tracing through the construction
of C: v(A ∧ B) = v(A) ∧ v(B) = [A] ∧ [B] = [A ∧ B], and similarly
v(A ∨ B) = v(A) ∨ v(B) = [A] ∨ [B] = [A ∨ B]. So, if A 6` B, then
since [A] 66 [B], we have v(A) 66 v(B). We have a counterexample in
our lattice.
This is one example of the way that we can think of our logic in an
algebraic manner. We will see many more later. Before going on to
the next section, let us use a little of what we have seen in order to
reflect more deeply on cut and identity.
2.5.3 | consequences of cut and identity

Doing without Cut as well as [IdA ] puts quite a strong constraint on the
rules for a connective. To ensure that identities are provable for con-
junctions, we needed to have a proof of p ∧ q ` p ∧ q. When you think
about it, this means that the left and right rules for conjunction must
be connected. The rules for a conjunction on the left must be strong
enough to entail whatever is required to give you the conjunction on
the right. Using the concepts of algebras, if we think of the left and
right rules for conjunction as defining two different connectives, ∧l
and ∧r respectively, the provability of identity ensures that we have
x ∧l y 6 x ∧r y.
The same holds for the elimination of Cut. Showing that you can
eliminate a Cut in which a conjunction is the cut-formula, we need to
show now that the rules for conjunction on the right are strong enough
to ensure that whatever it is that is entailed by a conjunction on the
right is “true enough” to be entailed by whatever entails the conjunc-
tion on the left. Thinking algebraically again, the eliminatibility of
Cut ensures that whenever z 6 x ∧r y and x ∧l y 6 z 0 we have z 6 z 0 .
That is, it ensures that x ∧r y 6 x ∧l y.
In other words, you can think of cut-elimination and the provabil-
ity of identity as ensuring that the left and right rules for a connective
are appropriately coordinated. You can prove an identity A ] B ` A ] B
if the right rules for ] make it “more” true than the left rules, but it
will not do if ]-on-the-right is “less” true than ]-on-the-right. A Cut
step from L ` A ] B and A ] B ` R to L ` R, will work when ]-on-

the-right is “less” true than ]-on-the-right, but it will not work if the
mismatch is in the other direction.
2.5.4 | exercises

2.6 | proof identity
[When is proof π1 the same proof as π2 ? There are different things
one could mean by this.]
[normalisation analysis: Proof π1 is identical to proof π2 when π1
and π2 have the same normal forms. Examine this analysis of proof
identity in the context of intuitionistic logic.]
[generality analysis: Proof π1 differs from proof π2 when they have
different generality. This can be analysed in terms of flow graphs [13,
14, 15].]
[This has an application to category theory, and categories are a natural
home for models of proofs, if we think of them as structures with pro-
positions/statements as objects and proofs as arrows. (Provided that
we think of proofs as having a single premise and a single conclusion
of course. Generalisations of categories are required if we have a richer
view of what a proof might be.)]

propositional logic:
applications
3
As promised, we can move from technical details to applications of
these results. With rather a lot of proof theory under our collective
belts, we can turn our attention to philosophical issues. In this chapter,
we will look at questions such as these: How are we to understand
the distinctive necessity of logical deduction? What is the distinctively
logical ‘must’? In what way are logical rules to be thought of as defini-
tions? What can we say about the epistemology of logical truths? Can
there be genuine disagreement between rival logical theories, or are all
such discussions a dialogue of in which the participants talk different
languages and talk past one another?
In later chapters we will examine other topics, such as generality,
predication, objectivity, modality, and truth. For those, we require a
little more logical sophistication than we have covered to this point.
What we have done so far suffices to equip us to tackle the topics at
hand.
The techniques and systems of logic can be used for many different
things — we can design electronic circuits using simple boolean lo-
gic [97]. We can control washing machines with fuzzy logic [45]. We
can use substructural logics to understand syntax [16, 54, 55] — but
beyond any of those interesting applications, we can use the techniques
of logic to construct arguments, to evaluate them, and to tell us some-
thing about how beliefs, conjectures, theories and statements, fit to-
gether. It is this role of logic that is our topic.
3.1 | assertion and denial

One distinguishing feature of this application of logical techniques is
its bearing on our practices of assertion. We may express beliefs by
making assertions. When considering a conjecture, we may tentat-
ively make a hypothetical assertion — to “try it on for size” and to
see where it leads. When we use the techniques of logic to evaluate
a piece of mathematical reasoning, the components of that reasoning
are, at least in part, assertions. If we have an argument leading from
a premise A to a conclusion B, then this tells us something significant
about an assertion of A (it tells us, in part, where it leads). This also
tells us something significant about an assertion of B (it tells us, in part,
what leads to it). Can we say more than this? What is the connection
between proof (and logical consequence) and assertion? It would seem
that there is an intimate connection between proof and assertion, but
what is it?
129
Suppose that A entails B, that there is a proof of B from A. What
can we say about assertions of A and of B? If an agent accepts A, then
it is tempting to say that the agent also ought to accept B, because
B follows from A. But this is far too strong a requirement to take
seriously. Let’s consider why not:
(1) The requirement, as I have expressed it, has many counterexamples.
The requirement has the following form:
If A entails B, and I accept A, then I ought to accept B.
Notice that I have a proof from A to A. (It is a very small proof: the
identity proof.) It would follow, if I took this requirement seriously,
that if I accept A, then I ought to accept A. But there are many things
– presumably – that I accept that I ought not accept. My beliefs extend
beyond my entitled beliefs. The mere fact that I believe A does not in
and of itself, give me an entitlement, let alone, an obligation to believe
A. So, the requirement that you ought to accept the consequences of
your beliefs is altogether too strong.
This error in the requirement is corrected with a straightforward
scope distinction. Instead of saying that if A entails B and if you accept
A then you ought to accept B, we should perhaps weaken the condition
as follows:
If A entails B, then it ought to be that if I accept A then I

accept B.
We fix the mistaken scope by holding that what we accept should be

closed under entailment. But this, too, is altogether too strong, as the
following considerations show.
(2) There are many consequences of which we are unaware. It seems
that logical consequence on its own provides no obligation to believe.
Here is an example: I accept all of the axioms of Peano arithmetic (pa).
However, there are consequences of that theory that I do not accept.
I do not accept all of the consequences of those axioms. Goldbach’s
conjecture (gc) could well be a consequence of those axioms, but I am
not aware of this if it is the case, and I do not accept gc. If gc is a
consequence of pa, then there is a sense in which I have not lived up to
some kind of standard if I fail to accept it. My beliefs are not as com-
prehensive as they could be. If I believed gc, then in some important
sense I would not make any more mistakes than I have already made,
because gc is a consequence of my prior beliefs. However, it is by no
means clear that comprehensiveness of this kind is desirable.
(3) In fact, comprehensiveness is undesirable for limited agents like us.
The inference from A to A ∨ B is valid, and if our beliefs are always to
be closed under logical consequence, then for any belief we must have
infinitely many more. But consider a very long disjunction, in which
one of the disjuncts we already accept. In what sense is it desirable that
we accept this long disjunction? The belief may be too complex to even
consider, let alone, to believe or accept or assert.
130 propositional logic: applications · chapter 3

Notice that it is not a sufficient repair to demand that we merely

accept the immediate logical consequences of our beliefs. It may well
be true that logical consequence in general may be analysed in terms
of chains of immediate inferences we all accept when they are presen-
ted to us. The problems we have seen hold for immediate consequence.
The inference from the axioms of pa to Goldbach’s conjecture might
be decomposable into steps of immediate inferences. This would not
make Goldbach’s conjecture any more rationally obligatory, if we are
unaware of that proof. If the inference from A to A ∨ B is an imme-
diate inference, then logical closure licenses an infinite collection of
(irrelevant) beliefs. This point is not new. Gilbert Har-
man, for example, argues for it in
(4) Furthermore, logical consequence is sometimes impossible to check. Change in View [40].
If I must accept the consequences of my beliefs, then I must accept
all tautologies. If logical consequence is as complex as consequence in
classical first-order logic, then the demand for closure under logical
consequence can easily be uncomputable. For very many sets of state-
ments, there is no algorithm to determine whether or not a given state-
ment is a logical consequence of that set. Closure under logical con-
sequence cannot be underwritten by algorithm, so demanding it goes
beyond what we could rightly expect for an agent whose capacities are
computationally bounded.
So, these arguments show that logical closure is too strict a standard
to demand, and failure to live up to it is no failure at all. Logical con-
sequence must have some other grip on agents like us. But what could
this grip be? Consider again the case of the valid argument from A to
B, and suppose, as we did before, that an agent accepts A. What can we
say about the agent’s attitude to B? One thing that we could clearly
say is that the agent is, in some sense, committed to B. There is good
sense in saying that we are (at least implicitly) committed to the lo-
gical consequences of those things we accept. Deduction is one way of
drawing out the implicit commitments that we have, and making them
explicit. We could hold to the following two norms:
If A entails B, and I accept A, then I am committed to B.
In fact, we could accept the closure of commitment under logical con-

sequence
If A entails B, and I am committed to A, then I am com-

mitted to B.
by holding that those claims to which I am committed are all and only
the consequences of those things I accept. There are good reasons to I will slightly revise this notion later,
think of commitment in this way. However, it remains that for the but it will do for now.
reasons we have already seen, we need not think of what we accept as

conforming to the same kinds of conditions as those things to which
we are committed.
Do any norms govern what we accept? One plausible constraint
is that if some set of propositions is inconsistent, then we should not
§3.1 · assertion and denial 131

accept all of them. If X `, then accepting every member of X is some
kind of mistake. (It’s not just some kind of mistake, it’s a bad mistake.)
This is one plausible constraint on what we accept. Similarly, if we
were to assert every member of an inconsistent set X, then that act of
assertion is, in some sense to be made out, defective. But what about
collections of propositions other than inconsistent ones? If we have an
argument from A to B, does this argument constrain what we accept
in a similar way to the inconsistent collection?
To look at this closely, suppose for a moment that we have an argu-
ment from A to B. If A ` B, then if we have negation in the vocabulary,
we can conclude A, ¬B `. This tells us that accepting both A and ¬B
is a mistake. This points to a general account for valid argument. If
an argument is valid then it is a mistake to accept the premise and to
reject the conclusion.
If an agent’s cognitive state, in part, is measured in terms of those
things she accepts and those she rejects, then valid arguments constrain
those combinations of acceptance and rejection. As we have seen, a
one-premise, one-conclusion argument from A to B constrains accept-
ance/rejection by ruling out accepting A and rejecting B. This explan-
ation of the grip of valid argument has the advantage of symmetry. A
valid argument from A to B does not, except by force of habit, have to
be read as establishing the conclusion. If the conclusion is unbelievable,
then it could just as well be read as undermining the premise. Reading
the argument as constraining a pattern of accepting and rejecting gives
this symmetry its rightful place.
Now consider the connection between negation and rejection. I
have introduced rejection by means of negation, but now it is time
take away the ladder. Or to change the metaphor, let’s turn the picture
upside down. We have picked out rejection in using negation, but we
do not need to define it in that way. Instead, we will in fact be using
rejection to give an account of negation. If there are reasoning and
representing agents who do not have the concept of negation, and if it
is still appropriate for us to analyse their reasoning using a notion of
logical consequence, then we ought to take those agents as possessing
the ability to deny without having the ability to negate. This seems
plausible. As an agent accepts and rejects, she filters out information
and rules out possibilities. For any possible judgement the agent might
consider, there are three possible responses: she might accept it, she
This has nothing to do with a might reject it, or she may be undecided. To assert that A is the case
‘three valued logic’ as we do not is to express your accepting that A is the case. If you prefer, to accept
take the attitudes of some agent
to a statement to be anything that A is the case is to internalise the assertion that A. To reject A is
like ‘semantic values’ let alone not merely to fail to accept A, for we may be undecided. To deny A is
‘truth values’ of that statement. not merely to fail to assert A, for we may be silent. To accept A is to (in
part) close off the possibility of rejecting A. To accept A and then to go
on to reject A will result in a revision of your commitments, and not a
mere addition to them. Similarly, to reject A is to (in part) close off the
possibility of accepting A. To reject A and then to go on to accept A will
result in a revision of your commitments, and not a mere addition to
them. To assert A is to place the denial of A out of bounds—until you

revise your commitments by withdrawing the assertion. Similarly, to

deny A is to place the assertion of A out of bounds—again, until the
denial of A is withdrawn. An agent’s ability to do this does not require
that she has a concept of negation, giving for each statement another
statement—its negation—which can be asserted or denied, used as a
premise in reasoning, as a component of another proposition, and so
on.
Before going on to utilise the notions of assertion and denial, I should
defend the claim that we ought to treat assertion and denial on a par,
and that it is not necessary to treat the denial of a claim as the assertion
of the negation of that claim. This matters because in the sections that
follow, I will use denial in giving an account of negation. To require
of an agent a grasp of negation in order to have the capacity to deny
would threaten this account.
I will argue that the speech-act of denial is best not analysed in terms
of assertion and negation but rather, that denial is, in some sense, prior
to negation. I will provide three different considerations in favour of
this position. The first involves the case of an agent with a limited
logical vocabulary. The second argument, closely related to the first,
involves the case of the proponent of a non-classical logic. The third
will rely on general principles about the way logical consequence ra-
tionally constrains assertion and denial.
consideration one: Parents of small children are aware that the abil-
ity to refuse, deny and reject arrives very early in life. Considering
whether or not something is the case – whether to accept that some-
thing is the case or to reject it – at least appears to be an ability children
acquire quite readily. At face value, it seems that the ability to assert
and to deny, to say yes or no to simple questions, arrives earlier than
any ability the child has to form sentences featuring negation as an op-
erator. It is one thing to consider whether or not A is the case, and it is
another to take the negation ¬A as a further item for consideration and
reflection, to be combined with others, or to be supposed, questioned,
addressed or refuted in its own right. The case of early development
lends credence to the claim that the ability to deny can occur prior to
the ability to form negations. If this is the case, the denial of A, in the
mouth of a child, is perhaps best not analysed as the assertion of ¬A.
So, we might say that denial may be acquisitionally prior to neg-
ation. One can acquire the ability to deny before the ability to form
negations.
consideration two: Consider a related case. Sometimes we are con-

fronted with theories which propose non-standard accounts of nega-
tion, and sometimes we are confronted with people who endorse such
theories. These will give us cases of people who appear to reject A
without accepting ¬A, or who appear to accept ¬A without rejecting A.
If things are as they appear in these cases, then we have further reason

to reject the analysis of rejection as the acceptance of a negation. I will
consider just two cases.
supervaluationism: The supervaluationist account of truth-value gaps
enjoins us to allow for claims which are not determinately true, and
not determinately false [30, 52, 95]. These claims are those which are
true on some valuations and false on others. In the case of the super-
valuational account of vagueness, borderline cases of vague terms are
a good example. If Fred is a borderline case of baldness, then on some
valuations “Fred is bald” is true, and on others, “Fred is bald” is false.
So, “Fred is bald” is not true under the supervaluation, and it is to be
rejected. However, “Fred is not bald” is similarly true on some valu-
ations and false on others. So, “Fred is not bald” is not true under the
supervaluation, and it, too, is to be rejected. Truth value gaps provide
examples where denial and the assertion of a negation come apart. The
supervaluationist rejects A without accepting ¬A. When questioned,
she will deny A, and she will also deny ¬A. She will not accept ¬A.
The supervaluationist seems to be a counterexample to the analysis of
denial as the assertion of a negation.
dialetheism: The dialetheist provides is the dual case [60, 61, 66, 67,
69]. A dialetheist allows for truth-value gluts instead of truth-value
gaps. Dialetheists, on occasion, take it to be appropriate to assert both
A and ¬A. A popular example is provided by the semantic paradoxes.
Graham Priest’s analysis of the liar paradox, for example, enjoins us
to accept both the liar sentence and its negation, and to reject neither.
In this case, it seems, the dialetheist accepts a negation ¬A without re-
jecting A, the proposition negated. When questioned, he will assert A,
and he will also assert ¬A. He will not reject ¬A. The dialetheist, too,
seems to be a counterexample to the analysis of denial as the assertion
of a negation.
In each case, we seem to have reason to take denial to be something
other than the assertion of a negation, at least in the mouths of the su-
pervaluationist and the dialetheist. These considerations are not con-
clusive: the proponent of the analysis of rejection in terms of nega-
tion may well say that the supervaluationist and the dialetheist are
confused about negation, and that their denials really do have the con-
tent of a negation (by their own lights), despite their protestations to
the contrary. Although this is a possible response, there is no doubt
that it does violence to the positions of both the supervaluationist and
the dialetheist. We would do better to see if there is an understand-
ing of the connections between assertion, denial, acceptance, rejection
and negation which allows us to take these positions at something ap-
proaching face value. This example shows that denial may be concep-
tually separated from the assertion of a negation.
consideration three: The third consideration favouring denial over

negation is the fertility of the analysis. Once our agent is able to as-
sert and to deny, the full force of logical consequence will have its grip

on the behaviour of the agent. Asserting A has its consequences for

denials (do not deny A, and do not deny any other consequence of A).
Denying B has its consequences for assertions (do not assert B, and do
not assert any other premise leading to B). This analysis generalises to
arguments with multiple premises in the way that you would expect. There is another consideration in
More interestingly, it also generalises to arguments with multiple con- favour of taking denial as prior to
negation. See page 141 for the de-
clusions. tails.
So, let us pass on to the elaboration of the story. We can think
of assertions and denials en masse, as the normative force of logical
consequence will be explained in terms of the proprieties of different
combinations of assertions and denials. To fix terminology, we will
call collections of assertions and denials positions. A position [X : Y] The ‘game’ terminology is intentional.
is a pair of (finite) sets, X of things asserted and Y of things denied. A position is a part of a ‘scorecard’
in Brandom’s normative scorekeep-
Positions are the kinds of things that it is appropriate to evaluate. It ing [10]. A scorecard keeps track
is rare that a single assertion or denial deserves a tick of approval or of commitments and entitlements.
a black mark of opprobrium on the basis of logic alone. It is much Here, a position keeps track merely
of explicit commitments made by
more often that collections of assertions and denials — our positions way of assertion and denial. You
— stand before the tribunal of logical evaluation. The most elementary find yourself in a position on the
judgement we may make concerning a position is the following: basis of moves you have made.
More on commitment later.
identity: [A : A] is incoherent.
A position consisting of the solitary assertion of A (whatever claim A
might be) together with its denial, is incoherent.
To grasp the import of calling a position incoherent, it is vital to neither
understate it, nor to overstate it. First, we should not overstate the
claim by taking incoherent positions to be impossible. While it might
be very difficult for someone to sincerely assert and deny the same
statement in the same breath, it is by no means impossible. For ex-
ample, if we wish to refute a claim, we may proceed by means of a
reductio ad absurdum by asserting (under an assumption) that claim,
deriving others from it, and perhaps leading on to the denial of some-
thing we have already asserted. Once we find ourselves in this posi-
tion (including the assertion and the denial of the one and the same
claim) we withdraw the supposition. We may have good reasons to put
ourselves in incoherent positions, in order to manage the assertions
and denials we wish to make. To call a position incoherent is not to say
that the combination of assertions and denials cannot be made.
Conversely, it is important to not understate the claim of incoher-
ence. To call the position [X : Y] incoherent is not merely to say that it
is irrational to assert X and deny Y , or that it is some kind of bad idea.
It is much more than that. Consider the case of the position [A : A].
This position is seriously self-defeating in that to take you to assert
A is to take you to rule out denials of A (pending a retraction of that
assertion), to take you to deny A is to take you to rule out assertions of
A (pending a retraction of that denial). The incoherence in the position
is due to the connection between assertion and denial, in that to make
the one is to preclude the other. The incoherence is not simply due to
any external feature of the content of that assertion. As a matter of

fact, it seems that whenever we take someone to have denied A (and
to have in the past asserted A) we take this to be a retraction of the
assertion.
Another claim can be made about the properties of coherence. The inco-
herence of a position cannot be fixed by merely adding more assertions
or denials.
weakening: If [X, A : Y] or [X : A, Y] is coherent, then [X : Y] is too.

This tells us that incoherence is a property preserved when assertions
and denials are added to a position, and that coherence is a property
preserved when assertions and denials are removed from a position.
Contraposing the statement of weakening, we see that if the position
[X : Y] is incoherent, then merely adding the assertion of A, or adding
the denial of A does not fix things. To take a position to be coherent is
to endorse weaker positions (with fewer commitments) as coherent as
well. To take a position to be incoherent is to judge stronger positions
(with more commitments) as similarly incoherent. It follows that I
can judge a position incoherent if it contains the assertion of A and
the denial of A, without having to check all of the other assertions and
denials in that position.
Our final requirement on the relation of coherence is the converse of
the weakening condition. So, we call it
strengthening: If [X : Y] is coherent, then it either [X, A : Y] or

[X : A, Y] is coherent too.
This condition can be viewed as follows: if we have a coherent position
[X : Y], then if we cannot add A as an assertion (if [X, A : Y] is incoher-
ent) then the claim A is implicitly denied in that position. An implicit
denial is as good as an explicit denial, so since the original position was
coherent, the new position in which A is explicitly denied is coherent
as well. Taking the other horn of the disjunction, if we cannot add A
as a denial (if [X : A, Y] is incoherent) then the claim A is implicitly
asserted in that position. An implicit assertion is as good as an explicit
assertion, so since the original position was coherent, the new position
in which A is explicitly asserted is coherent as well.
These three constraints on coherence, assertion and denial are enough
to motivate a definition:
definition 3.1.1 [coherence] Given a language, we will call a relation

on the set of positions in that language a coherence relation if and only
if it satisfies the conditions of identity, weakening and strengthening.
Given a coherence relation, we will indicate the incoherence of a posi-
tion [X : Y] as follows: ‘X ` Y ’.
The turnstile notation X ` Y makes sense, as a coherence relation is

really a consequence relation. The identity condition is A ` A, the
identity sequent. The strengthening condition tells us that if X `

A, Y and X, A ` Y then X ` Y . This is the cut rule. The weakening

condition is, well, the weakening structural rule.
The rules are not exactly the kinds of rules we have seen before,
since the collections on either side of the turnstile here are not multis-
ets of statements, but sets of statements. This means that the contrac-
tion rule is implicit and does not need to be added as a separate rule: if
X, A, A ` Y then X, A ` Y , since the set X, A, A is the same set as X, A.
The premise and conclusion of a contraction rule are exactly the same
sequents.
Viewed one way, the cut rule is a transitivity condition: If A ` B
and B ` C then A ` C. The form that we have here does not look so
much like transitivity, but a strange cousin of the law of the excluded
middle: it says that it is either ok to assert A or to deny A. But it is the
same old cut rule that we have already seen. If A ` B and B ` C, then it
follows that A ` B, C and A, B ` C by weakening. The strengthening
rule tells us that A ` C. In other words, since A ` B, C (we cannot
deny B in the context [A : C]) and A, B ` C (we cannot assert B in
the context [A : C]), there is a problem with the context [A : C]. It is
incoherent: we have A ` C.
Let us consider, for a moment, the nature of the languages related by

a coherence relation, and the kinds of conditions we might find in a
coherence relation. In most applications we are concerned with, the co-
herence relation on our target vocabulary will be rich and interesting,
as assertions and denials are made in a vocabulary with robust connec-
tions of meaning. It seems to me that all of the kinds of relationships
expressible in the vocabulary of coherence make sense. We have A ` B
if asserting A and denying B is out of bounds. An example might be
‘this is a square’ and ‘this is a parallelogram’. The only way to be a
square is by being a parallelogram, and it is a mistake to deny that
something is a parallelogram while claiming it to be a square. We have
A, B ` if asserting both A and B is out of bounds. An example might be
‘it is alive’ and ‘it is dead’. To claim of something that it is both alive
and that it is dead is to not grasp what these terms mean. Examples
with multiple premises are not difficult to find. We have A, B, C ` D
when the assertion of both A, B and C together with the denial of D
is out of bounds: consider ‘it is a chess piece’, ‘it moves forward’, ‘it
captures diagonally’, and ‘it is a pawn.’ Examples with multiple con-
clusions are more interesting. We have A ` B, C if asserting A and
denying both B and C is out of bounds. Perhaps the trio ‘x is a parent’,
‘x is a mother’ and ‘x is a father’ is an example, or for a mathematical
case, we have ‘x is an number’, ‘x is zero’ and ‘x is a successor’.
The limiting cases of coherence are the situations in which we have
an incoherent position with just one assertion or denial. If we have A `
then the assertion of A by itself is incoherent. If we have ` B, then the
denial of B is incoherent. It is plausible that there are such claims, but
it is less clear that there are primitive statements in a vocabulary that
deserve to be thought of as inconsistent or undeniable on their own.
Notice, too, that it makes sense to treat any kind of vocabulary

in which assertions and denials are made as one in which assertions
and denials may together be coherent or incoherent. We may judge
discourse on any subject matter for coherence, even when we are not
clear on what the subject is about, or what it would be for the claims
in that vocabulary to be justified or warranted. Even though we may
be completely unsettled on the matter of what it might be for moral
claims to be true or justified, it can make sense for us to evaluate moral
discourse with respect to coherence. The claim that it is wrong to ac-
cuse someone of a crime without evidence (for example) may be taken
to entail that it is sometimes wrong to accuse someone of a crime. It
makes sense to take claims of wrongness and rightness to be coherent
or incoherent with other claims, even if we are unsure as to the mat-
ter of what makes moral claims true (or even if we are skeptical that
they are the kinds of claims that are ‘made true’ by anything at all). It
suffices that they may be asserted and denied.
However, as far as logic is concerned what we will haev to say will
apply as well to a circumstance in which we have the smallest relation
satisfying these conditions: in which X ` Y only when X and Y share
a proposition. In this case, the statements in the vocabulary can be
thought of as entirely logically independent. The only way a position
in this vocabulary will count as incoherent is if it has managed to as-
sert one thing and to deny it as well — not by meand of asserting or
denying something else, but by explicitly asserting A and denying A
at the very same time. In this case, the statements of the language are
entirely logically independent.
3.2 | definition and harmony

Now suppose that we have a vocabulary and the relation of coherence
upon it. Suppose that someone wants to understand the behaviour
of negation. We could incorporate the rules for negation. First, we
treat the syntax by closing the vocabulary under negation. In other
words, we add the condition that if A is a statement, then ¬A is a
statement too. This means that not only do we have the original state-
ments, and the new negations in our vocabulary, but we also have neg-
ations of negations, and so on. If the language is formed by means
of other grammatical rules, then these may also compound statements
involving negation. (So if we had conjunction, we could now form
conjunctions of negations, as well as negations of conjunctions).
To interpret the significance of claims involving negation, we must
go beyond the syntax, to semantics. In this case, we will tell you how
claims involving negation are to be used. In this case, this will amount
to evlauating positions in which negations are asserted, and positions
in which negations are denied. We may consider our traditional se-
quent rules for negation:
X ` A, Y X, A ` Y
¬L ¬R
X, ¬A ` Y X ` ¬A, Y

With our interpretation of sequents in mind, we can see that these

rules tell us something important about the coherence of claims in-
volving negation. The ¬L rule tells us that [X, ¬A : Y] is coherent only
if [X : A, Y] is coherent. The ¬R rule tells us that [X : ¬A, Y] is coher-
ent only if [X, A : Y] is coherent. In fact, we may show that if these
two conditions are the only rules governing the behaviour of the oper-
ator ‘¬’, then then we have, in some sense, defined the behaviour of ‘¬’
clearly and distinctly.
The first fact to notice is that if we extend a coherence relation on
the old language with these two rules, the relation on the new language
still satisfies the rules of identity, weakening and strengthening. We
have identity, for the new formulas: ¬A ` ¬A, since
AÀ
¬R
` ¬A, A
¬L
¬A ` ¬A
This shows us that if we have identity for A, we have identity for ¬A
too. This gives us identity for each formula consisting of one negation.
We can do the same for two, three, or more negations by an inductive
argument. For weakening, we may argue that if X ` Y , then since To show that the other connectives
X, A ` Y (by weakening for A) then X ` ¬A, Y . Similarly, X ` A, Y satisfy identity under the new regime
requires a separate argument. Hope-
gives us X, ¬A, ` Y , and we have weakening for statements with one fully the other connectives in the old
negation operator. An inductive argument gives us more statements, language survive this extension. No
as usual. general argument can be made for
this claim here.
For strengthening, we use the usual argument for the elimination
of cut. If the old language does not contain any other connectives, the
argument is straightforward: if we have X ` ¬A, Y and X, ¬A ` Y , we
reason in the usual manner: we take a derivation δ of X ` ¬A, Y and a
derivation δ 0 of X, ¬A ` Y to consist of the steps to show each sequent
from incoherent sequents in the original vocabulary, using ¬L and ¬R.
These rules satisfy the conditions for a successful cut elimination argu-
ment, and we may derive X ` Y in the usual manner in the way that
we have seen.
If the old language contains other connectives, then we may need to
be a little more subtle, and as a matter of fact, in some cases (where the
connectives in the old vocabulary behave in an idiosyncratic manner)
the cut rule will not be eliminable. In the case of a primitive vocabulary For a discussion of this phe-
(or in the case of a vocabulary in which the other connectives are well- nomenon, see Section 3.3.
behaved, in a sense to be elaborated later), the addition of the operator

¬ is completely straightforward. It the new coherent relation satisfies
the constraints of identity, weakening and strengthening.
In addition, the new connective is uniquely defined, in the sense
that if we add two negation connectives ¬1 and ¬2 satisfying these
rules, then we can reason as follows:
AÀ AÀ
¬2 R ¬1 R
` ¬2 A, A ` ¬1 A, A
¬1 L ¬2 L
¬1 A ` ¬2 A ¬2 A ` ¬1 A
§3.2 · definition and harmony 139

and it is never coherent to assert ¬1 A and deny ¬2 A or vice versa.
There is no way that they can come apart when it comes to the status of
Of course, one could assert ¬1 A assertion and denial. The result is still a coherence relation satisfying
without at the very same time the requirements. This means that we have a definition. We have not
asserting ¬2 A, but the asser-
tion of ¬2 A would have been merely stipulated some rules that the new connective is to satisfy, but
just as good as far as coherence we have fixed its behaviour as far as coherence is concerned. If there
or incoherence is concerned. are two connectives satisfying these rules, they are indistinguishable
as far as the norms of coherence of assertion and denial are concerned.
It is impossible for them todiffer in that asserting one and denying the
other is never permissible.
Is this definition available to us? It seems that again, the answer
is ‘yes,’ or that it is ‘yes’ if the original vocabulary is well enough
behaved. If we can show that the cut rule is admissible in the new
coherence relation (without assuming it), then it follows that the only
incoherence facts in the old vocabulary are those that are given by the
original coherence relation. The new vocabulary brings along with
itself new facts concerning coherence, but these merely constrain the
new vocabulary. They do not influence the old vocabulary. It follows
that if you held that some position [X : Y] (expressed in the original
vocabulary) was coherent, then this is still available to you once you
add the definition of ‘¬’. In other words, the presence of negation
does not force any new coherence facts on the old vocabulary. The
extension of the language to include negation is conservative over the
old vocabulary [6], so it may be thought of as not only a definition of
this concept of negation (as it fixes its behaviour, at least concerning
coherence) but it also may be thought of as a pure definition, rather
than a creative one.
A collection of rules for a connective that allow for a unique and
There is a huge literature on har- pure, conservative definition are said to be in harmony [64, 65, 87].
mony. There are fo defences for
weaker-than classical logics on What we have done for negation, we may do for the other connect-
grounds of harmony [25, 88, 89],
and defences of classical logic on
ives of the language of classical logic. The rules for ∧, ∨ and → also
similar grounds [53, 72, 96], and uniquely define the connectives, and are conservative: they are harmo-
discussions of conditions of har- nious in just the same way as the rules for negation. We may think of
mony for the treatment of iden-
tity and modal operators [22, 73].
these rules as purely and uniquely defining the connectives over a base
vocabulary.
Once we have these connectives in our vocabulary, we may use
them in characterising coherence. As you can see, the rules for nega-
tion convert a denial of A to an assertion of ¬A, and an assertion of A
to a denial of ¬A. So, a sequent X ` Y may be converted to an equival-
ent sequent in which all of the material is on one side of the turnstile.
We have X ` Y if and only if X, ¬Y ` (where ¬Y is the set of all of the
statements in Y , with a negation prefixed). In other words, if asserting
each member of X and denying each member of Y is out of bounds,
then so is asserting each member of X and asserting each member of
¬Y . But there is no need to privilege assertion. X ` Y is also equivalent
to ` ¬X, Y , according to which it is out of bounds to deny each member
of ¬X and to deny each member of Y .
But with the other connectives, we can go even further than this.

With conjunction, asserting each member of X is equivalent to assert- Choose you own favourite way
ing X, the conjunction of each member of X. Denying each member of finding a conjunction of each
V
member of a finite set. For an n-
of Y is W
equivalent to denying Y . So, we have X ` Y if and only if
W
membered set there are at least n!
X ` Y , if and only if X ∧ ¬ Y `, if and only if ` ¬ X ∨ Y .
V V W V W
was of doing this.
For each position [X : Y] we have a complex statment X ∧ ¬ Y
V W
which is coherent to assert if andVonly ifWthe position is coherent. We

also have a complex statement ¬ X ∨ Y that is coherent to deny if
and only if the position is coherent.
Now we can see another reason why it is difficult, but nonetheless
important, to distinguish denial and negation. Given a genuine nega- This is the fourth consideration in
tion connective ¬, the denial and the assertion of a negation are inter- favour of taking denial as prior to
negation, mentioned on page 135.
changeable: we can replace the denial of A with the assertion of ¬A at
no change to coherence. The case is completely parallel with conjunc-
tion In the presence of the conjunction connective ‘∧’, we may replace
the assertion of both A and B with the assertion of A ∧ B. However,
there are reasons why we may want the structural combination A, B of
separate statements alongside the connective combination A ∧ B. How-
ever, we have good reason to prefer the understanding of an argument
as having more than one premise, rather than having to make do with a
conjunction of statements as a premise. We have good reason to prefer
to understand the behaviour of conjunction in terms of the validity of
the rule A, B ` A ∧ B, rather than thinking of this rule as a disguised
or obfuscated version of the identity sequent A ∧ B ` A ∧ B. It is genu-
inely informative to consider the behaviour of conjunction as related
to the behaviour of taking assertions together.
If you like the jargon, you can think of conjunction as a way to
‘make explicit’ what is implicit in making more than one assertion.
There is a benefit in allowing an explicit conjunction, in that we can
then reject a conjunction without committing ourselves to rejecting
either conjunct of that conjunction. Without the conjunction connect-
ive or something like it, we cannot do express a rejection of a pair of
assertions without either rejecting one or both.
The situation with negation is parallel: we have just the same sorts
of reasons to prefer to understand the behaviour of negation in terms
of the validity of A, ¬A ` and ` A, ¬A rather than thinking of these
rules as obfuscated versions of the rules of identity and cut. Just as con-
junction makes explicit the combination of assertions negation makes . . . and disjunction makes explicit the
explicit the relationship between assertions and denials. Negation al- combination of denials . . .
lows denials to be taken up into assertoric contexts, and assertions to

be taken up in the contexts of denials.
3.3 | tonk and non-conservative extension

I have hinted at a way that this nice story can fail: the extension of
a coherence relation with a definition of a connective can fail to be
conservative. I will start with a discussion of a drastic case, and once we
have dealt with that case, we will consider cases that are less dramatic.
The drastic case involves Arthur Prior’s connective ‘tonk’ [70]. In the
§3.3 · tonk and non-conservative extension 141

context of a sequent system with sets of formulas on the left and the
Prior’s original rules were the in- right of the turnstile, we can think of tonk as ‘defined’ by the following
ferences from A to A tonk B rules:
and from A tonk B to B. X, B ` Y X ` A, Y
tonkL tonkR
X, A tonk B ` Y X ` A tonk B, Y
This pair of rules has none of the virtues of the rules for the pro-
positional connectives ∧, ∨, ¬ and →. If we add them to a coher-
ence relation on a basic vocabulary, they are not strong enough to
enable us to derive A tonk B ` A tonk B. The only way to derive
A tonk B ` A tonk B using the rules for tonk is to proceed from either
A tonk B ` A (which is derivable in turn only from B ` A, in general)
or from B ` A tonk B (which is derivable in turn only from B ` A, in
general). So, if every sequent A tonk B ` A tonk B is derivable, it is
only because every sequent B ` A is derivable. As a result, the exten-
ded relation is not a coherence relation, unless the original coherence
relation is trivial in the sense of taking every position to be incoherent!
The problem manifests with the strengthening (cut) condition as
well. In the new vocabulary we have sequents A ` A tonk B and
A tonk B ` B, by weakening we have A ` A tonk B, B and A, A tonk
B ` B, so if strengthening is available, we would have A ` B. So the
only way that strengthening is available for our new coherence relation
is if absolutely every sequent in the old vocabulary is incoherent.
This tells us that the only way that we can inherit a coherence re-
lation satisfying identity and strengthening, upon the addition of the
tonk rule is if it is trivial at the start. We should consider what happens
if we try to impose identity and weakening by fiat. Suppose that we
just mandate that the coherence relation satisfies identity and strength-
ening, in the presence of tonk. What happens?
This happens:
AÀ B`B
tonkL tonkR
A ` A tonk B A tonk B ` B
Cut
A`B
we have triviality. This time as a consequence, rather than a precondi-
tion of the presence of tonk. The
Given that we use coherence relations to draw distinctions between
positions, we are not going to wish to use tonk in our vocabulary.
Thinking of the rules as constraining assertion and denial, it is clear
why. The tonkL rule tells us that it is incoherent to assert A tonk B
whenever it is incoherent to assert B. The tonkR rule tells us that is
is incoherent to deny A tonk B whenever it is incoherent to deny A.
Now, take a position in which it is incoherent to assert B and in which
is is incoherent to deny A. The position [A : B] will do nicely. What can
we do with A tonk B? It is incoherent to assert it and to deny it. The
only way this could happen is if the original position is incoherent. In
other words, [A : B] is incoherent. That is, A ` B. We have shown our
original result in a slightly different manner: tonk is a defective con-
nective, and the only way to use it is to collapse our coherence relation

into triviality. (This is not to deny that Roy Cook’s results about logics
in which tonk rules define a connective are not interesting [18]. How-
ever, they have little to do with consequence relations as defined here,
as they rely on a definition of logical consequence that is essentially
not transitive—that is, they do not satisfy strengthening.)
Now consider a much more interesting case of nonconservative exten-
sion. Suppose that our language contains a negationlike connective ∼
satisfying the following rules
XÀ X, A `
∼L ∼R
X, ∼A ` X ` ∼A
and that otherwise, the vocabulary contains no connectives. This re-

striction is for simplicity’s sake—it eases the presentation, but is not
an essential restriction.
This connective satisfies the rules for an intuitionistic negation con-
nective. If the primitive vocabulary is governed by a coherence relation
satisfying our conditions, then adding this connective will preserve
those properties. We can derive ∼A ` ∼A as follows:
AÀ
∼L
∼A, A `
∼R
∼A ` ∼A
The other derivation of ∼A ` ∼A, via ` ∼A, A, is not available to us

becase the position [ : ∼A, A] is coherent. We cannot show that ∼∼A `
A. The rules as we have fixed them are not quite enough to specify
the coherence relation, since there is no way to verify that the relation
satisfies the condition of weakening on the new vocabulary, given that
it does on the old vocabulary. The rules as they stand are not enough
to guarantee that A, ¬A ` Y for a collection Y of statements from the
new vocabulary. So, let us add this condition by fiat. We will say that
X ` Y holds in this coherence relation if and only if we have X ` A
for some A ∈ Y , or X `. It is not too difficult to verify that this is a
genuine coherence relation.
Starting with this relation, the addition of ¬ is no longer conservative.
If we add the usual rules for ¬, we have the following two derivations.
AÀ
¬L
A, ¬A ` AÀ
∼R ¬R
¬A ` ∼A ` A, ¬A
∼L ¬L
∼∼A, ¬A ` ¬¬A ` A
¬R
∼∼A ` ¬¬A
It follows that the new collection of rules either no longer satisfies

strengthening, or the new system provides us with a derivation of
∼∼A ` A that we did not have before.
§3.3 · tonk and non-conservative extension 143

This example shows us that conservative extension, by itself, does
not give us a hard-and-fast criterion for choosing between different
rules. The classical vocabulary counts as a conservative extension over
other vocabulary if all of the connectives are well-behaved, and the
cut-elimination theorem holds. This requires not only that the new
connectives are well-behaved and harmonious, but also that the old
connectives behave properly. In this case, it is the rule for ∼ that messes
up the argument. To eliminate cuts, we must show that if our cut
formula is passive in some other rule, then we could have performed
the cut on the premises of the rule, and not the conclusion. In this case,
this is not possible. Suppose we have a cut-formula as passive in the
rule for ∼.
X, B, A `
∼L
` B, C X, B ` ∼A
Cut
X ` ∼A, C
There is no way to get to X ` ∼A, C by performing the cut before
introducing the ∼, since the premise sequent would have to be X, A ` C,
which is no longer appropriate for the application of a the ∼L rule.
` B, C X, B, A `
Cut
X, A ` C
∼L??
X ` ∼A, C
From the persepctive of natural deduction, the constraint on ∼L is

global and not merely local. If we happen to have a refutation of
X, B, A, then one step is composing (discharging) the premise A with a
∼I node to derive ∼A. Another step is composing the proof with a proof
with the two conclusions B, C to add a conclusion C but discharge the
premise B. From the point of view of circuits, there is no sense in which
todo: add a picture. one of these is done ‘before’ or ‘after’ the other.
There seems to be a real sense in which the rules for the classical
connectives naturally fit the choice of sequent structure. Once we have
settled on the ‘home’ for our connectives — in this case, sequents of
todo: expand this cryptic remark. the form X ` Y — we need to work hard to get less than classical logic.
3.4 | meaning
The idea that the rules of inference confer meaning on the logical con-
nectives is a compelling one. What can we say about this idea? We
have shown that we can introduce the logical constants into a practice
of asserting and denial in such a way that assertions and denials fea-
turing those connectives can be governed for coherence along with the
original assertions and denials. We do not have to say anything more
about the circumstances in which a negation or a conjunction or a dis-
junction is true or false, or to find any thing to which the connectives
‘correspond.’ What does this tell us about the meanings of the items
that we have introduced?

There are a number of things one could want in a theory of mean-

ings of logical connectives. (1) You could want something that connec-
ted meanings to use. (2) You could want something that connected
meanings to truth conditions. (3) You could want something that ex-
plains meaning in terms of the proprieties of translation. Let us take
these considerations in turn.
rules and use: The account of the connectives given here, as operat-
ors introduced by means of well-behaved rules concerning coherence,
gives a clear connection to the way that these connectives are used. To
be sure, it is not a descriptive account of the use of the connectives.
Instead, it is a normative account of the use of the connectives, giving
us a particular normative category (coherence) with which to judge the
practice of assertions and denials in that vocabulary. The rules for the
connectives are intimately connected to use in this way. If giving an
account of the meaning of a fragment of our vocabulary involves giv-
ing a normative account of the proprieties of the use of that vocabulary,
the rules for the connectives can at the very least be viewed as a part
of the larger story of the meaning of that vocabulary.
rules and truth conditions: Another influential strand (no, let me

be honest—it is the overwhelmingly dominant tradition) in semantics
takes the controlling category in giving a semantic theory for some
part of the language to be given in the truth conditions for statements
expressed in that part of the vocabulary. To be completely concrete, the
truth conditional account of the meaning of conjunction goes some-
thing like this:
A conjunction A ∧ B is true iff A is true and B is true.
More complex accounts say more, by replacing talk of ‘true’ by ‘true
relative to a model’ or ‘true in a world’ or ‘true given an assignment of
values to variables’ or any number of embellishments on the general
picture.
This looks nothing like the account we have given in terms of rules
governing coherence. What is the connection between truth conditions,
meaning and the rules for the connectives?
First it must be made clear that whatever we are to make the con-
nection between truth conditions and the inference rules will depend
crucially on what we make of the concept of truth. The concept of truth
is subtle and difficult. I am not thinking primarily of the philosophical
debates over the proper way to analyse the concept and to connect it to
concepts of correspondence, coherence, or whatever else. The subtlety I am well aware of accounts that
of the notion of truth comes from the paradoxes concerning truth. It attempt to keep the simple express-
ive correspondence between a claim
would be nice to think of truth as the simple expressive concept accord- A and the claim that A is true, at
ing to which an assertion of the claim that A is true had no more nor the cost of giving a non-classical
less significance than an assertion of the claim A. The liar paradox and account of the logical connect-
ives [68, 74, 75]. This non-classical
its like have put paid to that. Whatever we can say about the concept account has many virtues. Simplicity
of truth and the proper analysis of the liar paradox, it is not simple, of the concept of truth is one of
whatever option we take about the paradox. them. However, the simplicity of the
overall picture is not so obvious [29].
§3.4 · meaning 145

All of these considerations mean that an extended and judicious
discussion of truth and its properties must wait for some more logical
sophistication, once we have an account of predicates, names, identity
and quantification under our belts. For now, we must make do with a
few gestures in the general direction of a more comprehensive account.
So, consider simple instances of our rules governing conjunction:
A, B ` A ∧ B A∧BÀ A∧B`B
These do not say that a conjunction is true if and only if the conjuncts
are both true, but they come very close to doing so, given that we are
not yet using the truth predicate. These rules tell us that it is never
permissible to assert both conjuncts, and to deny the conjunction. This
is not expressed in terms of truth conditions, but for many purposes it
will have the same consequences. Furthermore, using the completeness
Well, I need to write that bit up. proofs of Section ???, they tell us that there is no model in which A and
B are satisfied and A ∧ B is not; and that there is no model in which
A ∧ B is satisfied, and in which A or B is not. If we think of satisfaction
in a model as a model for truth, then the truth-conditional account
of the meaning of connectives is a consequence of the rules. We may
think of a the reification or idealisation of a coherent position (as given
to us in the completeness proofs) as a model for what is true. We do
not need to reject truth-conditional accounts of the meanings of the
connectives. They are consequences of the definitions that we have
given. Whether or not we take this reading of the truth-conditional
account as satisfying or not will depend, of course, on what we need
the concept of truth to do. We will examine this in more detail later.
rules and translation: What can we say about the connection between
the rules for connectives and how we interpret the assertions and deni-
als of others? Here are some elementary truisms: (a) people do not
have to endorse the rules I use for negation for me to take them to
mean negation by ‘not.’ It does not seem that we settle every question
of translation by looking at these rules. Nonethless, (b) we can use
these rules as a way of making meaning more precise. We can clarify
meaning by proposing and adopting rules for connectives. Not every
use of ‘and’ fits the rules for ‘∧’. Adopting a precisely delineated co-
herence relation for an item aids communication, when it is achievable.
The rules we have seen are a very good way to be precise about the
behaviour of the connectives. (c) Most cruically, what we say about
translation depends on the status of claims of coherence. If I take what
someone says to be assertoric, I relate what they say and what I am
committed to in the one coherence relation. I keep score by keeping
track of what (by my lights) you are saying. And you do this for me.
I use a coherence relation to keep score of your judgements, and you
do the same for me. There can be disputes over the relation: you can
take some position to be coherent that I do not. This is made explicit
by our logical vocabulary. If you and I agree about the rules for clas-
sical connectives, then if we disagree over whether or not X, A ` B, Y

is coherent, then we disagree (in the context [X : Y]) over the coher-
ence of asserting A → B. Connectives are a way of articulating this
disagreement. Similarly, ¬ is a way of making explicit incompatibility
judgements of the form A, B ` or exhaustiveness judgements of the
form ` A, B.
3.5 | achilles and the tortoise

has gone wrong in the case of the reasoner who is quite prepared to
assert A and to assert A → B, but who questions or hesitates over B?
By no means is this a problem. This is well discussed [Smiley’s paper
is a very good example, as is Harman.] But it’s worth explaining ths
story in this vocabulary. Consider the person who endorses A, and B,
and even who endorses (A ∧ B) → Z. Is there a way to force her to
endorse Z? Of course not. By our lights she is already committed to
Z in that it is a consequence of what she is already committed to in
the position mentioned. However, there is no way to compel her to
endorse it.
3.6 | warrant
Discussion of warrant preservation in proof. In what way is classical
inference apt for preservation of warrant, and what is the sense in
which intuitionstic logic is appropriate. [Preservation warrant in the
case of an argument from X to A. These are verificationally transparent
arguments. Preservation of diswarrant in the case of an argument
from B to Y . These are the falsificationally transparent arguments. Both
are nice. But neither is enough.]
3.7 | gaps and gluts

Now, I have asked you to conisder adding to your vocabulary, a con-
nective ¬ satisfying the rules for negation. This is actually a very
strong requirement. Gaps and gluts, supervaluationism and subvalu-
ationism: a discussion of how the setting here helps explain the signi-
ficance of these approaches.
3.8 | realism
Discussion of the status of models and truth this account. Are we real-
ists? Are these merely useful tools, or something more? (Discussion
of Blackburn’s quasi-realism and its difficulties with logic here. The
Frege/Geach problem is discussed at this point, if not before.)
§3.5 · achilles and the tortoise 147

part ii
quantifiers, identity
and existence
149
quantifiers:
tools & techniques
4
4.1 | predicate logic
4.1.1 | rules
4.1.2 | what do the rules mean?
Mark Lance, makes the point in his paper “Quantification, Substitu-
tion, and Conceptual Content” [48] that an inference-first account of
quantifiers is sensible, but it isn’t the kind of “substitutional” account
oft mentioned. If we take the meaning of (∀x)A to be given by what
one might infer from it, and use to infer to it, then one can infer from
it to each of its instances (and furthermore, to any other instances we
might get as we might further add to the language). What one might
use to infer to (∀x)A is not just each of the instances A[x/n1 ], A[x/n2 ],
etc. (Though that might work in some restricted circumstances, clearly
it is unwieldy at best and wrong-headed at worst.) The idea behind
the rules is that to infer to (∀x)A you need not just an instance (or
all instances). You need something else. You need to have derived an
instance in a general way. That is, you need to have a derivation of
A[x/n] that applies independently of any information “about n.” (It
would be nice to make some comment about how this sidesteps all of
the talk about “general facts” in arguments about what you need to
know to know that (∀x)A apart from knowing that A[x/n] for each par-
ticular n, but to make that case I would need more space and time than
I have at hand at present.)
4.2 | identity and existence

A section on the addition of identity, discussing options for adding an
existence predicate if we countenance “non-denoting terms.”
[Xa]i
φ(a) a=b ·
·
=E ·
φ(b) Xb
=I,i
a=b
The crucial side-condition in =I is that the predicate variable X does

not appear elsewhere in the premises. In other words, if an arbitrary
property holds of a, it must hold of b too. The only way for this to
hold is, of course, for a to be identical to b. Here is a proof using these
151
rules.
[Xa]1 a=b
=E
Xb b=c
=E
Xc
=I,1
a=c
The sequent rules are these:
Γ ` φ(a), ∆ Γ 0 , φ(b) ` ∆ 0 Γ, Xa ` Xb, ∆
=L =R
Γ, Γ 0 , a = b ` ∆, ∆ 0 Γ ` a = b, ∆
The side condition in =R is that X does not appear in Γ or ∆.
Notice that we need to modify the subformula property further,
since the predicate variable does not appear in the conclusion of =R,
and more severely, the predicate φ does not appear in the conclusion
of =L Here is an example derivation:
Xa ` Xa Xb ` Xb
¬R ¬L
` ¬Xa, Xa Xb, ¬Xb `
=L
a = b, Xb ` Xa
=R
a=b`b=a
If we have a cut on a formula a = b which is active in both premises of
that rule:
· · 0 · 00
· δX ·δ ·δ
· · ·
Γ, Xa ` Xb, ∆ Γ ` φ(a), ∆ 0
0
Γ , φ(b) ` ∆ 00
00
=R =L
Γ ` a = b, ∆ Γ 0 , Γ 00 , a = b ` ∆ 0 , ∆ 00
Cut
Γ, Γ 0 , Γ 00 ` ∆, ∆ 0 , ∆ 00
we can eliminate it in favour of two cuts on the formulas φ(a) and φ(b).
To do this, we modify the derivation δX to conclude Γ, φ(a) ` φ(b), ∆,
which we can by globally replacing Xx by φ(x). The result is still a
derivation. We call it δφ . Then we may reason as follows:
· · 0
· δφ ·δ
· ·
Γ, φ(a) ` φ(b), ∆ Γ ` φ(a), ∆ 0
0 · 00
·δ
Cut ·
Γ, Γ 0 ` φ(b), ∆, ∆ 0 Γ , φ(b) ` ∆ 00
00
Cut
Γ, Γ 0 , Γ 00 ` ∆, ∆ 0 , ∆ 00
4.3 | models
Traditional Tarski models for classical logic and Kripke intuitionistic
logic motivated on the basis of the proof rules we have introduced. A
presentation of more “modest” finitist semantics in which the domain
is finite at each stage of evaluation, given by the sequent system. A
context of evaluation, in this kind of model, is a finite entity, including
information about “how to go on.”
152 quantifiers: tools & techniques · chapter 4

4.4 | arithmetic
Peano and Heyting arithmetic are introduced as a simple example of
a rigorous system with enough complexity to be truly interesting. Dis-
cussion of the consistency proof for arithmetic. I will point to Gödel’s
incompleteness results, and show how pa + Con(pa) can be seen as
adding to the stock of arithmetic inference principles, in the same way
that adding stronger induction principles does. [32]
4.5 | second order quantification

Second order quantification introduced, and Girard/Tait normalisa-
tion for second order logic.
§4.4 · arithmetic 153

quantifiers: applications
5
5.1 | objectivity
Substitutional and objectual quantification and objectivity. The ac-
count of quantification given here isn’t first-and-foremost objectual in
the usual sense, but it can be seen as a semantically anti-realist (that is,
not truth-first) reading of standard, objectual quantification. A defence
of this analysis, and the a discussion of the sense in which this provides
properly universal quantification, independently of any consideration
of whether the class of “everything” is a set or can constitute a model.
5.2 | explanation
How do we prove a universal claim? By deriving it. Explanation of
the reasons why people like “universal facts” and why this is better
understood in terms prior to commitment to fact-like entities.
5.3 | relativity
A discussion of ontological relativity, Quine’s criterion for commit-
ment to objects. (We discuss the sense in which logic alone does
not force the existence of any number of things, and why the choice
of ontology depends on the behaviour of names and variables in your
theory.)
5.4 | existence
A discussion of a neo-Carnapian view that to adopt inference prin-
ciples concerning numbers, say, is free. Relating to current discussion
of structuralism, plenitudinous platonism and fictionalism in mathem-
atics
5.5 | consistency
The essential incompleteness and extendibility of our inference prin-
ciples.
5.6 | second order logic

A discussion of the “standard model” of second order logic, and its in-
terpretation as an “ideal” endpoint for language expansion. (Treating
155
the range of the quantifiers in second order logic as an ideal endpoint
of conceptual expansion.) A discussion of why standard Second Or-
der Logic, so construed, is essentially non-axiomatisable.
156 quantifiers: applications · chapter 5

part iii
modality and truth
157
modality and truth:

tools & techniques 6
6.1 | simple modal logic
6.2 | modal models
6.3 | quantified modal logic
6.4 | truth as a predicate
159
modality and truth:

applications
7
7.1 | possible worlds
7.2 | counterparts
7.3 | synthetic necessities
7.4 | two dimensional semantics
7.5 | truth and paradox
161
references
[1] alan ross anderson and nuel d. belnap. Entailment: The Logic This bibliography is also available
of Relevance and Necessity, volume 1. Princeton University Press, online at http://citeulike.org/
Princeton, 1975. user/greg_restall/tag/ptp.
citeulike.org is an interesting
collaborative site for sharing pointers
[2] alan ross anderson, nuel d. belnap, and j. michael dunn. Entail-
to the academic literature.
ment: The Logic of Relevance and Necessity, volume 2. Princeton
University Press, Princeton, 1992.
[3] h. p. barendregt. “Lambda Calculi with Types”. In samson abramsky,

dov gabbay, and t. s. e. maibaum, editors, Handbook of Logic in
Computer Science, volume 2, chapter 2, pages 117–309. Oxford Univer-
sity Press, 1992.
[4] jc beall and greg restall. “Logical Pluralism”. Australasian Journal

of Philosophy, 78:475–493, 2000.
[5] jc beall and greg restall. Logical Pluralism. Oxford University

Press, Oxford, 2006.
[6] nuel d. belnap. “Tonk, Plonk and Plink”. Analysis, 22:130–134, 1962.
[7] nuel d. belnap. “Display Logic”. Journal of Philosophical Logic,

11:375–417, 1982.
[8] richard blute, j. r. b. cockett, r. a. g. seely, and t. h. trimble.

“Natural Deduction and Coherence for Weakly Distributive Categories”.
Journal of Pure and Applied Algebra, 13(3):229–296, 1996. Available
from ftp://triples.math.mcgill.ca/pub/rags/nets/nets.ps.
gz.
[9] david bostock. Intermediate Logic. Oxford University Press, 1997.
[10] robert b. brandom. Making It Explicit. Harvard University Press,

1994.
[11] robert b. brandom. Articulating Reasons: an introduction to infer-

entialism. Harvard University Press, 2000.
[12] l. e. j. brouwer. “On the Significance of the Principle of Excluded

Middle in Mathematics, Especially in Function Theory”. In From Frege
to Gödel: a a source book in mathematical logic, 1879–1931, pages
334–345. Harvard University Press, Cambridge, Mass., 1967. Originally
published in 1923.
[13] samuel r. buss. “The Undecidability of k-provability”. Annals of Pure

and Applied Logic, 53:72–102, 1991.
[14] a. carbone. “Interpolants, Cut Elimination and Flow graphs for the
Propositional Calculus”. Annals of Pure and Applied Logic, 83:249–
299, 1997.
163
[15] a. carbone. “Duplication of directed graphs and exponential blow up of
proofs”. Annals of Pure and Applied Logic, 100:1–67, 1999.
[16] bob carpenter. The Logic of Typed Feature Structures. Cambridge

Tracts in Theoretical Computer Science 32. Cambridge University Press,
1992.
[17] alonzo church. The Calculi of Lambda-Conversion. Number 6 in

Annals of Mathematical Studies. Princeton University Press, 1941.
[18] roy t. cook. “What’s wrong with tonk(?)”. Journal of Philosophical

Logic, 34:217–226, 2005.
[19] haskell b. curry. Foundations of Mathematical Logic. Dover, 1977.

Originally published in 1963.
[20] dirk van dalen. “Intuitionistic Logic”. In dov m. gabbay and franz
günthner, editors, Handbook of Philosophical Logic, volume III.
Reidel, Dordrecht, 1986.
[21] vincent danos and laurent regnier. “The Structure of Multiplicat-

ives”. Archive of Mathematical Logic, 28:181–203, 1989.
[22] nicholas denyer. “The Principle of Harmony”. Analysis, 49:21–22,

1989.
[23] kosta došen. “The First Axiomatisation of Relevant Logic”. Journal of

Philosophical Logic, 21:339–356, 1992.
[24] kosta došen. “A Historical Introduction to Substructural Logics”. In

peter schroeder-heister and kosta došen, editors, Substructural
Logics. Oxford University Press, 1993.
[25] michael dummett. The Logical Basis of Metaphysics. Harvard

University Press, 1991.
[26] j. michael dunn. “Relevance Logic and Entailment”. In dov m.

gabbay and franz günthner, editors, Handbook of Philosophical
Logic, volume 3, pages 117–229. Reidel, Dordrecht, 1986.
[27] j. michael dunn and greg restall. “Relevance Logic”. In dov m.

gabbay, editor, Handbook of Philosophical Logic, volume 6, pages
1–136. Kluwer Academic Publishers, Second edition, 2002.
[28] roy dyckhoff. “Contraction-Free Sequent Calculi for Intuitionistic

Logic”. Journal of Symbolic Logic, 57:795–807, 1992.
[29] hartry field. “Saving the Truth Schema Paradox”. Journal of

Philosophical Logic, 31:1–27, 2002.
[30] kit fine. “Vaguness, Truth and Logic”. Synthese, 30:265–300, 1975.
Reprinted in Vagueness: A Reader [46].
[31] f. b. fitch. Symbolic Logic. Roland Press, New York, 1952.
[32] torkel franzén. Inexhaustibility: A Non-Exhaustive Treatment,

volume 16 of Lecture Notes in Logic. Association for Symbolic Logic,
2004.
164 references
[33] gerhard gentzen. “Untersuchungen über das logische Schliessen”.

Math. Zeitschrift, 39:176–210 and 405–431, 1934. Translated in The
Collected Papers of Gerhard Gentzen [34].
[34] gerhard gentzen. The Collected Papers of Gerhard Gentzen. North

Holland, 1969. Edited by M. E. Szabo.
[35] jean-yves girard. “Linear Logic”. Theoretical Computer Science,

50:1–101, 1987.
[36] jean-yves girard. Proof Theory and Logical Complexity. Bibliopolis,

Naples, 1987.
[37] jean-yves girard, yves lafont, and paul taylor. Proofs and Types,
volume 7 of Cambridge Tracts in Theoretical Computer Science. Cam-
bridge University Press, 1989.
[38] ian hacking. “What is Logic?”. The Journal of Philosophy, 76:285–

319, 1979.
[39] chris hankin. Lambda Calculi: A Guide for Computer Scientists,

volume 3 of Graduate Texts in Computer Science. Oxford University
Press, 1994.
[40] gilbert harman. Change In View: Principles of Reasoning. Bradford

Books. MIT Press, 1986.
[41] jean van heijenoort. From Frege to Gödel: a a source book in math-
ematical logic, 1879–1931. Harvard University Press, Cambridge,
Mass., 1967.
[42] arend heyting. Intuitionism: An Introduction. North Holland,

Amsterdam, 1956.
[43] w. a. howard. “The Formulae-as-types Notion of Construction”.

In j. p. seldin and j. r. hindley, editors, To H. B. Curry: Essays on
Combinatory Logic, Lambda Calculus and Formalism, pages 479–490.
Academic Press, London, 1980.
[44] colin howson. Logic with Trees: An introduction to symbolic logic.

Routledge, 1996.
[45] rolf isermann. “On Fuzzy Logic Applications for Automatic Control,
Supervision, and Fault Diagnosis”. IEEE Transactions on Systems,
Man, and Cybernetics—Part A: Systems and Humans, 28:221–235,
1998.
[46] rosanna keefe and peter smith. Vagueness: A Reader. Bradford

Books. MIT Press, 1997.
[47] william kneale and martha kneale. The Development of Logic.

Oxford University Press, 1962.
[48] mark lance. “Quantification, Substitution, and Conceptual Content”.

Noûs, 30(4):481–507, 1996.
[49] e. j. lemmon. Beginning Logic. Nelson, 1965.
165
[50] paola mancosu. From Brouwer to Hilbert. Oxford University Press,
1998.
[51] edwin d. mares. Relevant Logic: A Philosophical Interpretation.

Cambridge University Press, 2004.
[52] vann mcgee. Truth, Vagueness and Paradox. Hackett Publishing

Company, Indianapolis, 1991.
[53] peter milne. “Classical Harmony: rules of inference and the meaning
of the logical constants”. Synthese, 100:49–94, 1994.
[54] michiel moortgat. Categorial Investigations: Logical Aspects of the

Lambek Calculus. Foris, Dordrecht, 1988.
[55] glyn morrill. Type Logical Grammar: Categorial Logic of Signs.

Kluwer, Dordrecht, 1994.
[56] sara negri and jan von plato. Structural Proof Theory. Cambridge
University Press, Cambridge, 2001. With an appendix by Aarne Ranta.
[57] i. e. orlov. “The Calculus of Compatibility of Propositions (in Rus-

sian)”. Matematicheskiı̌ Sbornik, 35:263–286, 1928.
[58] victor pambuccian. “Early Examples of Resource-Consciousness”.

Studia Logica, 77:81–86, 2004.
[59] francesco paoli. Substructural Logics: A Primer. Trends in Logic.

Springer, May 2002.
[60] terence parsons. “Assertion, Denial, and the Liar Paradox”. Journal
of Philosophical Logic, 13:137–152, 1984.
[61] terence parsons. “True Contradictions”. Canadian Journal of Philo-

sophy, 20:335–354, 1990.
[62] francis j. pelletier. “A Brief History of Natural Deduction”. History

and Philosophy of Logic, 20:1–31, March 1999.
[63] dag prawitz. Natural Deduction: A Proof Theoretical Study. Almqv-

ist and Wiksell, Stockholm, 1965.
[64] dag prawitz. “Towards a Foundation of General Proof Theory”.

In patrick suppes, leon henkin, athanase joja, and gr. c. moisil,
editors, Logic, Methodology and Philosophy of Science IV. North
Holland, Amsterdam, 1973.
[65] dag prawitz. “Proofs and the Meaning and Completeness of the Logical
Constants”. In e. saarinen j. hintikka, i. niiniluoto, editor, Essays
on Mathematical and Philosophical Logic, pages 25–40. D. Reidel,
1979.
[66] graham priest. “The Logic of Paradox”. Journal of Philosophical

Logic, 8:219–241, 1979.
[67] graham priest. “Inconsistencies in Motion”. American Philosophical

Quarterly, 22:339–345, 1985.
166 references
[68] graham priest. In Contradiction: A Study of the Transconsistent.

Martinus Nijhoff, The Hague, 1987.
[69] graham priest, richard sylvan, and jean norman, editors. Paracon-
sistent Logic: Essays on the Inconsistent. Philosophia Verlag, 1989.
[70] arthur n. prior. “The Runabout Inference-Ticket”. Analysis, 21:38–

39, 1960.
[71] stephen read. Relevant Logic. Basil Blackwell, Oxford, 1988.
[72] stephen read. “Harmony and Autonomy in Classical Logic”. Journal

of Philosophical Logic, 29:123–154, 2000.
[73] stephen read. “Identity and Harmony”. Analysis, 64(2):113–115,

2004.
[74] greg restall. “Deviant Logic and the Paradoxes of Self Reference”.
Philosophical Studies, 70:279–303, 1993.
[75] greg restall. On Logics Without Contraction. PhD thesis,

The University of Queensland, January 1994. Available at
http://consequently.org/writing/onlogics.
[76] greg restall. An Introduction to Substructural Logics. Routledge,

2000.
[77] greg restall. “Carnap’s Tolerance, Meaning and Logical Pluralism”.

Journal of Philosophy, 99:426–443, 2002.
[78] greg restall. Logic. Fundamentals of Philosohphy. Routledge, 2006.
[79] fred richman. “Intuitionism as Generalization”. Philosophia Math-

ematica, 5:124–128, 1990.
[80] edmund robinson. “Proof Nets for Classical Logic”. Journal of Logic
and Computation, 13(5):777–797, 2003.
[81] richard routley, val plumwood, robert k. meyer, and ross t.

brady. Relevant Logics and their Rivals. Ridgeview, 1982.
[82] moses schönfinkel. “Über die Bausteine der mathematischen Logik”.

Math. Annallen, 92:305–316, 1924. Translated and reprinted as “On the
Building Blocks of Mathematical Logic” in From Frege to Gödel [41].
[83] dana scott. “Lambda Calculus: Some Models, Some Philosophy”.

In j. barwise, h. j. keisler, and k. kunen, editors, The Kleene Sym-
posium, pages 223–265. North Holland, Amsterdam, 1980.
[84] d. j. shoesmith and t. j. smiley. Multiple Conclusion Logic. Cam-

bridge University Press, Cambridge, 1978.
[85] r. m. smullyan. First-Order Logic. Springer-Verlag, Berlin, 1968.

Reprinted by Dover Press, 1995.
[86] w. w. tait. “Intensional Interpretation of Functionals of Finite Type I”.

Journal of Symbolic Logic, 32:198–212, 1967.
167
[87] neil tennant. Natural Logic. Edinburgh University Press, Edinburgh,
1978.
[88] neil tennant. Anti-Realism and Logic: Truth as Eternal. Clarendon

Library of Logic and Philosophy. Oxford University Press, 1987.
[89] neil tennant. The Taming of the True. Clarendon Press, Oxford,
1997.
[90] a. s. troelstra. Lectures on Linear Logic. csli Publications, 1992.
[91] a. s. troelstra and h. schwichtenberg. Basic Proof Theory,

volume 43 of Cambridge Tracts in Theoretical Computer Science.
Cambridge University Press, Cambridge, second edition, 2000.
[92] a. m. ungar. Normalization, cut-elimination, and the theory of

proofs. Number 28 in csli Lecture Notes. csli Publications, Stanford,
1992.
[93] igor urbas. “Dual-Intuitionistic Logic”. Notre Dame Journal of

Formal Logic, 37:440–451, 1996.
[94] alasdair urquhart. “Semantics for Relevant Logics”. Journal of

Symbolic Logic, 37:159–169, 1972.
[95] achillé varzi. An Essay in Universal Semantics, volume 1 of Topoi

Library. Kluwer Academic Publishers, Dordrecht, Boston and London,
1999.
[96] alan weir. “Classical Harmony”. Notre Dame Journal of Formal

Logic, 27(4):459–482, 1986.
[97] j. eldon whitesitt. Boolean algebra and its applications. Addison–

Wesley Pub. Co., Reading, Mass., 1961.
[98] edward n. zalta. “Gottlob Frege”. In edward n. zalta, editor,

Stanford Encyclopedia of Philosophy. Stanford University, 2005.
168 references

Proof Theory and Philosophy

Uploaded by

Copyright:

Available Formats

Proof Theory and Philosophy

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proof Theory and Philosophy

Uploaded by

Copyright:

Available Formats

proof theory

version of september 18, 2006

1 Why Proof Theory? | 9

Part I Propositional Logic

3 Propositional Logic: Applications | 129

Part II Quantifiers, Identity and Existence

5 Quantifiers: Applications | 155

Part III Modality and Truth

7 Modality and Truth: Applications | 161

an introduction to logic from a non-partisan, pluralist, proof-

an introduction to the applications of proof theory: We will

material, work through the exercises: especially the basic, intermedi-

why proof theory?

example 1: meaning. Suppose you want to know when someone is

pp or qq is true if and only if ppq is true or pqq is true.

Perhaps you have seen this information presented in a truth-table.

Clearly, this table can be used to distinguish between some uses of

example 2: generality. It is a commonplace that it is impossible or

10 why proof theory? · chapter 1

example 3: modality. A third example is similar. Philosophical dis-

When we need to refer to the collection of all atomic formulas, we will

Given a conditional formula, and its antecedent, its con-

Given a conditional formula whose consequent is also a

16 propositional logic: tools & techniques · chapter 2

Instead of that mouthful, we will use variables to talk generally about

2.1.1 | proofs for conditionals

§2.1 · natural deduction for conditionals 17

Figure 2.1: natural deduction rules for conditionals

We have motivated two rules of inference. These rules are dis-

18 propositional logic: tools & techniques · chapter 2

the assumptions used in the proof of B except for the instances of A

definition 2.1.1 [proofs for conditionals] A proof is a tree, whose

» Any formula A is a proof, with premise A and conclusion A.

This is a recursive definition, in just the same manner as the recursive

Figure 2.2: three implicational proofs

Figure 2.2 gives three proofs of implicational proofs constructed using

§2.1 · natural deduction for conditionals 19

constructing proofs top-down: You start with the assumptions

constructing proofs bottom-up: Start with the conclusion, and

20 propositional logic: tools & techniques · chapter 2

Here, we assume A, and then, we infer B → A discharging all of the

definition 2.1.2 [discharge policy] A discharge policy may either

definition 2.1.3 [discharge in proofs] An proof in which every dis-

§2.1 · natural deduction for conditionals 21

We shall see later that there is no linear proof from A → (A → B), A

definition 2.1.4 [multiset] Given a class X of objects (such as the class

definition 2.1.5 [comparing multisets] When M and M 0 are multis-

We use finite multisets as a part of a discriminating analysis of proofs

22 propositional logic: tools & techniques · chapter 2

need to consider infinite multisets in this section, as multisets repres-

definition 2.1.6 [argument] An argument X ∴ A is a structure con-

Here are some features of validity.

lemma 2.1.7 [validity facts] Let v-validity be any of linear, relevant,

§2.1 · natural deduction for conditionals 23

In this section, on the other hand, we will not go beyond be conceptual

24 propositional logic: tools & techniques · chapter 2

conclusion A, the collection of direct proofs – those that go straight

2.1.2 | normal proofs

§2.1 · natural deduction for conditionals 25

definition 2.1.8 [normal proof] A proof is normal if and only if the