Course Notes
Course Notes
Rebecca Weber
Spring 2011
Contents
1 Introduction 3
1.1 Mindset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Some History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Some References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 A Little Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 7
2.1 First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Recursion and Induction . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Some Notes on Proofs and Abstraction . . . . . . . . . . . . . . . . 29
3 Dening Computability 33
3.1 Functions, Sets, and Sequences . . . . . . . . . . . . . . . . . . . . 33
3.2 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Partial Recursive Functions . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Coding and Countability . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 The Church-Turing Thesis . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Other Denitions of Computability . . . . . . . . . . . . . . . . . . 47
4 Working with Computable Functions 57
4.1 A Universal Turing Machine . . . . . . . . . . . . . . . . . . . . . . 57
4.2 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 The Recursion Theorem . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Unsolvability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Computable and Computably Enumerable Sets 71
5.1 Dovetailing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Computing and Enumerating . . . . . . . . . . . . . . . . . . . . . 72
5.3 Noncomputable Sets Part I . . . . . . . . . . . . . . . . . . . . . . . 76
1
2 CONTENTS
5.4 Noncomputable Sets Part II: Simple Sets . . . . . . . . . . . . . . . 77
6 Turing Reduction and Posts Problem 79
6.1 Reducibility of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Finite Injury Priority Arguments . . . . . . . . . . . . . . . . . . . 82
7 Turing Degrees 91
7.1 Turing Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Relativization and the Turing Jump . . . . . . . . . . . . . . . . . . 92
8 More Advanced Results 97
8.1 The Limit Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2 The Arslanov Completeness Criterion . . . . . . . . . . . . . . . . . 99
8.3 c Modulo Finite Dierence . . . . . . . . . . . . . . . . . . . . . . . 101
9 Areas of Research 105
9.1 Lattice-Theoretic Properties . . . . . . . . . . . . . . . . . . . . . . 105
9.2 Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.3 Some Model Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.4 Computable Model Theory . . . . . . . . . . . . . . . . . . . . . . . 124
9.5 Reverse Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . 127
A Mathematical Asides 137
A.1 The Greek Alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.2 Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.3 Cantors Cardinality Proofs . . . . . . . . . . . . . . . . . . . . . . 138
Bibliography 141
Chapter 1
Introduction
This chapter is one I expect you will initially skim. The rst section I hope you will
come back to halfway through the course, to get a high-level view of the subject; it
may not make total sense before the course begins.
1.1 Mindset
What does it mean for a function or set to be computable?
Computability is a dynamic eld. I mean that in two ways. One, of course,
is that research into computability is ongoing and varied. However, I also mean
that the mindset when working in computability is dynamic rather than static. The
objects in computability are rarely accessible all at once or in their exact form.
Rather, they are approximated or enumerated. Sets will be presented element by
element; a functions output must be waited for, and indeed may never come.
The aspect of computability theory that tends to bother people the most is that
it is highly nonconstructive. By that I mean many proofs are existence proofs rather
than constructive proofs: when we say a set is computable, we mean an algorithm
exists to compute it, not that we necessarily have such an algorithm explicitly. One
common application of this is being unconcerned when a program requires some
magic number to operate correctly: for example, a value n such that on inputs at
least n two functions are equal, though they might dier on inputs below n. We
think of a eet of programs, each guessing a dierent value; if we can show such
a value exists we know one of those programs (indeed, innitely many) will operate
correctly, and that is all we care about. This is called nonuniformity and can inhibit
some further uses of the algorithm, so we always pay attention to whether or not
processes are uniform.
The other initially troublesome aspect, which perhaps only bothers computabil-
ity theorists because no one else sees it, is that computability uses self-reference in
very strong and perhaps illegal-looking ways. For now we will leave this to 4.4.
3
4 CHAPTER 1. INTRODUCTION
The primary tool in computability is called a priority argument (see 6.2). This
is a ramped-up version of a diagonal argument, such as Cantors proof the reals are
uncountable (Appendix A.3). Essentially, we break up what we want to accomplish
in a construction into an innite collection of requirements to meet. The require-
ments have a priority ordering, where lower-priority requirements are restricted from
doing anything that harms higher-priority requirements, but not vice-versa. As long
as each requirement will only cause harm nitely many times, and each can recover
from a nite number of injuries, the construction will succeed. This allows us to
work with information that is being approximated and will throughout the con-
struction be incomplete and possibly incorrect: the requirements must act based on
the approximation and thus might act wrongly or have their work later undone by
a higher-priority requirement. Proofs that priority constructions accomplish their
goals are done by induction, which also works step by step.
As a simple example of the sorts of conicts that arise, consider building a set A
by putting numbers into it stage by stage during a construction. We may have some
requirements that want to put elements into A, say, to cause it to have nonempty
intersection with another set. Then we may have requirements that are trying to
maintain computations that are based on A that say things like if 5 is in A,
output 1, and if not, output 0. If one requirement wants to put 10 into A to cause
an intersection and another wants to keep 10 out to preserve a computation, priority
allows us to break the tie. Regardless of which one wins the other has to recover:
the one that wanted 10 out would need to get its computation back or be able to
use a dierent one, and the one that wanted 10 in would need to be able to use a
dierent element to create the intersection.
The point of a priority argument is that we can make assertions about the
computability of the object we are building. Computability theorists are concerned
not only with whether an object exists but with how complicated it must be.
1.2 Some History
We begin with the mathematical philosophy of formalism, which holds that all
mathematics is just symbol-pushing without deeper meaning (to be contrasted with
Platonism, which holds that mathematics represents something real, and in partic-
ular that even statements we cant prove or disprove with our mathematical axioms
are true or false in reality). In 1910, Whitehead and Russell published the Principia
Mathematica, which was in part an eort to put all of mathematics into symbolic
form.
Would this take the creativity out of mathematics? If you can formalize ev-
erything, you can generate all possible theorems by applying your set of logical
deduction rules to sets of axioms. Repeat, adding your conclusions at every step to
the axioms. It would take forever, but would be totally deterministic and complete.
1.2. SOME HISTORY 5
In 1900, David Hilbert gave a famous talk in which he listed problems he thought
should direct mathematical eort. His tenth problem, paraphrased, was to nd
an algorithm to determine whether any given polynomial equation with integer
coecients (a Diophantine equation) has an integer solution. This was, again, part
of the program to make all mathematics computational. People were once famous
for being able to solve lots of quadratic equations, but the discovery of the quadratic
formula put an end to that. Could we nd such a formula for any degree equation?
Gdel showed Whitehead and Russells quest to formalize mathematics was
doomed to fail. His First Incompleteness Theorem shows any suciently strong
axiomatic system has true but unprovable theorems: theorems that would never
appear in the list being generated by automated deduction. Furthermore, what is
meant by suciently strong is well within the bounds of what mathematicians
would consider reasonable axioms for mathematics.
Church showed even Hilberts more modest goal of mechanizing nding roots of
polynomials was impossible [8]. However, Hilberts tenth problem didnt acknowl-
edge the possibility of such an algorithm simply not existing; at the time, there was
no mathematical basis to approach such questions.
Hilberts phrasing was as follows:
Given a Diophantine equation with any number of unknown quanti-
ties and with rational integral
1
numerical coecients: To devise a pro-
cess according to which it can be determined in a nite number of oper-
ations whether the equation is solvable in rational integers.
Churchs response:
There is a class of problems of elementary number theory which can
be stated in the form that it is required to nd an eectively calculable
function f of n positive integers, such that f(x
1
, . . . , x
n
) = 2 is a nec-
essary and sucient condition for the truth of a certain proposition of
elementary number theory involving x
1
, . . . , x
n
as free variables. [foot-
note: The selection of the particular positive integer 2 instead of some
other is, of course, accidental and non-essential.]
... The purpose of the present paper is to propose a denition of
eective calculability which is thought to correspond satisfactorily to
the somewhat vague intuitive notion in terms of which problems of this
class are often stated, and to show, by means of an example, that not
every problem of this class is solvable.
That is, it may not be possible to devise Hilberts desired process, and in fact
it is not, though that was shown much later. Churchs major contribution is the
1
rational integer = integer.
6 CHAPTER 1. INTRODUCTION
point that we need some formal notion of nite process to answer Hilbert this is
his eective calculability.
Church proposes two options in this paper: the lambda calculus, and what would
later be called primitive recursive functions. Shortly thereafter Kleene proposed
what we now call the partial recursive functions [32]. It was not widely accepted at
the time that any was a good characterization of eectively computable, however.
It was not until Turing developed his Turing machine [59], which was accepted
as a good characterization, and it was proved that Turing-computable functions,
lambda-computable functions, and partial recursive functions are the same class,
that the functional denitions were accepted. All three of these formalizations of
computability are studied in Chapter 3. The idea that not all problems are solvable
comes up in Chapter 4, along with many of the tools needed in such proofs.
This area of study took on a life of its own beyond simply answering Hilberts
challenge (often the way new elds of mathematics are introduced), becoming known
as computability theory or recursion theory. Chapters 58 explore some of the
additional topics and fundamental results of the area, and Chapter 9 contains a
survey of the sorts of questions of current interest to computability theorists.
1.3 Some References
These notes owe a great debt to a small library of logic books. For graduate- and
research-level work I regularly refer to Classical Recursion Theory by P.G. Odifreddi
[49], Theory of Recursive Functions and Eective Computability by H. Rogers [51],
and Recursively Enumerable Sets and Degrees by R.I. Soare [56]. The material in
here owes a great deal to those three texts. More recently, I have enjoyed A. Nies
book Computability and Randomness [48].
In how to present such material to undergraduates, I was inuenced by such
books as Computability and Logic by Boolos, Burgess, and Jerey [4], Computabil-
ity by Cutland [10], A Mathematical Introduction to Logic by Enderton [19], An
Introduction to Formal Languages and Automata by Linz [41], and A Transition to
Advanced Mathematics by Smith, Eggen, and St. Andre [55].
1.4 A Little Request
I am looking to turn these course notes into a textbook in the near future, so any
comments as to places you feel it could be improved are welcome (but do not feel
obligated). Typos, yes, but more importantly spots where I was too terse, too ver-
bose, or disjointed, sections or chapters you feel would benet from reorganization,
or places where you were left wanting more and would like to at least see a reference
to outside sources.
Chapter 2
Background
This chapter covers a collection of topics that are not computability theory per se,
but are needed for it. They are set apart so the rest of the text reads more smoothly
as a reference, but we will cover them as needed when they become relevant.
2.1 First-Order Logic
In this section we learn a vocabulary for expressing formulas, logical sentences. This
is useful for brevity (x < y is much shorter than x is less than y, and the savings
grows as the statement becomes more complicated) but also for clarity. Expressing
a mathematical statement symbolically can make it more obvious what needs to be
done with it, and however carefully words are used they may admit some ambiguity.
We use lowercase Greek letters (mostly and , sometimes and ) to represent
formulas. The simplest formula is a single symbol (or assertion) which can be either
true or false. There are several ways to modify formulas, which well step through
one at a time.
The conjunction of formulas and is written and , , or & .
It is true when both and are true, and false otherwise. Logically and and
but are equivalent, and so are & and & , though in natural language there
are some dierences in connotation.
The disjunction of and is written or , or . It is false when both
and are false, and true otherwise. That is, is true when at least one of
and is true; it is inclusive or. Natural language tends to use exclusive or, where
only one of the clauses will be true, though there are exceptions. One such: Would
you like sugar or cream in your coee? Again, and are equivalent.
The negation of is written not(), not-, , or . It is true when
is false and false when is true. The potential dierence from natural language
negation is that must cover all cases where fails to hold, and in natural
language the scope of a negation is sometimes more limited. Note that = .
7
8 CHAPTER 2. BACKGROUND
How does negation interact with conjunction and disjunction? & is false
when , , or both are false, and hence its negation is () (). is false
only when both and are false, and so its negation is ()&(). We might
note in the latter case that this matches up with natural languages neither...nor
construction. These two negation rules are called DeMorgans Laws.
Exercise 2.1.1. Simplify the following formulas.
(i) & (() ).
(ii) ( & () & ) ( & () & ()).
(iii) (( & ) & ).
There are two classes of special formulas to highlight now. A tautology is always
true; the classic example is () for any formula . A contradiction is always
false; here the example is & (). You will sometimes see the former expression
denoted T (or ) and the latter .
To say implies ( or ) means whenever is true, so is . We
call the antecedent and the consequent of the implication. We also say is
sucient for (since whenever we have we have , though we may also have
when is false), and is necessary for (since it is impossible to have without
). Clearly should be true when both formulas are true, and it should be
false if is true but is false. It is maybe not so clear what to do when is false;
this is claried by rephrasing implication as disjunction (which is often how it is
dened in the rst place). means either holds or fails; i.e., (). The
truth of that statement lines up with our assertions earlier, and gives truth values
for when is false namely, that the implication is true. Another way to look at
this is to say is only false when proven false, and that can only happen
when you see a true antecedent and a false consequent. From this it is clear that
( ) is & ().
There is an enormous dierence between implication in natural language and
implication in logic. Implication in natural language tends to connote causation,
whereas the truth of need not give any connection at all between the mean-
ings of and . It could be that is a contradiction, or that is a tautology.
Also, in natural language we tend to dismiss implications as irrelevant or meaning-
less when the antecedent is false, whereas to have a full and consistent logical theory
we cannot throw those cases out.
Example 2.1.2. The following are true implications:
If sh live in the water, then earthworms live in the soil.
If rabbits are aquamarine blue, then earthworms live in the soil.
2.1. FIRST-ORDER LOGIC 9
If rabbits are aquamarine blue, then birds drive cars.
The negation of the nal statement is Rabbits are aquamarine blue but birds do
not drive cars.
The statement If sh live in the water, then birds drive cars is an example of
a false implication.
Equivalence is two-way implication and indicated by a double-headed arrow:
or . It is an abbreviation for ( ) & ( ), and is true when
and are either both true or both false. Verbally we might say if and only if ,
which is often abbreviated to iff . In terms of just conjunction, disjunction,
and negation, we may write equivalence as ( & ) (() & ()). Its negation
is exclusive or, ( ) & ( & ).
Exercise 2.1.3. Negate the following statements.
(i) 56894323 is a prime number.
(ii) If there is no coee, I drink tea.
(iii) John watches but does not play.
(iv) I will buy the blue shirt or the green one.
Exercise 2.1.4. Write the following statements using standard logical symbols.
(i) if .
(ii) only if .
(iii) unless .
As an aside, let us have a brief introduction to truth tables. These are nothing
more than a way to organize information about logical statements. The leftmost
columns are generally headed by the individual propositions, and under those head-
ings occur all possible combinations of truth and falsehood. The remaining columns
are headed by more complicated formulas that are build from the propositions, and
the lower rows have T or F depending on the truth or falsehood of the header for-
mula when the propositions have the true/false values in the beginning of that row.
Truth tables arent particularly relevant to our use for this material, so Ill leave
you with an example and move on.
&
T T F F T T T T
T F F T F T F F
F T T F F T T F
F F T T F F T T
10 CHAPTER 2. BACKGROUND
If we stop here, we have propositional (or sentential ) logic. These formulas
usually look something like [A (B&C)] C and their truth or falsehood de-
pends on the truth or falsehood of the assertions A, B, and C. We will con-
tinue on to predicate logic, which replaces these assertions with statements such as
(x < 0) & (x + 100 > 0), which will be true or false depending on the value sub-
stituted for the variable x. We will be able to turn those formulas into statements
which are true or false inherently via quantiers. Note that writing (x) indicates
the variable x appears in the formula .
The existential quantication x is read there exists x. The formula x(x) is
true if for some value n the unquantied formula (n) is true. Universal quanti-
cation, on the other hand, is x(x) (for all x, (x) holds), true when no matter
what n we ll in for x, (n) is true.
Quantiers must have a specied set of values to range over, because the truth
value of a formula may be dierent depending on this domain of quantication. For
example, take the formula
(x)(x ,= 0 (y)(xy = 1)).
This asserts every nonzero x has a multiplicative inverse. If we are letting our
quantiers range over the real numbers or the rational numbers, this statement is
true, because the reciprocal of x is available to play the role of y. However, in the
integers or natural numbers this is false, because 1/x is only in the domain when x
is 1.
Introducing quantication opens us up to two kinds of logical formulas. If all
variables are quantied over (bound variables), then the formula is called a sentence.
If there are variables that are not in the scope of any quantier (free variables), the
formula is called a predicate. The truth value of a predicate depends on what values
are plugged in for the free variables; a sentence has a truth value period. For
example, (x)(y)(x < y) is a sentence, and it is true in all our usual domains
of quantication. The formula x < y is a predicate, and it will be true or false
depending on whether the specic values plugged in for x and y satisfy the inequality.
Exercise 2.1.5. Write the following statements as formulas, specifying the domain
of quantication.
(i) 5 is prime.
(ii) For any number x, the square of x is nonnegative.
(iii) There is a smallest positive integer.
Exercise 2.1.6. Consider the natural numbers, integers, rational numbers, and real
numbers. Over which domains of quantication are each of the following statements
true?
2.1. FIRST-ORDER LOGIC 11
(i) (x)(x 0).
(ii) (x)(5 < x < 6).
(iii) (x)((x
2
= 2) (x = 5)).
(iv) (x)(x
2
1 = 0).
(v) (x)(x
3
+ 8 = 0).
(vi) (x)(x
2
2 = 0).
When working with multiple quantiers the order of quantication can matter
a great deal. For example, take the two formulas
= (x)(y)(x x = y);
= (y)(x)(x x = y).
says every number has a square and is true in our typical domains. However,
says there is a number which is all other numbers square and is true only if you
are working over the domain containing only 0 or only 1.
Exercise 2.1.7. Over the real numbers, which of the following statements are true?
Over the natural numbers?
(i) (x)(y)(x + y = 0).
(ii) (y)(x)(x + y = 0).
(iii) (x)(y)(x y).
(iv) (y)(x)(x y).
(v) (x)(y)(x < y
2
).
(vi) (y)(x)(x < y
2
).
(vii) (x)(y)(x ,= y x < y).
(viii) (y)(x)(x ,= y x < y).
The order of operations when combining quantication with conjunction or dis-
junction can also make the dierence between truth and falsehood.
Exercise 2.1.8. Over the real numbers, which of the following statements are true?
Over the natural numbers?
(i) (x)(x 0 x 0).
12 CHAPTER 2. BACKGROUND
(ii) (x)(x 0) (x)(x 0).
(iii) (x)(x 0 & x 5).
(iv) (x)(x 0) & (x)(x 5).
How does negation work for quantiers? If x(x) fails, it means no matter what
value we ll in for x the formula obtained is false i.e., (x(x)) x((x)).
Likewise, (x(x)) x((x)): if does not hold for all values of x, there must
be an example for which it fails. If we have multiple quantiers, the negation walks
in one by one, ipping each quantier and nally negating the predicate inside. For
example:
[(x)(y)(z)(w)(x, y, z, w)] (x)(y)(z)(w)((x, y, z, w)).
Exercise 2.1.9. Negate the following sentences.
(i) (x)(y)(z)((z < y) (z < x)).
(ii) (x)(y)(z)(xz = y).
(iii) (x)(y)(z)(y = x z = x y = z).
(bonus: over what domains of quantication would this be true?)
A nal notational comment: you will sometimes see the symbols
and
i
A
i
= x : (i)(x A
i
).
The i under the union or intersection symbol is also sometimes written i N.
2.2. SETS 15
Exercise 2.2.5. For i N, let A
i
= 0, 1, . . . , i and let B
i
= 0, i. What are
i
A
i
,
i
B
i
,
i
A
i
, and
i
B
i
?
When sets are constructed in computability theory, elements are typically put
in a few at a time, stagewise. For set A, we denote the (nite) set of elements
added to A at stage s or earlier as A
s
, and when the writing is formal enough the
construction will say A is dened as
s
A
s
. When the writing is informal that is
left unsaid, but is still true.
If two sets are given by descriptions instead of explicit lists, we must prove one
set is a subset of another by taking an arbitrary element of the rst set and showing
it is also a member of the second set. For example, to show the set of people eligible
for President of the United States is a subset of the set of people over 30, we might
say: Consider a person in the rst set. That person must meet the criteria listed in
the US Constitution, which includes being at least 35 years of age. Since 35 is more
than 30, the person we chose is a member of the second set.
We can further show that this containment is proper, by demonstrating a mem-
ber of the second set who is not a member of the rst set. For example, a 40-year-old
Japanese citizen.
Exercise 2.2.6. Prove that the set of squares of even numbers, x : y(x = (2y)
2
),
is a proper subset of the set of multiples of 4, x : y(x = 4y).
To prove two sets are equal, there are three options: show the criteria for mem-
bership on each side are the same, manipulate set operations until the expressions
are the same, or show each side is a subset of the other side.
An extremely basic example of the rst option is showing x :
x
2
,
x
4
N =
x : (y)(x = 4y). For the second, we have a bunch of set identities, things like de
Morgans Laws,
A B = A B
A B = A B,
and distribution laws,
A (B C) = (A B) (A C)
A (B C) = (A B) (A C).
To prove identities we have to turn to the rst or third option.
Example 2.2.7. Prove that A (B C) = (A B) (A C).
We work by showing each set is a subset of the other. Suppose rst that
x A(BC). By denition of union, x must be in A or in BC. If x A, then
x is in both A B and A C, and hence in their intersection. On the other hand,
if x BC, then x is in both B and C, and hence again in both AB and AC.
16 CHAPTER 2. BACKGROUND
Now suppose x (AB) (AC). Then x is in both unions, AB and AC.
If x A, then x A (B C). If, however, x / A, then x must be in both B and
C, and therefore in B C. Again, we obtain x A (B C).
Notice that in the direction we used two cases that could overlap, and did
not worry whether we were in the overlap or not. In the direction, we could only
assert x B and x C if we knew x / A (although it is certainly possible for x to
be in all three sets), so forbidding the rst case was part of the second case.
Exercise 2.2.8. Using any of the three options listed above, as long as it is appli-
cable, do the following.
(i) Prove intersection distributes over union (i.e., for all A, B, C, A (B C) =
(A B) (A C)).
(ii) Prove de Morgans Laws.
(iii) Prove that A B = (A B) (B A) (A B) for any sets A and B.
Our nal topic in the realm of sets is cardinality. The cardinality of a nite set
is the number of elements in it. For example, the cardinality of the set of positive
integer divisors of 6 is 4: [1, 2, 3, 6[ = 4. When we get to innite sets, cardinality
separates them by how innite they are. Well get to its genuine denition in 2.3,
but it is ne now and later to think of cardinality as a synonym for size. The way
to tell whether set A is bigger than set B is to look for a one-to-one function from A
into B. If no such function exists, then A is bigger than B, and we write [B[ < [A[.
The most important result is that [A[ < [T(A)[ for any set A.
If we know there is a one-to-one function from A into B but we dont know
about the reverse direction, we write [A[ [B[. If we have injections both ways,
[A[ = [B[. It is a signicant theorem of set theory that having injections from A to
B and from B to A is equivalent to having a bijection between A and B; the fact
that this requires work is a demonstration of the fact that things get weird when
you work in the innite world. Another key fact (for set theorists; not so much for
us) is trichotomy: for any two sets A and B, exactly one of [A[ < [B[, [A[ > [B[, or
[A[ = [B[ is true.
For us, innite cardinalities are divided into two categories. A set is countably
innite if it has the same cardinality as the natural numbers. The integers and
the rational numbers are important examples of countably innite sets. The term
countable is used by some authors to mean countably innite, and by others to
mean nite or countably innite, so you often have to rely on context. To prove
that a set is countable, you must demonstrate it is in bijection with the natural
numbers that is, that you can count the objects of your set 1, 2, 3, 4, . . . , and not
miss any. Well come back to this in 3.4; for now you can look in the appendices to
nd Cantors proofs that the rationals are countable and the reals are not (A.3).
2.3. RELATIONS 17
The rest of the innite cardinalities are called uncountable, and for our purposes
thats about as ne-grained as it gets. The fundamental notions of computability
theory live in the world of countable sets, and the only uncountable ones we get to
are those which can be approximated in the countable world.
2.3 Relations
The following denition is not the most general case, but well start with it.
Denition 2.3.1. A relation R(x, y) on a set A is a logical formula that is true or
false of each pair (x, y) A
2
, never undened.
We also think of relations as subsets of A
2
consisting of the pairs for which the
relation is true. For example, in the set A = 1, 2, 3, the relation < consists of
(1, 2), (1, 3), (2, 3) and the relation is the union of < with (1, 1), (2, 2), (3, 3).
Note that the order matters: although 1 < 2, 2 ,< 1, so (2, 1) is not in <. The
rst denition shows you why these are called relations; we think of R as being
true when the values lled in for x and y have some relationship to each other. The
set-theoretic denition is generally more useful, however.
More generally, we may dene n-ary relations on a set A as logical formulas that
are true or false of any n-tuple (ordered set of n elements) of A, or alternatively
as subsets of A
n
. For n = 1, 2, 3 we refer to these relations as unary, binary, and
ternary, respectively.
Exercise 2.3.2. Prove the two denitions of relation are equivalent. That is, prove
that every logical predicate corresponds to a unique set, and vice-versa.
Exercise 2.3.3. Let A = a, b, c, d, e.
(i) What is the ternary relation R on A dened by (x, y, z) R (xyz is an
English word)?
(ii) What is the unary relation on A which is true of elements of A that are vowels?
(iii) What is the complement of the relation in (2)? We may describe it in two
ways: as the negation of the relation in (2), and how?
(iv) How many elements are in the 5-ary relation R dened by (v, w, x, y, z) R
(v, w, x, y, z are all distinct elements of A)?
(v) How many unary relations are possible on A? What other collection associated
with A does the collection of all unary relations correspond to?
Exercise 2.3.4. How many n-ary relations are possible on an m-element set?
18 CHAPTER 2. BACKGROUND
We tend to focus on binary relations, since most of our common, useful examples
are binary: <, , =, ,=, , . Binary relations may have certain properties:
Reexivity: (x)R(x, x)
Symmetry: (x, y)[R(x, y) R(y, x)]
i.e., (x, y)[(R(x, y) & R(y, x)) (R(x, y) & R(y, x))]
Antisymmetry: (x, y)[(R(x, y) & R(y, x)) x = y]
Transitivity: (x, y, z)[(R(x, y) & R(y, z)) R(x, z)]
I want to point out that reexivity is a property of possession: R must have the
reexive pairs (the pairs (x, x)). Antisymmetry is, loosely, a property of nonpos-
session. Symmetry and transitivity, on the other hand, are closure properties: if
R has certain pairs, then it must also have other pairs. Those conditions may be
met either by adding in the pairs that are consequences of the pairs already present,
or omitting the pairs that are requiring such additions. In particular, the empty
relation is symmetric and transitive, though it is not reexive.
Exercise 2.3.5. Is = reexive? Symmetric? Antisymmetric? Transitive? How
about ,=?
Exercise 2.3.6. For nite relations we may check these properties by hand. Let
A = 1, 2, 3, 4.
(a) What is the smallest binary relation on A that is reexive?
(b) Dene the following binary relations on A:
R
1
= (2, 3), (3, 4), (4, 2)
R
2
= (1, 1), (1, 2), (2, 1), (2, 2)
R
3
= (1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (3, 4), (4, 4)
For each of those relations, answer the following questions.
(i) Is the relation reexive? Symmetric? Antisymmetric? Transitive?
(ii) If the relation is not reexive, what is the smallest collection of pairs that
need to be added to make it reexive?
(iii) If the relation is not symmetric, what is the smallest collection of pairs
that need to be added to make it symmetric?
(iv) If the relation is not transitive, what is the smallest collection of pairs that
need to be added to make it transitive?
2.3. RELATIONS 19
(v) If the relation is not antisymmetric, what is the smallest collection of pairs
that could be removed to make it antisymmetric? Is this answer unique?
Exercise 2.3.7. Let A = 1, 2, 3. Dene binary relations on A with the following
combinations of properties or say why such a relation cannot exist. Can such a
relation be nonempty?
(i) Reexive and antisymmetric but neither symmetric nor transitive.
(ii) Symmetric but neither reexive nor transitive.
(iii) Transitive but neither reexive nor symmetric.
(iv) Symmetric and transitive but not reexive.
(v) Both symmetric and antisymmetric.
(vi) Neither symmetric nor antisymmetric.
(vii) Reexive and transitive but not symmetric.
(viii) Reexive and symmetric but not transitive.
(ix) Symmetric, antisymmetric, and transitive.
(x) Reexive, symmetric, and transitive.
(xi) None of reexive, symmetric, or transitive.
Exercise 2.3.8. Suppose R and S are binary relations on A. For each of the
following properties, if R and S possess the property, must RS possess it? RS?
(i) Reexivity
(ii) Symmetry
(iii) Antisymmetry
(iv) Transitivity
Exercise 2.3.9. Each of the following relations has a simpler description than the
one given. Find such a description.
(i) R
on T(N) where R
(A, B) A B = .
(ii) R
()
on R where R
()
(x, y) (, x) (y, ) = .
(iii) R
[]
on R where R
[]
(x, y) (, x] [y, ) = .
20 CHAPTER 2. BACKGROUND
(iv) R
()
on R where R
()
(x, y) (, x) (y, ) = R.
(v) R
[]
on R where R
[]
(x, y) (, x] [y, ) = R.
We may visualize a binary relation R on A as a directed graph. The elements
of A are the vertices, or nodes, of the graph, and there is an arrow (directed edge)
from vertex x to vertex y if and only if R(x, y) holds. The four properties we have
just been exploring may be stated as:
Reexivity: every vertex has a loop.
Symmetry: for any pair of vertices, either there are edges in both directions
or there are no edges between them.
Antisymmetry: for two distinct vertices there is at most one edge connecting
them.
Transitivity: if there is a path of edges from one vertex to another (always
proceeding in the direction of the edge), there is an edge directly connecting
them, in the same direction as the path.
Exercise 2.3.10. Properly speaking, transitivity just gives the graphical interpre-
tation for any vertices x, y, z, if there is an edge from x to y and an edge from y
to z, there is an edge from x to z. Prove that this statement is equivalent to the
one given for transitivity above.
We will consider two subsets of these properties that dene classes of relations
which are of particular importance.
Denition 2.3.11. An equivalence relation is a binary relation that is reexive,
symmetric, and transitive.
The quintessential equivalence relation is equality, which is the relation consisting
of only the reexive pairs. What is special about an equivalence relation? We can
take a quotient structure whose elements are equivalence classes.
Denition 2.3.12. Let R be an equivalence relation on A. The equivalence class
of some x A is the set [x] = y A : R(x, y).
Exercise 2.3.13. Let R be an equivalence relation on A and let x, y be elements
of A. Prove that either [x] = [y] or [x] [y] = .
In short, an equivalence relation puts all the elements of the set into boxes so
that each element is unambiguously assigned to a single box. Within each box all
possible pairings are in the relation, and no pairings that draw from dierent boxes
are in the relation. We can consider the boxes themselves as elements, getting a
quotient structure.
2.3. RELATIONS 21
Denition 2.3.14. Given a set A and an equivalence relation R on A, the quotient
of A by R, A/R, is the set whose elements are the equivalence classes of A under R.
Now we can dene cardinality more correctly. The cardinality of a set is the
equivalence class it belongs to under the equivalence relation of bijectivity, so cardi-
nalities are elements of the quotient of the collection of all sets under that relation.
Exercise 2.3.15. Let A be the set 1, 2, 3, 4, 5, and let R be the binary relation on
A that consists of the reexive pairs together with (1, 2), (2, 1), (3, 4), (3, 5), (4, 3),
(4, 5), (5, 3), (5, 4).
(i) Represent R as a graph.
(ii) How many elements does A/R have?
(iii) Write out the sets [1], [2], and [3].
Exercise 2.3.16. A partition of a set A is a collection of disjoint subsets of A with
union equal to A. Prove that any partition of A determines an equivalence relation
on A, and every equivalence relation on A determines a partition of A.
Exercise 2.3.17. Let R(m, n) be the relation on Z that holds when m n is a
multiple of 3.
(i) Prove that R is an equivalence relation.
(ii) What are the equivalence classes of 1, 2, and 3?
(iii) What are the equivalence classes of 1, 2, and 3?
(iv) Prove that Z/R has three elements.
Exercise 2.3.18. Let R(m, n) be the relation on N that holds when mn is even.
(i) Prove that R is an equivalence relation.
(ii) What are the equivalence classes of R? Give a concise verbal description of
each.
The two exercises above are examples of modular arithmetic, also sometimes
called clock-face arithmetic because its most widespread use in day-to-day life is
telling what time it will be some hours from now. This is a notion that is used only
in N and Z. The idea of modular arithmetic is that it is only the numbers remainder
upon division by a xed value that matters. For clock-face arithmetic that value is
12; we say we are working modulo 12, or just mod 12, and the equivalence classes are
represented by the numbers 0 through 11 (in mathematics; 1 through 12 in usual
22 CHAPTER 2. BACKGROUND
life). The fact that if it is currently 7:00 then in eight hours it will be 3:00 would
be written as the equation
7 + 8 = 3 (mod 12),
where is sometimes used in place of the equals sign.
Exercise 2.3.19. (i) Exercises 2.3.17 and 2.3.18 consider equivalence relations
that give rise to arithmetic mod k for some k. For each, what is the correct
value of k?
(ii) Describe the equivalence relation on Z that gives rise to arithmetic mod 12.
(iii) Let m, n, and p be integers. Prove that
n = m (mod 12) = n + p = m + p (mod 12).
That is, it doesnt matter which representative of the equivalence class you
pick to do your addition.
The second important class of relations we will look at is partial orders.
Denition 2.3.20. A partial order on a set A is a binary relation that is reexive,
antisymmetric, and transitive. A with is called a partially ordered set, or poset.
In a poset, given two nonequal elements of A, either one is strictly greater than
the other or they are incomparable. If all pairs of elements are comparable, the
relation is called a total order or linear order on A.
Example 2.3.21. Let A = a, b, c, d, e and dene on A as follows:
(x A)(x x)
a c, a d
b d, b e
We could graph this as follows:
c d e
a b
Example 2.3.22. T(N) ordered by subset inclusion is a partially ordered set.
It is easy to check the relation is reexive, transitive, and antisymmetric. Not
every pair of elements is comparable: for example, neither 1, 2, 3 nor 4, 5, 6 is
a subset of the other. This poset actually has some very nice properties that not
every poset has: it has a top element (N) and a bottom element (), and every pair
2.4. RECURSION AND INDUCTION 23
of elements has both a least upper bound (here, the union) and a greatest lower
bound (the intersection).
If we were to graph this, it would look like an innitely-faceted diamond with
points at the top and bottom.
Example 2.3.23. Along the same lines as Example 2.3.22, we can consider the
power set of a nite set, and then we can graph the poset that results.
Let A = a, b, c. Denote the set a by a and the set b, c by a, and likewise
for the other three elements. The graph is then as follows:
A
c
b a
a b c
.
Proof. We work by induction. For n = 3, the polygon in question is a triangle, and
it has interior angles which sum to 180
= (3 2) 180
.
Assume the theorem holds for some n 3 and consider a convex polygon with
n + 1 vertices. Let one of the vertices be named x, and pick a vertex y such that
along the perimeter from x in one direction there is a single vertex between x and
y, and in the opposite direction, (n+1) 3 = n2 vertices. Join x and y by a new
edge, dividing our original polygon into two polygons. The new polygons interior
angles together sum to the sum of the original polygons interior angles. One of the
new polygons has 3 vertices and the other n vertices (x, y, and the n 2 vertices
between them). The triangle has interior angle sum 180
+ (n 2)180
= (n + 1 2) 180
, as desired.
Notice also in this example that we used the base case as part of the inductive
step, since one of the two polygons was a triangle. This is not uncommon.
Exercise 2.4.3. Prove the following statements by induction.
(i) For every positive integer n,
1 + 4 + 7 + . . . + (3n 2) =
1
2
n(3n 1).
(ii) For every positive integer n,
2
1
+ 2
2
+ . . . + 2
n
= 2
n+1
2.
(iii) For every positive integer n,
n
3
3
+
n
5
5
+
7n
15
is an integer.
(iv) For every positive integer n, 4
n
1 is divisible by 3.
(v) The sequence a
0
, a
1
, a
2
, . . . dened by a
0
= 0, a
n+1
=
an+1
2
is bounded above
by 1.
(vi) Recall that for a binary operation on a set A associativity is dened as for
any x, y, z, (x y) z = x (y z). Use induction to prove that for any
collection of n elements from A put together with , n 3, any grouping of
the elements which preserves order will give the same result.
26 CHAPTER 2. BACKGROUND
Exercise 2.4.4. A graph consists of vertices and edges. Each edge has a vertex
at each end (they may be the same vertex). Each vertex has a degree, which is
the number of edge endpoints at that vertex (so if an edge connects two distinct
vertices, it contributes 1 to each of their degrees, and if it is a loop on one vertex,
it contributes 2 to that vertexs degree). It is possible to prove without induction
that for a graph the sum of the degrees of the vertices is twice the number of edges.
Find a proof of that fact using
(a) induction on the number of vertices;
(b) induction on the number of edges.
Exercise 2.4.5. The Towers of Hanoi is a puzzle consisting of a board with three
pegs sticking up out of it and a collection of disks that t on the pegs, each with
a dierent diameter. The disks are placed on a single peg in order of size (smallest
on top) and the goal is to move the entire stack to a dierent peg. A move consists
of removing the top disk from any peg and placing it on another peg; a disk may
never be placed on top of a smaller disk.
Determine how many moves it requires to solve the puzzle when there are n
disks, and prove your answer by induction.
Recursion
To dene a class recursively means to dene it via a set of basic objects and a set
of rules allowing you to extend the set of basic objects. We may give some simple
examples.
Example 2.4.6. The natural numbers may be dened recursively as follows:
0 N.
if n N, then n + 1 N.
Example 2.4.7. The well-formed formulas (wffs) in propositional logic are a re-
cursively dened class.
Any propositional symbol P, Q, R, etc., is a wff.
If and are wffs, so are the following:
(i) ( & );
(ii) ( );
(iii) ( );
(iv) ( );
2.4. RECURSION AND INDUCTION 27
(v) ().
The important fact, which gives the strength of this method of denition, is that
we may apply the building-up rules repeatedly to get more and more complicated
objects.
For example, ((A&B) ((P&Q) (A))) is a wff, as we can prove by giving a
construction procedure for it. A, B, P, and Q are all basic wffs. We combine them
into (A&B) and (P&Q) by operation (i), obtain (A) from (v), ((P&Q) (A))
from (iii), and nally our original formula by (ii).
Exercise 2.4.8. (i) Prove that ((A (B&C)) C) is a wff.
(ii) Prove that (P Q( is not a wff.
Exercise 2.4.9. (i) Add a building-up rule to the recursive denition of N to get
a recursive denition of Z.
(ii) Add a building-up rule to the recursive denition of Z to get a recursive de-
nition of Q.
Exercise 2.4.10. Write a recursive denition of the rational functions in x, those
functions which can be written as a fraction of two polynomials of x. Your basic
objects should be x and all real numbers. For this exercise, dont worry about the
problem of division by zero.
We may also dene functions recursively. For that, we say what f(0) is (or
whatever our basic object is) and then dene f(n+1) in terms of f(n). For example,
(n + 1)! = (n + 1)n!, with 0! = 1, is factorial, a recursively dened function youve
probably seen before. We could write a recursive denition for addition of natural
numbers:
a(0, 0) = 0;
a(m + 1, n) = a(m, n) + 1;
a(m, n + 1) = a(m, n) + 1.
This looks lumpy but is actually used in logic in order to minimize the number of
operations that we take as fundamental: this denition of addition is all in terms
of successor, the plus-one function.
Exercise 2.4.11. Write a recursive denition of p(m, n) = m n, on the natural
numbers, in terms of addition.
28 CHAPTER 2. BACKGROUND
Induction Again
Beyond simply resembling each other, induction and recursion have a strong tie in
proofs. To prove something about a recursively-dened class requires induction.
This use of induction is less codied than the induction on N we saw above. In fact,
the limited version of induction we saw above is simply the induction that goes with
the recursively-dened set of natural numbers, as in Example 2.4.6. Lets explore
how this works in general.
The base case of the inductive argument will match the basic objects of the
recursive class. The inductive step will come from the operations that build up the
rest of the class. If they match exactly, you are showing the set of objects that have
a certain property contains the basic objects of the class and is closed under the
operations of the class, and hence must be the entire class.
Example 2.4.12. Consider the class of wffs, dened in Example 2.4.7. We may
prove by induction that for any wff , the number of positions where binary con-
nective symbols occur in (that it, &, , , and ) is one less than the number
of positions where propositional symbols occur in .
Proof. For any propositional symbol, the number of propositional symbols is 1
and the number of binary connectives is 0, one less than 1.
Suppose by induction that p
1
= c
1
+ 1 and p
2
= c
2
+ 1 for p
1
, p
2
the number of
propositional symbols and c
1
, c
2
the number of binary connectives in the wffs , ,
respectively. The number of propositional symbols in (Q), for Q any of , &, ,
and , is p
1
+ p
2
, and the number of connective symbols is c
1
+ c
2
+ 1. By the
inductive hypothesis we see
p
1
+ p
2
= c
1
+ 1 + c
2
+ 1 = (c
1
+ c
2
+ 1) + 1,
so the claim holds for (Q).
Finally, consider (). Here the number of binary connectives and propositional
symbols have not changed, so the claim still holds.
Exercise 2.4.13. Suppose is a wff which does not contain negation (that is, it
comes from the class dened as in Example 2.4.7 but without closure operation (v)).
Prove by induction that the length of is of the form 4k + 1 for some k 0, and
that the number of positions at which propositional symbols occur is k + 1 (for the
same k).
Note that we can perform induction on N to get results about other recursively-
dened classes if we are careful. For wffs, we might induct on the number of propo-
sitional symbols or the number of binary connectives, for instance.
Exercise 2.4.14. Recall from calculus that a function f is continuous at a if f(a) is
dened and equals lim
xa
f(x). Recall also the limit laws, which may be summarized
2.5. SOME NOTES ON PROOFS AND ABSTRACTION 29
for our purposes as
lim
xa
(f(x)g(x)) = (lim
xa
f(x))(lim
xa
g(x)), +, , , /,
as long as both limits on the right are dened and if = / then lim
xa
g(x) ,= 0.
Using those, the basic limits lim
xa
x = a and lim
xa
c = c for all constants c, and
your recursive denition from Exercise 2.4.10, prove that every rational function is
continuous on its entire domain.
Exercise 2.4.15. Using the recursive denition of addition from the previous sec-
tion (a(0, 0) = 0; a(m + 1, n) = a(m, n + 1) = a(m, n) + 1), prove that addition is
commutative (i.e., for all m and n, a(m, n) = a(n, m)).
2.5 Some Notes on Proofs and Abstraction
Denitions
Denitions in mathematics are somewhat dierent from denitions in English. In
natural language, the denition of a word is determined by the usage and may evolve.
For example, broadcasting was originally just a way of sowing seed. Someone used
it by analogy to mean spreading messages widely, and then it was adopted for radio
and TV. For speakers of present-day English I doubt the original planting meaning
is ever the rst to come to mind.
In contrast, in mathematics we begin with the denition and assign a term to it
as a shorthand. That term then denotes exactly the objects which fulll the terms of
the denition. To say something is by denition impossible has a rigorous meaning
in mathematics: if it contradicts one of the properties of the denition, it cannot
hold of an object to which we apply the term.
Mathematical denitions do not have the uidity of natural language denitions.
Sometimes mathematical terms are used to mean more than one thing, but that is
a re-use of the term and not an evolution of the denition. Furthermore, mathe-
maticians dislike that because it leads to ambiguity (exactly what is being meant
by this term in this context?), which defeats the purpose of mathematical terms in
the rst place: to serve as shorthand for specic lists of properties.
Proofs
There is no way to learn how to write proofs without actually writing them, but I
hope you will refer back to this section from time to time
A proof is an object of convincing. It should be an explicit, specic, logically
sound argument that walks step by step from the hypotheses to the conclusions.
That is, avoid vagueness and leaps of deduction, and strip out irrelevant statements.
30 CHAPTER 2. BACKGROUND
Make your proof self-contained except for explicit reference to denitions or previous
results (i.e., dont assume your reader is so familiar with the theorems that you may
use them without comment; instead say by Theorem 2.5, . . .).
Our proofs will be very verbal they will bear little to no resemblance to the two-
column proofs of high school geometry. A proof which is just strings of symbols with
only a few words is unlikely to be a good (or even understandable) proof. However,
it can be clumsy and expand proofs out of readability to avoid symbols altogether.
It is also important for specicity to assign symbolic names to (arbitrary) numbers
and other objects to which you will want to refer. Striking the symbol/word balance
is a big step on the way to learning to write good proofs.
Your audience is a person who is familiar with the underlying denitions used
in the statement being proved, but not the statement itself. For instance, it could
be yourself after you learned the denitions, but before you had begun work on the
proof. You do not have to put every tiny painful step in the write-up, but be careful
about what you assume of the readers ability to ll in gaps. Your goal is to convince
the reader of the truth of the statement, and that requires the reader to understand
the proof. Along those lines, it is often helpful to insert small statements (I call it
foreshadowing or telegraphing) that let the reader know why you are doing what
you are currently doing, and where you intend to go with it. In particular, when
working by contradiction or induction, it is important to let the reader know at the
beginning.
Cautionary notes:
* Be careful to state what you are trying to prove in such a way that it does not
appear you are asserting its truth prior to proving it.
* If you have a denition before you of a particular concept and are asked to prove
something about the concept, you must stick to the denition.
* Be wary of mentally adding words like only, for all, for every, or for some which
are not actually there; likewise if you are asked to prove an implication it is likely
the converse does not hold, so if you prove equivalence you will be in error.
* If you are asked to prove something holds of all objects of some type, you cannot
pick a specic example and show the property holds of that object it is not a proof
that it works for all. Instead give a symbolic name to an arbitrary example and
prove the property holds using only facts that are true for all objects of the given
type.
* There is a place for words like would, could, should, might, and ought in proofs,
but they should be kept to a minimum. Most of the time the appropriate words are
has, will, does, and is. This is especially important in proofs by contradiction. Since
in such a proof you are assuming something which is not true, it may feel more
natural to use the subjunctive, but that comes across as tentative. You assume
some hypothesis; given that hypothesis other statements are or are not true. Be
bold and let the whole contraption go up in ames when it runs into the statement
2.5. SOME NOTES ON PROOFS AND ABSTRACTION 31
it contradicts.
* And nally, though math class is indeed not English class, sentence fragments
and tortured grammar have no place in mathematical proofs. If a sentence seems
strained, try rearranging it, possibly involving the neighboring sentences. Do not
fear to edit: the goal is a readable proof that does not require too much back-and-
forth to understand.
Exercise 2.5.1. Here are some proofs you can try that dont involve induction:
(i) (m)(n)(3m + 5n = 12) (over N)
(ii) For any integer n, the number n
2
+ n + 1 is odd.
(iii) If every even natural number greater than 2 is the sum of two primes, then
every odd natural number greater than 5 is the sum of three primes.
(iv) For nonempty sets A and B, A B = B A if and only if A = B.
Chapter 3
Dening Computability
There are many ways we could try to get a handle on the concept of computability.
We could think of all possible computer programs, or a class of functions dened in
a way that feels more algebraic. Many denitions which seem to come from widely
disparate viewpoints actually dene the same collection of functions, which gives us
some claim to calling that collection the computable functions (see 3.5).
3.1 Functions, Sets, and Sequences
We mention three aspects of functions important to computability before beginning.
Limits
Our functions take only whole-number values. Therefore, for the limit lim
n
f(n)
to exist, f must eventually be constant. If it changes values innitely-many times,
the limit simply doesnt exist.
In computability we typically abbreviate our limit notation, as well. It would be
more common to see the limit above written as lim
n
f(n).
Partial Functions
Lets go back to calculus, or possibly even algebra. A function denition is supposed
to include not only the rule that associates domain elements with range elements, but
also the domain. However, in calculus, we abuse this to give functions as algebraic
formulas that calculate a range element from a domain element, and dont specify
their domains; instead we say their domain is all elements of R on which they are
dened. However, we treat these functions as though their domain is actually all of
R, and talk about, for example, values at which the function is discontinuous.
33
34 CHAPTER 3. DEFINING COMPUTABILITY
Here we take that mentality and make it ocial. In computability we use par-
tial functions on N, functions which take elements of some subset of N as inputs,
and produce elements of N as outputs. When applied to a collection of functions,
partial means partial or total, though the partial function f may generally be
read as saying fs domain is a proper subset of N.
The intuition here is that the function is a computational procedure which may
legally be given any natural number as input, but might go into an innite loop on
certain inputs and never output a result. Because we want to allow all computational
procedures, we have to work with this possibility.
Most basically, we need notation. If x is in the domain of f, we write f(x) and
say the computation halts, or converges. We might specify halting when saying what
the output of the function is, f(x) = y, though there the is fairly superuous.
When x is not in the domain of f we say the computation diverges and write f(x).
We also still talk about f(x), and by extension the computation, being dened or
undened.
For total functions f and g, we say f = g if (x)(f(x) = g(x)). When f and g
may be partial, we require a little more: f = g means
(x)[(f(x) g(x)) & (f(x) = y g(x) = y)].
Some authors write this as f g to distinguish it from equality for total functions
and to highlight the fact that f and g might be partial.
Finally, when the function meant is clear, f(x) = y may be written x y.
Ones and Zeros
In computability, as in many elds of mathematics, we use certain terms and nota-
tion interchangeably even though technically they dene dierent objects, because
in some deep sense those objects arent dierent at all. We begin here with a
denition.
Denition 3.1.1. For a set A, the characteristic function of A is the following total
function:
A
(n) =
_
1 n A
0 n / A
In the literature,
A
is often represented simply by A, so, for instance, we can
say
e
= A to mean
e
=
A
as well as saying A(n) to mean
A
(n) (so A(n) = 1
is another way to say n A). Additionally, we may conate the function and set
with the binary sequence that is the outputs of the function in order of input size.
Example 3.1.2. The sequence 1010101010. . . can represent
(i) The set of even numbers, 0, 2, 4, . . .;
3.2. TURING MACHINES 35
(ii) The function f(n) = n mod 2.
Exercise 3.1.3. Construct bijections between (i) and (ii), (ii) and (iii), and (i) and
(iii) below, and prove they are bijections.
(i) Innite binary sequences, 2
N
.
(ii) Total functions from N to 0, 1.
(iii) Subsets of N.
Sometimes it is useful to limit ourselves to nite objects.
Exercise 3.1.4. Construct bijections between (i) and (ii), (ii) and (iii), and (i) and
(iii) below, and prove they are bijections.
(i) Finite binary sequences, 2
<N
.
(ii) Finite subsets of N.
(iii) N.
3.2 Turing Machines
Our rst rigorous denition of computation is due to Turing [59].
A Turing machine (TM) is an idealized computer which has a tape it can read
from and write on, a head which does that reading and writing and which moves
back and forth along the tape, and an internal state which may be changed based
on whats happening on the tape. Everything here is discrete: we think of the
tape as being divided into squares, each of which can hold one symbol, and the
read/write head as resting on an individual square and moving from square to
square. We specify Turing machines via quadruples a, b, c, d, sets of instructions
that are decoded as follows:
a is the state the TM is currently in;
b is the symbol the TMs head is currently reading;
c is an instruction to the head to write or move;
d is the state the TM is in at the end of the instructions execution.
For example, q
3
, 0, R, q
3
means if I am in state q
3
and currently reading a 0, move
one square to the right and remain in state q
3
. The instruction q
0
, 1, 0, q
1
means
if I am in state q
0
and reading a 1, overwrite that 1 with a 0 and change to state q
1
.
The symbol in position c may also be a blank, indicating the machine should erase
36 CHAPTER 3. DEFINING COMPUTABILITY
whatever symbol it is reading. For any xed a, b, there is at most one quadruple.
It is not necessary that there be any instruction at all; the computation may halt
by hitting a dead end.
Since Turing machines represent idealized computers, we allow them unlimited
time and memory to perform their computations. Not innite time or memory, but
we cant bound them from the beginning; what if our bound was just one step short
of completion or one square of tape too small? So the TMs tape is innite, though
any given computation uses only a nite length of it.
The symbols and states come from a nite list and hence the collection of in-
structions must be nite. It does not matter how long the lists are; generally we
stick to the symbols 0, 1, and blank () or even just 1 and but allow arbitrarily
long lists of states, mostly because this is the mode that lends itself best to writing
descriptions of machines. Note that some authors distinguish legal halting states
from other states, and consider dead-ending in a non-halting state equivalent to
entering an innite loop. They may also require the read/write head to end up on
a particular end of the tape contents. This is all to make proofs easier, and it does
not reduce the power of the machines. For us, however, all states are legal halting
states and the read/write head can end up anywhere.
Example 3.2.1. Lets begin by writing a Turing machine that outputs x + 1 in
tally notation given input x in tally notation (i.e., the tape begins by holding x
consecutive 1s and ends with x + 1 consecutive 1s). Here is a sample input tape:
1 1 1 1
The arrow indicates the starting position of the read/write head; we are allowed
to specify that the input must be in tally notation and the TMs head be positioned
at the leftmost 1. Our desired computation is
move right to rst
write 1
halt.
Therefore we write two instructions, letting halting happen because of an absence
of relevant instructions:
q
0
, 1, R, q
0
move R as long as you see 1
q
0
, , 1, q
1
when you see , write 1 and change state
Since we specied what tape content and head position we were writing a ma-
chine for, these are sucient: we know the only time the machine will read a from
state q
0
will be the rst blank at the end of x.
What about binary notation instead? For example:
0 1 1 0 0 1
q
0
, 1, 0, q
2
q
2
, 0, R, q
0
q
0
, , 1, q
1
Exercise 3.2.2. Step through the program above with the following tapes, where
you may assume the read/write head begins at the leftmost non-blank square. Write
the contents of the tape, position of read/write head, and current state of the
machine for each step.
(i) 0 1 1
(ii) 1 0 1
(iii) 1 1 1
Exercise 3.2.3. Write a Turing machine to compute the function f(x) = 4x. Use
tally or binary notation as desired.
38 CHAPTER 3. DEFINING COMPUTABILITY
Exercise 3.2.4. This exercise will walk you through writing an inverter, a Turing
machine that, given a string of 1s and 0s, outputs the reversal of that string.
Here is what ought to happen:
Instruction Block A
0 1 1
Instruction Block C
1 1 0
halt.
These three blocks of states, if written correctly, will allow the machine to deal
with arbitrarily long symbol strings. Longer strings will result in more iterations of
B, but A and C occur only once apiece.
Use state to remember which symbol to print and to gure out which block of
symbols youre currently walking through (switch state at blanks). A complication
is knowing when to stop: once the last symbol has been erased, how do you know
not to walk leftward forever? Step one extra space left to see if what you just read
was the last symbol (i.e., to see if the next spot is blank or not) and use state to
account for a yes or no answer.
Exercise 3.2.5. Write a Turing machine to compute the function f(x) = x mod 3.
Use tally or binary notation as desired.
3.3. PARTIAL RECURSIVE FUNCTIONS 39
3.3 Partial Recursive Functions
Turings machine denition of computability was far from the only competitor on
the eld. We will only explore one other in depth, but survey a few more in the
next section. The partial recursive functions, where recursive is used as in 2.4,
were Kleenes contribution.
Primitive Recursive Functions
We begin with a more restricted set of functions, the primitive recursive functions.
This denition can be a little opaque at rst, so we will state it and then discuss it.
Denition 3.3.1. The class of primitive recursive functions is the smallest class (
of functions such that the following hold.
(i) The successor function S(x) = x + 1 is in (.
(ii) All constant functions M(x
1
, x
2
, . . . , x
n
) = m for n, m N are in (.
(iii) All projection functions P
n
i
(x
1
, x
2
, . . . , x
n
) = x
i
for n 1, 1 i n, are in (.
(iv) (Composition.) If g
1
, g
2
, . . . , g
m
, h are in (, then
f(x
1
, . . . , x
n
) = h(g
1
(x
1
, . . . , x
n
), . . . , g
m
(x
1
, . . . , x
n
))
is in (, where the g
i
are functions of n variables and h is a function of m
variables.
(v) (Primitive recursion, or just recursion.) If g, h ( and n 0 then the function
f dened below is in (:
f(x
1
, . . . , x
n
, 0) = g(x
1
, . . . , x
n
)
f(x
1
, . . . , x
n
, y + 1) = h(x
1
, . . . , x
n
, y, f(x
1
, . . . , x
n
, y)),
where g is a function of n variables and h a function of n + 2 variables.
Demonstrating that functions are primitive recursive can be complicated, as one
must demonstrate how they are built from the ingredients above.
Example 3.3.2. The addition function, f(x, y) = x + y, is primitive recursive.
We can express addition recursively with f(x, 0) = x and f(x, y+1) = f(x, y)+1.
The former is almost in proper primitive recursive form; let f(x, 0) = P
1
1
(x).
The latter needs to be in the form f(x, y +1) = h(x, y, f(x, y)), so we want that
h to spit out the successor of its third input. With an application of composition,
we get h(x, y, z) = S(P
3
3
(x, y, z)), and our derivation is complete.
40 CHAPTER 3. DEFINING COMPUTABILITY
Exercise 3.3.3. Prove that the maximum function, m(x, y) = maxx, y, is prim-
itive recursive.
Exercise 3.3.4. Prove that the multiplication function, g(x, y) = x y, is primitive
recursive. You may use the addition function f(x, y) = x + y in your derivation.
Exercise 3.3.5. Consider a grid of streets, n east-west streets crossed by m north-
south streets to make a rectangular map with nm intersections; each street reaches
all the way across or up and down. If a pedestrian is to walk along streets from the
northwest corner of this rectangle to the southeast corner, walking only east and
south and changing direction only at corners, let r(n, m) be the number of possible
routes. Prove r is primitive recursive.
In fact, all the usual arithmetic functions on N are primitive recursive, such as
exponentiation, factorial, and the modied subtraction
x
.
y =
_
x y if x y
0 if x < y
It is a very large class, including nearly all functions encountered in usual mathe-
matical work, and perhaps has claim on the label computable by itself. We will
argue in the following sections that it is insucient.
The Ackermann Function
The Ackermann function is the most common example of a (total) computable func-
tion that is not primitive recursive; in other words, evidence that something needs
to be added to the closure schema of primitive recursive functions in order to fully
capture the notion of computability, even if we require everything be total. In fact,
it was custom-built to meet that criterion, since the primitive recursive functions
cover so much ground it seemed they might actually constitute all computable func-
tions. The Ackermann functions is dened recursively for non-negative integers m
and n as follows:
A(m, n) =
_
_
_
n + 1 if m = 0
A(m1, 1) if m > 0 and n = 0
A(m1, A(m, n 1)) if m > 0 and n > 0.
The version above is a simplication of Wilhelm Ackermanns original function
due to Rzsa Pter and Raphael Robinson. It is not necessarily immediately clear
that this function is computable that the recursive denition always hits bottom.
The proof that it is came later than the denition of the function itself.
3.3. PARTIAL RECURSIVE FUNCTIONS 41
The proof this is not primitive recursive is technical, but the idea is simple. Here
is what we get when we plug small integer values in for m:
A(0, n) = n + 1
A(1, n) = n + 2
A(2, n) = 2n + 3
A(3, n) = 2
n+3
3
A(4, n) = 2
2
2
3
where the stack of 2s in the nal equation is n + 3 entries tall. That value grows
incredibly fast: A(4, 2) is a 19729-digit number.
The key is the stack of 2s. Roughly, each iteration of exponentiation requires an
application of primitive recursion. We can have only a nite number of applications
of primitive recursion, xed in the function denition, in any given primitive recur-
sive function. However, as n increases A(4, n) requires more and more iterations of
exponentiation, eventually surpassing any xed number of applications of primitive
recursion, no matter how large.
Partial Recursive Functions: Unbounded Search
To increase the computational power of our class of functions we add an additional
closure scheme. This accommodates problems like the need for increasingly many
applications of primitive recursion in the Ackermann function.
Denition 3.3.6. The class of partial recursive functions is the smallest class of
functions such that the ve conditions from Denition 3.3.1 of the primitive recursive
functions hold, and additionally
(vi) (Unbounded search, minimization, or -recursion.) If (x
1
, . . . , x
n
, y) is a par-
tial recursive function of n + 1 variables, and we dene (x
1
, . . . , x
n
) to be
the least y such that (x
1
, . . . , x
n
, y) = 0 and (x
1
, . . . , x
n
, z) is dened for all
z < y, then is a partial recursive function of n variables.
One of the most important features of this closure scheme is that it introduces
partiality; the primitive recursive functions are all total. A function using un-
bounded search can be total, of course, and in fact the Ackermann function requires
unbounded search, despite being total. That is a sign that we perhaps need more
than just the primitive recursive functions to capture all of computability.
Why should partiality be allowed? The rst reason is that allowing all operations
that seem like they ought to be allowed results in the possibility of partial functions.
That is, from the modern perspective, real computers sometimes get caught in
innite loops. A more practical reason is that we cant get at just the total
42 CHAPTER 3. DEFINING COMPUTABILITY
functions from the collection of all partial recursive functions. Theres no way to
single them out; this notion is made precise as Theorem 3.4.6, below.
The name -recursion comes from a common notation. The symbol , or -
operator, is read the least and is used (from a purely formula-writing standpoint)
in the same way that quantiers are used. For example, x(x > 5) is read the least
x such that x is greater than ve and returns the value 6. In -notation, we could
dene (x
1
, . . . , x
n
) = y[(x
1
, . . . , x
n
, y) = 0 & (z < y)(x
1
, . . . , x
n
, z)].
Example 3.3.7. Using unbounded search we can easily write a square root function
to return
x if x is a square number and diverge otherwise.
We will use the primitive recursive functions +, , and integer subtraction
.
(where x
.
y = max0, x y) without derivation. We would like the following:
(x) = y[(x
.
(y y)) + ((y y)
.
x) = 0].
To properly dene the function in brackets requires some nested applications of
composition, even taking the three arithmetic operators as given.
3.4 Coding and Countability
So far weve computed only with natural numbers. How could we dene computation
on domains outside of N? If the desired domain is countable, we may be able to
encode its members as natural numbers. For example, we could code Z into N by
using the even natural numbers to represent nonnegative integers, and the odd to
represent negative integers. Specically, we can write the computable function
f(k) =
_
2k k 0
2k 1 k < 0
To move into N
2
, the set of ordered pairs of natural numbers, there is a standard
pairing function indicated by angle brackets:
x, y :=
1
2
(x
2
+ 2xy + y
2
+ 3x + y).
For longer tuples we iterate, so for example x, y, z := x, y, z. Note this gives
us a way to encode the rational numbers, Q. It also lets us treat multivariable
functions in the same way as single-input functions.
The pairing function is often given as a magic formula from on high, but its quite
easy to derive. You may be familiar with Cantors proof that the rational numbers
are the same size as the natural numbers, where he walks diagonally through the grid
of integer-coordinate points in the rst quadrant and skips any that have common
factors (if not, see Appendix A.3). We can do essentially that now, though we wont
skip anything.
Starting with the origin, we take each diagonal and walk down it from the top.
3.4. CODING AND COUNTABILITY 43
(0, 3)
##
(0, 2)
##
(1, 2)
##
(0, 1)
##
(1, 1)
##
(2, 1)
##
(0, 0)
OO
(1, 0)
ZZ
(2, 0)
]]
(3, 0)
The number of pairs on a given diagonal is one more than the sum of the entries
of each pair. The number of pairs above a given one on its own diagonal is its rst
entry, so if we want to number these from 0, we let (x, y) map to
1 + 2 +. . . + (x + y) + x,
where all terms except the last correspond to the diagonals below (x, y)s diagonal.
This sums to
(x + y + 1)(x + y)
2
+ x =
1
2
(x
2
+ 2xy + y
2
+ 3x + y).
If you are unfamiliar with the formula for the summation of the integers 1 through
n, you can nd it in Appendix A.2.
The key elements of any coding function are that it be bijective and computable.
There are two ways to think about how computation is performed under coding.
1. The Turing machine can decode the input, perform the computation, and
encode the answer.
2. The Turing machine can compute on the encoded input directly, obtaining the
encoded output.
Exercise 3.4.1. Consider Z encoded into N by f above. Write a function which
takes f(k) as input and outputs f(2k) using approach 2 above. By the Church-
Turing thesis you need not write a Turing machine or a formal partial recursive
function; an algebraic expression will suce.
There are limitations on what kinds of objects can be encoded: they must come
from a set that is eectively countable. Here, in countable we include nite; all nite
sets are eectively countable. An innite countable set is eectively countable if
there exists a computable bijection between the set and N which also has computable
inverse. If we have such a bijection, we can use the image of an object, which will
be a natural number, as the code of the object. This is equivalent to the objects
44 CHAPTER 3. DEFINING COMPUTABILITY
being representable by nite sequences of symbols that come from a nite alphabet,
so that the symbols can be represented by numbers and the sequences of numbers
collapsed down via pairing or a similar function. In fact, pairing is a bijection with
N that shows that N
2
and in fact N
k
for all k are eectively countable.
In fact,
k
N
k
is eectively countable, where here I intend k to start at 0
(N
0
= ). The function
:
_
k0
N
k
N
given by () = 0 and
(a
1
, . . . , a
k
) = 2
a
1
+ 2
a
1
+a
2
+1
+ 2
a
1
+a
2
+a
3
+2
+ . . . + 2
a
1
+a
2
+...+a
k
+k1
demonstrates the eective countability. A singleton that is, an element of N itself
is mapped to a number with binary representation using a single 1. An n-tuple
maps to a number whose binary representation uses exactly n 1s.
Exercise 3.4.2. (i) Find the images under of the tuples (0, 0), (0, 0, 0), (0, 1, 2),
and (2, 1, 0).
(ii) What is the purpose of summing subsequences of a
i
in the exponents?
(iii) What is the purpose of adding 1, 2, . . . , k 1 in the exponents?
(iv) Prove that is a bijection.
Exercise 3.4.3. (i) Given disjoint eectively countable sets A and B, prove that
A B is eectively countable.
(ii) Given eectively countable sets A and B that are not necessarily disjoint,
prove that A B is eectively countable.
Exercise 3.4.4. (i) Show that if a class A of objects is constructed recursively
using a nite set of basic objects and a nite collection of computable building-
up rules (see 2.4), A is eectively countable.
(ii) Show that even if the sets of basic objects and rules in part (i) are innite, as
long as they are eectively countable, so is A. 5.1 may be helpful.
Coding is generally swept under the rug; in research papers one generally sees
at most a comment to the eect of we assume a coding of [our objects] as natural
numbers is xed. It is a vital component of computability theory, however, as it
removes a need for separate denitions of algorithm for dierent kinds of objects.
There is another particular coding we need to discuss. The set of Turing ma-
chines is, in fact, eectively countable; the TMs may be coded as natural numbers.
One way to code them is to rst interpret an element of N as a nite subset of N, as
3.4. CODING AND COUNTABILITY 45
in Exercise 3.1.4, and then interpret the elements of that nite subset as quadruples.
A natural number n will be read as k + , where 0 7 and k is a multiple of
8. We decode k/8 into i, j and interpret it to say the starting and ending states
of the quadruple are q
i
and q
j
. Then will give the symbol read and action taken;
0 , 1 1, 2 L, 3 R, and likewise for the four pairs beginning with 1.
Note in this example we are using just the symbols and 1, but it is clear how this
generalizes to any nite symbol set.
It is important to note that any method of coding will include junk machines,
codes that may be interpreted as TMs but which give machines that dont do any-
thing. There will also be codes that give machines that, while dierent, compute
the same function. In fact, we can prove the Padding Theorem, Exercise 3.4.5, after
a bit of vocabulary.
We call the code of a Turing machine its index, and say when we choose a partic-
ular coding that we x an enumeration of the Turing machines (or, equivalently, the
partial recursive functions). It is common to use for partial recursive functions;
e
is the e
th
function in the enumeration, the function with index e, and the function
that encodes to e. We often use the indices simply as tags, to put an ordering on
the functions, but it is often important to remember that the index is the function,
in a very literal way.
Exercise 3.4.5. Prove that given any index of a Turing machine M, there is a
larger index which codes a machine that computes the same function as M. This is
called the Padding Theorem.
Another collection of objects commonly indexed is the nite sets, as in Exercise
3.1.4. The n
th
nite set, or set corresponding to n in the bijection, is typically
denoted D
n
.
We are now in the position to demonstrate a very practical reason to allow
partial functions in our denition of computability. Recall that by total computable
function we mean a function from the class of partial computable functions which
happens to be total.
Theorem 3.4.6. The total computable functions are not eectively countable. That
is, there is no computable indexing of exactly the total computable functions.
Proof. Suppose the contrary and let f
e
denote the e
th
function in an enumeration
of all total computable functions. We dene a new total computable function as
follows:
g(e) = f
e
(e) + 1.
Since all f
e
are total computable, it is clear that g is total computable.
1
Hence g
must have an index; that is, there must be some e
such that g = f
e
. However,
g(e
) ,= f
e
(e
can output
the same thing T
does in this halting branch, since the output will not depend on
which halting branch T
nds rst.
Exercise 3.6.6. Turn the idea above into a proof of Claim 3.6.5.
The Lambda Calculus
This is an important function denition of computability. Those with an interest
in computer science may know that the lambda calculus is the basis of functional
programming languages such as Lisp and Scheme. This was Churchs main con-
tender for the denition of computable. It is one of the many models of computation
which is equivalent to Turing machines and partial recursive functions; since it is
important well explore it in some depth, though still only getting a taste of it. I
learned about the lambda calculus in a programming languages class I took from
Peter Kogge at Notre Dame, and this section is drawn from his lecture notes and
book [28].
2
In computer science, we might refer to the domain as the language the machine accepts.
50 CHAPTER 3. DEFINING COMPUTABILITY
The lambda calculus is based entirely on substitution; typical expressions look
like
(x[M)A,
which then is written [A/x]M, and means replace every instance of x in M by A.
Expressions are built recursively. We have a symbol set which consists of paren-
theses, [, , and an innite collection of identiers, generally represented by lower-
case letters. An expression can be an identier, a function, or a pair of expressions
side-by-side, where a function is of the form (identier[expression). We will use
capital letters to denote arbitrary lambda expressions. Formally everything should
be thoroughly parenthesized, but understanding that evaluation always happens
left to right (i.e., E
1
E
2
E
3
means (E
1
E
2
)E
3
, and so on) we may often drop a lot of
parentheses. In particular,
(xy[M)AB = ((x[(y[M)))A)B = [B/y]([A/x]M).
Identiers are essentially variables, but are called identiers instead because
their values dont change over time. We solve problems with lambda calculus by
manipulating the form the variables appear in, not their values. An identier x
occurs free in expression E if (1) E = x, (2) E = (y[A), y ,= x, and x appears
free in A, or (3) E = AB and x appears free in either A or B. Otherwise x occurs
bound (or does not occur). In (x[M), only free occurrences of x are candidates
for substitution, and no substitution is allowed which converts a free variable to a
bound one. If that would be the result of substitution, we rename the problematic
variable instead.
Here are the full substitution rules for (x[E)A [A/x]E E
. They are
dened recursively, in cases matching those of the recursive denition of expression.
1. If E = y, an identier, then if y = x, E
= A. Otherwise E
= E.
2. If E = BC for some expressions B, C, then E
= (([A/x]B)([A/x]C)).
3. If E = (y[C) for some expression C and
(i) y = x, then E
= E.
(ii) y ,= x where y does not occur free in A (i.e., substitution will not cause
a free variable to become bound), then E
= (y[[A/x]C).
(iii) y ,= x where y does occur free in A, then E
= (z[[A/x]([z/y]C)), where
z is a symbol that does not occur free in A. This is the renaming rule.
Example 3.6.7. Evaluate
(xy[yxx)(z[yz)(rs[rs).
3.6. OTHER DEFINITIONS OF COMPUTABILITY 51
Remember that formally this is
[(x[(y[yxx))(z[yz)](rs[rs).
The rst instance of substitution should be for x, but this will bind what is currently
a free instance of y, so we apply rule 3.(iii) using identier symbol a:
(y[y(z[az)(z[az))(rs[rs).
Next a straightforward substitution to get
(rs[rs)(z[az)(z[az),
which becomes (z[az)(z[az) and nally a(z[az).
You can see this can rapidly get quite unfriendly to do by hand, but it is very
congenial for computer programming. There are two great strengths to functional
programming languages: all objects are of the same type (functions) and hence are
handled the same way, and evaluation may often be done in parallel. In particular,
if we have (x
1
. . . x
n
[E)A
1
. . . A
m
, where m n, the sequential evaluation
(x
m+1
. . . x
n
[([A
m
/x
m
](. . . ([A
2
/x
2
]([A
1
/x
1
]E)) . . .)))
is equivalent to the simultaneous evaluation
(x
m+1
. . . x
n
[[A
1
/x
1
, A
2
/x
2
, . . . , A
m
/x
m
]E)
provided there are no naming conicts. That is, alongside the restriction of not
having any x
i+1
, . . . , x
n
free in A
i
(which would then bind a free variable, never
allowed), we must know none of the x
m+1
, . . . x
n
appear free in any A
i
, i m.
To start doing arithmetic, we need to be able to represent zero and the rest of the
positive integers, at least implicitly (i.e., via a successor function). Lambda calculus
integers are functions which take two arguments, the rst a successor function and
the second zero, and which (if given the correct inputs) return an expression which
equals an integer.
0 : (sz[z) (sz[z)SZ = [S/s][Z/z]z = Z
1 : (sz[s(z)) (sz[s(z))SZ = S(Z)
.
.
.
K : (sz[ s(s . . . s
. .
(z) . . .)) KSZ = S(S . . . S
. .
(Z) . . .)
K times K times
Interpreting Z as zero and S(E) as the successor of whatever integer is represented
by E, these give the positive integers.
52 CHAPTER 3. DEFINING COMPUTABILITY
We can dene successor as a lambda operator in general, as well as addition
and multiplication. Successor is a function that acts on an integer K (given as a
function) and returns a function that is designed to act on SZ and give K + 1.
Likewise, multiplication and addition are functions that act on a pair of integers K,
L, and return a function designed to act on SZ to give K L or K+L, respectively.
Successor : S(x) = (xyz[y(xyz)).
Addition : (wzyx[wy(zyx)).
Multiplication : (wzy[w(zy)).
I dont know that there is any way to understand these without stepping through
an example.
Example 3.6.8. 2 + 3.
To avoid variable clashes, well use s and a for s and z in 2 and r and b in 3.
2 + 3 = (wzyx[wy(zyx))(sa[s(s(a)))(rb[r(r(r(b))))
= (yx[(sa[s(s(a)))y((rb[r(r(r(b))))yx))
= (yx[(a[y(y(a)))(y(y(y(x)))))
= (yx[y(y(y(y(y(x)))))) = 5.
Exercise 3.6.9. Evaluate S(3).
Exercise 3.6.10. Evaluate 2 3.
Similarly we can dene lambda expressions that execute if. . .then. . .else op-
erations. That is, we want expressions P such that PQR returns Q if P is true,
and R if P is false. Then, additional Boolean operations are useful. We wont step
through these, but Ill give you the denitions and you can work some examples out
yourself.
true : T = (xy[x) false : F = (xy[y)
and : (zw[zwF) or : (zw[zTw)
not : (z[zFT)
Exercise 3.6.11. Work out the following operations:
not T, not F
and TT, and TF, and FT, and FF
or TT, or TF, or FT, or FF
or(and TF)(not F)
3.6. OTHER DEFINITIONS OF COMPUTABILITY 53
The missing piece to understand how this can be equivalent to Turing machines
is recursion, in the computer science sense: if A is a base case for R, then RA is
simply evaluated, and if not, then RA reduces to something like RB, where B is
somehow simpler than A. This is our looping procedure; it requires R calling itself
as a subfunction. To make expressions call themselves we rst need to make them
duplicate themselves. We begin with the magic function
(x[xx)(x[xx).
Try doing the substitution called for. Next, given some expression R wherein x does
not occur free, try evaluating
(x[R(xx))(x[R(xx)).
This is not so general, however, and so we remove the hard-coding of R via an-
other lambda operator. This gives us our second magic function, the xed point
combinator Y .
Y = (y[(x[y(xx))(x[y(xx))).
When Y is applied to some other expression R, the result is to layer Rs onto the
front:
Y R = R(Y R) = R(R(Y R)) = R(R(R(Y R))) . . . .
Finally, consider (Y R)A, to get to our original goal. This evaluates to R(Y R)A; if
R is a function of two variables, it can test A and return the appropriate expression
if A passes the test, throwing away the (Y R) part, and if A fails the test it can use
the (Y R) to generate a new copy of R for the next step of the recursion. Well omit
any examples.
Unlimited Register Machines
Unlimited register machines, or URMs, are (as you would guess) a machine denition
of computability. Nigel Cutland uses them as the main model of computation in his
book Computability [10]; they are easier to work with than Turing machines if you
want to get into the guts of the model, while still basic enough that the proofs remain
manageable. This should feel like a Turing machine made more human-friendly.
The URM has an unlimited memory in the form of registers R
i
, each of which
can hold a natural number denoted r
i
. The machine has a program which is a nite
list of instructions, and based on those instructions it may alter the contents of its
registers. Note that a given computation will only be able to use nitely-many of
the registers, just as a Turing machine uses only nitely-many spaces on its tape,
but we cannot cap how many it will need in advance.
There are four kinds of instructions.
54 CHAPTER 3. DEFINING COMPUTABILITY
(i) Zero instructions: Z(n) tells the URM to change the contents of R
n
to 0.
(ii) Successor instructions: S(n) tells the URM to increment (that is, increase by
one) the contents of R
n
.
(iii) Transfer instructions: T(m, n) tells the URM to replace the contents of R
n
with the contents of R
m
. The contents of R
m
are unchanged.
(iv) Jump instructions: J(m, n, i) tells the URM to compare the contents of R
n
and R
m
. If r
n
= r
m
, it is to jump to the i
th
instruction in its program and
proceed from there; if r
n
,= r
m
it continues to the instruction following the jump
instruction. This allows for looping. If there are fewer than i instructions in
the program the machine halts.
The machine will also halt if it has executed the nal instruction of the program,
and that instruction did not jump it back into the program. You can see where
innite loops might happen: r
n
= r
m
, the URM hits J(m, n, i) and is bounced
backward to the i
th
instruction, and nothing between the i
th
instruction and the
instruction J(m, n, i) either changes the contents of one of R
n
or R
m
or jumps the
machine out of the loop.
A computation using the URM consists of a program and an initial conguration;
that is, the initial contents of the registers.
Example 3.6.12. Using three registers we can compute sums. The initial contents
of the registers will be x, y, 0, 0, 0, . . . , where we would like to compute x+y. The
sum will ultimately be in the rst register and the rest will be zero.
We have only successor to increase our values, so well apply it to x y-many
times. The third register will keep track of how many times weve done it; once its
contents equal y we want to stop incrementing x, zero the second and third registers,
and halt.
Since our jump instruction jumps when the two values checked are dierent
rather than the same, we have to be clever about how we use it. Here is a program
that will add x and y:
Instructions: Explanation:
1. J(2, 4, 8) if y = 0, nothing to do
2. S(1) increment x
3. S(3) increment counter
4. J(2, 3, 6) jump out of loop if were done
5. J(1, 1, 2) otherwise continue incrementing
6. Z(2) zero y register
7. Z(3) zero counter
3.6. OTHER DEFINITIONS OF COMPUTABILITY 55
Exercise 3.6.13. Write out all steps of the computation of 3+3 using the program
above, including the contents of the registers and the instruction number to be
executed next.
Exercise 3.6.14. Write a URM program to compute products. Note that x y is
the sum of y copies of x, and iterate the addition instructions appropriately. Be
careful to keep your counters for the inside and outside loops separate, and zero
them whenever necessary.
Chapter 4
Working with Computable Functions
4.1 A Universal Turing Machine
From the enumeration of all Turing machines we can denote a universal Turing
machine; that is, a machine which will emulate every other machine. Using to
denote an arbitrary input and 1
e
to denote a string of e 1s, we can dene the
universal machine U by
U(1
e
0) =
e
().
U counts the 1s at the beginning of the input string, decodes that value into the
appropriate set of quadruples, throws out the 0 it sees next, and uses the rest of the
string as input, acting according to the quadruples it decoded. This procedure is
computable because the coding of Turing machines as indices is computable.
This is why it is clear in Theorem 3.4.6 that g is total computable: were
the total computable functions eectively enumerable, we wouldnt have a eet of
disparate f
e
(e) to evaluate for each e; we would have one U(1
e
0e) for all e.
Note that of course there are innitely-many universal Turing machines, as there
are for any program via padding.
We can use the universal machine to construct a total recursive function that is
not primitive recursive, in a dierent way from the Ackermann function. Well still
be pretty sketchy about it, though. Heres the outline:
1. Code all the primitive recursive functions as
n
.
2. Show there exists a computable p(n) such that for all n
p(n)
=
n
, where
p(n)
lives in the standard enumeration of partial recursive functions.
3. Use the universal machine to dene a new function which is total recursive
but not any
n
.
Some more detail:
57
58 CHAPTER 4. WORKING WITH COMPUTABLE FUNCTIONS
1. Conceptually straightforward though technically annoying. We can code the
derivation of our function via composition and primitive recursion, from con-
stants, successor, and projection.
2. Start by arguing we have indices in the standard enumeration
n
for the basic
primitive recursive functions, which is true essentially because we can code
them up on the y in a uniform way (e.g., for constants, with a single function
that takes any pair c, n to an index for the n-ary constant function with output
c). Then we argue that we have explicit indices for composition and recursion
as functions of the constituent functions indices (again exploiting the fact
that the index is the function), which is again true because we can explicitly
code them.
Then, given a -index n, we can uniformly nd p(n), a code for
n
in the
standard enumeration of partial recursive functions. We simply decode the
-index and recode into a -index using the functions whose indices we just
argued we have.
3. Using n to denote not only the integer but also its representation in binary,
dene the function
f(n) = U(1
p(n)
0n) + 1.
That is, f(n) =
p(n)
(n) + 1 =
n
(n) + 1. Since
n
is primitive recursive, it is
total, which means f is total. However, it is not equal to any , as it diers
from each on at least one input.
4.2 The Halting Problem
Is it possible to dene a specic function which is not computable? Yes and no. We
cant write down a procedure, because by the Church-Turing thesis that leads to a
computable function. However, via the indexing of all partial computable functions
we can dene a noncomputable function.
First, a little notation recalled from 3.1. We use arrows to denote halting
behavior: for a function
e
, the notation
e
(n) means n is in the domain of
e
,
and
e
(n) means n is not in the domain of
e
, so
e
fails to halt on input n.
Dene the halting function as follows:
f(e) =
_
1 if
e
(e)
0 if
e
(e) .
To explore the computability of f, dene g:
g(e) =
_
e
(e) + 1 if f(e) = 1
0 if f(e) = 0.
4.3. PARAMETRIZATION 59
Certainly if
e
(e), it is computable to nd the output value, and computable to
add 1. The use of f avoids attempting to compute outputs for divergent computa-
tions, and hence if f is computable, so is g. However, it is straightforward to show
g is not computable, and so the halting function (or halting problem, the question of
determining for which values of e
e
(e)) is not computable. This is a key example,
and we dene the halting set as well:
K = e :
e
(e).
Exercise 4.2.1. Prove that g dened above is not computable. You may nd the
contemplation of Theorem 3.4.6 helpful.
4.3 Parametrization
Parametrization means something dierent in computability theory than it does in
calculus. What we mean here is the ability to push input parameters into the index
of a function. Here is the rst place that it is important that the indexing of Turing
machines be xed, and where we take major advantage of the fact that the index
a natural number contains all the information we need to reconstruct the machine
itself.
The simplest form of the s-m-n Theorem, which is what we traditionally call the
parametrization theorem, is the following.
Theorem 4.3.1. There is a total computable function S
1
1
such that for all e, x, and
y,
e
(x, y) =
S
1
1
(e,x)
(y).
If you accept a really loose description, this is very simple to prove: S
1
1
decodes
e, lls x into the appropriate spots, and recodes the resulting algorithm. The key
is that although the new algorithm depends on e and x, it does so uniformly the
method is the same regardless of the numbers.
This is a good moment to pause and think about uniformity, a key idea in
computability. A process is uniform in its inputs if it is like a choose-your-own-
adventure book: all possible paths from start to nish are already there in the book,
and the particular inputs just tell you which path youll take this time. Uniformity
allows for a single function or construction method or similar process to work for
every instance, rather than needing a new one for each instance.
Exercise 4.3.2. Prove there is a computable function f such that
f(x)
(y) = 2
x
(y)
for all y. Hint: think of an appropriate function
e
(x, y).
The full version of the theorem allows more than one variable to be moved, and
more than one to remain as input. More uses of both versions appear in sections to
come.
60 CHAPTER 4. WORKING WITH COMPUTABLE FUNCTIONS
Theorem 4.3.3 (Kleene 1938). Given m, n, there is a primitive recursive one-to-
one function S
m
n
such that for all e, all n-tuples x, and all m-tuples y,
S
m
n
(e, x)
( y) =
e
( x, y).
That you can get this to be primitive recursive is interesting but not too im-
portant. The fact that you can force it to be one-to-one follows from the Padding
Theorem (Exercise 3.4.5).
Ill note that while it looks at rst like all this is doing is allowing you to com-
putably incorporate data into an algorithm, the fact that the data could itself be
a code of an algorithm means this is more than that; it is composition via indices.
In particular, parametrization and the universal machine give us a way to translate
operations on sets and functions to operations on indices.
For example, suppose we want to nd an index for
x
+
y
uniformly in x and
y. We can let f(x, y, z) =
x
(z) +
y
(z) by letting it equal U(1
x
0z) + U(1
y
0z),
so everything that was either in input or index is now in input. Then the s-m-n
theorem gives us a computable function s(x, y) such that
s(x,y)
(z) = f(x, y, z), so
that is the index for
x
+
y
as a (total computable) function of x and y.
In computer programming, this process of reducing the number of arguments of
a function is called currying, after logician Haskell Curry; when specic inputs are
given to the S
m
n
function it is called partial evaluation or partial application.
4.4 The Recursion Theorem
Kleenes Recursion Theorem, though provable in only a few lines, is probably the
most conceptually challenging theorem in fundamental computability theory, at
least in the way it is usually presented. It is extremely useful vital, in fact for
a large number of proofs in the eld. We will discuss this a bit after meeting the
theorem and some of its corollaries.
Recall that equality for partial functions is the assertion that when one diverges,
so does the other, and when they converge it is to the same output value.
Theorem 4.4.1 (Recursion or Fixed-Point Theorem, Kleene). Suppose that f is a
total computable function; then there is a number n such that
n
=
f(n)
. Moreover,
n is computable from an index for f.
Proof. This is the magical proof of the theorem. By the s-m-n theorem there is a
total computable function s(x) such that for all x and y
f(x(x))
(y) =
s(x)
(y).
Let m be any index such that
m
computes the function s; note that s and hence
m are computable from an index for f. Rewriting the statement above yields
f(x(x))
(y) =
m(x)
(y).
4.4. THE RECURSION THEOREM 61
Then, putting x = m and letting n =
m
(m) (which is dened because s is total),
we have
f(n)
(y) =
n
(y)
as required.
Corollary 4.4.2. There is some n such that
n
=
n+1
.
Corollary 4.4.3. If f is a total computable function then there are arbitrarily large
numbers n such that
f(n)
=
n
.
Corollary 4.4.4. If f(x, y) is any partial computable function there is an index e
such that
e
(y) = f(e, y).
Exercise 4.4.5. (i) Prove Corollary 4.4.3. Note that we might obtain a xed
point for f from a dierent function g dened to be suitably related to f.
(ii) Prove Corollary 4.4.4. It requires both the Recursion Theorem and the s-m-n
theorem.
Exercise 4.4.6. Prove the following applications of Corollary 4.4.4:
(i) There is a number n such that
n
(x) = x
n
.
(ii) There is a number n such that the domain of
n
is n.
We may prove index set results easily from the Recursion Theorem, where A N
is an index set if it has the property that if x A and
x
=
y
, then y A.
Theorem 4.4.7 (Rices Theorem). Suppose that A is an index set not equal to
or N. Then A is not computable.
Proof. Begins as follows: work by contradiction, supposing A is computable. Set
some a A and b / A and consider the function
f(x) =
_
a x / A
b x A
Apply the Recursion Theorem.
Exercise 4.4.8. Complete the proof of Rices Theorem.
We may also use the Recursion Theorem to prove results about enumeration of
Turing machines. In particular, there is no eective enumeration which takes the
rst instance of each function and omits the rest.
Theorem 4.4.9. Suppose that f is a total increasing function such that
62 CHAPTER 4. WORKING WITH COMPUTABLE FUNCTIONS
(i) if m ,= n, then
f(m)
,=
f(n)
,
(ii) f(n) is the least index of the function
f(n)
.
Then f is not computable.
Proof. Suppose f satises the conditions of the theorem. By (i), f cannot be the
identity, so since it is increasing there is some k such that for all n k, f(n) > n.
Therefore by (ii),
f(n)
,=
n
for every n k. However, if f is computable, this
violates Corollary 4.4.3.
Now lets go back and discuss the theorem and its use in the wider world. The
Recursion Theorem is often described as a diagonalization argument that fails;
partiality is, in some sense, a built-in defense against diagonalization. In particular,
if we wanted to dene a function that diered from
e
on input e, we would have
to know whether
e
(e), which bounces us out of the realm of the computable. The
Recursion Theorem is a strong statement of the failure of that attempt.
In more detail, dene the diagonal function by (e) =
e
(e). This is a partial
computable function; its domain is K, the Halting Set. For any total f we can
dene f (x) as the result of the usual composition if (x) halts and undened
otherwise (conrm to yourself that composition dened in that way gives a partial
computable function). Hence f is
e
for some e, and if f (e) is dened it
equals (e). In that case (e) is a xed point for f, in the literal sense rather than
the machine index sense. Now, we can see f (e) cant always be dened, because
f(e) = e + 1 is partial computable, but has no literal xed point.
What we get instead is Corollary 4.4.2, a xed point at the machine index level.
The s-m-n theorem gives a total computable function d such that
d(i)
=
(i)
for
all i such that (i), and then the function s such that
s(i)
=
fd(i)
. The argument
from the previous paragraph gives us the rest, with adjusted functions: f d will
be
e
for some e, so f(d(e)) = (e) (now we are able to assert this is dened). By
denition of d, d(e) and (e) index the same function, so
d(e)
=
f(d(e))
and d(e) is
the sought-after xed point.
This is extraordinarily useful in constructions. Many of the uses can be summed
up as building a Turing machine using the index of the nished machine. The
construction will have early on a line something like We construct a partial com-
putable function and assume by the Recursion Theorem that we have an index
e for . This looks insane, but it is completely valid. The construction, which
will be computable, is the function for which we seek a xed point (at the index
level). Computability theorists think of a construction as a program. It might have
outside components the statement of the theorem could say For every function
f of this type, . . . and then the constructions if/then statements would give
dierent results depending on which particular f was in play, but such variations
4.5. UNSOLVABILITY 63
will be uniform, as described in 4.3. That is, the construction is like a choose-your-
own-adventure book, or a complicated owchart. The particular function f selects
the option, but what happens for all possible sequences of options is already laid
out. Likewise, if we give the construction the input e to be interpreted as the index
of a partial computable function, it can use e to produce e
, which is an index of
the function it is trying to build. The Recursion Theorem says the construction
will have a xed point, some i such that i and i
M
. We extract and modify the contents of Ms tape, treating it as a particular
word. We insert an additional symbol into the word beyond the tape contents to
indicate Ms current state and location of the read/write head (put it just left of
the current tape square), and the productions of
M
follow naturally:
66 CHAPTER 4. WORKING WITH COMPUTABLE FUNCTIONS
(i) Rewriting: if q
i
, S
j
, S
k
, q
S
k
to
M
.
(ii) Moving: if q
i
, S
j
, R, q
S
k
to
M
.
Similarly for q
i
, S
j
, L, q
.
The axiom of
M
is the initial state followed by the initial contents of the tape; i.e.,
the input m.
The mimicry of M were aiming for is to have a particular word be a theorem
of
M
if and only if M halts on the input m. To clean things up and take care
of special cases, we add a special unused symbol (h) to the beginning and end of
each word, and add productions that deal with that. We also add special state-like
symbols q, q
that are switched into when we hit a dead end: For every state q
i
and
symbol S
j
that do not begin any quadruple of M, add the production q
i
S
j
qS
j
.
Once were in q we delete symbols to the right: for every symbol S
i
,
M
contains
qS
i
q. When we hit the right end, switch into q
: qh q
h. Finally, delete
symbols to the left: S
i
q
h.
We have proved the following theorem:
Theorem 4.5.8. It is not possible in general to decide whether or not a word is a
theorem of a semi-Thue system.
Exercise 4.5.9. How did the mimicry of Turing machines by semi-Thue systems
give us Theorem 4.5.8?
Exercise 4.5.10. Write a proof of Theorem 4.5.8. In particular, ll in the de-
tails of the symbol h, formally verify that the construction works, and include the
explanation of Exercise 4.5.9.
Post Correspondence
Im including this one mostly because its cute. We use the term alphabet for the
set of all symbols used. If A is an alphabet and w a word all of whose symbols are
in A we call w a word on A.
Denition 4.5.11 (Post, 1946 [50]). A Post correspondence system consists of an al-
phabet A and a nite set of ordered pairs h
i
, k
i
, 1 i m, of words on A. A word
u on A is called a solution of the system if for some sequence i i
1
, i
2
, . . . , i
n
m
(the i
j
need not be distinct) we have u = h
i
1
h
i
2
h
in
= k
i
1
k
i
2
k
in
.
That is, given two lists of m words, h
1
, . . . , h
m
and k
1
, . . . , k
m
, we want
to determine whether any concatenation of words from the h list is equal to the
concatenation of the corresponding words from the k list. A solution is such a
concatenation.
4.5. UNSOLVABILITY 67
Example 4.5.12. The word aaabbabaaaba is a solution to the system
a
2
, a
3
, b, ab, aba, ba, ab
3
, b
4
, ab
2
a, b
2
,
as shown by the two decompositions
aa abba b aa aba
aaa bb ab aaa ba
In fact, the segments aaabbab and aaaba are individually solutions as well.
Given a semi-Thue process and a word v, we can construct a Post correspon-
dence system that has a solution if and only if v is a theorem of . Then we can
conclude the following.
Theorem 4.5.13. There is no algorithm for determining whether or not a given
arbitrary Post correspondence system has a solution.
Proof. Let be a semi-Thue process on alphabet A = a
1
, . . . , a
n
with axiom u,
and let v be a word on A. We construct a Post correspondence system P such that
P has a solution if and only if v is a theorem of . The alphabet of P is
B = a
1
, . . . , a
n
, a
1
, . . . , a
n
, [, ], ,
,
with 2n + 4 symbols. For any word w on A, write w
.
Suppose the productions of are g
i
g
i
, 1 i k, and assume these in-
clude the n identity productions a
i
a
i
, 1 i n. Note this is without loss
of generality as the identity productions do not change the set of theorems of .
However, we may now assert that v is a theorem of if and only if we can write
u = u
1
u
2
u
m
= v for some odd m.
Let P consist of the following pairs:
[u, [, ,
, , ],
v],
g
j
, g
j
,
g
j
, g
j
_
for 1 j k
Let u = u
1
u
2
u
m
= v, where m is odd. Then the word
w = [u
1
u
u
3
u
m1
u
m
]
is a solution of P, with the decompositions
[u
1
u
u
3
]
[ u
1
u
u
m
],
68 CHAPTER 4. WORKING WITH COMPUTABLE FUNCTIONS
where u
2
corresponds to u
1
by the concatenation of three pairs: we can write
u
1
= rg
j
s, u
2
= r g
j
s for some 1 j k. Then u
2
= r
j
s
v],
and our decompositions are forced at the ends to be the pairs [u, [, ],
v]. This
gives us the initial correspondences
[u
v ]
[ u
v]
We must have u corresponding to some r
and v to some s
, where u r and
s v. Then the and
must correspond to a
v ]
[ u r
v]
Iterating this procedure, we see that w shows u v.
Furthermore, any solution w must begin with [ and end with ] (possibly with
additional brackets in the middle). We have forced this by adding
to the symbols
in half of every pair. For w to be a solution, the symbol at the beginning of w
must also begin both elements of a pair of P, and the only symbol that does so is
[; likewise the only symbol that ends both elements of a pair of P is ].
Hence, P has a solution if and only if v is a theorem of ; if we can always
decide whether a Post correspondence problem has a solution we have contradicted
Theorem 4.5.8.
As a nal note, we point out that this undecidability result is for arbitrary Post
correspondence systems. We may get decidability results by restricting the size of
the alphabet or the number of pairs h
i
, k
i
. If we restrict to alphabets with only one
symbol but any number of pairs, then the Post correspondence problem is decidable.
If we allow two symbols and any number of pairs, it is undecidable. If we restrict
to only one pair or two pairs of words, the problem is decidable regardless of the
number of symbols [18], and at 7 pairs it is undecidable [45]. Between three and six
pairs inclusive the question is still open.
Mathematical Logic
In 2.1 we met predicate logic as a way of writing formulas; it includes negation
(), the connectives &, , and , and the quantiers and . Here we must
4.5. UNSOLVABILITY 69
broaden our perspective a bit to dene a logic which is a whole unto itself rather
than just a notational system.
A particular (predicate) logic consists of the symbols above as well as variables,
constants, functions, and relations; all together they are called the language. We
used this idea without comment in 2.1 when writing formulas about N or other
number systems, using constants for elements of N, arithmetic functions, and the
relations = and <. We dene the notion of formula recursively (see 2.4), beginning
with simpler notions.
A term is a constant, a variable, or a function of terms. For example, if , +,
and 2 are in our language, 2 (x +y) is a term because 2, x, and y are individually
terms, x + y is a term because it is a function of x and y, and 2 (x + y) is a term
because it is a function of the terms 2 and x + y.
An atomic formula is a relation of terms, such as 2 (x + y) > 5. All atomic
formulas are formulas, and if , are formulas, so are (&), (), (), x(),
and x().
To dene a particular logic we explicitly state the collections of constants, func-
tions, and relations included. As an aside, this allows distinctions of in which
systems a particular property is denable by a predicate formula. Any given logic
may have multiple interpretations, structures /in which every constant, predicate,
and function has a specied meaning. For example, N with the usual arithmetic
and 0, 1, . . . , 11 with arithmetic modulo 12 are distinct interpretations of the logic
with constants 0 and 1, functions + and , and relation =. The structure / mod-
els a formula , written / [= , if is true under the interpretation. is valid
(tautological) if it is true in all interpretations, denoted .
The problem is this: given a logic , determine whether the -formula is valid.
To prove that this is unsolvable, for each Turing machine M we construct a logic
M
and a formula
M
of
M
such that
M
is valid if and only if M halts. We give
only a sketch of the proof, which is very similar to the construction for semi-Thue
systems, Theorem 4.5.8.
Given a Turing machine M on the alphabet 0, 1 with internal states q
0
, . . . , q
n
,
let the constants of the language
M
be 0, 1, q
0
, . . . , q
n
, q, q
, h.
M
has one func-
tion f and one relation Q. The function f(x, y), also written (xy), concatenates
constants: the terms of
M
are words on the alphabet of constants and variables.
The binary relation Q(t
1
, t
2
) (for quadruple), which we will also write t
1
t
2
, holds
exactly when t
2
is obtained from t
1
by one of the semi-Thue productions in the
proof of Theorem 4.5.8. That is, we have the formula (x, y)[xAy xBy] whenever
A B is one of the productions of
M
.
We now have a collection of nitely-many formulas well call axioms, one for
each semi-Thue production. We need two more. The rst says f is associative:
(x, y, z)[((xy)z) = (x(yz))].
70 CHAPTER 4. WORKING WITH COMPUTABLE FUNCTIONS
The second says Q is transitive:
(x, y, z)[(x y & y z) (x z)].
Let the conjunction of all the above axioms be called . Then the Turing ma-
chine M halts on input x with initial conguration q
0
if and only if the formula
:= (hq
0
x h hq
s
A
s
.
It is straightforward to see that there are innitely many sets that are not even
c.e., much less computable. It is traditional to denote the domain of
e
by W
e
(and
hence the stage-s approximation by W
e,s
). The c.e. (including computable) sets are
all listed out in the enumeration W
0
, W
1
, W
2
, . . ., which is a countable collection of
sets. However, the power set of N, which is the set of all sets of natural numbers, is
uncountable. Therefore in fact there are not only innitely many but uncountably
many sets that are not computably enumerable.
Exercise 5.2.6. Prove that if A is c.e., A is computable if and only if A is c.e.
5.2. COMPUTING AND ENUMERATING 75
Exercise 5.2.7. Use Exercise 5.2.6 and the enumeration of c.e. sets, W
e
eN
, to
give an alternate proof of the noncomputability of K.
Exercise 5.2.8. Prove that an innite set is computable if and only if it can be
computably enumerated in increasing order (that is, it is the range of a monotone
total computable function).
Exercise 5.2.9. Prove that if A is computable, and B A is c.e., then B is
computable if and only if A B is c.e. Prove that if A is only c.e., B A c.e., we
cannot conclude B is computable even if A B is computable.
Exercise 5.2.10. Prove the reduction property: given any two c.e. sets A, B there
are c.e. sets
A A,
B B such that
A
B = and
A
B = A B.
Exercise 5.2.11. Prove that the c.e. sets are uniformly enumerable: there is a
single computable procedure that enumerates the pair e, x if and only if x W
e
.
Exercise 5.2.12. Prove that the collection (A
n
, B
n
)
nN
of all pairs of disjoint
c.e. sets is uniformly enumerable, with the denition in Exercise 5.2.11 modied to
involve triples n, i, x, where i 0, 1 indicates A or B. Note that as with the c.e.
sets, the enumeration will contain repeats.
Exercise 5.2.13. Show that any innite c.e. set contains an innite computable
subset.
Exercise 5.2.14. Show that any innite set contains a noncomputable subset.
Exercise 5.2.15. Prove that if A and B are both computable (respectively, c.e.),
then the following sets are also computable (respectively, c.e.).
(i) A B,
(ii) A B,
(iii) A B := 2n : n A 2n + 1 : n B, the disjoint union or join.
Exercise 5.2.16. Show that if AB, as dened above, is computable (respectively,
c.e.), then A and B are both computable (c.e.).
Exercise 5.2.17. Two c.e. sets A, B are computably separable if there is a com-
putable set C that contains A and is disjoint from B. They are computably insepa-
rable otherwise.
(i) Let A = x :
x
(x)= 0 and B = x :
x
(x)= 1. Show A and B are
computably inseparable.
76 CHAPTER 5. COMPUTABLE AND COMPUTABLY ENUMERABLE SETS
(ii) Let (A
n
, B
n
)
nN
be the enumeration of all disjoint pairs of c.e. sets as in
Exercise 5.2.12. Let x A iff x A
x
and x B iff x B
x
, and show A and
B are computably inseparable. Hint: What if C were one of the B
n
?
Exercise 5.2.18. Show that if A is computably enumerable, the union B =
_
eA
W
e
is computably enumerable. If A is computable, is B computable? Can you make
any claims about C =
eA
W
e
given the computability or enumerability of A?
Exercise 5.2.19. (i) Call a relation computable if when coded into a subset of
N, that set is computable. Given a computable binary relation R, prove the
set A = x : (y)((x, y) R) is c.e.
(ii) With R and A as above, prove that if A is noncomputable, for every total
computable function f, there is some x A such that every y such that
(x, y) R satises y > f(x).
5.3 Noncomputable Sets Part I
So how do we create a noncomputable set? One way is by making its characteristic
function nonequal to every total computable function. We can do this diagonally,
by making A such that
A
(e) ,=
e
(e).
We want to say Let
A
(e) = 1 if
e
(e) = 0, and otherwise let it be 0. Its not
so hard to prove A dened that way is c.e., but to generalize we have to be a little
more careful, enumerating A gradually as we learn about the results of the various
e
(e) computations.
Lets consider this diagonal example further. We can think of this denition as
an innite collection of requirements
R
e
:
A
(e) ,=
e
(e).
We win each individual requirement if either
e
(e), or
e
(e) but gives a value
dierent from
A
(e). We must also make sure A is c.e., which is a single requirement
that permeates the construction.
To make sure A is c.e., we put elements into it but never take them out, and
we make sure every step of the construction is computable. The construction itself,
then, is the computable procedure that enumerates A.
Meeting each R
e
will be local; none of the requirements will interact with any
others. We dovetail the computations in question as in 5.1, so we will eventually
see the end of any convergent computation. If
e
(e)= 0 at stage s we put e into A
at that stage. If we never see that we keep e out; thats the whole construction.
5.4. NONCOMPUTABLE SETS PART II: SIMPLE SETS 77
Why does this give a noncomputable set? In other words, why does it satisfy
the requirements? Because if
e
(e) , we win that requirement. Otherwise the
computation converges at some nite stage. If it converges to 0, at that stage
we put e into A, and
A
(e) = 1 ,=
e
(e). Otherwise we keep e out of A, and
A
(e) = 0 ,=
e
(e).
5.4 Noncomputable Sets Part II: Simple Sets
Denition 5.4.1. A c.e. set A is simple if its complement is innite but contains
no innite c.e. subsets.
That is, if W
e
is innite, it must have nonempty intersection with A, but there
still has to be enough outside of A that A is innite. Note that having an innite
or nite complement is often called being coinnite or conite, respectively.
Exercise 5.4.2. (i) Prove that if A is simple, it is not computable.
(ii) Prove that a coinnite c.e. set is simple if and only if it is not contained in any
coinnite computable set.
(iii) Prove that if A and B are simple, A B is simple and A B is either simple
or conite.
(iv) Prove that if A is simple and W
e
is innite, A W
e
must be innite (not just
nonempty).
We now discuss the construction of a simple set. This perhaps seems technical,
but is the most common way to force a set to be noncomputable in modern construc-
tions (we often want to construct sets with certain properties and use construction
modules to do so; the simplicity module is the most common for noncomputability,
because it turns out to be easier to work with than the module we met in 5.3).
Just as before, we have an innite collection of requirements to meet:
R
e
: ([W
e
[ = ) (A W
e
,= ).
Additionally we have two overarching requirements,
A is c.e.
and
[A[ = .
As before, to make sure A is c.e., we will enumerate it during the construction
and make sure every step of the construction is computable.
78 CHAPTER 5. COMPUTABLE AND COMPUTABLY ENUMERABLE SETS
To meet R
e
while maintaining the size of A, we look for n > 2e such that n W
e
.
When we nd one, we enumerate n into A. Then we stop looking for elements of
W
e
to put into A (the requirement R
e
is satised).
Since W
e
may be nite, we have to dovetail the search as in 5.1, so at stage s
we look at W
e,s
for e < s such that W
e,s
A
s
= .
Why does this work?
As discussed before, A is c.e. because the construction is computable, and
numbers are only put into A, never taken out.
A is innite because only k-many requirements R
e
are allowed to put numbers
below 2k into A for any k, leaving at least k of those numbers in A.
For each W
e
that is innite, there must be some element x > 2e in W
e
.
Eventually s is big enough that (a) we are considering W
e
, and (b) such an x
is in W
e,s
. At that point we will put x into A and R
e
will be satised forever
after.
One thing to note: we cannot tell during enumeration whether any given W
e
will be nite or innite. There could be a long lag time between enumerations, and
we cant tell whether we need to wait longer to get more elements or whether were
done. Because of this, we may act on behalf of some nite sets W
e
unnecessarily.
Thats okay, though, because we set up the 2e safeguard to make sure we never
put so much into A that A becomes nite, and that would be the only way extra
elements in A could hurt the construction.
Chapter 6
Turing Reduction and Posts
Problem
6.1 Reducibility of Sets
Wed like a denition of relative computability that allows us to say (roughly) that
set A is more or less computable than set B. Certainly it should be the case that
the computable sets are more computable than all noncomputable sets; we can
get a ner-grade division than those two layers, however. Intuitively, A is reducible
to B if knowing B makes A computable.
Denition 6.1.1. An oracle Turing machine with oracle A is a Turing machine
that is allowed to ask a nite number of questions of the form is n in A? during
the course of a single computation.
The restriction to only nitely-many questions is so the computation remains
nite. We think of oracle machines as computers with CD drives. We pop the
CD of A into the drive, and the machine can look up nitely many bits from the
CD during its computation on input n. Another way to think of it would be the
machine having an additional internal tape that is read-only and pre-printed with
the characteristic function of A. That perhaps claries how we might code oracle
Turing machines, as well as making very concrete the fact that only nitely-many
questions may be asked of the oracle.
The number and kind of questions the machine asks may vary with not only the
input value, but also the answers it gets; i.e., with the oracle. However, once again
we must have uniformity; you can think of a pre-existing owchart or tree diagram
of the procedure. For example, a simple, pointless function might have a process as
follows:
Given input n, nd if n A.
79
80 CHAPTER 6. TURING REDUCTION AND POSTS PROBLEM
If so, test each k N. If any k
2
= n, halt and output k.
If not, ask if n
2
A.
If so, halt and output n.
If not, go into an innite loop.
We notate oracles by superscript: M
A
for a machine,
A
for a function. This is
where we start needing the brackets notation from 5.1, because we consider the
stage-s approximation of both the oracle and the computation:
As
e,s
(n) abbreviates
to
A
e
(n)[s].
Denition 6.1.2. A set A is Turing reducible to a set B, written A
T
B, if for
some e,
B
e
=
A
. A and B are Turing equivalent, A
T
B, if A
T
B and B
T
A.
This denition may also be made with functions. To match it to the above, we
conate a function f with its (coded) graph x, y : f(x) = y.
Exercise 6.1.3. Prove A
T
A.
Exercise 6.1.4. (i) Prove that
T
is a preorder on T(N); that is, it is a reexive,
transitive relation.
(ii) In fact,
T
is uniformly transitive, which is easiest from the function point of
view: prove there is a function k such that for all i, e, f, g, h, if h =
g
e
and
g =
f
i
, then h =
f
k(e,i)
.
(iii) Prove that
T
is an equivalence relation on T(N).
Exercise 6.1.5. (i) Prove that if A is computable, then A
T
B for all sets B.
(ii) Prove that if A is computable and B
T
A, then B is computable.
One could think of Turing-equivalent sets as being closely related, like A and A
are. The following is about as close as a relationship can get.
Denition 6.1.6. The symmetric dierence of two sets A and B is
A B = (A B) (A B).
If [A B[ < we write A =
B
when A B is nite.
Exercise 6.1.7. Prove =
B, then A
T
B.
6.1. REDUCIBILITY OF SETS 81
(ii) Prove that A
T
B does not imply A =
B.
On the opposite end of the c.e. sets from the computable sets are the complete
sets (or Turing complete): sets that are c.e. and that compute all other c.e. sets.
Recall that the halting set is
K = e :
e
(e).
Theorem 6.1.9 (Post, 1944; see Davis [12]). K is c.e., and if A is computably
enumerable, then A
T
K.
Proof. Given A, we construct a computable function f such that x A f(x) K.
Let e be such that A = W
e
, and dene the function (x, y) to equal 0 if
e
(x), and
diverge otherwise. Since
e
is partial computable, so is , so it is
i
(x, y) for some
i. By the s-m-n Theorem 4.3.1, there is a total computable function S
1
1
such that
S
1
1
(i,x)
(y) =
i
(x, y) for all x and y. However, since i is xed, we may view S
1
1
(i, x)
as a (computable) function of one variable, f(x). Now,
x A
e
(x) y(
f(x)
(y) = 0)
f(x)
(f(x)) f(x) K;
x / A
e
(x) y(
f(x)
(y))
f(x)
(f(x)) f(x) / K.
Exercise 6.1.10. Another way to show K is complete is via an augmented halting
set. Recall that x, y is the code of the pair x, y under the standard pairing function.
Dene K
0
= x, y :
x
(y). You show K
0
is c.e. via dovetailing, as with K, and
it is clear that every c.e. set is computable from K
0
. To complete Theorem 6.1.9,
we need only show K
0
T
K. Prove this directly, using the s-m-n Theorem 4.3.1.
When working with oracle computations we need to know how changes in the
oracle aect the computation, or really, when we can be sure changes wont aect
the computation. Since each computation asks only nitely many questions of the
oracle, we can associate it with a value called the use of the computation. There
are various notations for use; Ill stick to u(A, e, x), the maximum n N that
e
asks A about during its computation on input x, plus one.
Another piece of notation: A n (A restricted to the rst n elements)
means A 0, 1, . . . , n 1 (the reason for the plus one above). Note that if
A (u(A, e, x)) = B (u(A, e, x)), then u(B, e, x) = u(A, e, x) and
A
e
(x) =
B
e
(x).
In words, if A and B agree up to the largest element
e
(x) asks about when
computing relative to A, then in fact on input x there is no dierence between
computing relative to A and relative to B because
e
is following the same path in
its ask about 5: if yes, then ask about 10; if no, then ask about 8 owchart for
computation. If B diers from A up to the use with oracle A, then both the use
and the output with oracle B could be dierent.
82 CHAPTER 6. TURING REDUCTION AND POSTS PROBLEM
What this means for constructions is that if you want to preserve a computation
A
e
(x) while still allowing enumeration into A, you need only prevent enumeration
of numbers u(A, e, x). Any others will leave the computation unharmed.
In 4.5 we saw a collection of problems that correspond to c.e. sets: the set of
theorems of a semi-Thue system or logic is enumerable; the set of concatenations
that might be solutions to a Post correspondence system is enumerable. In full
generality every one of those is equivalent to the halting problem (we noted that
embedding the halting problem into the decision problem was a standard method
to prove undecidability), and it is not clear how one would reduce the generality
in such a way as to become weaker than the halting problem without becoming
computable.
This prompted Emil Post to ask the following question.
Question 6.1.11 (Posts Problem). Is there a set A such that A is noncomputable
and incomplete?
The answer is yes, though it took a while to get there. A renement of the
question asks whether there is a c.e. set that is noncomputable and incomplete. Why
is that a renement? Because any complete set will be Turing-above some non-c.e.
sets as well as all the c.e. ones. So if we build an intermediate set without worrying
about its enumerability, we might well end up with a non-c.e. one. However, we can
also answer yes to the problem of whether there is a c.e. set between computable
and complete, as proved in the next section.
Theorem 6.2.3. There is a c.e. set A such that A is noncomputable and incomplete.
6.2 Finite Injury Priority Arguments
Suppose we have an innite collection R
e
eN
of requirements to meet while con-
structing a set A. Weve seen this in the noncomputable set constructions of 5.3
and 5.4. However, suppose further that these requirements may interact with each
other, and to each others detriment. As an extremely simplied example, suppose
R
6
wants to put even numbers into A and R
8
wants there to be no even numbers
in A. Then if R
6
puts 2 into A, R
8
will take it back out, and R
6
will try again with
2 or some other even number, and again R
8
will take it back out. Well go round
and round in a vicious circle and neither requirement will end up satised (in fact
in this example, A may not even be well-dened).
In this example the requirements are actually set directly in opposition. At the
other end of the spectrum, we can have requirements that are completely indepen-
dent from each other and still have to worry about injury to a requirement. The
reason is that information is parceled out slowly, stage by stage, since were working
6.2. FINITE INJURY PRIORITY ARGUMENTS 83
with enumerations rather than full, pre-known characteristic functions. Our infor-
mation is at best not known to be correct and complete, and at worst is actually
incomplete, misleading, or outright wrong. Therefore we will make mistakes acting
on it. However, we cant wait to act because what were waiting for might never
happen, and not acting is almost certainly not correct either. For example, in the
simple set construction, there was no waiting until we determine whether a set is
nite or not. We cant ever know if weve seen all the elements of the set, so we
have to act as soon as we see a chance (a large-enough number). This mistake"
we make there is putting additional elements into the set that we didnt have to.
We eliminate the damage from that mistake by putting a lower bound on the size
of the elements we can enumerate. In this more complicated construction, we will
make mistakes that actually cause damage, but set up the construction in such a
way that the damage can be survived.
The key to getting the requirements to play nicely together is priority. We put
the requirements into a list and only allow each to injure requirements further down
the list. Then in our situation above, R
6
would be allowed to injure R
8
, but not
vice-versa.
The kind of priority arguments we will look at in this section are nite-injury
priority arguments. That means each requirement only breaks the ones below it a
nite number of times. We show every requirement can recover from nitely-much
injury, and so after the nite collection of requirements earlier in the list than R
e
have nished causing injury, R
e
can act to satisfy itself and remain satised forever.
[The proofs, therefore, are induction arguments.]
Lets work through a dierent version of the simple set construction. Recall the
denition.
Denition 6.2.1. A c.e. set A is simple if A is innite but contains no innite c.e.
subset.
Theorem 6.2.2. There is a simple set.
Proof. We will construct A to be simple via meeting the following two sets of re-
quirements:
R
e
: ([W
e
[ = ) (A W
e
,= ).
N
e
: (n > e)(n A).
The construction will guarantee that A is computably enumerable. It is clear, as
discussed in 5.4, that meeting all R
e
will guarantee A contains no innite c.e.
subsets.
To see that meeting all N
e
guarantees A is innite, consider a specic N
e
. If it is
met, there is some n > e in A. Now consider N
n
. If it is met, there is some n
> n
84 CHAPTER 6. TURING REDUCTION AND POSTS PROBLEM
in A; we may continue in this way. Thus satisfying all N
e
requires innitely many
elements in A.
We rst discuss meeting each requirement in isolation, starting with R
e
. If
W
e,s
A
s
= , but an element enters W
e
at stage s + 1, R
e
puts that element into
A. It is then permanently satised, as we do not take elements out of A. Each N
e
chooses a marker n
e
> e and prevents its enumeration into A.
The negative (N) requirements will prohibit some positive (R) requirements
from enumerating elements into A. Some R requirements will enumerate elements
into A that N requirements want to keep out of A. The priority ordering on these
requirements is as follows:
R
0
, N
0
, R
1
, N
1
, R
2
, N
2
, . . .
Recall that requirements earlier in the list have higher priority.
Now each R
e
requirement may enumerate anything from W
e
into A except for the
elements prohibited by N
0
, N
1
, . . . , N
e1
. Thus R
e
might injure N
e
for some e
> e,
by enumerating its chosen value into A. This will cause the negative requirements
from that point on to move their chosen n
e
values past the number R
e
enumerated
into A. We therefore refer to n
e,s
, the value of the marker n
e
at stage s.
One further denition will streamline the construction. We say a requirement
R
e
requires attention at stage s +1 if W
e,s
A
s
= (so R
e
is unsatised) and there
is some x W
e,s+1
such that x ,= n
k,s
for all k < e (there is a suitable witness that
we are able to use to satisfy R
e
at stage s + 1).
Construction:
Stage 0: A
0
= . Each N
e
chooses value n
e,0
= e + 1.
Stage s +1: If any R
e
, e s, requires attention, choose the least such e and the
least witness x for that e and let A
s+1
= A
s
x; if x is n
k,s
for any k e, let
n
k
,s+1
= n
k
+1,s
for all k
A
e
(n
e
)= 0 (meaning
A
e
and
B
agree on n
e
).
1
In that case N
e
puts the witness
into B. The dierence between this and 5.3 is that with an oracle, the computation
might not stay halted. That is, as A changes, the behavior of
A
e
may also change.
Therefore N
e
tries to preserve its computation by restricting enumeration into A:
it wants to keep any new elements u(A, e, n
e
) out of A (recall the use function
from the end of 6.1).
The priority ordering is as in Theorem 6.2.2:
R
0
, N
0
, R
1
, N
1
, R
2
, N
2
, . . .
Each R
e
must obey restraints set by N
k
for k < e, but may injure later N require-
ments by enumerating into A below the restraint of that N, after N has enumerated
its witness into B. In that case, N must choose a later witness and start over.
Again, we make a streamlining denition: R
e
requires attention at stage s if
W
e,s
A
s
= and there is some x W
e,s
such that x > 2e and x is also greater
than any restraints in place from N
k
for k < e. N
e
requires attention at stage s if it
has a dened witness n
e,s
, and
A
e
(n
e
)[s]= 0 but n
e,s
/ B
s
.
Construction:
A
0
= B
0
= .
Stage s: Set n
s,s
to be the least number not yet used in the construction (as
there have been only nitely-many stages and only nitely much happens in any
given stage, there will always be an innite tail of N to work with).
Ask if any R
e
or N
e
with e < s requires attention. If so, take the highest-priority
such and act to satisfy it:
If N
e
, put n
e,s
into B
s+1
and restrain A up to u(A
s
, e, n
e,s
).
If R
e
, put the least applicable x into A
s+1
. Cancel the witnesses (and re-
straints, if any were set) of requirements N
k
for e k s and give them new
witnesses n
k,s+1
, distinct unused large numbers, increasing in k.
If no requirement needs attention, do nothing.
In either case, any n
e,s+1
that was not specically dened is equal to n
e,s
; re-
straints hold until they are cancelled by injury.
End construction.
Now, the verication.
Lemma 1. Each R
e
requirement acts at most once.
Proof. Clear.
Lemma 2. For every e, n
e
= lim
s
n
e,s
is dened.
1
Why pick a specic n
e
instead of just looking for some dierence somewhere? Because it
streamlines the construction and doesnt make it any more dicult to verify.
6.2. FINITE INJURY PRIORITY ARGUMENTS 87
Proof. As before, this lemma follows entirely from Lemma 1: since every R
e
require-
ment acts at most once, the nite collection of them preceding N
e
in the priority list
will all be nished acting at some nite stage. After that stage, whatever witness
n
e
is in place is the permanent value.
Lemma 3. Every N
e
requirement is met. Moreover, each N
e
either has no restraint
from some stage s on, or it has a permanent nite restraint that is unchanged from
some stage s on.
Proof. Consider some xed N
e
. By Lemma 2, n
e
eventually reaches its nal value,
and by Lemma 1, all higher-priority positive requirements eventually stop acting.
Thus after some stage, N
e
will never be injured again and will have a single witness
n
e
to worry about. By induction, assume all higher-priority N
e
requirements have
stopped acting. There are two cases:
Case 1:
A
e
(n
e
) never converges to 0. That is, it either never converges, or it
converges to some value other than 0. In this case the correct action is to keep n
e
out of B, which N
e
does by default. N
e
is the only requirement that could put n
e
into B, since witnesses for dierent requirements are always distinct values, so n
e
will remain out of B. In this case N
e
never sets a restraint for the permanent value
of n
e
, and any restraints it set on behalf of earlier witnesses are cancelled. N
e
will
never act again.
Case 2:
A
e
(n
e
)= 0. Suppose the nal value of n
e
is assigned to N
e
at stage s
(so we know additionally that by stage s all higher-priority R
e
requirements have
stopped acting), and that at stage s
such that
A
e
(n
e
)[s
]= 0, N
e
will
be the highest-priority requirement needing attention and will set restraint on A up
to u(A
s
, e, n
e
) and enumerate n
e
into B. As the only positive requirements that
might still act are bound to obey that restraint, it will never be violated and thus
the computation
A
e
(n
e
)= 0 is permanent and unequal to
B
(n
e
). Likewise, the
restraint set is at its nal level, and N
e
will never act again.
Lemma 4. Every R
e
requirement is met.
Proof. Consider some xed R
e
. By Lemma 3, let s be a stage by which each
requirement N
k
, k < e, has set its permanent restraint r
k
or has had its restraint
canceled and never thereafter sets a new one. By Lemma 1, let s also be large
enough that all higher-priority R requirements that will ever act have already done
so. Let r = max2e, r
k
: k < e; note this is a nite value. If W
e
is nite, then R
e
is met automatically. If W
e
is innite, then it must contain a number x > r. If R
e
is not already satised by stage s, then at the rst stage thereafter in which such
an x enters W
e
, R
e
will be the highest-priority requirement needing attention and
will be able to enumerate x into A, making it permanently satised.
This completes the proof of the theorem.
88 CHAPTER 6. TURING REDUCTION AND POSTS PROBLEM
Notes on Approximation
So far, our approximations have all been enumerative processes: sets gain elements
one by one, or functions gradually give results for various input values. There
are other ways to get information about noncomputable sets; being c.e. is actually
quite strong. The weakest condition on a set computable from 0
is simply to be
computable from 0
, or
0
2
(see 7.2.5). For A to be
0
2
means there is a computable
function g(x, s) such that for each x, lim
s
g(x, s) =
A
(x), and the number of
times g changes its mind on each x is nite:
(x) ([s : g(x, s) ,= g(x, s + 1)[ < ) .
This is not a denition but a theorem, of course, and you can see its proof in 8.1.
In the context of a nite injury priority argument, we must be able to cope with
injury caused by additional elements we hadnt counted on as well as the removal of
elements we thought were in the set. Restraint on the set being constructed serves
to control both addition and removal of elements. We also no longer know only one
change will be made; we only know the changes to each elements status will be
nite, so eventually the approximation will be correct.
In between there are various families of approximability. For a given computable
function f(x), a set is f-c.e. if it has a
0
2
approximation g such that the number
of mind changes of g on x is bounded by f(x). If f is the identity, we call the set
id-c.e.
An approximation useful in the study of randomness (see 9.2) is left computable
enumerability. In a left-c.e. approximation, it is always okay to put elements in,
but only okay to take x out if you have put in something less than x. This is more
natural in the context of characteristic functions viewed as innite binary sequences.
If you think of the sequence given by
A
as an innite binary expansion of a number
between 0 and 1 then it is left-c.e. if we can approximate it so that the numerical
value of the approximation is always increasing.
Exercise 6.2.4. Using a nite injury priority construction, build a computable lin-
ear ordering L isomorphic to the natural numbers N such that the successor relation
of L is not computable. That is, take N and reorder it by
L
such that the ordering
is total and has a least element, and so that there is a total computable function
f such that f(x, y) = 1 if and only if x
L
y, but no total computable function g
such that g(x, y) = 1 if and only if y is the successor of x. The computability of the
ordering will be implicit in the construction: place s into the ordering at stage s.
For the remainder, satisfy the following requirements:
P
e
:
e
total (x, y)[
e
(x, y) = 1 but (z)(x
L
z
L
y)]
N
x
: there are only nitely-many elements L-below x
6.2. FINITE INJURY PRIORITY ARGUMENTS 89
Exercise 6.2.5. Using a nite injury priority argument, build a bi-immune
0
2
set
A. That is, A such that A and A both intersect every innite c.e. set. Meet the
following requirements:
P
e
: [W
e
[ = (n)(n W
e
A)
R
e
: [W
e
[ = (n)(n W
e
A)
P
e
puts things in, R
e
takes them out. Remember since A is merely
0
2
these things
can be the same numbers, provided they only seesaw nitely-many times apiece.
Chapter 7
Turing Degrees
7.1 Turing Degrees
Recall from Exercise 6.1.4 that Turing equivalence is an equivalence relation on
T(N), the power set of the natural numbers. As in 2.3, we may dene a quotient
structure.
Denition 7.1.1. (i) The Turing degrees (or degrees of unsolvability) are the
quotient of T(N) by Turing equivalence.
(ii) For a set A N, the degree of A is deg(A) = [A], the equivalence class of A
under Turing equivalence. This is often notated a, or a
on the chalkboard.
The Turing degrees are partially ordered by Turing reducibility, meaning
deg(A) deg(B) iff A
T
B. This is well-dened (i.e., not dependent on the
choice of degree representative A, B) by denition of Turing equivalence and the
fact that it is an equivalence relation.
Exercise 7.1.2. Prove the following.
(i) The least Turing degree is deg() (also denoted 0, ); it is the degree of all
computable sets.
(ii) Every pair of degrees deg(A), deg(B) has a least upper bound; moreover, that
l.u.b. is deg(A B) (A B is dened in Exercise 5.2.15).
(iii) For all sets A, deg(A) = deg(A).
Note that for part (ii) you must show not only that deg(AB) deg(A), deg(B),
but also that any degree deg(A), deg(B) is also deg(A B).
91
92 CHAPTER 7. TURING DEGREES
It is not the case that every pair of degrees has a greatest lower bound. The
least upper bound of a pair of sets is often called their join and the greatest lower
bound, should it exist, their meet.
Part (iii) of Exercise 7.1.2 explains the wording of the following denition.
Denition 7.1.3. A degree is called c.e. if it contains a c.e. set.
The maximum c.e. degree is deg K, the degree of the halting set, which follows
from Theorem 6.1.9.
Exercise 7.1.4. Prove that each Turing degree contains only countably many sets.
Corollary 7.1.5. There are uncountably many Turing degrees.
7.2 Relativization and the Turing Jump
The notion of relativization is one of xing some set A and always working with A
as your oracle: working relative to A. Then computability becomes computability
in A (being equal to
A
e
for some e, also called A-computability) and enumerability
become enumerability in A (being equal to W
A
e
:= dom(
A
e
) for some e). Some
examples follow.
Theorem 7.2.1 (Relativized S-m-n Theorem). For every m, n 1 there exists a
one-to-one computable function S
m
n
of m + 1 variables so that for all sets A N
and for all e, y
1
, . . . , y
m
N,
A
S
m
n
(e,y
1
,...,ym)
(z
1
, . . . , z
n
) =
A
e
(y
1
, . . . , y
m
, z
1
, . . . , z
n
).
Two important points: 1) this is a poor example of relativization, though it is
important for using relativization; this is because 2) S
m
n
is not just computable in A,
it is computable. The proof here is essentially the same as for the original version;
the only dierence is with oracle machines the exact enumeration of
e
is dierent.
Heres a better example:
Exercise 7.2.2. Prove that B
T
A if and only if B and B are both c.e. in A.
Heres a very important example of relativization.
Theorem 7.2.3. The set A
= e :
A
e
(e) is c.e. in A but not A-computable.
7.2. RELATIVIZATION AND THE TURING JUMP 93
The proof is essentially the same as for the original theorem. This is the halting
set relativized to A, otherwise known as the jump (or Turing jump) of A and read A-
prime or A-jump. The original halting set is often denoted
or 0
, and therefore
the Turing degree of the complete sets is denoted
or 0
.
The jump of a set is always strictly Turing-above the original set, and computes
it. Jump never inverts the order of Turing reducibility, though it may collapse it.
That is, the jump operator is not one-to-one.
Proposition 7.2.4. (i) If B
T
A, then B
T
A
.
(ii) There exist sets A, B such that B
T
A but B
T
A
.
One example of part (ii) is noncomputable low sets, sets A such that A
.
The degrees ,
, . . . ,
(n)
, . . . are, in some sense, the spine of the Turing de-
grees, in part because starting with the least degree and moving upward by iterating
the jump is the most natural and non-arbitrary way to nd a sequence of strictly
increasing Turing degrees.
Additionally, however, those degrees line up with the number of quantiers we
need to write a logical formula. The arithmetic hierarchy is a way of categorizing
relations according to how complicated the logical predicate representing them has
to be. Lets have some denitions.
Denition 7.2.5. (i) A set B is in
0
(equivalently,
0
) if it is computable.
(ii) A set B is in
1
if there is a computable relation R(x, y) such that
x B (y)(R(x, y)).
(iii) A set B is in
1
if there is a computable relation R(x, y) such that
x B (y)(R(x, y)).
(iv) For n 1, a set B is in
n
if there is a computable relation R(x, y
1
, . . . , y
n
)
such that x B (y
1
)(y
2
)(y
3
) . . . (Qy
n
)(R(x, y
1
, . . . , y
n
)), where the
quantiers alternate and hence Q = if n is odd, and Q = if n is even.
(v) Likewise, B is in
n
if there is a computable relation R(x, y
1
, . . . , y
n
) such that
x B (y
1
)(y
2
)(y
3
) . . . (Qy
n
)(R(x, y
1
, . . . , y
n
)), where the quantiers
alternate and hence Q = if n is odd, and Q = if n is even.
(vi) B is in
n
if it is in both
n
and
n
.
94 CHAPTER 7. TURING DEGREES
(vii) B is arithmetical if for some n, B is in
n
n
.
We often say B is
2
instead of B is in
2
. These denitions relativize to
A by allowing the relation to be A-computable instead of just computable, and in
that case we tack an A superscript onto the Greek letter:
A
n
,
A
n
,
A
n
.
I should note here that these are more correctly written as
0
n
and the like, with
oracles indicated as
0,A
n
. The superscript 0 indicates that all the quantiers have
domain N. If we put a 1 in the superscript, we would be allowing quantiers that
range over sets in addition to numbers, and would obtain the analytic hierarchy.
Thats outside the scope of this course.
Exercise 7.2.6. Prove the following basic results.
(i) If B is in
n
or
n
, then B is in
m
and
m
for all m > n.
(ii) B is in
n
if and only if B is in
n
.
(iii) B is computable if and only if it is in
1
(i.e.,
0
=
1
).
(iv) B is c.e. if and only it is in
1
.
(v) The union and intersection of two
n
sets (respectively,
n
,
n
sets) are
n
(
n
,
n
).
(vi) The complement of any
n
set is
n
.
That this actually is a hierarchy, and not a lot of names for the same collection
of sets, needs to be proven. Note that the
n
formulas with one free variable (the
formulas that dene subsets of N) are eectively countable, as in Exercise 3.4.4.
This gives us a universal
n
set S, analogous to the universal Turing machine and
itself
n
, such that e, x S if and only if the e
th
n
set contains x.
From S dene P := x : x, x S. P is also
n
, but it is not
n
. If it were, by
part (ii) of Exercise 7.2.6, P would be
n
. However, then P is the e
th
n
set for some
e. We have e P e, e S on the one hand, but e P e / P e, e / S,
for a contradiction.
The complement P, then, is
n
but not
n
.
Exercise 7.2.7. Prove that there is a
n+1
set that is neither
n
nor
n
. Hint:
use P and P as above, merge them, and use parts (i) and (v) of Exercise 7.2.6.
The strong connection to Turing degree continues as we move up the scale of
complexity.
Denition 7.2.8. A set A is
n
-complete if it is in
n
and for every B
n
there
is a total computable one-to-one function f such that x B f(x) A (we say
B is 1-reducible to A).
n
-completeness is dened analogously.
7.2. RELATIVIZATION AND THE TURING JUMP 95
Theorem 7.2.9.
(n)
is
n
-complete and
(n)
is
n
-complete for all n > 0.
The index sets we saw in 4.5 are all complete at some level of the hierarchy.
Fin is
2
-complete, Inf and Tot are
2
-complete, and Rec is
3
-complete.
In fact, the following also hold. Theorem 7.2.9 combined with the proposition
below is known as Posts Theorem.
Proposition 7.2.10. (i) B
n+1
B is c.e. in some
n
set B is c.e.
in some
n
set.
(ii) B
n+1
B is c.e. in
(n)
.
(iii) B
n+1
B
T
(n)
.
This strong tie between enumeration and existential quantiers should make
sense after all, youre waiting for some input to do what you care about. If it
happens, it will happen in nite time (the relation on the inside is computable),
but you dont know how many inputs youll have to check or how many steps youll
have to wait, just that if the relation holds at all, it holds for some value of the
parameter.
Theorem 7.2.9 and Proposition 7.2.10 both relativize. For Theorem 7.2.9 the
relativized version starts with for every n > 0 and every set A, A
(n)
is
A
n
-complete
(recall to relativize the arithmetical hierarchy we allow the central relation to be
A-computable rather than requiring it to be computable). The others relativize
similarly.
One more exercise, for practice.
Exercise 7.2.11. (i) Prove A is
2
if and only if there is a total computable
function g(x, s) with codomain 0, 1 such that
x A lim
s
g(x, s) = 1.
(ii) Prove A is
2
if and only if there is a total computable function g(x, s) with
codomain 0, 1 such that
x A lim
s
g(x, s) ,= 0.
Some general rules for working in the arithmetic hierarchy:
Like quantiers can be collapsed to a single quantier. For example,
(x
1
)(x
2
)(x
3
)(R(y, x
1
, x
2
, x
3
)) is still
1
. This follows from codability
of tuples.
It bears mentioning in particular that adding an existential quantier to the
beginning of a
n
formula keeps it
n
, and likewise for universal quantiers
and
n
formulas.
96 CHAPTER 7. TURING DEGREES
Bound quantiers, ones which only check parameter values on some initial
segment of N, do not increase the complexity of a formula. For example,
(x < 30)(y)(R(x, y)) is just
1
.
It is possible to turn a
n
statement into an innite collection of
n1
state-
ments by bounding the leading universal quantier with larger and larger
bounds. That is, turn (x)[
n1
] into the set (x < m)[
n1
] : m N.
The original
n
formula is true if and only if all
n1
formulas in the set are
true. This is useful because in general we have a much better idea of how to
cope with formulas (since they are c.e. over some iteration of the halting
problem) than formulas (which are co-c.e.; there is not as much machinery
developed for them).
Chapter 8
More Advanced Results
This chapter contains some miscellaneous results and ideas in computability theory
that come up with some frequency.
8.1 The Limit Lemma
Recall that a set A is low if A
,
because if
T
A, we always have
T
A
.
In the construction of a low set, we work by making sure that if the computation
A
e
(e)[s] converges innitely-many times (i.e., for innitely-many stages s), then
it converges. That is, it either eventually forever diverges or eventually forever
converges. We argue that if we can accomplish such a feat for all e, A is low. The
following general theorem proves rigorously why that works and is useful for more
than just constructing low sets.
Theorem 8.1.1 (Limit Lemma). For any function f, f
T
B
.
97
98 CHAPTER 8. MORE ADVANCED RESULTS
Since not all sets which are reducible to
(i.e.,
0
2
sets) are c.e., this is often
the most useful way to work with them. There is a way to use the approximating
function to distinguish between c.e. and general
0
2
sets, and in fact we need part
of it in order to prove the Limit Lemma, so lets consider it now.
Recall that the mu-operator returns the minimal example satisfying its formula:
(x)[x > 1 & (y)(y
2
= x)] would be 4.
Denition 8.1.2. Suppose g(x, s) converges to f(x). A modulus (of convergence)
for g is a function m(x) such that for all s m(x), g(x, s) = f(x). The least modulus
is the function m(x) = (s)(t s)[g(x, t) = f(x)].
Exercise 8.1.3. All notation is as in Denition 8.1.2.
(i) Prove that the least modulus is computable in any modulus.
(ii) Prove that B
T
g(x, s).
(iii) Prove that f is computable from g and any modulus m for g.
In general we cannot turn the reducibility of (iii) around, but for functions of c.e.
degree there will be some modulus computable from f (and hence the least modulus
will also be computable from f).
Theorem 8.1.4 (Modulus Lemma). If B is c.e. and f
T
B, then there is a
computable function g(x, s) such that lim
s
g(x, s) = f(x) for all x and a modulus m
for g which is computable from B.
Proof. Let B be c.e. and let f =
B
e
. Dene the functions
g(x, s) =
_
B
e
(x)[s] if
B
e
(x)[s]
0 otherwise
m(x) = (s)(z s)[
Bz
e
(x)[s] & B
s
z = B z].
Clearly g is computable; m is B-computable because the quantier on z is bounded
and hence does not increase complexity, the rst clause is computable, and the
second clause is clearly B-computable (it gives the desired property that m is a
modulus because B is c.e. and hence once the approximation B
s
matches B it will
never change to dier from B).
Proof of Theorem 8.1.1. (=) Suppose f
T
B
. We know B
is c.e. in B, so g(x, s)
exists and is B-recursive by the Modulus Lemma relativized to B.
(=) Suppose the B-computable function g(x, s) limits to f(x). Dene the
following nite sets:
B
x
= s : (t)[s t & g(x, t) ,= g(x, t + 1)].
8.2. THE ARSLANOV COMPLETENESS CRITERION 99
If we let C = s, x : s B
x
(also notated
x
B
x
), then C is
B
1
and hence
c.e. in B; therefore C
T
B
.
Properties of the modulus of the limiting function are what give us a character-
ization of the c.e. degrees.
Corollary 8.1.5. A function f has c.e. degree iff f is the limit of a computable
function g(x, s) which has a modulus m
T
f.
Exercise 8.1.6. Prove Corollary 8.1.5; for (=) apply the Modulus Lemma; for
(=) use C from the proof of the Limit Lemma.
8.2 The Arslanov Completeness Criterion
This is a result that can be viewed as the ip side of the Recursion Theorem, and is
presented mostly as a companion to the Recursion Theorem. Recall that a complete
c.e. set is one that has the same degree as the halting problem. We need an extension
of the Recursion Theorem, due to Kleene.
Theorem 8.2.1 (Recursion Theorem with Parameters). If f(x, y) is a computable
function, then there is a computable function n(y) such that
n(y)
=
f(n(y),y)
for all
y.
Proof. Dene a computable function d by
d(x,y)
(z) =
_
x(x,y)
(z) if
x
(x, y);
otherwise.
Choose v such that
v
(x, y) = f(d(x, y), y). Then n(y) = d(v, y) is a xed point,
since unpacking the denitions of n, d and v (and then repacking n) we see
n(y)
=
d(v,y)
=
v(v,y)
=
f(d(v,y),y)
=
f(n(y),y)
.
In fact we may replace the total function f(x, y) with a partial function (x, y)
and have total computable n such that whenever (n(y), y) is dened, n(y) is a
xed point. The proof is identical to the proof of the Recursion Theorem with
Parameters. Note that the parametrized version implies the original version by
considering functions which ignore their second input.
Theorem 8.2.2 (Arslanov Completeness Criterion, Arslanov 1977/1981). A c.e.
set A is complete if and only if there is a function f
T
A such that W
f(x)
,= W
x
for all x.
100 CHAPTER 8. MORE ADVANCED RESULTS
Proof. (=) We note without proof x : W
x
= is of complete c.e. degree and
hence Turing equivalent to whatever A we were given. Dene f by
W
f(x)
=
_
if W
x
,=
0 otherwise.
By the observation f
T
A, and it clearly satises the right-hand side of the theorem.
(=) Let A be c.e., and assume (x)[W
f(x)
,= W
x
] where f
T
A. By the
Modulus Lemma there is a computable function g(x, s) that limits to f and such
that g has a modulus m
T
f (and hence m
T
A). Let K denote the halting set,
and let (x) = (s)[x K
s
] if x K; (x) otherwise. By the Recursion Theorem
with Parameters dene the computable function h by
W
h(x)
=
_
W
g(h(x),(x))
if x K;
otherwise.
Now if x K and (x) m(h(x)), then g(h(x), (x)) = f(h(x)) and W
f(h(x))
= W
h(x)
contrary to assumption on f. Hence
1
for all x
x K x K
m(h(x))
so K
T
A.
Corollary 8.2.3. Given a c.e. degree a, a < 0
W
f(n)
(see 8.3). These are also called almost xed points. Weaker still
are Turing xed points, n such that W
n
T
W
f(n)
.
Just as a catalogue:
Any function f
T
is Turing-equivalent to
for N as follows:
(i) A
0
B if A = B,
1
This hence hides a long string of consequences of (x) < m(h(x)).
8.3. c MODULO FINITE DIFFERENCE 101
(ii) A
1
B if A =
B,
(iii) A
2
B if A
T
B,
(iv) A
n+2
B if A
(n)
T
B
(n)
for n N.
Now completeness at higher levels of complexity may be dened in terms of
computing a function that has no
-xed points.
Theorem 8.2.4 (Generalized Completeness Criterion). Fix N. Suppose
()
T
A and A is c.e. in
()
. Then
A
T
(+1)
(f
T
A)(x)[W
f(x)
,
W
x
].
8.3 E Modulo Finite Dierence
Recall from 6.1 that A =
B,
we often treat A and B as interchangeable, and say we are working modulo nite
dierence. The usefulness of working modulo nite dierence is that it gives you
wiggle room in constructions as long as eventually youre putting in exactly the
elements you want to be, it doesnt matter if you mess up a little at the beginning and
end up with a set which is not equal to what you want, but is -equal. Momentarily
Ill show you the main use (which is essentially the above but on a grander scale).
The structure of the (c.e.) sets modulo nite dierence has been the object of
much study. We usually use c to denote the c.e. sets, and c
. The letters A and 1 are used to denote the collection of all subsets
of N and of the computable sets, respectively. Unlike when we work with degrees,
the lattice-theoretic operations wed like to perform are dened everywhere (well,
at least more than with degrees).
Denition 8.3.1. c, 1, and A are all lattices; that is, they are partially ordered
sets where every pair of elements has a least upper bound (join) and a greatest lower
bound (meet). The ordering in each case is subset inclusion. The join of two sets
A and B is A B := A B; their meet is A B := A B. In each case these
operations distribute over each other, making all three distributive lattices. All three
lattices have least and greatest element, moreover (not required to be a lattice): the
least element in each case is and the greatest is N. A set A is complemented if
there is some B in the lattice such that A B is the greatest element and A B
is the least element; the lattice is called complemented if all of its elements are.
A complemented, distributive lattice with (distinct) least and greatest element is
called a Boolean algebra.
Exercise 8.3.2. (i) Show A and 1 are Boolean algebras but c is not.
102 CHAPTER 8. MORE ADVANCED RESULTS
(ii) Characterize the complemented elements of c.
(iii) How small may a Boolean algebra be?
Denition 8.3.3. A property P is denable in a language (i.e., a set of relations,
functions, and constants) if using only the symbols in the language and standard
logical symbols one may write a formula with one free variable such that an object
has property P if and only if when lled in for the free variable it makes the formula
true. Likewise we may dene n-ary relations (properties of sequences of n objects)
using formulas of n free variables.
For example, the least element of a lattice is denable in the language L =
(where we interpret as whatever ordering relation were actually using; here it
would be ) by the formula
(x)[y x].
The formula is true of y if and only if y is less than or equal to all elements of the
lattice, which is exactly the denition of least element.
Exercise 8.3.4. Let the language L = be xed.
(i) Show greatest element is denable in L.
(ii) Show meet and join are denable (via formulas with three free variables) in L.
Denition 8.3.5. An automorphism of a lattice / is a bijective function from /
to / which preserves the partial order.
Exercise 8.3.6. (i) Show that automorphisms preserve meets and joins.
(ii) Show that a permutation of N induces an automorphism of A.
(iii) What restrictions could we set on permutations of N to ensure they induce
automorphisms of 1? Of c?
Denition 8.3.7. Given a lattice /, a class X / is invariant (under automor-
phisms) if for any x / and automorphism f of /, f(x) X x X. X
is an orbit if it is invariant and transitive: that is, for any x, y X there is an
automorphism f of / such that f(x) = y.
Exercise 8.3.8. What sort of structure (relative to automorphisms) must an in-
variant class that is not an orbit have?
Denition 8.3.9. A property P of c.e. sets is lattice-theoretic (l.t.) in c if it is
invariant under all automorphisms of c. P is elementary lattice theoretic (e.l.t.)
if there is a formula of one free variable in the language / = , , , 0, 1 which
denes the class of sets with property P in c, where , 0, 1 are interpreted as , , N,
respectively.
8.3. c MODULO FINITE DIFFERENCE 103
Exercise 8.3.10. Show that a denable property P is preserved by automorphisms;
that is, that e.l.t. implies l.t.
The denition and exercise above still hold when we switch from c to c
. Heres
where the additional usefulness of working modulo nite dierence comes in. We
almost always are worried about properties which are preserved if only nitely many
elements of the set are changed; that is, properties which are closed under nite
dierence. One can show that the collection of all nite sets and the relation =
are
both denable in c, and from there it is straightforward to show that any property
P closed under nite dierences is e.l.t. in c if and only if it is e.l.t. in c
. To
show something is not e.l.t., one would likely show it is not l.t. by constructing
an automorphism under which it is not invariant. Automorphisms are easier to
construct in c
.
Chapter 9
Areas of Research
In this chapter I try to give you a taste of various areas of computability theory
in which research is currently active, with some of the questions currently under
investigation. Actually, only 9.1 discusses pure computability theory; the others
are independent areas that intersect signicantly with computability.
9.1 Lattice-Theoretic Properties
The Turing degrees are a partially ordered set under
T
, as we know, and we also
know any pair of degrees has a least upper bound and not every pair of degrees has
a greatest lower bound (a meet). What else can be said about the structure of this
poset?
Denition 9.1.1. T is the partial ordering of the Turing degrees, and T( a) is
the partial order of degrees less than or equal to a. 1 is the partial order of the c.e.
Turing degrees (so 1 T).
An automorphism of a poset is a bijection f from the poset to itself that preserves
the order relation; that is, if x y, f(x) f(y). It is nontrivial if it is not the
identity. The big open question here is:
Question 9.1.2. Is there a nontrivial automorphism of T? Of 1?
I should note that it is still open whether there is a natural intermediate degree.
That is, the decidability problems we stated all gave rise to sets which were complete,
and to get something noncomputable and incomplete we resorted to a nite-injury
priority argument. Is there an intermediate set that arises naturally from, say, a
decision problem? Researchers in the area have varying opinions on how important
that question is, given the many ways we have to construct intermediate degrees.
I think of classes of Turing degrees as being picked out by two kinds of denitions:
105
106 CHAPTER 9. AREAS OF RESEARCH
Computability-theoretic denitions are of the form A degree d is (name) if it
contains a set (with some computability-theoretic property).
Lattice-theoretic denitions are properties dened by predicates that use basic
logical symbols (&, , , , , ) plus the partial order relation. Their format
is somewhat less uniform, but could be summed up as A degree d is (name)
if (it sits above a certain sublattice/it sits below a certain sublattice/there is
another degree with a specied lattice relationship to it).
An example of a lattice-theoretic denition would be least upper bound and
greatest lower bound. Least upper bound (join) may be dened as follows.
z = x y (z x & z y & (w)[(w x & w y) w z]).
Greatest lower bound is dened similarly; the top and bottom element can also be
dened.
Heres a more complicated but still purely lattice-theoretic denition.
Denition 9.1.3. A degree a 1 is cuppable if there is some c.e. degree b <
such that a b =
f
f
f
f
f
t
t
t
t
t
a x
a x
c = (a x) b
b
Note that neither a nor x need be above c individually. The following question,
as far as I can tell, is still open.
Question 9.1.9. Is there a center in 1?
Denition 9.1.8 is an example of a denition made purely in the language of
lattice theory: we do not have to know where the poset 1 comes from to understand
it, it simply uses the partial order relation (meet and join are both denable in terms
of the partial order relation: exercise). However, solving it will likely take the form
of a priority argument, constructing a set A and at the same time constructing
an innite sequence of sets B
e
, C
e
so that for the c.e. set W
e
(using the standard
enumeration of c.e. sets to represent all possible c.e. degrees x) deg(B
e
) and deg(C
e
)
allow deg(A) to satisfy the denition of center with deg(W
e
).
Onward to embeddability. First let me introduce some simple lattices. All of
these have a least and greatest element.
The diamond lattice is a four-element lattice with two incomparable intermediate
elements. The pentagon, or N
5
, has ve elements; two of the intermediate elements
are comparable and the other is not. The 1-3-1, or M
3
, is a ve-element lattice
with three incomparable intermediate elements. Finally, S
8
is a diamond on top of
a 1-3-1, for eight total elements.
d
d
d
d
t
t
t
t
t
d
d
d
d
d
d
d
dt
t
t
t t
N
5
M
3
An important distinction between these lattices is distributivity. A lattice is
distributive if a (b c) = (a b) (a c); i.e., meet and join distribute over each
other. The diamond is distributive, but neither the pentagon nor the 1-3-1, and
9.1. LATTICE-THEORETIC PROPERTIES 109
hence S
8
, is distributive. In fact, the non-distributive lattices are exactly those that
contain the pentagon and/or the 1-3-1 as a sublattice.
Lets consider embeddings that preserve the least and/or greatest elements sep-
arately.
All nite distributive lattices embed into 1 preserving least element, as do
the pentagon and 1-3-1, but not S
8
.
Open: where does the embeddable/nonembeddable cuto lie?
The same results as above hold preserving greatest element.
Conjecture: a lattice embeds into 1 preserving greatest element if and only
if it embeds preserving least element.
We lose out when we try to embed preserving both least and greatest element.
The Lachlan non-diamond theorem says even the diamond does not embed
into 1 preserving both least and greatest element. This is what tells us a c.e.
degree cannot cup and cap with the same partner degree, because such a pair
would then form the center of a diamond with least element and greatest
element
.
We meet with success if the lattice L to be embedded can be decomposed into
two sublattices L
1
, L
2
such that all elements of L
1
are above all elements of
L
2
, L
1
can be embedded preserving greatest element, and L
2
can be embedded
preserving least element. In that case we can stitch together the embeddings
of the sublattices to get an embedding of L that preserves both least and
greatest element.
We can also consider embedding questions for intervals of 1, where the interval
[a, b] is c : a c b.
Now an open question about the Turing degrees as a whole. For a poset to be
locally countable means the set of predecessors of any one element x (that is, the set
of elements of the poset less than or equal to x) is countable.
Question 9.1.10 (Sacks). Suppose P is a locally countable partially ordered set of
cardinality less than or equal to that of T(N). Is P embeddable into T?
This question is essentially asking if a partially ordered set has no obvious block-
ades to embeddability, does it embed? The Turing degrees are locally countable
and of the same size as T(N), so anything that is not locally countable or is any
bigger clearly cannot be embedded.
We can also discuss the lattice of c.e. sets, ordered by . We call this c. The top
element is N and the bottom is . Every pair of sets has a dened meet and join,
given by intersection and union. The complemented sets are those c.e. sets whose
110 CHAPTER 9. AREAS OF RESEARCH
set-theoretic complement is also c.e.; in other words, the computable sets. Recall
that the set-theoretic dierence of two sets A and B is AB = x : x A & x / B.
Denition 9.1.11. Two sets A and B are equivalent modulo nite dierence, de-
noted A =
Z or Z =
N.
That is, anything between a maximal set and N is either essentially the maximal
set or essentially all of N. The complement of a maximal set is called cohesive.
Maximal sets do exist. They put an end to Posts program to nd a set which had
such a small complement, was so close to being all of N without being conite and
hence computable, that it would have to be incomplete. A maximal set has the
smallest possible complement from a c.e. set point of view, but not all maximal sets
are incomplete.
One family of theorems particularly suited to c are splitting theorems. A splitting
of a c.e. set B is a pair of disjoint c.e. sets that union to B. Two useful splitting
theorems follow.
Theorem 9.1.13 (Friedberg Splitting). If B is noncomputable and c.e. there is a
splitting of B into c.e., noncomputable A
0
, A
1
such that if W is c.e. and W B
is non-c.e., then W A
i
is also non-c.e. for i = 0, 1 (this implies the A
i
are
noncomputable by setting W = N).
Note that if W meets the hypothesis of the implication, it must have not only
an innite intersection with B, but with each of A
0
and A
1
. If W A
0
were nite,
say, then WA
0
=
for 2
<N
. I will use for the empty string; it is also often called
. 1
n
is the string of n 1s and likewise for 0, and if and are strings, (or
and = 1
e
0
with
M will halt on a string which is comparable to one on which M previously halted. Through
stage s
dene P to diverge on
all remaining strings (including the one which witnessed that M was not prex-free). If M is
prex-free, P will mimic it exactly (in terms of halting behavior, input, and output).
9.2. RANDOMNESS 115
Here is an interesting theorem I stumbled across while researching nite injury
priority arguments for a seminar talk. I include it because its sort of magic and
cool. Recall that a simple set is a c.e. set A such that A is innite but contains no
innite c.e. subsets.
Theorem 9.2.4 (Kolmogorov [30, 31]). The set of nonrandom numbers is simple.
Proof. The set we need to prove simplicity for is A = x : K(x) < [x[. It is
c.e. because x A if and only if (e < x)(U(e) = x)
2
, or to make it clearer
(s)(e < x)(U
s
(e) = x). This is a computable predicate preceded by a single
existential quantier, so it is
0
1
and hence corresponds to a c.e. set. We know the
set of random numbers is innite, so [A[ is innite.
We now show that every innite c.e. set W
e
contains a nonrandom element.
There is a uniform description of the elements of W
e
: x
e,n
is the n
th
element in the
enumeration of W
e
. Therefore by an application of the S-m-n Theorem, there is a
one-to-one computable function h such that U(h(e, n)) = x
e,n
. We use this to show
every innite c.e. set has an innite subset such that h is a description of x
n
, shorter
than x
n
, for some n. That element x
n
will be nonrandom and in the original set, so
the original set has a nonrandom element.
To that end, set t(n) = max
en
h(e, n). The important point is that for any
index e, t(n) will take h(e, n) into account on all but nitely-many values of n.
Given a c.e. set X, enumerate a subset Y so that the n
th
element of Y , y
n
,
is greater than t(n). This is possible, and results in an innite set Y , when X is
innite because t(n) is some xed value for each n, so X will contain innitely many
elements greater than it. Y will be c.e. because of that, and because t is computable.
Since Y is c.e., it is W
e
for some e. However, by the choice of y
n
> t(n) and
the fact that for almost all n, t(n) h(e, n), we know there is some n such that
x
e,n
= y
n
> t(n) h(e, n). For that n, h(e, n) gives a short description of x
e,n
, so
x
e,n
is a nonrandom element of W
e
= Y and hence of X.
This uses Berrys paradox (Russell 1906): Given n, consider the least number
indescribable by < n characters. This gives a description of length c +[n[ for xed
c, which is paradoxical whenever c +[n[ n (i.e., for almost every n).
The size of K and Krafts inequality
To decide what it means to be incompressible, we needed to know something about
the size of K. What upper bound can we assert about it, in terms of the length of
2
Were fudging a little here since the complexity will be determined by the length of such e
and the length of two nonequal numbers may be equal, but its not too important and streamlines
the argument substantially.
116 CHAPTER 9. AREAS OF RESEARCH
x? The following is part of a larger, more technical theorem, which I have trimmed
in half.
Theorem 9.2.5 (Chaitin, [5]). For every x of length n,
K(x) n + K(n) +O(1). (9.2.1)
Proof. To obtain (9.2.1), consider a prex-free Turing machine T which computes
T(qx) = x for any q such that the universal prex-free machine U gives U(q) = [x[.
Since T is prex-free it has an index m in the enumeration of all prex-free machines,
and hence U(1
|m|
0mqx) = x. That description has length 2[m[ + [q[ + [x[, or (if q
is as short as possible) [x[ +K([x[) + 2[m[, where m does not depend on x, and its
length certainly bounds the size of K(x).
This upper bound leads to recursive further bounds like
K(x) n +[n[ +[[n[[ +[[[n[[[ + . . .
Why do we not use this upper bound in our denition of randomness? Because
there is no innite string X such that for all n, K(X n) n + K(n) O(1).
We could kludge by saying for innitely-many n instead of for all, but thats
unsatisfying. And, of course, the denition we gave for randomness is the one that
lines up with Martin-Lf tests and martingales.
A (very rough) lower bound comes from the Kraft Inequality, a very useful tool
in randomness. In a prex-free set there are a lot of binary strings missing. Thus we
would expect the length of these strings to grow rapidly. They do, as the theorem
below shows. The proof is included because its not so dicult.
Theorem 9.2.6 (Kraft Inequality, [34]). Let
1
,
2
, . . . be a nite or innite sequence
of natural numbers. There is a prex-free set of binary strings with this sequence as
its elements lengths if and only if
n
2
n
1.
Proof. First suppose there is a prex-free set of nite binary strings x
1
with lengths
i
. Consider
_
_
i
[x
i
]
_
as a subset of 2
N
. Certainly this is bounded by 1, and since the set is prex-free the
intervals are disjoint. Hence the measure of their union is the sum of their measures,
and the inequality holds.
Now suppose there is a set of values
1
,
2
, . . . such that the inequality holds.
Because we are not working eectively, we may assume the set is nondecreasing. To
9.2. RANDOMNESS 117
nd a prex-free set of binary strings which have those values as their lengths, start
carving up the complete binary tree, taking the leftmost string of length
i
which
is incomparable to the previously-chosen strings. [For example, if our sequence of
values began 3, 4, 7, we would choose 000, 0010, 0011000.] Every binary string of
length corresponds to an interval of size exactly 2
i
1. Then from the sequence
i
we can eectively compute a prex-free set A with members
i
of length
i
.
Proof. The organization of this proof is as written in Downey and Hirschfeldt [14],
where they say it was suggested by Joe Miller.
Assume that we have selected strings
i
, i n, such that [
i
[ =
i
. Suppose
also that we have a string x[n] = .x
1
x
2
. . . x
m
= 1
jn
2
j
, and that for every
k m such that x
k
= 1, there is a string
k
2
<N
of length k incomparable to all
j
, j n and all
j
, j < k and x
j
= 1.
Notice that since x[n] is the measure of the unchosen portion of 2
<N
, the fact
that there are strings of lengths corresponding to the positions of 1s in x[n] means
the remaining measure is concentrated into intervals of size at least as large as 2
n+1
for any
n+1
which would allow satisfaction of the Kraft Inequality. Note also that
the
k
are unique and among them they cover the unchosen measure of 2
<N
.
Now we select a string to correspond to
n+1
. If x
n+1
= 1, let
n+1
=
n+1
and
let x[n + 1] be x[n] but with x
n+1
= 0; all
k
for k ,=
n+1
remain the same. If
x
n+1
= 0, nd the largest j <
n+1
such that x
j
= 1 and the leftmost string of
length
n+1
extending
j
, and let
n+1
= . Let x[n + 1] = x[n] .0
n+1
1
1. As a
result, in x[n + 1], x
j
= 0, all of the x
k
for j < k
n+1
are 1, and the remaining
118 CHAPTER 9. AREAS OF RESEARCH
places of x[n + 1] are the same as in x[n]. Since was chosen to be leftmost in the
cone
j
, there will be strings of lengths j+1, . . . ,
n+1
to be assigned as
j+1
, . . . ,
n+1
(namely,
j+i
=
j
0
i1
1), as required to continue the induction.
One way to think of this is as a way to build prex-free machines by enumerating
a list of pairs of lengths and strings, with the intention that the string is described
by an input of the specied length.
Theorem 9.2.8 (Kraft-Chaitin, restated). Suppose we are eectively given a set of
pairs n
k
,
k
kN
such that
k
2
n
k
1. Then we can recursively build a prex-free
machine M and a collection of strings (descriptions)
k
such that [
k
[ = n
k
and
M(
k
) =
k
.
Kraft-Chaitin allows us to implicitly build machines by computably enumerating
axioms n
k
,
k
and arguing that the set n
k
kN
satises the Kraft Inequality. The
machine houses the construction, from which the axioms are enumerated, and on
input enumerates them while performing Kraft-Chaitin until such a time as is
chosen to be an element of the prex-free set, corresponding to some n
k
,
k
. At
that point (if it ever comes), the machine halts and outputs
k
.
So why prex-free?
We could dene the complexity of x as the minimum length input that produces
x when given to the standard universal Turing machine, rather than the universal
prex-free Turing machine. That is studied; we call it the plain Kolmogorov com-
plexity of x and denote it C(x). However, as a standard for complexity it has some
problems.
To say an innite string is random if and only if all its initial segments are
random sounds right, but without the restriction to prex-free machines it is an
empty denition: no such string exists. In fact, this is the reason there is no innite
string X such that for all n, K(X n) n + K(n) O(1). The proof is from
Martin-Lf [43] and may be found in Li and Vitnyi [40], 2.5.
Even at the level of nite strings plain Kolmogorov complexity has some unde-
sirable properties. The rst is non-subadditivity. That is, for any constant c you
like there are x and y such that the plain complexity of the coded pair x, y is more
than C(x) +C(y) +c. K, on the other hand, is subadditive, because with K we can
concatenate the descriptions p and q of x and y, respectively, and the machine will
be able to tell them apart: the machine can read until it halts, assume that is the
end of p, and then read again until it halts to obtain q. Some constant-size code to
specify that action and how to encode the x and y that result, and we have x, y.
The second undesired property of C is nonmonotonicity on prexes: the com-
plexity of a substring may be greater than the complexity of the whole string.
For example, a power of 2 has very low complexity, so that if n = 2
k
then
9.2. RANDOMNESS 119
C(1
n
) log log n + O(1) (i.e., a description of k, which is no more than log k in
size, plus some machinery to print 1s). However, once k is big enough, there will
be numbers smaller than n which have much higher complexity because they have
no nice concise description in terms of, say, powers of smaller numbers. For such a
number m, the plain complexity of 1
m
would be higher than that of 1
n
even though
1
m
is a proper initial segment of 1
n
.
What is the underlying problem? The C(x) measure contains information about
the length of x (that is, n) as well as the pattern of bits. For most n, about log n
of the bits of the shortest description of x will be used to determine n. What that
means is that for simple strings of the same length n, where by simple I mean
each having plain Kolmogorov complexity less than log n, any distinction between
the information content of the two strings will be lost to the domination of the
complexity of n.
Another way of looking at it is that C allows you to compress a binary sequence
using a ternary alphabet: 0, 1, and end of string. Thats not a fair measure of
compressibility, and as stated above, it leads to some technical problems as well as
philosophical ones.
The main practical argument for K over C, though, is that K gives the denition
that lines up with the characterizations of randomness in terms of Martin-Lf tests
and martingales.
Relative Randomness
Next we would like to be able to compare sets to each other, in a ner-grained
way than saying both, one, or neither is n-random for some n. For example, the
bit-ip of a random sequence is random, but if we are given the original sequence
as an oracle, its bit-ip can be produced by a constant-size program. Therefore no
sequences bit-ip is random relative to the original sequence.
The generalization is very simple: add an oracle to the prex-free Turing ma-
chines.
Denition 9.2.9. (i) The prex-free Kolmogorov complexity of x relative to A is
K
A
(x) = min[p[ : U
A
(p) = x.
(ii) A set or sequence B is A-random (or 1-A-random) if
(n)[K
A
(B n) n O(1)].
It should be clear that if B is nonrandom, it is also non-A-random for every A.
Adding an oracle can never increase the randomness of another string; it can only
derandomize.
Two extremely useful theorems about relative randomness rely on the join of
sequences. Weve seen this denition before, but to remind you:
120 CHAPTER 9. AREAS OF RESEARCH
Denition 9.2.10. The join of two sets (or sequences) A and B is their disjoint
union
A B = 2n : n A 2n + 1 : n B.
Its Turing degree is the least upper bound of the degrees of A and B, so A B
corresponds to join in the lattice of Turing degrees.
Theorem 9.2.11 (van Lambalgen [60]). If A B is 1-random, then B is 1-A-
random (and hence 1-random).
Note that likewise A will be 1-B-random in the theorem above. There is a
converse to this, though it is slightly stronger: instead of needing A to be 1-B-
random and B to be 1-A-random, we only need one of those conditions plus 1-
randomness for the other set.
Theorem 9.2.12 (van Lambalgen [61]). If A is 1-random and B is 1-A-random,
then A B is 1-random.
The two theorems together show that if A and B are 1-random and one is 1-
random relative to the other, the other is 1-random relative to the rst. In fact,
they are only special cases of a much more general pair of theorems that we dont
have the vocabulary to state.
Lowness and K-Triviality
The idea of lowness for randomness comes from several perspectives. First, there is
the observation that taking 1-randomness relative to A may only decrease the set of
random reals. That is, if RAND is the set of all 1-random reals and RAND
A
is the
set of all A-random reals, then for any A, RAND
A
RAND. The question is then
for which A equality holds; certainly for any computable A it does, but are there
others? Hence we have the following denition, a priori perhaps an empty one.
Denition 9.2.13. A set A is low for random if it is noncomputable and
RAND
A
= RAND; that is, any real which is 1-random is still random relative
to A.
The term low is by analogy with ordinary computability theory, where A is
low if the halting problem relativized to A has the same Turing degree as the non-
relativized halting problem.
We think of a low set as being nearly computable. It clearly cannot itself be
random, else in derandomizing itself and its innite subsequences it would change
the set of randoms. Therefore, the existence of low for random sets gives a middle
ground between computable and random.
9.2. RANDOMNESS 121
Theorem 9.2.14 (Kuera and Terwijn [36]). There exists a noncomputable A such
that RAND
A
= RAND.
There is a dierent aspect of lowness we could consider; this approaches the
middle ground between computability and randomness in a dierent way, directly
tackling the question of initial segment complexity.
Denition 9.2.15. A real is K-trivial if the prex-free complexity of its
length-n initial segments is bounded by the complexity of n; that is, for all n,
K( n) K(n) +O(1).
The question is, again, whether there are any noncomputable K-trivial reals.
Certainly all computable reals are such that K( n) K(n) + O(1); the O(1)
term holds the function which generates the initial segments of , and then getting
an initial segment is as simple as specifying the length you want.
Theorem 9.2.16 (Zambella [66], after Solovay [57]). There is a noncomputable c.e.
set A such that K(A n) K(n) +O(1).
The truly remarkable thing is that these are the same class of reals: a real is low
for random if and only if it is K-trivial. The proof is really, really hard, involving
work by Gcs [21], Hirschfeldt, Nies, and Stephan [26], Kuera [35], and Nies [47].
There are some theorems we wont prove here about the degree properties of
K-trivials; for proofs see Downey and Hirschfeldt [14] or the paper cited. Recall
that a set A
T
is high if A
.
Theorem 9.2.17 (Chaitin [5]). If A is K-trivial then it is
0
2
; that is, A
T
.
Theorem 9.2.18 (Downey, Hirschfeldt, Nies, Stephan [16]). If A is K-trivial, then
A is Turing incomplete, and in fact not even high.
Theorem 9.2.19 ([16]). If reals and are K-trivial, then so is their sum +.
Note that in the above we mean simply the arithmetic sum, not the join. The
next theorem shows that the K-trivials hang together in a strong sense. An ideal of
the Turing degrees is a collection of degrees which is closed downward under Turing
reducibility and upward under join. Therefore the proof must show both that if
T
and is K-trivial, is K-trivial, and that if and are K-trivial,
is K-trivial.
Theorem 9.2.20 (Nies). The K-trivial reals form a
0
3
-denable ideal in the Turing
degrees.
This is touted as the only natural example of a nontrivial ideal in the Turing
degrees.
122 CHAPTER 9. AREAS OF RESEARCH
9.3 Some Model Theory
Both computable model theory (9.4) and reverse mathematics (9.5) use some
model theory, which is an entire other area of mathematical logic. Propositional
and predicate logic, which undergraduates are often exposed to, tend to include
elements of what would be categorized as model theory.
Suppose we have a collection of relation symbols, such as (=, <); call it a language
/. In this example, the relations are both binary. A structure for that language (or
/-structure) is a collection of elements, called the universe, along with an interpre-
tation for each relation. For example, N with the usual equality and ordering is a
structure for the language (=, <); we would denote it (N, =
N
, <
N
). There are many
possible structures for any given language, even after you take the quotient of the
collection of structures by the equivalence relation of isomorphism.
To get to a model, we add axioms. Generally we call the collection of logical
sentences (recall sentences are formulas with no free variables, so that they have
a truth value) were treating as axioms a theory. These sentences may use all the
standard logical symbols (and, not, quantiers, etc) as well as variables and any
symbol from the language. A structure for the language is a model of the theory
if, interpreting the language as the structure species and letting the domain of
quantication be the universe of the structure, all the sentences in the theory are
true. This may greatly restrict the number of structures we can have; in fact there
are theories for which there is only one model with a countable universe, up to
isomorphism (such theories are called countably categorical ).
Properly speaking, languages can have not only relation symbols, but also func-
tion symbols and symbols for distinguished constant values; the arity of the re-
lations and functions must be specied. An isomorphism between /-structures
/ = (A, c
A
, f
A
, R
A
) and B = (B, c
B
, f
B
, R
B
) is a bijection F : A B such that
F(c
A
) = c
B
, and for tuples a of the appropriate arity, if F(a
i
) = b
i
and F(k) = ,
then f
A
(a) = k f
B
(b) = and R
A
(a) R
B
(b). For languages with more than
one constant, function, or relation, this denition extends as expected. When the
isomorphism is between a structure and itself, it is called an automorphism.
Let us consider the example / = (0, 1, =, <, +, ), where 0 and 1 are constant
symbols, = and < are binary relations, and + and are binary functions. We can
create many structures for /; lets look at a few which have countable universes.
(i) N, with the usual meanings for all these symbols;
(ii) Q, with the usual meanings for all these symbols;
(iii) N, with the usual meanings for everything except = and <; = interpreted as
equality modulo 12 (so 12 = 0), and n < m true if n (mod 12) < m (mod 12)
in the usual ordering (so 12 < 1).
9.3. SOME MODEL THEORY 123
(iv) N, with 0 and 1 interpreted as usual, = interpreted as nonequality, < inter-
preted as >, and + and interpreted as and exponentiation, respectively.
The point of (iv) is to show we do not have to abide by the conventional uses
of the symbols.
3
In fact we could have gone further aeld and decided that, say,
= would be the relation that holds of (n, m) exactly when n is even, and < the
relation that holds when n + m is a multiple of 42.
Let us consider some axioms on /.
(I) (0 = 1)
(II) x, y(x < y + 1 (x < y x = y))
(III) x((x < 0))
(IV) x, yz(((y = 0) & (x = 0)) x z = y)
Axiom I is true in structures (i), (ii), and (iii), but not (iv): 0 and 1 are nonequal,
but in (iv) that is exactly the interpreted meaning of the symbol =. Axiom II is true
in structures (i) and (iii). It is false in (ii), as shown by x = 2 and y = 1.5. Axiom
II is also true in structure (iv), where in conventional terms it says if x > y 1, then
x > y or x ,= y.
Axiom III is clearly true in structures (i) and (iii) and false in (ii) and (iv) (where
in the latter it asserts no number is positive). Axiom IV asserts (in structures (i)
(iii)) that any nonzero number is divisible by any other nonzero number. It is false
in structure (i) and true in (ii); it is false in (iii) but it takes maybe a bit more
thought to see it. An example of axiom IVs failure in structure (iii) is x = 2 and
y = 3: no multiple of x will be odd, but all members of ys equivalence class modulo
12 are odd. Axiom IV also fails in structure (iv), where it says in conventional
terms that if x and y are both zero, there is a power to which one can raise x to get
something not equal to y.
We could say structure (i) is a model for the theory containing axioms I, II, and
III. If we call the model / and the theory T , we notate this as / [= T . However,
structure (i) is not a model for the last axiom; call it : / ,[= . Note that for all
sentences and models / over the same language, either / [= or / [= .
However, may be independent of the theory T , where T , and T , .
This discussion has implied we can go from structures to theories, and indeed we
can. Given a theory T we can talk about models of that theory, / [= T , but given a
structure /, we can speak of the theory of that structure, Th(/) := : / [= .
Since a theory is just a collection of sentences, it has consequences under log-
ical deduction.
4
If is a logical consequence of the sentences in T , we denote
3
However, it is common, perhaps even standard, to make = a special symbol that may only
be interpreted as genuine equality.
4
Some authors require theories to be closed under logical deduction.
124 CHAPTER 9. AREAS OF RESEARCH
that by T . If for every /-sentence , either T or T , T is called
complete. For any structure /, Th(/) is complete. I should note here that we
assume our theories are consistent (i.e., they do not prove any contradictions); oth-
erwise the deductive closure of the theory is literally all sentences in the language.
Gdels completeness theorem says that T if and only if for all structures /,
/ [= T / [= . What this means is that if you want to show follows from a
T , you must give a logical deduction, but if you want to show T , (which is not
the same as T ), you need only construct a model of T in which is false.
Thus far we have only spoken of rst-order theories, ones where only one kind of
object is quantied over. In practice we might want to quantify over both elements
and sets of elements, which puts us in the realm of second-order logic. The discussion
above carries over to second-order logic, but models now consist of a universe M, a
subset of T(M), and interpretations of all the language symbols. A prime example is
N together with T(N), but in computability theory we often can restrict the subsets
included and still get a model of our theory. More on this in forthcoming sections.
9.4 Computable Model Theory
The area of computable model theory applies our questions of computability or
levels of noncomputability to structures of model theory. The source for this section
is Ash and Knights Computable Structures and the Hyperarithmetical Hierarchy [3];
Volume 1 of the Handbook of Recursive Mathematics [20] and a survey article by
Harizanov [24] are also good references. Note that in this section, all structures will
be countable; in fact one frequently assumes every structure has universe N, and we
will follow this convention.
The degree of a countable structure is the least upper bound of the degrees of
the functions and relations of the language as interpreted in that structure. We can
ask many questions:
What degrees are possible for models of a theory?
What degrees are possible for models of a theory within a particular isomor-
phism type (equivalence class under isomorphism)?
Given a degree d, can we construct a theory with no models of degree d?
What happens if we restrict to computable models and isomorphisms? Do we
get more or fewer isomorphism types, for example?
One important theory in logic is Peano arithmetic (PA), a theory in the language
(S, +, , 0), where S is a unary function, + and are binary functions, and 0 is a
constant. The axioms of PA abstract the essence of grade-school arithmetic. That
9.4. COMPUTABLE MODEL THEORY 125
addition and multiplication act as we expect follows from the axioms, which more
explicitly fall into the following groups:
0 is zero: (x)(S(x) ,= 0); (x)(x + 0 = x); (x)(x 0 = 0).
S means plus one: (x)(y)(S(x) = S(y) x = y); (x)(x+S(y) = S(x+y));
(x)(x S(y) = x y + x).
Induction works: for every formula (u, v) in the language of PA, we have the
axiom
(u) ((u, 0) & (y) [(u, y) (u, S(y))] (x)(u, x)) .
The standard model of PA is denoted A, the natural numbers with successor and
the usual addition, multiplication, and zero.
In fact a standard model of PA is any model where the universe is generated
by closing 0 under successor, and those elements are also called standard. Nothing
forbids having elements in the universe that are not obtained in that way. Those
elements and the models that contain them are called nonstandard; nonstandard
models also contain standard elements.
The induction axiom in PA leads directly to the following useful tool, since the
antecedent need refer only to the successors of zero, but the consequent has an
unrestricted x.
Proposition 9.4.1 (Overspill). If / [= PA is nonstandard and (x) is a formula
that holds for all nite elements of M, then (x) also holds of some innite element.
Theorem 9.4.2 (Tennenbaum [58]). If / [= PA is nonstandard, it is not com-
putable.
Proof. Let X and Y be computably inseparable c.e. sets (see Exercise 5.2.17). There
are natural formulas that mean x X
s
, y Y
s
, as well as p
n
[u (the n
th
prime divides
u). Let (x, u) say
y([(s x (y X
s
)) p
y
[u] & [(s x (y Y
s
)) p
y
,[u]).
For all nite c, / [= u(c, u), because the product of all primes corresponding to
elements of X
c
is such a u.
By Overspill, Proposition 9.4.1, there is an innite c
, u).
For d such that / [= (c
, d), let Z = m N : / [= p
m
[d. Z is a separator for X
and Y , and Z is computable from /. Since X and Y are computably inseparable,
Z and hence / are noncomputable.
For the following theorem we need a denition.
126 CHAPTER 9. AREAS OF RESEARCH
Denition 9.4.3. A trivial structure is one such that there is a nite set of el-
ements a such that any permutation of the universe that xes a pointwise is an
automorphism.
For example, 0, 1, 2, . . . with nitely-many named elements and unary relations
that are all either empty or the entire universe. Any permutation of 0, 1, 2, . . .
that preserves the named elements will also preserve the relations, and hence be an
automorphism.
Theorem 9.4.4 (Solovay, Marker, Knight [33]). Suppose / is a nontrivial structure.
If / X, there exists a structure B isomorphic to / via F such that B
T
X, and
in fact F /
T
X.
The proof uses F to code X into B. In particular, if / is a linear order, we may
enumerate the universes A and B as a
0
, a
1
, . . . and b
0
, b
1
, . . .. F maps from A
to B so that if a
2n
<
A
a
2n+1
, then F(a
2n
) = b
2n
and F(a
2n+1
) = b
2n+1
if and only
if n X. Otherwise F swaps the order; if a
2n+1
<
A
a
2n
the opposite happens, so
b
2n+1
is on top iff n X. Then B interprets < in the necessary way to make F an
isomorphism from / to B.
Corollary 9.4.5. Peano arithmetic has standard models in all Turing degrees.
This follows from the fact that the standard model A is computable.
Linear orders are a useful example for a number of our questions.
Theorem 9.4.6 (Miller [46]). There is a linear order / that has no computable
copy, but such that for all noncomputable X
T
0
.
130 CHAPTER 9. AREAS OF RESEARCH
some 0 < x < 1, f(x) = 0. It can prove paracompactness: given an open cover
U
n
: n N of a set X, there is an open cover V
n
: n N of X such that for
ever x X there is an open set W containing x such that W V
n
= for all but
nitely-many n. As a nal example, RCA
0
can prove that every countable eld has
an algebraic closure but not that it has a unique algebraic closure. More on that
momentarily.
The canonical model of RCA
0
is called REC. Its universe is N, and its collection
of sets is exactly the computable sets, meaning a formula that begins X is read
for all computable sets X. In fact, REC is the smallest model of RCA
0
possible
with universe N (we call it the minimal -model ).
An aside on models and parameters here: All systems of reverse math are rel-
ative, in the sense that the induction and comprehension formulas are allowed to
use parameters from the model. It is tempting to think of every model of RCA
0
as
simply REC, but that would be a harmful restriction for proving results. We can
have noncomputable sets in a model of RCA
0
; if we have such a set A the axioms
provide comprehension for sets that are
0
1
in A (that is,
0,A
1
) and induction for
0,A
1
formulas. Moreover, the universe of the model need not be N. Every subsys-
tem of second-order arithmetic has innitely-many nonstandard models. We will
not address them here, except to say the universes of such models start with N but
include elements that are larger than every element of N. Those innite elements
can act in ways counter to the intuition we have developed from N.
Any theorem that might require a noncomputable set or function pops us out
of RCA
0
. For example, in reality every elds algebraic closure is unique (up to
isomorphism), so the fact that RCA
0
cant prove the uniqueness tells us that even for
two computable algebraic closures the isomorphism between them might necessarily
be noncomputable.
Another example of something RCA
0
cannot prove which comes directly from
comprehension, and REC is weak Knigs lemma. This says that if you have a
subtree of 2
<N
, and there are innitely-many nodes in the tree, then there must
be an innite path through the tree (full Knigs lemma allows arbitrary nite
branching rather than restricting to the children 0 and 1).
8
The proof is quite easy
if you allow yourself noncomputable techniques: start at the root. Since there are
innitely-many nodes above the root, there must be innitely-many nodes above at
least one of the children of the root. Choose the left child if it has innitely-many
nodes above it, and otherwise choose the right. Repeat (walk upward until you hit
a branching node if you are at a node with only one child). Since each time you
have innitely-many nodes above you, you never have to stop, so you trace out an
innite path.
Such a tree is computable if the set of its nodes is computable. There exist
computable innite trees with no computable innite paths, so these trees are in
8
Je Hirst states this lemma as big skinny trees are tall.
9.5. REVERSE MATHEMATICS 131
REC but none of their innite paths are. Hence RCA
0
cannot prove they have
innite paths at all. Ill note in passing that although the failure of weak Knigs
lemma in the specic model REC is sucient to show it does not follow from RCA
0
,
the result that not every computable tree has a computable path relativizes to say
that for any set A, not every A-computable tree has an A-computable path.
There are, of course, many other theorems RCA
0
cannot prove, but we will
discuss those in the subsequent sections.
Weak Knigs Lemma
WKL
0
, or weak Knigs lemma, does not t the same mold as RCA
0
, ACA
0
, or
1
1
-CA
0
(described below). In this system comprehension has been restricted, but
not in a way that uniformly addresses the complexity of in the basic comprehension
axiom scheme. It is more easily stated in the form of the previous section, that
every innite subtree of 2
<N
has an innite path. WKL
0
is RCA
0
together with
that comprehension axiom.
9
I want to note here that it is important that the tree be a subset of 2
<N
, rather
than any old tree where every node has at most two children. If we let the labels
of the nodes be unbounded, we get something equivalent to full Knigs lemma,
which says that any innite, nitely-branching tree (subtree of N
N
) has a path and
is equivalent to ACA
0
. In fact, some reverse mathematicians refer to 0-1 trees
rather than binary-branching trees to highlight the distinction. The dierence is
one of computability versus enumerability of the children of a node in a computable
tree. If the labels have a bound, we have only to ask a nite number of membership
questions to determine how many children a node actually has. If not, even knowing
there are at most 2 children, if we have not yet found 2, we must continue to ask
about children with higher and higher labels, a process that only halts if the node
has the full complement of children.
Over RCA
0
, WKL
0
is equivalent to the existence of a unique algebraic closure
for any countable eld. It is also equivalent to the statement that every continuous
function on [0, 1] attains a maximum, every countable commutative ring has a prime
ideal, and the Heine-Borel theorem. Heine-Borel says that every open cover of [0, 1]
has a nite subcover.
There is no canonical model of WKL
0
. In fact, any model of WKL
0
with universe
N contains a proper submodel which is also a model of WKL
0
. The intersection of
all such models is REC, which as we saw is not a model of WKL
0
. There is a
deep connection between models of WKL
0
and Peano Arithmetic (PA; see 9.4).
Formally, a degree d is the degree of a nonstandard model of PA iff there is a model
9
This makes WKL
0
another system with more induction than is simply given by comprehension
plus set-based induction; the three stronger systems do not have this trait. Without the extra
induction we call the system WKL
0
.
132 CHAPTER 9. AREAS OF RESEARCH
of WKL
0
with universe N consisting entirely of sets computable from d. Informally,
PA-degree is to WKL
0
what computable is to RCA
0
and arithmetic to ACA
0
.
The study of computable trees gives us a result called the low basis theorem,
which says any computable tree has a path of low degree (where A is low if A
,
and it is signicant that noncomputable low sets exist). This and a little extra
computability theory shows WKL
0
has a model with only low sets.
Arithmetic Comprehension
ACA
0
stands for arithmetic comprehension axiom. As mentioned, we obtain it from
Z
2
by restricting the formulas in the comprehension scheme to those which may
be written using number quantiers only, no set quantiers. Surprisingly, there is
no middle ground between RCA
0
and ACA
0
in terms of capping the complexity
of via the arithmetic hierarchy: if we allow to be even
0
1
, we get the full
power of ACA
0
. The proof is easy, as well: given the existence of a set X, we get
the existence of every set that is
0,X
1
, which includes X
1
, which by Posts Theorem 7.2.10 are the sets
that are
0,X
2
. Continuing this process we bootstrap our way all the way up the
arithmetic hierarchy.
RCA
0
can prove that the statement for all X, the Turing jump X
exists (suit-
ably coded) is equivalent to ACA
0
. Other equivalent statements include: every
sequence of points in a compact metric space has a convergent subsequence; ev-
ery countable vector space over a countable scalar eld has a basis (we may also
restrict to the scalar eld being Q and still get equivalence); and every countable
commutative ring has a maximal ideal.
ACA
0
, like RCA
0
, has a minimal model with universe N. It is ARITH, the
collection of all arithmetic sets. These sets are exactly those denable by formulas
with no set quantiers but arbitrarily-many number quantiers, or equivalently, sets
which are Turing reducible to
(n)
for some n.
Arithmetic Transnite Recursion
ATR
0
, like WKL
0
, is obtained via a restriction to comprehension that feels less
natural than the other systems. Arithmetic transnite recursion roughly says that
starting at any set that exists, we may iterate the Turing jump on it as many times
as we like and those sets will all exist. This is a very imprecise version, clearly,
since it is not at all apparent this gives more than ACA
0
; the real thing is quite
technical (as many times as we like is a lot), so we will skip it and discuss some
of the equivalent theorems.
The main one is the perfect set theorem. A set X is (topologically) perfect if it
has no isolated points; every point x X is the limit of some sequence of points
y
i
: y
i
X, i N, y
i
,= x. A tree is perfect if every node of the tree has more than
9.5. REVERSE MATHEMATICS 133
one innite path extending it, which is exactly the previous statement but specic
to the tree topology. The perfect set theorem states that every uncountable closed
set has a nonempty perfect subset, and the version for trees says every tree with
uncountably many paths has a nonempty perfect subtree. Both are equivalent to
ATR
0
over RCA
0
; note that both are comprehension theorems.
Another comprehension theorem which is equivalent to ATR
0
is that for any
sequence of trees T
i
: i N such that each T
i
has at most one path, the set
i : T
i
has a path exists.
From ATR
0
up, there are no minimal models. ATR
0
is similar to WKL
0
in that
it has no minimal model but the intersection of all its models of universe N is a
natural class of sets. In this case it is HYP, the hyperarithmetic sets, which we will
not dene.
1
1
Comprehension
1
1
-CA
0
stands for
1
1
comprehension axiom. It is in the same family as RCA
0
and
ACA
0
, where the comprehension scheme has been restricted by capping the allowed
complexity of the formula . In this case, is allowed to have one universal set
quantier, and an unlimited (nite) list of number quantiers.
Well mention only a few results equivalent to
1
1
-CA
0
, most strengthenings or
generalizations of theorems equivalent to ATR
0
. It is equivalent to
1
1
-CA
0
that
every tree is the union of a perfect subtree and a countable set of paths.
Two comprehension theorems equivalent to
1
1
-CA
0
are that (a) for any sequence
of trees T
i
: i N, the set i : T
i
has a path exists, and (b) for any uncountable
tree the perfect kernel of the tree exists (that is, the union of all its perfect subtrees).
A Spiderweb
It is remarkable that so many theorems of ordinary mathematics fall into ve ma-
jor equivalence classes under relative provability. However, it would be misleading
to close this section without mentioning that not every theorem has such a clean
relationship to the Big Five. A lot of research has been done that establishes a
cobweb of implications for results that lie between RCA
0
and ACA
0
, and for many
researchers this is the most interesting part of reverse mathematics. If I were willing
to drown you in acronym denitions I could draw a very large picture with one-way
arrows, two-way arrows, unknown implications, and non-implications, but we will
keep to a manageable list. For this section the primary references are papers by
Cholak, Jockusch, and Slaman [7] and Hirschfeldt and Shore [27].
One of the main focuses of this area of research is Ramseys theorem. The general
statement is the following, where [N]
n
is the set of all n-element subsets of N.
134 CHAPTER 9. AREAS OF RESEARCH
Theorem 9.5.1 (Ramseys theorem). Given f : [N]
n
0, . . . , m1, there is an
innite set H N such that the restricted map f H : [H]
n
0, . . . , m 1 is
constant. H is called a homogeneous set for f.
Ramseys theorem is a generalization of the pigeonhole principle, which is the
case n = 1. The pigeonhole principle says if we put innitely-many objects in
nitely-many boxes, some individual box must contain innitely-many objects.
We usually think of the range of f as consisting of colors and call f an m-coloring
of (unordered) n-tuples from N. For f a 2-coloring of pairs, we may picture [N]
2
as
a graph with vertices the natural numbers and an undirected edge between every
pair of distinct vertices. The map f colors each edge red or blue, and Ramseys
theorem asserts the existence of a subset of vertices such that the induced subgraph
on those vertices (the one that contains all edges from the original graph which still
have both their endpoints in the subgraph) will have all edges the same color.
For our use it matters what the values of the parameters are, so we use RT
n
m
to denote Ramseys theorem restricted to functions f : [N]
n
0, . . . , m 1 for
specied n and m. It is the size of the subsets that matters; we can argue the
number of colors can be restricted to 2 without loss of generality.
Exercise 9.5.2. Fix n. Show that by repeated applications of RT
n
2
one can obtain
a homogeneous set for the coloring f : [N]
n
0, . . . , m1. Since it is clear RT
n
m
implies RT
n
2
for each m 2, this shows they are actually equivalent.
RT
n
m
for xed n 3, m 2 is equivalent to ACA
0
, and the universal quantica-
tion (m) RT
n
m
is equivalent to ACA
0
for n 3. If we quantify over both parameters
we get the principle RT, which is strictly between ACA
0
and ATR
0
. On the other
side, RT
2
2
is strictly weaker than ACA
0
, and is not implied by WKL
0
. It is an open
problem whether RT
2
2
implies WKL
0
or whether they are independent.
A principle which is strictly below both WKL
0
and RT
2
2
is DNR, which stands
for diagonally non-recursive. DNR says there exists a function f such that
(e)(f(e) ,=
e
(e)). It is clear that DNR fails in REC, since such a function
is designed exactly to be unequal to every computable function.
CAC, or chain-antichain, says that every innite partial order (P,
P
) has an in-
nite subset that is either a chain (all elements are comparable by
P
; i.e., a subset
that is a linear order) or an antichain (no elements are comparable by
P
). As an ex-
ample, the partial order (T(N), ) contains the innite chain 0, 1, . . . n : n N
and the innite antichain n : n N.
ADS, ascending or descending sequence, is implied by CAC; it is open whether
they are equivalent or ADS is strictly weaker. ADS says that every innite
linear order (L,
L
) has an innite subset S that is either an ascending sequence
((s, t S)(s < t s <
L
t)) or a descending sequence ((s, t S)(s < t t <
L
s)).
For example, if L is (Z,
Z
) coded by n 2n for n 0 and n 2n 1 for
n < 0, the positive integers form an ascending sequence (their coded ordering <
9.5. REVERSE MATHEMATICS 135
matches their interpreted ordering
Z
) and the negative integers form a descending
sequence (their coded ordering is opposite from their interpreted ordering).
Neither CAC nor ADS implies DNR and neither is implied by WKL
0
. Both are
implied by RT
2
2
. For some justication as to why CAC and ADS should be stronger
than RCA
0
, see Exercise 6.2.4. Even if the order relation must be computable (as
in REC), things we dene from the order relation need not be.
Finally, WWKL
0
, or weak weak Knigs lemma, is a system intermediate between
WKL
0
and DNR. The lemma says that if T 2
<N
is a tree with no innite path,
then
lim
n
[ T : [[ = n[
2
n
= 0.
This is clearly implied by weak Knigs lemma, which says in contrapositive that if
T has no innite path it must be nite (so this fraction does not just approach 0, it
is identically 0 from some n on). A decent amount of measure theory can be carried
out in WWKL
0
, but I wanted to mention it in particular because it has connections
to randomness as laid out in 9.2. A model / of RCA
0
is also a model of WWKL
0
if and only if for every X in /, there is some Y in / such that Y is 1-random
relative to X [1].
Appendix A
Mathematical Asides
In this appendix Ive stuck a few proofs and other tidbits that arent really part of
computability theory, but have been referenced in the text.
A.1 The Greek Alphabet
As you progress through mathematics youll learn much of the Greek alphabet by
osmosis, but here is a list for reference.
alpha A nu N
beta B xi
gamma omicron O o
delta pi
epsilon E or rho P
zeta Z sigma
eta H tau T
theta upsilon
iota I phi or
kappa K chi X
lambda psi
mu M omega
A.2 Summations
When dening the pairing function we needed to sum from 1 to x + y. There is a
very clever way to nd a closed form for the sum 1 + 2 + . . . + n. Write out the
terms twice, in two directions:
1 + 2 + . . . + (n 1) + n
n + (n 1) + . . . + 2 + 1
137
138 APPENDIX A. MATHEMATICAL ASIDES
Adding downward, we see n copies of n + 1 added together. As this is twice the
desired sum, we get
n
i=1
i =
n(n + 1)
2
.
Related, though not relevant to this material, is the way one proves the sum of
the geometric series with terms ar
i
, i 0, is
a
1r
whenever [r[ < 1. We take the
partial sum, stopping at some i = n, and we subtract from it its product with r:
a + ar + ar
2
+ . . . + ar
n
(ar + ar
2
+ . . . + ar
n
+ ar
n+1
)
Letting s
n
be the n
th
partial sum of the series, we get s
n
rs
n
= a ar
n+1
, or
s
n
= (a ar
n+1
)/(1 r). The sum of any series is the limit of its partial sums, so
we see
i=0
ar
i
= lim
n
a ar
n+1
1 r
=
a
1 r
lim
n
(1 r
n+1
),
and that limit is 1 whenever [r[ < 1.
A.3 Cantors Cardinality Proofs
Cantor had two beautifully simple diagonal proofs to show the rational numbers
are no more numerous than the natural numbers, but the real numbers are strictly
more numerous. The ideas of these proofs are used for some of the most fundamen-
tal results in computability theory, such as the proof that the halting problem is
noncomputable.
First we show that Q has the same cardinality as N. Take the grid of all pairs
of natural numbers; i.e., all integer-coordinate points in the rst quadrant of the
Cartesian plane. The pair (n, m) represents the rational number n/m; all positive
rational numbers are representable as fractions of natural numbers. We may count
these with the natural numbers if we go along diagonals of slope 1. Note that it
does not work to try to go row by row or column by column, as you will never nish
the rst one; you must dovetail the rows and columns, doing a bit from the rst,
then a bit from the second and some more from the rst, then a bit from the third,
more from the second, and yet more from the rst, and so on. To count exactly
the rationals, start by labeling 0 with 0, then proceed along the diagonals, skipping
(n, m) if n/m reduces to a rational weve already counted, and otherwise counting
it twice to account for the negation of n/m.
Cantors proof that R is strictly bigger than N is necessarily more subtle, as
demonstrating the existence of an isomorphism to N (which is exactly what count-
ing with the natural numbers accomplishes) is generally more straightforward than
demonstrating no such isomorphism exists.
A.3. CANTORS CARDINALITY PROOFS 139
In fact, we will show even just the interval from 0 to 1 is larger than N. Suppose
for a contradiction that we have an isomorphism between [0, 1] and N. List the
elements of [0, 1] out in the order given by the isomorphism, as innite repeating
decimals (using all-0 tails if needed):
.65479362895 . . .
.00032797584 . . .
.35271900000 . . .
.00000000063 . . .
.98989898989 . . .
.
.
.
Now construct a new number d [0, 1] decimal by decimal using the numbers on
the list. If the n
th
decimal place of the n
th
number on the list is k, then the n
th
decimal place of d will be k +1, or 0 if k = 9. In our example above, d would begin
.71310. While d is clearly a number between 0 and 1, it does not appear on the list,
because it diers from every number on the list in at least one decimal place the
n
th
.
Bibliography
[1] Ambos-Spies K., B. Kjos-Hanssen, S. Lempp, and T.A. Slaman. Comparing DNR and
WWKL. Journal of Symbolic Logic 69:10891104, 2004.
[2] Ambos-Spies, K., and A. Kuera. Randomness in computability theory. In Computability
Theory and Its Applications: Current Trends and Open Problems (ed. Cholak, Lempp, Ler-
man, Shore), vol. 257 of Contemporary Mathematics, pages 114. American Mathematical
Society, 2000.
[3] Ash, C.J., and J.F. Knight. Computable Structures and the Hyperarithmetical Hierarchy.
Elsevier Science B.V., 2000.
[4] Boolos, G.S., J.P. Burgess, and R.C. Jerey. Computability and Logic, fourth edition. Cam-
bridge University Press, 2002.
[5] Chaitin, G.J. Information-theoretical characterizations of recursive innite strings. Theoret-
ical Computer Science 2:4548, 1976.
[6] Chaitin, G.J. Incompleteness theorems for random reals. Advances in Applied Mathematics
8:119146, 1987.
[7] Cholak, P.A., C.J. Jockusch, and T.A. Slaman. On the strength of Ramseys theorem for
pairs. Journal of Symbolic Logic 66: 155, 2001.
[8] Church, A. An unsolvable problem of elementary number theory. Journal of Symbolic Logic
1:7374 (1936).
[9] Church, A. On the concept of a random sequence. Bulletin of the American Mathematical
Society 46:130135, 1940.
[10] Cutland, N. Computability: An introduction to recursive function theory. Cambridge Univer-
sity Press, 1980.
[11] Davis, M. Computability and Unsolvability. McGraw-Hill Education, 1958. Reprinted by
Dover Publications, 1985.
[12] Davis, M. The Undecidable. Raven Press, 1965.
[13] Davis, M. Hilberts tenth problem is unsolvable. American Mathematical Monthly
80:233269, 1973.
[14] Downey, R., and D. Hirschfeldt, Algorithmic Randomness and Complexity, in preparation.
141
142 BIBLIOGRAPHY
[15] Downey, R., E. Griths, and G. LaForte. On Schnorr and computable randomness, martin-
gales, and machines. Mathematical Logic Quarterly 50(6):613627, 2004.
[16] Downey, R., D. Hirschfeldt, A. Nies, and F. Stephan. Trivial reals, extended abstract. In
Computability and Complexity in Analysis Malaga (Electronic Notes in Theoretical Computer
Science, and proceedings; edited by Brattka, Schrder, Weihrauch, Fern Universitt; 294-
6/2002, 37-55), July 2002.
[17] Dzgoev, V.D., and S.S. Goncharov. Autostable models (English translation). Algebra and
Logic 19:2837, 1980.
[18] Ehrenfeucht, A., J. Karhumaki, and G. Rozenberg. The (generalized) Post correspondence
problem with lists consisting of two words is decidable. Theoretical Computer Science 21(2),
1982.
[19] Enderton, H.B. A Mathematical Introduction to Logic, second edition. Harcourt/Academic
Press, 2001.
[20] Ershov, Yu.L., S.S. Goncharov, A. Nerode, J.B. Remmel, and V.W. Marek, eds. Handbook of
recursive mathematics. Vol. 1: Recursive model theory. Studies in Logic and the Foundations
of Mathematics 138, North-Holland, 1998.
[21] Gcs, P. Every set is reducible to a random one. Information and Control 70:186192, 1986.
[22] Goncharov, S.S. The quantity of non-autoequivalent constructivizations (English transla-
tion). Algebra and Logic 16:169185, 1977.
[23] Goncharov, S.S. The problem of the number of non-autoequivalent constructivizations (En-
glish translation). Algebra and Logic 19:401414, 1980.
[24] Harizanov, V. Computably-theoretic complexity of countable structures Bulletin of Symbolic
Logic 8:457477, 2002.
[25] Hilbert, D. Mathematical problems, Bulletin of the American Mathematical Society 8(1901
1902):437479.
[26] Hirschfeldt, D., A. Nies, and F. Stephan. Using random sets as oracles. Submitted.
[27] Hirschfeldt, D., and R.A. Shore. Combinatorial principles weaker than Ramseys theorem for
pairs. Journal of Symbolic Logic 72:171206, 2007.
[28] Kogge, P.M. The Architecture of Symbolic Computers. The McGraw-Hill Companies, Inc.,
1998.
[29] Kolmogorov, A.N. Grundbegrie der Wahrscheinlichkeitsrechnung. Springer, 1933.
[30] Kolmogorov, A.N. On tables of random numbers. Sankhya, Series A, 25:369376, 1963.
[31] Kolmogorov, A.N., Three approaches to the quantitative denition of information. Problems
of Information Transmission (Problemy Peredachi Informatsii) 1:17, 1965.
[32] Kleene, S.C. General recursive functions of natural numbers. Mathematische Annalen,
112:727742, 1936.
BIBLIOGRAPHY 143
[33] Knight, J.F. Degrees coded in jumps of orderings. Journal of Symbolic Logic 51:10341042,
1986.
[34] Kraft, L.G. A Device for Quantizing, Grouping, and Coding Amplitude Modulated Pulses.
Electrical engineering M.S. thesis. MIT, Cambridge, MA, 1949.
[35] Kuera, A. Measure,
0
1
classes, and complete extensions of PA. In Springer Lecture Notes
in Mathematics Vol. 1141, pages 245259. Springer-Verlag, 1985.
[36] Kuera, A., and S. Terwijn. Lowness for the class of random sets. Journal of Symbolic Logic
64(4):13961402, 1999.
[37] Levin, L.A. On the notion of a random sequence. Soviet Mathematics Doklady 14:14131416,
1973.
[38] Levin, L.A. Laws of information conservation (non-growth) and aspects of the foundation of
probability theory. Problems of Information Transmission 10:206210, 1974.
[39] Lvy, P. Thorie de lAddition des Variables Aleatoires. Gauthier-Villars, 1937 (second edition
1954).
[40] Li, M., and P. Vitnyi, An Introduction to Kolmogorov Complexity and its Applications,
second edition. Springer Graduate Texts in Computer Science, Springer Science+Business
Media, New York, NY 1997.
[41] Linz, P. An Introduction to Formal Languages and Automata, second edition. Jones and
Bartlett Publishers, 1997.
[42] Martin-Lf, P. The denition of random sequences. Information and Control 9:602619, 1966.
[43] Martin-Lf, P. Complexity oscillations in innite binary sequences. Z. Wahrscheinlichkeit-
theorie verw. Gebiete 19:225230, 1971.
[44] Matijasevi, Yu.V. On recursive unsolvability of Hilberts tenth problem. Logic, methodology
and philosophy of science, IV (Proc. Fourth Internat. Congr., Bucharest, 1971). Studies in
Logic and Foundations of Math. 74: 89110. North-Holland, 1973.
[45] Matijasevi, Y., and G. Senizergues. Decision problems for semi-Thue systems with few rules.
Proceedings, 11th Annual IEEE Symposium on Logic in Computer Science, 1996.
[46] Miller, R. The
0
2
-spectrum of a linear order. Journal of Symbolic Logic 66:470486, 2001.
[47] Nies, A. Lowness properties and randomness. Advances in Mathematics 197(1):274305,
2005.
[48] Nies, A. Computability and Randomness. Oxford Logic Guides, 51. Oxford University Press,
Oxford, 2009.
[49] Odifreddi, P.G. Classical Recursion Theory. Studies in Logic and the Foundations of Mathe-
matics 125, Elsevier, 1989, 1992.
[50] Post, E.L. A variant of a recursively unsolvable problem. Bulletin of the American Mathe-
matical Society 52, 1946.
144 BIBLIOGRAPHY
[51] Rogers, H. Theory of Recursive Functions and Eective Computability. The MIT Press, 1987.
[52] Schnorr, C. P. A unied approach to the denition of random sequences. Mathematical Sys-
tems Theory 5:246258, 1971.
[53] Shafer, G. A counterexample to Richard von Mises theory of collectives. Translation with
introduction of an extract from Villes tude Critique de la Notion de Collectif [62], available
from http://www.probabilityandnance.com.
[54] Simpson, S.G. Subsystems of Second-Order Arithmetic. Perspectives in Mathematical Logic,
Springer-Verlag, 1999.
[55] Smith, D., M. Eggen, and R. St. Andre. A Transition to Advanced Mathematics, third edition.
Brooks/Cole Publishing Company, Wadsworth, Inc., 1990.
[56] Soare, R.I. Recursively Enumerable Sets and Degrees. Perspectives in Mathematical Logic,
Springer-Verlag, 1987.
[57] Solovay, R. Draft of paper (or series of papers) on Chaitins work. Unpublished notes, May
2004.
[58] Tennenbaum, S. Non-Archimedean models for arithmetic. Notices of the American Mathe-
matical Society 6:270, 1959.
[59] Turing, A.M. On computable numbers, with an application to the Entscheidungsproblem.
Proceedings of the London Mathematical Society, Series 2, 42:230265 (1937). A correction.
Proceedings of the London Mathematical Society, Series 2, 43:544546 (1937).
[60] van Lambalgen, M. Random Sequences. Ph.D. thesis. University of Amsterdam, The Nether-
lands, 1987.
[61] van Lambalgen, M. The axiomatization of randomness. Journal of Symbolic Logic
55(3):11431167, 1990.
[62] Ville, J. tude Critique de la Notion de Collectif. Gauthier-Villars, Paris, 1939.
[63] Volchan, S.B. What is a random sequence. American Mathematical Monthly 109(1):4663,
2002.
[64] von Mises, R. Probability, Statistics and Truth. Translation of the third German edition,
1951; originally published 1928, Springer. George Allen and Unwin Ltd., London, 1957.
[65] Wald, A. Die Widerspruchsfreiheit des Kollektivbegris der Wahrscheinlichkeitsrechnung.
Ergebnisse eines mathematische Kollektives 8:3872, 1936.
[66] Zambella, D. On sequences with simple initial segments. ILLC technical report ML-1990-05,
University of Amsterdam, 1990.
[67] Zvonkin, A.K., and L.A. Levin. The complexity of nite objects and the development of
concepts of information and randomness by the theory of algorithms. Russian Mathematical
Surveys 25(6):83124, 1970.