Book
Book
Book
Making Connections
Jim Hefferon
https://hefferon.net/computation
Notation Description
P (S) power set, collection of all subsets of S
Sc complement of the set S
1S characteristic function of the set S
⟨a 0 , a 1 , ... ⟩ sequence
N, Z, Q, R natural numbers { 0, 1, ... }, integers, rationals, reals
a, b, . . . characters
Σ alphabet, set of characters
B alphabet of bits, B = { 0, 1 }
σ, τ strings (any lower-case greek letter)
ε empty string
Σ∗ set of all strings over the alphabet
L language, subset of Σ∗
P Turing machine, either deterministic or nondeterministic
ϕ effective function, function computed by a Turing machine
ϕ(x)↓, ϕ(x)↑ function converges on that input, or diverges
UP universal Turing machine
G graph
M Finite State machine, either deterministic or nondeterministic
P complexity class of deterministic polynomial time problems
NP complexity class of nondeterministic polynomial time problems
V verifier for NP
SAT language for the Satisfiability problem
License This book is Free. You can use it without cost. You can also redistribute
it — an instructor can make copies and give it away or sell it through their bookstore
or their school’s intranet. You can also get the LATEX source and modify it to suit
your class; see https://hefferon.net/computation.
One reason that the book is Free is that it is written in LATEX, which is Free, as
is our Scheme implementation, as is Asymptote that drew the illustrations, as is
Emacs and all of GNU software, and the entire Linux platform on which this book
was developed. And besides those, all of the research that this text presents was
made freely available by scholars.
I believe that the synthesis here adds value — I hope so, indeed — but the
masters have left a well-marked trail and following it seems only right.
Acknowledgments I owe a great debt to my wife, whose patience with this
project has gone beyond all reasonable bounds. Thank you, Lynne.
My students have made the book better in many ways. I greatly appreciate all
of the contributions.
And, I must honor my teachers. First among them is M Lerman. Thank you,
Manny.
My teachers also include H Abelson, G J Sussmann, and J Sussmann, who
had the courage with Structure and Interpretation of Computer Programs to show
students just how mind-blowing it all is. When I see a programming text where
the examples are about managing inventory in a used car dealership, I can only
say: Thank you, for believing in me.
Memory works far better when you learn networks of facts rather than
facts in isolation.
– T Gowers, WHAT MATHS A-LEVEL DOESN’T NECESSARILY GIVE
YOU
Lisp has jokingly been called “the most intelligent way to misuse a
computer.” I think that description is a great compliment because it
transmits the full flavor of liberation: it has assisted a number of our
most gifted fellow humans in thinking previously impossible thoughts.
– E Dijkstra, CACM, 15:10
Jim Hefferon
Saint Michael’s College
Colchester, VT USA
joshua.smcvt.edu/computing
Draft: version 0.99, 2020-Dec-27.
Contents
I Mechanical Computation 3
1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Effective functions . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 What it does not say . . . . . . . . . . . . . . . . . . . . . . . . 17
4 An empirical question? . . . . . . . . . . . . . . . . . . . . . . . 18
5 Using Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1 Primitive recursion . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 General recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1 Ackermann functions . . . . . . . . . . . . . . . . . . . . . . . . 30
2 µ recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A Turing machine simulator . . . . . . . . . . . . . . . . . . . . . . . 37
B Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
C Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D Ackermann’s function is not primitive recursive . . . . . . . . . . . . 49
E LOOP programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
II Background 61
1 Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2 Cantor’s correspondence . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
1 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1 Universal Turing machine . . . . . . . . . . . . . . . . . . . . . . 84
2 Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 The Halting problem . . . . . . . . . . . . . . . . . . . . . . . . . . 92
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 General unsolvability . . . . . . . . . . . . . . . . . . . . . . . . 96
6 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7 Computably enumerable sets . . . . . . . . . . . . . . . . . . . . . . 107
8 Oracles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9 Fixed point theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 117
1 When diagonalization fails . . . . . . . . . . . . . . . . . . . . . 117
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A Hilbert’s Hotel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
B The Halting problem in Wider Culture . . . . . . . . . . . . . . . . . 124
C Self Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
D Busy Beaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
E Cantor in Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
IV Automata 178
1 Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . 178
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
2 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
3 ε transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4 Equivalence of the machine types . . . . . . . . . . . . . . . . . . 197
3 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 203
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
2 Kleene’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
2 Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5 Languages that are not regular . . . . . . . . . . . . . . . . . . . . 219
6 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7 Pushdown machines . . . . . . . . . . . . . . . . . . . . . . . . . . 235
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
2 Nondeterministic Pushdown machines . . . . . . . . . . . . . . . 240
3 Context free languages . . . . . . . . . . . . . . . . . . . . . . . 243
A Regular expressions in the wild . . . . . . . . . . . . . . . . . . . . 244
B The Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . . . . . . 252
V Computational Complexity 259
1 Big O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
3 Tractable and intractable . . . . . . . . . . . . . . . . . . . . . . 268
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
2 A problem miscellany . . . . . . . . . . . . . . . . . . . . . . . . . 275
1 Problems with stories . . . . . . . . . . . . . . . . . . . . . . . . 275
2 More problems, omitting the stories . . . . . . . . . . . . . . . . 279
3 Problems, algorithms, and programs . . . . . . . . . . . . . . . . . . 289
1 Types of problems . . . . . . . . . . . . . . . . . . . . . . . . . . 290
2 Statements and representations . . . . . . . . . . . . . . . . . . 293
4 P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
2 Effect of the model of computation . . . . . . . . . . . . . . . . . 301
3 Naturalness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
5 NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
1 Nondeterministic Turing machines . . . . . . . . . . . . . . . . . 306
2 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
3 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
6 Polytime reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
7 NP completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
1 P = NP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
8 Other classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
1 EXP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
2 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 340
3 Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 341
4 The Zoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
A RSA Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
B Tractability and good-enoughness . . . . . . . . . . . . . . . . . . . 350
Appendix 353
A Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
B Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Notes 363
Bibliography 396
Part One
Classical Computability
Chapter
I Mechanical Computation
What can be computed? For instance, the function that doubles its input, that
takes in x and puts out 2x , is intuitively mechanically computable. We shall call
such functions effective.
The question asks for the things that can be computed, more than it asks for
how to compute them. In this Part we will be more interested in the function, in
the input-output behavior, than in the details of implementing that behavior.
Section
I.1 Turing machines
Despite this desire to downplay implementation, we follow the approach of
A Turing that the first step toward defining the set of computable
functions is to reflect on the details of what mechanisms can do.
The context of Turing’s thinking was the Entscheidungsproblem,†
proposed in 1928 by D Hilbert and W Ackermann, which asks for an
algorithm that decides, after taking as input a mathematical state-
ment, whether that statement is true or false.‡ So he considered the
kind of symbol-manipulating computation familiar in mathematics,
as when we factor a polynomial or verify a step in a plane geometry
proof.
After reflecting on it for a while, one day after a run,§ Turing laid
down in the grass and imagined a clerk doing by-hand multiplication
with a sheet of paper that gradually becomes covered with columns of
numbers. With this image as a touchstone, Turing posited conditions
for the computing agent.
First, it (or he, or she) has a memory facility, such as the clerk’s Alan Turing 1912–
paper, to store and retrieve information. 1954
Second, the computing agent must follow a definite procedure, a
precise set of instructions, with no room for creative leaps. Part of what makes the
procedure definite is that the instructions don’t involve random methods, such as
counting clicks from radioactive decay, to determine which of two possibilities to
perform.
The other thing making the procedure definite is that the agent does not use
continuous methods or analog devices. So there is no question about the precision
Image: copyright Kevin Twomey, kevintwomey.com/lowtech.html † German for “decision problem.”
‡
When it finished computing it might turn on a light for ‘true’, or print the symbol 1. § He was a
serious candidate for the 1948 British Olympic marathon team.
4 Chapter I. Mechanical Computation
of operations as there might be, say, when reading results off of a slide rule or an
instrument dial. Instead, the agent works in a discrete fashion, step-by-step. For
instance, if needed they could pause between steps, note where they are (“about
to carry a 1”), and later pick up again. We say that at each moment the clerk is in
one of a finite set of possible states, which we denote q 0 , q 1 , . . .
Turing’s third condition arose because he wanted to investigate what is com-
putable in principle. He therefore imposed no upper bound on the amount of
available memory. More precisely, he imposed no finite upper bound — should
a calculation threaten to run out of storage space then more is provided. This
includes imposing no upper bound on the amount of memory available for inputs
or for outputs, and no bound on the amount of extra storage, scratch memory,
needed in addition to that for inputs and outputs.† He similarly put no upper
bound on the number of instructions. And, he left unbounded the number of steps
that a computation performs before it finishes.‡
The final question Turing faced is: how smart is the computing agent? For
instance, can it multiply? We don’t need to include a special facility for multi-
plication because we can in principle multiply via repeated addition. We don’t
even need addition because we can iterate the successor operation, the add-one
operation. In this way he pared the computing agent down until it was quite basic,
quite easy to understand, until the operations are so elementary that we cannot
easily imagine them further divided, while still keeping the agent powerful enough
to do anything that can, in principle, be done.
The tape is the memory, sometimes called the store. The box can read from
and write to it, one character at a time, as well as move a read/write head relative
to the tape in either direction. For instance, to multiply, the computing agent
can get the two input multiplicands from the tape (the drawing shows 74 and 72,
represented in binary and separated by a blank), can use the tape for scratch work,
†
It is true that a physical computer such as your cell phone has memory space that is bounded (putting
aside storing things in the Cloud). However, that space is extremely large. In this Part, when working
with the model devices we find that imposing a bound on memory is irrelevant, or even a hindrance.
‡
Some authors describe the availability of resources such as the amount of memory as ‘infinite’.
Turing himself does this. A reader may object that this violates the goal of the definition, to model
physically-realizable computations, and so the development here instead says that the resources have
no finite upper bound. But really, it doesn’t matter. If we show that something cannot be computed
when there are no bounds then we have shown that it cannot be computed on any real-world device.
Section 1. Turing machines 5
111
q0
†
Whether we move the tape or the head doesn’t matter, what matters is their relative motion. Thus
Tn = L means that one or the other moves such that the head now points to the location one place to
the left. In drawings we hold the tape steady and move the head because then comparing graphics step
by step is easier.
6 Chapter I. Mechanical Computation
We take the convention that when we press Start the machine is in state q 0 . The
picture shows it reading 1 so instruction q 0 1Rq 0 applies. Thus the first step is
that the machine moves its tape head right and stays in state q 0 . Below, the first
line shows this and later lines show the machine’s configuration after later steps.
Roughly, the computation slides to the right, blanks out the final 1, and slides back
to the start.
111 11
2 7
q0 q2
111 11
3 8
q0 q2
111 11
4 9
q1 q3
11
5
q1
Next, because there is no state q 3 , no instruction applies and the machine halts.
We can think of this machine as computing the predecessor function
(
x − 1 – if x > 0
pred(x) =
0 – else
because if we initialize the tape so that it contains only a string of n -many 1’s and
the machine’s head points to the first, then at the end the tape will have n − 1-many
1’s (except for n = 0, where the tape will end with no 1’s).
1.2 Example This machine adds two natural numbers.
The input numbers are represented by strings of 1’s that are separated with a blank.
The read/write head starts under the first symbol in the first number. This shows
the machine ready to compute 2 + 3.
11 111
q0
The machine scans right, looking for the blank separator. It changes that to a 1,
then scans left until it finds the start. Finally, it trims off a 1 and halts with the
Section 1. Turing machines 7
read/write head to the start of the string. Here are the steps.
11 111 111111
2 8
q0 q2
11 111 111111
3 9
q1 q3
111111 111111
4 10
q1 q3
111111 11111
5 11
q2 q4
111111 11111
6 12
q2 q5
Instead of giving a machine’s instructions as a list, we can use a table or a
diagram. Here is the transition table for Ppred and its transition graph.
∆ B 1
q0 Lq 1 Rq 0 q0 q1 q2 q3
B, L B, L B, R
q1 Lq 2 Bq 1
q2 Rq 3 Lq 2 1, R 1, B 1, L
q3 – –
∆ B 1
q0 Bq 1 Rq 0
1, 1
q1 1q 1 1q 2 q0 q1 q2 q3 q4 q5
B, B 1, 1 B, B 1, B
q2 Bq 3 Lq 2 B, R
q3 Rq 3 Bq 4 1, R B, 1 1, L B, R
q4 Rq 5 1q 5
q5 – –
The graph is how we will use most often present machines that are small, but if
there are lots of states then it can be visually confusing.
Next, a crucial observation. Some Turing machines, for at least some starting
configurations, never halt.
1.3 Example The machine Pinf loop = {q 0 BBq 0 , q 0 11q 0 } never halts, regardless of the
input.
8 Chapter I. Mechanical Computation
B, B q0 1, 1
The exercises ask for examples of Turing machines that halt on some inputs and
not on others.
It is high time for definitions. We take a symbol to be something that the device
can write and read, for storage and retrieval.†
1.4 Definition A Turing machine P is a finite set of four-tuple instructions qpTpTn qn .
In an instruction, the present state qp and next state qn are elements of a set
of states Q . The input symbol or current symbol Tp is an element of the tape
alphabet set Σ, which contains at least two members, including one called blank
(and does not contain L or R). The action symbol or next symbol Tn is an element
of the action set Σ ∪ { L, R }.
The set P must be deterministic: different four-tuples cannot begin with the
same qpTp . Thus, over the set of instructions qpTpTn qn ∈ P, the association of
present pair qpTp with next pair Tn qn defines a function, the transition function
or next-state function ∆ : Q × Σ → (Σ ∪ { L, R }) × Q .
We denote a Turing machine with P because the thing from our everyday
experience that a Turing machine is most like is a program.
Of course, the point of a machine is what it does. A Turing machine is a blueprint
for a computation — it is like a program — and so to finish the formalization started
by the definition we give a complete description of how these machines act.
We saw in tracing through Example 1.1 and Example 1.2 that a machine acts
by transitioning from one configuration to the next. A configuration of a Turing
machine is a four-tuple C = ⟨q, s, τL , τR ⟩ , where q is a state, a member of Q , s is a
character from the tape alphabet Σ, and τL and τR are strings of elements from
the tape alphabet, including possibly the empty string ε . These signify the current
state, the character under the read/write head, and the tape contents to the left
and right of the head. For instance, line 2 of the trace table of Example 1.2, where
the state is q = q 0 , the character under the head s is the blank, and to the left
of the head is τL = 11 while to the right is τR = 111, graphically represents the
configuration ⟨q, s, τL , τR ⟩ . That is, a configuration is a snapshot, an instant in a
computation.
We write C (t) for the machine’s configuration after the t -th transition, and say
that this is the configuration at step t . We extend that to step 0, and say that the
initial configuration C (0) is the machine’s configuration before we press Start.
Suppose that at step t a machine P is in configuration C (t) = ⟨q, s, τL , τR ⟩ . To
make the next transition, find an instruction qpTpTn qn ∈ P with qp = q and Tp = s .
If there is no such instruction then at step t + 1 the machine P halts.
†
How the device does this depends on its construction details. For instance, to have a machine with
two symbols, blank and 1, we can either read and write marks on a paper tape, or align magnetic
particles on a plastic tape, or bits on a chip, or we can push LEGO bricks to the left or right side of a
slot. Discreteness ensures that the machine can cleanly distinguish between the symbols, in contrast
with the trouble an instrument might have in distinguishing two values near its limit of resolution.
Section 1. Turing machines 9
is only one machine under discussion then we may omit the subscript and just
write ϕ .)
1 1 1 1 1 1 1 1
q0 7→ qh
That definition has two fine points, both needed to make the input-output
association well-defined. One is that just specifying that the machine starts with σ
on the tape is not enough since the initial position of the head can change the
output.† And, the definition omits blanks from σ and τ since the machine would
not be able to distinguish blanks at the end of those strings from blanks that are
part of the unbounded tape.
The definition says “If P halts . . . ” What if it doesn’t?
1.7 Definition If for a Turing machine the value of a computation is not defined
on some input σ ∈ Σ0 then we say that the function computed by the machine
diverges, written ϕ(σ )↑ (or ϕ(σ ) = ⊥ ). Where the machine does have an
associated output value, we say that its function converges, written ϕ(σ )↓. If ϕ is
defined for each input in Σ0 then it is a total function. If it diverges for at least
one member of Σ0 then it is a partial function.
Very important: note the difference between a machine P and the function
computed by that machine ϕ P .‡ For example, the machine Ppred is a set of four-
tuples but the predecessor function is a set of input-output pairs, which we might
write as x 7→ pred(x). Another example of the difference is that machines halt or
fail to halt, while functions converge or diverge.
That definition appears to only allow functions with a single input and out-
put; what about functions with multiple inputs or outputs? For instance, the
function that takes in two natural numbers a and b and returns ab is intuitively
mechanically computable but isn’t obviously covered.
The trick here is to consider the input string of 1’s to be an encoding, a repre-
sentation, of multiple inputs. For instance, we could set an exponentiation routine
up so that it inputs a string of x -many 1’s, then performs a prime factorization to
get x = 2a 3b · k for some k ∈ N, and then returns ab . In this way we can get a
two-input function from a single input string.
OK then, what about computing with non-numbers? For instance, we may want
to find the shortest path through a graph. In an extension of the prior paragraph,
to compute with a graph we find a way to represent it with a string. Programs
that work with graphs first decode the input string, then compute the answer, and
finish by encoding the answer as a string.
These codings may seem awkward, and they are. (Of course, a programming
language does conversions from decimal to binary and back again that are somewhat
†
Some authors don’t require that the first character in the output is under the head. But this way is
neater. ‡ The introduction to this chapter says that we are most interested in effective functions, ϕ P ,
and that we study machines P with an eye mostly to getting information about what they compute.
Section 1. Turing machines 11
I.1 Exercises
Unless the exercise says otherwise, assume that Σ = { B, 1 }. Also assume that any
machine must start with its head under the leftmost input character and arrange for
it to end with the head under the leftmost output character.
1.10 How is a Turing machine like a program? How is it unlike a program? How
is it like the kind of computer we have on our desks? How is it unlike?
1.11 Why does the definition of a Turing machine, Definition 1.4, not include a
definition of the tape?
1.12 Your study partner asks you, “The opening paragraphs talk about the Entschei-
dungsproblem, to mechanically determine whether a mathematical statement is
†
The term ‘recursive’ used to be universal but is now old-fashioned.
12 Chapter I. Mechanical Computation
true or false. I write programs with bits like if (x>3) all the time. What’s the
problem?” Help your friend out.
✓ 1.13 Trace each computation, as in Example 1.5.
(a) The machine Ppred from Example 1.1 when starting on a tape with two 1’s.
(b) The machine Padd from Example 1.2 the addends are 2 and 2.
(c) Give the two computations as configuration sequences, as on section 1.
✓ 1.14 For each of these false statements about Turing machines, briefly explain the
fallacy.
(a) Turing machines are not a complete model of computation because they
can’t do negative numbers.
(b) The problem with Example 1.3 is that the instructions don’t have any extra
states where the machine goes to halt.
(c) For a machine to reach state q 50 it must run for at least fifty one steps.
1.15 We often have some states that are halting states, where we send the machine
solely to make it halt. In this case the others are working states. For instance,
Example 1.1 uses q 3 as a halting state and its working states are q 0 , q 1 , and q 2 .
Name Example 1.2’s halting and working states.
✓ 1.16 Trace the execution of Pinf loop for ten steps, from a blank tape. Show the
sequence of tapes.
1.17 Trace the execution on each input of this Turing machine with alphabet
Σ = { B, 0, 1 } for ten steps, or fewer if it halts.
Section
I.2 Church’s Thesis
History Algorithms have always played a central role in mathematics. The simplest
example is a formula such as the one giving the height of a ball dropped from the
Leaning Tower of Pisa, h(t) = −4.9t 2 + 56. This is a kind of program: get the
height output by squaring the time input, multiplying by −4.9, and adding 56.
In the 1670’s Gottfried Wilhelm von Leibniz, the co-creator
of Calculus, constructed the first machine that could do addition,
subtraction, multiplication, division, and square roots as well. This
led him to speculate on the possibility of a machine that manipulates
not just numbers but symbols and could thereby determine the
truth of scientific statements. To settle any dispute, Leibniz wrote,
scholars could just say, “Calculemus!”† This is a version of the
Entscheidungsproblem.
The real push to understand computation arose in 1927 from
the Incompleteness theorem of K Gödel. This says that for any
(sufficiently powerful) axiom system there are statements that,
Leibniz’s Stepped while true in any model of the axioms, are not provable from
Reckoner those axioms. Gödel gave an algorithm that inputs the axioms and
outputs the statement. This made evident the need to define what
is ‘algorithmic’ or ‘intuitively mechanically computable’ or ‘effective’.
†
Latin for “Let us calculate!”
Section 2. Church’s Thesis 15
Evidence We cannot prove Church’s Thesis. That is, we cannot give a mathematical
proof. The definition of a Turing machine, or of lambda calculus or other equivalent
schemes, formalizes the notion of ‘effective’ or ‘intuitively mechanically computable’.
When a researcher agrees that it correctly explicates ‘computable on a discrete
and deterministic mechanism’ and consents to work within that formalization,
they are then free to proceed with reasoning mathematically about these systems.
So in a sense, Church’s Thesis comes before the mathematics, or at any rate sits
outside the usual derivation and verification work of mathematics. Turing wrote,
“All arguments which can be given are bound to be, fundamentally, appeals to
intuition, and for this reason rather unsatisfactory mathematically.”
Despite not being the conclusion of a deductive system, Church’s Thesis
is very widely accepted. We will give four points in its favor that persuaded
Gödel, Church, and others at the time, and that still persuade researchers
today — coverage, convergence, consistency, and clarity.
First, coverage: everything that people have thought of as intuitively
computable has proven to be computable by a Turing machine. This
Kurt Gödel includes not just the number theoretic functions investigated by researchers
1906–1978 in the 1930’s but also everything ever computed by every program written
†
After producing his machine model, Turing became a PhD student of Church at Princeton.
‡
Some authors call this the Church-Turing Thesis. Here we figure that because Turing has the machine,
we can give Church sole possession of the thesis.
16 Chapter I. Mechanical Computation
for every existing computer, because all of them can be compiled to run on
a Turing machine.
Despite this weight of evidence, the argument by coverage would collapse if
someone exhibited even one counterexample, one operation that can be done in
finite time on a physically-realizable discreet and deterministic device but that
cannot be done on a Turing machine. So this argument is strong but at least
conceivably not decisive.
The second argument is convergence: in addition to Turing and Church, many
other researchers then and since have proposed models of computation. For
instance, the next section on General Recursive Functions will give us a taste of
another influential model. However, despite this variation, our experience is that
every model yields the same set of computable functions. For instance, Turing
showed that the set of functions computable with his machine model is equal to
the set of functions computable with Church’s λ -calculus.
Now, everyone could be wrong. There could be some systematic error in
thinking around this point. For centuries geometers seemed unable to imagine
the possibility that Euclid’s Parallel Postulate does not hold and perhaps a similar
cultural blindness is happening here. Nonetheless, if a number of very smart
people go off and work independently on a question, and when they come back
you find that while they have taken a wide variety of approaches, they all got the
same answer, then you may well suppose that it is the right answer. At the least,
convergence says that there is something natural and compelling about this set of
functions.
An argument not available to Turing, Church, Gödel, and others in the 1930’s,
since it depends on work done since, is consistency: the details of the definition of
a Turing machine are not essential to what can be computed. For example, we can
show that a one-tape machine can compute all of the functions that can be done
by a machine with two or more tapes. Thus, the fact that Definition 1.4’s machines
have only one tape is not an essential point.
Similarly, machines whose tape is unbounded in only one direction can compute
all the functions computable with a tape unbounded in both directions. And
machines with more than one read/write head compute the same functions as
those with only one. As to symbols, we can compute any intuitively computable
function just by taking a single symbol beyond the blank that covers the all
but finitely-many cells on the starting tape, that is, with Σ = { 1, B }. Likewise,
restricting to write-only machines that cannot change marks once they are on the
tape suffices to compute this set of functions. Also, although restricting to machines
having only one state does not suffice, two-state machines are equipowerful with
the unboundedly-many states machines given in Definition 1.4.
There is one more condition that does not change the set of computable
functions, determinism. Recall that the definition of Turing machine given above
does not allow, say, both of the instructions q 5 1Rq 6 and q 5 1Lq 4 in the same machine,
because they both begin with q 5 1. If we drop this restriction then the class of
Section 2. Church’s Thesis 17
machines that we get are called nondeterministic. We will have much more to say
on this later but the collection of nondeterministic Turing machines computes the
same set of functions as does the collection of deterministic machines.
Thus, for any way in which the Turing machine definition seems to make an
arbitrary choice, making a different choice still yields the same set of computable
functions. This is persuasive in that any proper definition of what is computable
should posses this property; for instance, if two-tape machines computed more
functions than one-tape machines and three-tape machines more than those, then
identifying the set of computable functions with those computable by single-tape
machines would be foolish. But as with the prior argument, while this means that
the class of Turing machine-computable functions is natural and wide-ranging, it
still leaves open a small crack of a possibility that the class does not exhaust the
list of functions that are mechanically computable.
The most persuasive single argument for Church’s Thesis — what caused Gödel
to change his mind and what convinces scholars still today — is clarity: Turing’s
analysis is compelling. Gödel noted this in the quote given above and Church felt
the same way, writing that Turing machines have, “the advantage of making the
identification with effectiveness . . . evident immediately.”
What it does not say Church’s Thesis does not say that in all circumstances
the best way to understand a discrete and deterministic computation is via the
Turing machine model. For example, a numerical analyst studying the in-practice
performance of a floating point algorithm should use a computer model that has
registers. Church’s Thesis says that the calculation could in principle be done by a
Turing machine but for this use registers are more felicitous.†
Church’s Thesis also does not say that Turing machines are all there is to any
computation in the sense that if, say, you are studying an automobile antilock
braking system then the Turing machine model accounts for the logical and
arithmetic computations but not the entire system, with sensor inputs and actuator
outputs. S Aaronson has made this point, “Suppose I . . . [argued] that . . .
[Church’s] Thesis fails to capture all of computation, because Turing machines
can’t toast bread. . . . No one ever claimed that a Turing machine could handle
every possible interaction with the external world, without first hooking it up to
suitable peripherals. If you want a Turing machine to toast bread, you need to
connect it to a toaster; then the TM can easily handle the toaster’s internal logic.”
In the same vein, we can get physical devices that supply a stream of random
bits. These are not pseudorandom bits that are computed by a method that is
deterministic but which passes statistical tests. Instead, well-established physics
tells us these bits are truly random. Its relevance here is that Church’s Thesis only
claims that Turing machines model the discrete and deterministic computations
†
Brain scientists also find Turing machines to be not the most suitable model. Note, though, that saying
that an interrupt-driven brain model is a better fit is not the same as saying that the brain operations
could not, in principle, be done using a Turing machine as the substrate.
18 Chapter I. Mechanical Computation
that we can do after we are given input bits from such a device.
An empirical question? Church’s Thesis posits that Turing machines can do any
computation that is discrete and deterministic. That raises a big question: even
if we accept Church’s Thesis, can we do more by going beyond discrete and
deterministic? For instance, would analog methods — passing lasers through a gas,
say, or some kind of subatomic magic — allow us to compute things that no Turing
machine can compute? Or are these an ultimate in physically-possible machines?
Did Turing, on that day, lying on that grassy river bank, intuit everything that
experiments with reality would ever find to be possible?
For a taste of the conversation, we can prove that there is a case where the wave
equation† has initial conditions that are computable (for the initial real numbers x
there is a program that inputs i ∈ N and outputs the i -th decimal place of x ), but
the unique solution is not computable. So does the wave tank modeled by this
equation compute something that Turing machines cannot? Stated for rhetorical
effect: do the planets in their orbits compute a solution to the Three-Body Problem?
In this case we can object that an experimental apparatus is subject to noise
and measurement problems including a finite number of decimal places in the
instruments, etc. But even if careful analysis of the physics of a wave tank leads us
to discount it as reliably computing a function, we can still wonder whether there
are other apparatuses that would.
This big question remains open. As yet no analysis of a wider notion of physically-
possible mechanical computation in the non-discrete case has the support that
Turing’s analysis has garnered in its more narrow domain. In particular, no one
has yet produced a generally accepted example of a non-discrete mechanism that
computes a function that no Turing machine computes.
We will not pursue this any further, instead only observing that the community
of researchers has weighed in by taking Church’s Thesis as the basis for its work.
For us, ‘computation’ will refer to the kind of work that Turing analyzed. That’s
because we want to think about symbol-pushing, not numerical analysis and not
toast.
Using Church’s Thesis Church’s Thesis asserts that each of the models of com-
putation — for instance, Turing machines, λ calculus, and the general recursive
functions that we will see in the next section — are maximally capable. Here we
emphasize it because it imbues our results with a larger importance. When, for
instance, we will later describe a function for which we can prove that that no
Turing machine can compute it then, with the thesis in mind, we will take the
technical statement to mean that this function cannot be computed by any discrete
and deterministic device.
Another aspect of Church’s Thesis is that because they are each maximally
capable, these models, and others that we won’t describe, therefore all compute
†
A partial differential equation that describes the propagation of waves.
Section 2. Church’s Thesis 19
the same things. So we can fix one of them as our preferred formalization and get
on with the mathematical analysis. For this, we choose Turing machines.
Finally, we will also leverage Church’s Thesis to make life easier. As the exercises
in the prior section illustrate, while writing a few Turing machines gives some
insight, after a short while you may well find that doing more machines does not
give any more illumination. Worse, focusing too much on Turing machine details
(or on the low-level details of any computing model) can obscure larger points. So
if we can be clear and rigorous without actually having to handle a mass of detail
then we will be delighted.
Church’s Thesis helps with this. Often when we want to show that something
is computable by a Turing machine, we will first argue that it is intuitively
computable and then cite Church’s Thesis to assert that it is therefore Turing
machine computable. With that, our argument can proceed, “Let P be that
machine . . . ” without us ever having exhibited a set of four-tuple instructions. Of
course, there is some danger that we will get ‘intuitively computable’ wrong but
we all have so much more experience with this than people in the 1930’s that the
danger is minimal. The upside is that we can make rapid progress through the
material; we can get things done.
In many cases, to claim that something is intuitively computable we will produce
a program, or sketch a program, doing that thing. For these we like to use a
modern programming language, and our choice is a Scheme, specifically, Racket.
I.2 Exercises
2.2 Why is it Church’s Thesis instead of Church’s Theorem?
✓ 2.3 We’ve said that the thing from our everyday experience that Turing Machines
are most like is programs. What is the difference: (a) between a Turing
Machine and an algorithm? (b) between a Turing Machine and a computer?
(c) between a program and a computer? (d) between a Turing Machine and a
program?
✓ 2.4 Each of these is frequently voiced on the interwebs as a counterargument to
Church’s Thesis. Explain why each is bogus, said by clueless noobs. Plonk!
(a) Turing machines have an infinite tape so it is not a realistic model.
(b) The total size of the universe is finite, so there are in fact only finitely many
configurations possible for any computing device, whereas a Turing machine
can use more than that many configurations, so it is not a realistic model.
✓ 2.5 One of these is a correct statement of Church’s Thesis, and the others are
not. Which one is right? (a) Anything that can be computed by any mechanism
can be computed by a Turing machine. (b) No human computer, or machine
that mimics a human computer, can out-compute a Turing machine. (c) The
set of things that are computable by a discrete and deterministic mechanism
is the same as the set of things that are computable by a Turing machine.
20 Chapter I. Mechanical Computation
(d) Every product of a persons mind, or product of a mechanism that mimics the
activity of a person’s mind, can be produced by some Turing machine.
2.6 List two benefits from adopting Church’s Thesis.
✓ 2.7 Refute this objection to Church’s Thesis: “Some computations have unbounded
extent. That is, sometimes we look for our programs to halt but some computations,
such as an operating system, are designed to never halt. The Turing machine is
an inadequate model for these.”
2.8 The computers we use every day are binary. Use Church’s Thesis to argue that
if they were ternary, where instead of bits with two values they used trits with
three, then they would compute exactly the same set of functions.
2.9 Use Church’s thesis to argue that the indicated function exists and is com-
putable.
(a) Suppose that f 0 , f 1 : N → N are computable partial functions. Show that
h : N → N is a computable partial function where h(x) = 1 if x is in the
intersection of the domain of f 0 and the domain of f 1 , and h(x)↑ otherwise.
(b) Do the same as in the prior item, but take the union of the two domains.
(c) Suppose that f : N → N is a computable function that is total. Show that
h : N → N is a computable partial function, where h(x) = 1 if x is in the
range of f and and h(x)↑ otherwise.
(d) Suppose f 0 , f 1 : N → N are computable total functions. Show that their
composition h = f 1 ◦ f 0 is a computable function h : N → N.
(e) Suppose f 0 , f 1 : N → N are computable partial functions. Show that their
composition is a computable partial function f 1 ◦ f 0 : N → N.
✓ 2.10 Suppose that f : N → N is a total computable function. Use Church’s Thesis
to argue that this function is computable.
(
0 – if n is in the range of f
h(n) =
↑ – otherwise
✓ 2.12 If you allow processes to take infinitely many steps then you can have all
kinds of fun. Suppose that you have infinitely many dollars. Feeling flush you go
to a bar. The Devil is there. He proposes an infinite sequence of transactions, in
each of which he will hand you two dollars and take from you one dollar. (The
first will take 1/2 hour, the second 1/4 hour, etc.) You figure you can’t lose. But
he proves to be particular about the order in which you exchange bills. First he
Section 3. Recursion 21
Section
I.3 Recursion
In the 1930’s researchers other than Turing also saw the need to make precise
the notion of mechanical computability. Here we will outline an approach that is
different than Turing’s, both to give a sense of another approach and because we
will find it useful.†
This approach has a classical mathematics flavor. It lists initial functions that
are intuitively mechanically computable, along with intuitively computable ways
to combine existing functions, to make new functions from old. An example is that
one effective initial function is successor S : N → N described by S (x) = x + 1,
and an effective combiner is function composition. Then the composition S ◦ S ,
the plus-two operation, is also intuitively mechanically computable.
We now introduce another combiner that is intuitively mechanically computable.
Hermann Grass-
22 Chapter I. Mechanical Computation
(define (successor x)
(+ x 1))
(The (let ..) creates the local variable z.) The same is true for product and
power.
(define (product x y)
(let ((z (- y 1)))
(if (= y 0)
0
(plus (product x z) x))))
(define (power x y)
(let ((z (- y 1)))
(if (= y 0)
1
(product (power x z) x))))
‡
A schema is an underlying organizational pattern or structure.
24 Chapter I. Mechanical Computation
In the terms of Definition 3.2, д(x 0 ) = x 0 and h(w, x 0 , z) = pred(w); the bookkeep-
ing works since the arity of д is one less than the arity of f , and, because h has
dummy arguments, its arity is one more than the arity of f .
The computer code above make clear that primitive recursion fits into the plan
of specifying combiners that preserve the property of effectiveness: if д and h are
effective then so is f .
3.6 Definition The set of primitive recursive functions consists of those that can be
derived from the initial operations of the zero function Z (®
x) = Z (x 0 , ... x n−1 ) = 0,
x) = x + 1, and the projection† functions I i (®
the successor function S (® x) = x i , by a
finite number of applications of the combining operations of function composition
and primitive recursion.
Function composition covers not just the simple case of two functions f and д
whose composition is defined by f ◦ д (® x) = f (д(® x)). It also covers the case
of simultaneous substitution, where from f (x 0 , ... , x n ) and h 0 (y1 , ... , ym0 ), . . .,
hn (y1 , ... , ymn ), we get f (h 0 (y0, 0 , ... , y0,m0 ), ... , hn (yn, 0 , ... , yn,mn )), which is a
function with (m 0 + 1) + · · · + (mn + 1)-many inputs.
Besides addition and proper subtraction, we commonly use many other primitive
recursive functions such as finding remainders and testing for less-than. See the
exercises for these. The list is so extensive that a person could wonder whether
every mechanically computed function is primitive recursive. The next section
shows that the answer is no, that there are intuitively mechanically computable
functions that are not primitive recursive.
I.3 Exercises
✓ 3.7 What is the difference between primitive recursion and primitive recursive?
3.8 What is the difference between total recursive and primitive recursive?
3.9 In defining 00 there is a conflict between the desire to have that every power
of 0 is 0 and the desire to have that every number to the 0 power is 1. What does
the definition of power given above do?
†
There are infinitely many projections, one for each pair of natural numbers n, i . Projection is a
generalization of the identity function, which is why we use the use the letter I .
Section 3. Recursion 25
✓ 3.10 As the section body describes, recursion doesn’t have to be logically problem-
atic. But some recursions are; consider this one.
(
0 – if n = 0
f (n) =
f (2n − 2) – otherwise
where rem(a, b) is the remainder when a is divided by b . Note that this fits
Section 3. Recursion 27
7 if x = 1 7 if x = 1 and y = 2
m(x) = 9 if x = 5 n(x, y) = 9 if x = 5 and y = 5
0 otherwise 0 otherwise
28 Chapter I. Mechanical Computation
✓ 3.22 We will show that the function rem(a, b) giving the remainder when a is
divided by b is primitive recursive.
(a) Fill in this table.
a 0 1 2 3 4 5 6 7
rem(a, 3)
(b) Observe that rem(a + 1, 3) = rem(a) + 1 for many of the entries. When is
this relationship not true?
(c) Fill in the blanks.
– if a = 0
rem(a, 3) = – if a = S(z) and rem(z, 3) + 1 = 3
– if a = S(z) and rem(z, 3) + 1 , 3
(d) Show that rem(a, 3) is primitive recursive. You can use the prior item, along
with any functions shown to be primitive recursive in the section body,
Exercise 3.20 and Exercise 3.21. (Compared with Definition 3.2, here the two
arguments are switched, which is only a typographic difference.)
(e) Extend the prior item to show that rem(a, b) is primitive recursive.
3.23 The function div : N2 → N gives the integer part of the division of the first
argument by the second. Thus, div(5, 3) = 1 and div(10, 3) = 3.
(a) Fill in this table.
a 0 1 2 3 4 5 6 7 8 9 10
div(a, 3)
(b) Much of the time div(a + 1, 3) = div(a, 3). Under what circumstance does it
not happen?
(c) Show that div(a, 3) is primitive recursive. You can use the prior exercise,
along with any functions shown to be primitive recursive in the section body,
Exercise 3.20 and Exercise 3.21. (Compared with Definition 3.2, here the two
arguments are switched, which is only a difference of appearance.)
(d) Show that div(a, b) is primitive recursive.
3.24 Show that each of these is primitive recursive. You may use any function
shown to be primitive recursive in the section body, in the prior exercise, or in a
prior item.
(a) Bounded sum function: the partial sums of a series whose terms д(i) are
given by a primitive recursive function, Sд (y) = 0 ≤i <y д(i) = д(0) + д(1) +
Í
· · · + д(y − 1) (the sum of zero-many terms is Sд (0) = 0). Contrast this with
the final item of the prior question; here the number of summands is finite
but not fixed.
(b) Bounded product function: the partial products of a series whose terms
д(i) are given by a primitive recursive function, Pд (y) = 0 ≤i <y д(i) =
Î
д(0) · д(1) · · · д(y − 1) (the product of zero-many terms is Pд (0) = 1).
Section 3. Recursion 29
(c) Bounded minimization: let m ∈ N and let p(® x, i) be a predicate. Then the
minimization operator M(® x, i), typically written µ m i[p(®x, i)], returns the
x, i) = 0, or else returns m. Hint: Consider the
smallest i ≤ m such that p(®
bounded sum of the bounded products of the predicates.
3.25 Show that each is a primitive recursive function. You can use functions from
this section or functions from the prior exercises.
(a) Bounded universal quantification: suppose that m ∈ N and that p(® x, i) is
a predicate. Then U (® x, m), typically written ∀i ≤m p(® x, i), has value 1 if
x, 0) = 1, ... , p(®
p(® x, m) = 1 and value 0 otherwise. (The point of writing
the functional expression U (® x, m) is to emphasize the required uniformity.
Stating one formula for the m = 1 case, p(® x, 0) · p(®
x, 1), and another for the
m = 2 case, p(® x, 0) · p(®
x, 1) · p(®
x, 2), etc., is the best we can do. We can get
a single derivation, that follows the rules in Definition 3.6, and that works
for all m .)
(b) Bounded existential quantification: let m ∈ N and let p(® x, i) be a predi-
cate. Then A(® x, m), typically written ∃i ≤m p(® x, i), has value 1 if p(® x, 0) =
0, ... , p(®
x, m) = 0 is not true, and has value 0 otherwise.
(c) Divides predicate: where x, y ∈ N we have divides(x, y) if there is some k ∈ N
with y = x · k .
(d) Primality predicate: prime(y) if y has no nontrivial divisor.
3.26 The floor function f (x/y) = ⌊x/y⌋ returns the largest natural number less
than or equal to x/y . Show that it is primitive recursive. Hint: you may use
any function defined in the section or stated in a prior exercise but bounded
minimization is the place to start.
3.27 In 1202 Fibonacci asked: A certain man put a pair of rabbits in a place
surrounded on all sides by a wall. How many pairs of rabbits can be produced from
that pair in a year if it is supposed that every month each pair begets a new pair
which from the second month on becomes productive? This leads to a recurrence.
(
1 – if n = 0 or n = 1
F (n) =
F (n − 1) + F (n − 2) – otherwise
(a) Compute F (0) through F (10). (Note: this is not now in a form that matches
the primitive recursion schema, although we could rewrite it that way using
Exercise 3.20 and Exercise 3.24.)
(b) Show that F is primitive recursive. You may use the results from earlier,
including Exercise 3.20, 3.21, 3.24, and 3.25.
3.28 Let C(x, y) = 0 + 1 + 2 + · · · + (x + y) + y .
(a) Make a table of the values of C(x, y) for 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4.
(b) Show that C(x, y) is primitive recursive. You can use the functions shown
to be primitive recursive in the section body, along with Exercise 3.20,
Exercise 3.20, Exercise 3.25, and Exercise 3.25.
30 Chapter I. Mechanical Computation
3.29 Pascal’s Triangle gives the coefficients of the powers of x in the expansion
of (x + 1)n . For example, (x + 1)2 = x 2 + 2x + 1 and row two of the triangle is
⟨1, 2, 1⟩ . This recurrence gives the value at row n, entry m , where m, n ∈ N.
0 – if m > n
P(n, m) = 1 – if m = 0 or m = n
P(n − 1, m) + P(n − 1, m − 1) – otherwise
(a) Compute P(3, 2).
(b) Compute the other entries from row three: P(3, 0), P(3, 1), and P(3, 3).
(c) Compute the entries in row four.
(d) Show that this is primitive recursive. You may use the results from Exer-
cise 3.20 and Exercise 3.24.
✓ 3.30 This is McCarthy’s 91 function.
(
M(M(x + 11)) – if x ≤ 100
M(x) =
x − 10 – if x > 100
(a) What is the output for inputs x ∈ { 0, ... 101 }? For larger inputs? (You may
want to write a small script.)
(b) Use the prior item to show that this function is primitive recursive. You may
use the results from Exercise 3.20.
3.31 Show that every primitive recursive function is total.
3.32 Let д, h be natural number functions (that are total). Where f is defined by
primitive recursion from д and h , show that f is well-defined. That is, show that
if two functions both satisfy Definition 3.2 then they are equal, so the same inputs
they will yield the same outputs.
Section
I.4 General recursion
Every primitive recursive function is intuitively mechanically computable. What
about the converse: is every intuitively mechanically computable function primitive
recursive? In this section we will answer ‘no’.†
Ackermann functions One reason to think that there are functions that are
intuitively mechanically computable but are not primitive recursive is that some
mechanically computable functions are partial, meaning that they do not have an
output for some inputs, but all primitive recursive functions are total.
We could try to patch this, perhaps with: for any f that is intuitively mechanically
computable consider the function fˆ whose output is 0 if f (x) is not defined, and
†
That’s why the diminutive ‘primitive’ is in the name — while the class is interesting and important, it
isn’t big enough to contain every effective function.
Section 4. General recursion 31
x + y = S ( S ( · · · S (x))) x ·y = x +x + ··· +x xy = x · x · · · · · x
| {z } | {z } | {z }
y many y many y many
The pattern shows in the ‘otherwise’ lines. Each one satisfies that Hn (x, y) =
Hn−1 (x, Hn (x, y − 1)). Because of this pattern we call each Hn the level n function,
so that addition is the level 1 operation, multiplication is the level 2 operation, and
exponentiation is level 3. These ‘otherwise’ lines step the function up from level to
level. The definition below takes n as a parameter, writing H(n, x, y) in place of
Hn (x, y), to get all the levels into one formula.
32 Chapter I. Mechanical Computation
y+1 – if n = 0
– if n = 1 and y = 0
x
H(n, x, y) = 0 – if n = 2 and y = 0
1 – if n > 2 and y = 0
H(n − 1, x, H(n, x, y − 1)) – otherwise
4.3 Remark Level 4, the level above exponentiation, is tetration. The first few values
are H4 (x, 0) = H3 (x, 1) = x 1 = x , and H4 (x, 1) = H3 (x, H4 (x, 0)) = x 1 = x , and
H4 (x, 2) = H3 (x, H4 (x, 1)) = x x , as well as these two.
x xx
H4 (x, 3) = H3 (x, H4 (x, 2)) = x x H4 (x, 4) = x x
The problem is not that the arguments are in a different order; that is cosmetic. The
reason H does not work as h is that the definition of primitive recursive function,
Definition 3.2, requires that h be a function for which we already have a primitive
recursive derivation.
Of course, just because one definition has the wrong form doesn’t mean
that there is no definition with the right form. However, Ackermann†
proved that there isn’t, that H is not primitive recursive. The proof is a
detour for us so it is in an Extra Section but in summary: H grows faster
than any primitive recursive function. That is, for any primitive recursive
function f of three inputs, there is a sufficiently large N ∈ N such that
for all n, x, y ∈ N, if n, x, y > N then H(n, x, y) > f (n, x, y). This proof is
about uniformity. At every level, the function Hn is primitive recursive but Wilhelm
Acker-
no primitive recursive function encompasses all levels at once — there is no
mann
single, uniform, primitive recursive way to compute them all.
1896–1962
4.4 Theorem The hyperoperation H is not primitive recursive.
This relates to a point from the discussion of Church’s Thesis. We have
observed that if a function is primitive recursive then it is intuitively mechanically
computable. We have built a pile of natural and interesting functions that are
intuitively mechanically computable, and demonstrated that they are primitive
recursive. So ‘primitive recursive’ may seem to have many of the same characteristics
as ‘Turing machine computable’. The difference is that we now have an intuitively
mechanically computable function that is not primitive recursive. That is, ‘primitive
recursive’ fails the test that in the Church’s Thesis discussion we called coverage.
To cover all mechanically computable functions under a recursive rubric we need
to expand from primitive recursive functions to a larger set.
µ recursion The right direction is hinted at in Exercise 3.24 and Exercise 3.25.
Primitive recursion does bounded operations. We can show that a programming
language having only bounded loops computes all of the primitive recursive
functions; see the Extra section. To include every function that is intuitively
mechanically computable we must add unbounded operations.
4.5 Definition Suppose that д : Nn+1 → N is total, so that for every input n -
tuple there is a defined output number. Then f : Nn → N is defined from д by
minimization or µ -recursion, written f (® x, y) = 0 ,† if f (®
x) = µy д(® x) is the the
least number y such that д(®
x, y) = 0.
†
We have seen Ackermann already, as one of the people who stated the Entscheidungsproblem. Functions
having the same recursion as H are Ackermann functions. † Recall that x® abbreviates x 0, ... x n−1 .
34 Chapter I. Mechanical Computation
y 0 1 2 3 4 5 6 7 8 9
p(y) 41 43 47 53 61 71 83 97 113 131
We could think to test this with a program that searches for non-primes by trying
p(0), p(1) . . . Start with a function that computes quadratic polynomials,
x, y) = p(x 0 , x 1 , x 2 , y) = x 2y 2 + x 1y + x 0 and consider a test for the primality of
p(®
the output. (
0 – if p(®
x, y) is prime
д(®
x, y) =
1 – otherwise
(prime-helper n 2))
Now, this is д.
(define (g-sub-p x0 x1 x2 y)
(prime? (p x0 x1 x2 y)))
It is called g-sub-p because p is hard-coded into the source. Likewise the search
routine has g-sub-p baked in. That is the point the definition makes with “ f is
defined from д.”
(define (f-sub-g x0 x1 x2)
(define (f-sub-g-helper y)
(if (= 0 (g-sub-p x0 x1 x2 y))
y
(f-sub-g-helper (add1 y))))
With that, the search function finds that the polynomial above returns some
non-primes.
> (f-sub-g 1 1 41)
40
I.4 Exercises
Some of these have answers that are tedious to compute. You should use a computer,
for instance by writing a script or using Sage.
✓ 4.9 Find the value of H4 (2, 0), H4 (2, 1), H4 (2, 2), H4 (2, 3), and H4 (2, 4).
4.10 Graph H1 (2, y) up to y = 9. Also graph H2 (2, y) and H3 (2, y) over the same
range. Put all three plots on the same axes.
✓ 4.11 How many years is H4 (3, 3) seconds?
4.12 What is the ratio H3 (3, 3)/H2 (2, 2)?
✓ 4.13 Finish the proof of Lemma 4.2 by verifying that H2 (x, y) = x · y and
H3 (x, y) = x y .
4.14 This variant of H is often labeled “the” Ackermann function.
y + 1 – if k = 0
A(k, y) = A(k − 1, 1) – if y = 0 and k > 0
A(k − 1, A(k, y − 1)) – otherwise
It has different boundary conditions but the same recursion, the same bottom line.
(In general, any function with that recursion is an Ackermann function. More
about this variant is on Extra D.) Compute A(k, y) for 0 ≤ k < 4 and 0 ≤ y < 6.
4.15 Prove that the computation of H(n, x, y) always terminates.
36 Chapter I. Mechanical Computation
4.16 In defining general recursive functions, Definition 4.8, we get all computable
functions by starting with the primitive recursive functions and adding minimiza-
tion. What if instead of minimization we had added Ackermann’s function; would
we then have all computable functions?
✓ 4.17 Let д(x, y) = x + y and let f (x) = µy д(x, y) = 100 . For each, find the
value or say that it is not defined. (a) f (0) (b) f (1) (c) f (50) (d) f (100)
(e) f (101) Give an expression for f that does not include µ -recursion.
4.18 Let д(x, y) = ⌈(x + 1)/(y + 1) − 1⌉ and let f (x) = µy д(x, y) = 0 .
(a) Find f (x) for 0 ≤ x < 6.
(b) Give a description of f that does not use µ -recursion.
4.19 (a) Prove that the function remtwo : N → { 0, 1 } giving the remainder on
division by two is primitive recursive.
(b) Use that to prove that this function is µ -recursive: f (n) = 1 if n is even, and
f (n)↑ if n is odd.
✓ 4.20 Consider the Turing machine P = {q 0 B1q 1 , q 0 1Rq 0 , q 1 BRq 2 , q 1 1Lq 1 }. De-
fine д(x, y) = 0 if the machine P , when started on a tape that is blank except for
x -many consecutive 1’s and with the head under the leftmost 1, has halted after
step y . Otherwise, д(x, y) = 1. Find f (x) = µy д(x, y) = 0 for x < 6.
✓ 4.21 Define д(x, y) by: start P = {q 0 B1q 2 , q 0 1Lq 1 , q 1 B1q 2 , q 1 11q 2 } on a tape
that is blank except for x -many consecutive 1’s and with the head under the
leftmost 1. If Phas halted after step y then д(x, y) = 0 and otherwise д(x, y) = 1.
Let f (x) = µy д(x, y) = 0 . Find f (x) for x < 6. (This machine does the same
task as the one in the prior exercise, but faster.)
4.22 Consider this Turing machine.
Let д(x, y) = 0 if this machine, when started on a tape that is all blank except for
x -many consecutive 1’s and with the head under the leftmost 1, has halted after
y steps. Otherwise, д(x, y) = 1. Let f (x) = µy д(x, y) = 0 . Find: (a) f (0)
(b) f (1) (c) f (2) (d) f (x).
✓ 4.23 Define (
n/2 – if n is even
h(n) =
3n + 1 – else
and let H (n, k) be the k -fold composition of h with itself, so H (n, 1) = h(n),
H (n, 2) = h ◦ h (n), H (n, 3) = h ◦ h ◦ h (n), etc. (We can take
H (n, 0) = 0,
although its value isn’t interesting.) Let C(n) = µk H (n, k) = 1 .
(a) Compute H (4, 1), H (4, 2), and H (4, 3).
(b) Find C(4), if it is defined.
(c) Find C(5), it is defined.
(d) Find C(11), it is defined.
Extra A. Turing machine simulator 37
Extra
I.A Turing machine simulator
Writing code to simulate a Turing Machine is a reasonable programming project.
Here we exhibit an implementation. It has three design goals. The main one is
to track closely the description of the action of a Turing machine on section 1.
Secondary goals are to output a picture of the configuration after each step, and to
be easy to understand for a reader new to Racket.
We earlier saw this Turing machine that computes the predecessor function.
Thus the simulator for any particular Turing machine is really the pair consisting
of the code shown below along with this machine’s file description.
The data structure for a Turing machine is the simplest one, a list of instructions.
For the instructions, the program converts each of the above six lines into a list with
four members, a number, two characters, and a number. Thus, a Turing machine
is stored as a list of lists. The above machine is this (the line break is there only to
make it fit in the margins).
'((0 #\B #\L 1) (0 #\1 #\R 0) (1 #\B #\L 2) (1 #\1 #\B 1)
(2 #\B #\R 3) (2 #\1 #\L 2))
we define a configuration.
;; A configuration is a list of four things:
;; the current state, as a natural number
;; the symbol being read, a character
;; the contents of the tape to the left of the head, as a list of characters
;; the contents of the tape to the right of the head, as a list of characters
(define (make-config state char left-tape-list right-tape-list)
(list state char left-tape-list right-tape-list))
(The Racket function findf searches through tm for a member on which delta-test
returns a value of true.)
Turing machine work discretely, step by step. If there is no relevant instruction
then the machine halts, and otherwise it moves one cell left, one cell right, or
writes one character.
;; step Do one step; from a config and the tm, yield the next config
(define (step config tm)
(let* ([current-state (get-current-state config)]
[left-tape-list (get-left-tape-list config)]
[current-symbol (get-current-symbol config)]
[right-tape-list (get-right-tape-list config)]
[action-next-state (delta tm current-state current-symbol)]
[action (first action-next-state)]
[next-state (second action-next-state)])
(cond
[(char=? LEFT action) (move-left config next-state)]
[(char=? RIGHT action) (move-right config next-state)]
[else (make-config next-state
action ;; not L or R so it is in tape alphabet
left-tape-list
right-tape-list)])))
Because moving left and right are more complicated, they are in separate routines.
;; tape-right-char Return the element nearest the head on the right side
(define (tape-right-char right-tape-list)
(if (empty? right-tape-list)
BLANK
(car right-tape-list)))
;; tape-right-pop Return the right tape list without char nearest the head
(define (tape-right-pop right-tape-list)
(if (empty? right-tape-list)
'()
Extra A. Turing machine simulator 39
(cdr right-tape-list)))
;; tape-left-pop Return the left tape list without char nearest the head
(define (tape-left-pop left-tape-list)
(reverse (tape-right-pop (reverse left-tape-list))))
The execute routine calls the following one to give a simple picture of the
machine, showing the state number and the tape contents, with the current symbol
displayed between asterisks.
;; configuration-> string Return a string showing the tape
(define (configuration->string config)
(let* ([state-number (get-current-state config)]
[state-string (string-append "q" (number->string state-number))]
[left-tape (list->string (get-left-tape-list config))]
[current (string #\* (get-current-symbol config) #\*)] ;; wrap *'s
[right-tape (list->string (get-right-tape-list config))])
Besides the prior routine, the implementation has other code to do dull things
such as reading the lines from the file and converting them to instruction lists.
40 Chapter I. Mechanical Computation
(define (current-state-string->number s)
(if (eq? #\( (string-ref s 0)) ;; allow instr to start with (
(string->number (substring s 1))
(string->number s)))
(define (current-symbol-string->char s)
(string-ref s 0))
(define (action-symbol-string->char s)
(string-ref s 0))
(define (next-state-string->number s)
(if (eq? #\) (string-ref s (- (string-length s) 1))) ;; ends with )?
(string->number (substring s 0 (- (string-length s) 1)))
(string->number s)))
(define (string->instruction s)
(let* ([instruction (string-split (string-trim s))]
[current-state (current-state-string->number (first instruction))]
[current-symbol (current-symbol-string->char (second instruction))]
[action (action-symbol-string->char (third instruction))]
[next-state (next-state-string->number (fourth instruction))])
(list current-state
current-symbol
action
next-state)))
And, there is a bit more code for getting the file name from the command line, etc.,
that does not bear at all on simulating a Turing machine so we will leave it aside.
Below is a run of the simulator, with its command line invocation. It takes
input from the file pred.tm shown earlier. When the machine starts the input is
111, with a current symbol of 1 and the tape to the right of 11 (the tape to the left
is empty).
$ ./turing-machine.rkt -f machines/pred.tm -c "1" -r "11"
step 0: q0: *1*11
step 1: q0: 1*1*1
step 2: q0: 11*1*
step 3: q0: 111*B*
step 4: q1: 11*1*B
step 5: q1: 11*B*B
step 6: q2: 1*1*BB
step 7: q2: *1*1BB
step 8: q2: *B*11BB
step 9: q3: B*1*1BB
step 10: HALT
The output is crude but good enough for small scale experiments.
I.A Exercises
A.1 Run the simulator on the predecessor machine Ppred starting with five 1’s.
Also run it on an empty tape.
A.2 Run the simulator on Example 1.2’s Padd to add 1 + 2. Also simulate 0 + 2
and 0 + 0.
A.3 Write a Turing machine to perform the operation of adding 3, so that given
as input a tape containing only a string of n consecutive 1’s, it returns a tape with
a string of n + 3 consecutive 1’s. Follow our convention that when the program
starts and ends the head is under the first 1. Run it on the simulator, with an
input of 4 consecutive 1’s, and also with an empty tape.
Extra B. Hardware 41
A.4 Write a machine to decide if the input contains the substring 010. Fix
Σ = { 0, 1, B }. The machine starts with the tape blank except for a contiguous
string of 0’s and 1’s, and with the head under the first non-blank symbol. When
it finishes, the tape will have either just a 1 if the input contained the desired
substring, or otherwise just a 0. We will do this in stages, building a few of what
amounts to subroutines.
(a) Write instructions, starting in state q 10 , so that if initially the machine’s head
is under the first of a sequence of non-blank entries then at the end the head
will be to the right of the final such entry.
(b) Write a sequence of instructions, starting in state q 20 , so that if initially the
head is just to the right of a sequence of non-blank entries, then at the end
all entries are blank.
(c) Write the full machine, including linking in the prior items.
A.5 Modify the simulator so that it can run for a limited number of steps.
(a) Modify it so that, given k ∈ N, if the Turing machine hasn’t halted after k
steps then the simulator stops.
(b) Do the same, but replace k with a function (k n) where n is the input
number (assume the machine’s input is a string of 1’s).
Extra
I.B Hardware
computer, these would stand for different voltage levels. For both tables, inputs
are on the left while outputs are on the right.
not P P and Q P or Q
P ¬P P Q P ∧Q P ∨Q
0 1 0 0 0 0
1 0 0 1 0 1
1 0 0 1
1 1 1 1
Thus, where ‘7 is odd’ is P , and ‘8 is prime’ is Q , get the value of the conjunction ‘7
is odd and 8 is prime’ from the third line of the right-hand table: 0. The value of
the disjunction ‘7 is odd or 8 is prime’ is 1.
Truth tables help us work out the behavior of complex propositional logic state-
ments, by building them up from their components. This shows the input/output
behavior of the statement (P ∨ Q) ∧ ¬(P ∨ (R ∧ Q)).
There are operators other than ‘not’, ‘and’, and ‘or’ but an advantage of the set of
these three is that with them we can reverse the activity of the prior paragraph: we
can go from a table to a statement with that table. That is, we can go from from a
specified input-output behavior to a statement with that behavior.
Below are two examples. To make a statement with the behavior shown in the
table on the left, focus on the row with output 1. It is the row where P is false and
Q is false. Therefore the statement ¬P ∧ ¬Q makes this row take on value 1 and
every other row take on value 0.
P Q P Q R
0 0 1 0 0 0 0
0 1 0 0 0 1 1
1 0 0 0 1 0 1
1 1 0 0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
Extra B. Hardware 43
Next consider the table on the right and again focus on the rows with 1’s. Target
the second row with ¬P ∧ ¬Q ∧ R . For the third row use ¬P ∧ Q ∧ ¬R and for the
fifth row use P ∧ ¬Q ∧ ¬R . To finish, put these parts together with ∨’s to get the
overall statement.
Thus, we can produce statements with any desired behavior. Statements of this
form, clauses connected by ∨’s, where inside each clause the statement is built
from ∧’s, are in disjunctive normal form. (Also commonly used is conjunctive
normal form, where statements consist of clauses connected by ∧’s and each clause
uses only ∨’s as binary connectives.)
Now we translate those statements into physical devices. We can
use electronic devices, called gates, that perform logical operations on
signals (for this discussion we will take a signal to be the presence of
5 volts). The observation that you can use this form of a propositional
logic statement to systematically design logic circuits was made by
C Shannon in his master’s thesis. On the left below is the schematic
symbol for an and gate with two input wires and one output wire,
whose behavior is that a signal only appears on the output if there Claude Shannon
is a signal on both inputs. Symbolized in the middle is an or gate, 1916–2001
where there is signal out if either input has a signal. On the right is a
not gate.
PQR
constructed internally, and in particular can wonder how a not gate is possible;
isn’t having voltage out when there is no voltage in creating energy out of nothing?
The answer is that the descriptions above abstract out that issue. Here is the
internal construction of a kind of not gate.
G 5 volts
Vout
Vin S
On the right is a battery, which we shall see provides the extra voltage. On the top
left, shown as a wiggle, is a resistor. When current is flowing around the circuit,
this resistor regulates the power output from the battery.
On the bottom left, shown with the circle, is a transistor. This is a semiconductor,
with the property that if there is enough voltage between G and S then this
component allows current from the battery to flow through the D-S line. (Because
it is sometimes open and sometimes closed it is depicted as a switch, although
internally it has no moving parts.) This transistor is manufactured such that an
input voltage Vin of 5 volts will trigger this event.
To verify that this circuit inverts the signal, assume first that Vin = 0. Then
there is is a gap between D and S so no current flows. With no current the resistor
provides no voltage drop. Consequently the output voltage Vout across the gap is
all of the voltage supplied by the battery, 5 volts. So Vin = 0 results in Vout = 5.
Conversely, now assume that Vin = 5. Then the gap disappears, the current
flows between D and S, the resistor drops the voltage, and the output is Vout = 0.
Thus, for this device the voltage out Vout is the opposite of the voltage in Vin .
And, when Vin = 0 the output voltage of 5 doesn’t come from nowhere; it is from
the battery.
I.B Exercises
B.1 Make a truth table for each of these propositions. (a) (P ∧Q)∧R (b) P ∧(Q∧R)
(c) P ∧ (Q ∨ R) (d) (P ∧ Q) ∨ (P ∧ R)
B.2 Make a truth table for these. (a) ¬(P ∨ Q) (b) ¬P ∧ ¬Q (c) ¬(P ∧ Q)
(d) ¬P ∨ ¬Q
B.3 (a) Make a three-input table for the behavior: the output is 1 if a majority of
the inputs are 1’s. (b) Draw the circuit.
B.4 For the table below, construct a disjunctive normal form propositional logic
statement and use that to make a circuit.
Extra C. Game of Life 45
P Q
0 0 0
0 1 1
1 0 1
1 1 0
B.5 For the tables below, construct a disjunctive normal form propositional
logic statement and use that to make a circuit. (a) the table on the left,
(b) the one on the right.
P Q R P Q R
0 0 0 0 0 0 0 1
0 0 1 1 0 0 1 0
0 1 0 1 0 1 0 1
0 1 1 0 0 1 1 0
1 0 0 0 1 0 0 0
1 0 1 1 1 0 1 0
1 1 0 0 1 1 0 1
1 1 1 0 1 1 1 1
B.6 One propositional logic operator that was not covered in the description is
Exclusive Or XOR. It is defined by: P XOR Q is T if P , Q , and is F otherwise.
Make a truth table, from it construct a disjunctive normal form propositional logic
statement, and use that to make a circuit.
B.7 Construct a circuit with the behavior specified in the tables below: (a) the
table on the left, (b) the one on the right.
P Q R P Q R
0 0 0 1 0 0 0 1
0 0 1 0 0 0 1 0
0 1 0 0 0 1 0 1
0 1 1 0 0 1 1 0
1 0 0 0 1 0 0 0
1 0 1 1 1 0 1 1
1 1 0 0 1 1 0 1
1 1 1 0 1 1 1 0
B.8 The most natural way to add two binary numbers works like the grade school
addition algorithm. Start at the right with the one’s column. Add those two and
possibly carry a 1 to the next column. Then add down the next column, including
any carry. Repeat this from right to left.
(a) Use this method to add the two binary numbers 1011 and 1101.
(b) Make a truth table giving the desired behavior in adding the numbers in a
column. It must have three inputs because of the possibility of a carry. It
must also have two output columns, one for the total and one for the carry.
(c) Draw the circuits.
46 Chapter I. Mechanical Computation
Extra
I.C Game of Life
John von Neumann was one of the twentieth century’s most influ-
ential mathematicians. One of the many things he studied was the
problem of humans on Mars. He thought that to colonize Mars we
should first send robots. Mars is red because it is full of iron oxide,
rust. Robots could mine that rust, break it into iron and oxygen,
and release the oxygen into the atmosphere. With all the iron, the
robots could make more robots. So von Neumann was thinking
about making machines that could self-reproduce. (We will study
more about self-reproduction later.)
His thinking, along with a suggestion from S Ulam, who was
John von Neumann
studying crystal growth, led him to use a cell-based approach. So 1903-1957
von Neumann laid out some computational devices in a grid of
interconnections, making a cellular automaton.
Interest in cellular automata greatly increased with the appearance of the Game
of Life, by J Conway. It was featured in an October 1970 magazine column of
Scientific American. The rules of the game are simple enough that a person could
immediately take out a pencil and start experimenting. Lots of people did. When
personal computers came out, Life became one of the earliest computer crazes,
since it is easy for a beginner to program.
To start, draw a two-dimensional grid of square cells, like graph
paper or a chess board. The game proceeds in stages, or generations.
At each generation each cell is either alive or dead. Each cell has eight
neighbors, the ones that are horizontally, vertically, or diagonally
adjacent. The state of a cell in the next generation is determined by:
(i) a live cell with two or three live neighbors will again be live at
the next generation but any other live cell dies, (ii) a dead cell with
John Conway 1937–
exactly three live neighbors becomes alive at the next generation but
2020
other dead cells stay dead. (The backstory goes that live cells will
die if they are either isolated or overcrowded, while if the environment is just right
then life can spread.) To begin, we seed the board with some initial pattern.
As Gardner noted, the rules of the game balance tedious simplicity against
impenetrable complexity.
The basic idea is to start with a simple configuration of counters (organisms), one to a
cell, then observe how it changes as you apply Conway’s “genetic laws” for births, deaths,
and survivals. Conway chose his rules carefully, after a long period of experimentation,
to meet three desiderata:
1. There should be no initial pattern for which there is a simple proof that the
population can grow without limit.
2. There should be initial patterns that apparently do grow without limit.
3. There should be simple initial patterns that grow and change for a considerable
period of time before coming to end in three possible ways: fading away completely
Extra C. Game of Life 47
Generation 0 Generation 1
The pictures show the part of the game board containing the cells that are alive.
Two generations suffice to show everything that happens, which isn’t much.
Some other patterns don’t die, but don’t do much of anything, either. This is a
block. It is stable from generation to generation.
Generation 0 Generation 1
Because it doesn’t change, Conway calls this a “still life.” Another still life is the
beehive.
Generation 0 Generation 1
But many patterns are not still. This three-cell pattern, the blinker, does a
simple oscillation.
Other patterns move. This is a glider, the most famous pattern in Life.
48 Chapter I. Mechanical Computation
It moves one cell vertically and one horizontally every four generations, crawling
across the screen.
When Conway came up with the Life rules, he was not sure whether there is a
pattern where the total number of live cells keeps on growing. Bill Gosper showed
that there is, by building the glider gun which produces a new glider every thirty
generations.
The glider pattern an example of a spaceship, a pattern that reappears, displaced,
after a number of generations. Here is another, the medium weight spaceship.
Another important pattern is the eater, which eats gliders and other spaceships.
I.C Exercises
C.4 A methuselah is a small pattern that stabilizes only after a long time. This
pattern is a rabbit. How long does it take to stabilize?
C.5 How many 3 × 3 blocks are there? 4 × 4? Write a program that inputs a
dimension n and returns the number of n × n blocks.
C.6 How many of the 3 × 3 patterns will result in any cells on the board that
survive into the next generation? That survive ten generations?
C.7 Write code that takes in a number of rows n , a number of columns m and a
number of generations i , and returns how many of the n × m patterns will result
in any surviving cells after i generations.
Extra
I.D Ackermann’s function is not primitive recursive
We have see that the hyperoperation, whose definition is repeated below, is the
natural generalization of successor, addition, multiplication, etc.
y+1 – if n = 0
– if n = 1 and y = 0
x
H(n, x, y) = 0 – if n = 2 and y = 0
1 – if n > 2 and y = 0
H(n − 1, x, H(n, x, y − 1)) – otherwise
We have quoted a result that H, while intuitively mechanically computable,
is not primitive recursive. The details of the proof are awkward. For technical
convenience we will instead show that a closely related function, which is also
intuitively mechanically computable, is not primitive recursive.
50 Chapter I. Mechanical Computation
y + 1 – if k = 0
A(k, y) = A(k − 1, 1) – if y = 0 and k > 0
A(k − 1, A(k, y − 1)) – otherwise
Rózsa Péter 1905–
Any function based on the recursion in the bottom line is called an 1977
Ackermann function.† We will prove that A is not primitive recursive.
Since the new function has only two variables we can show a table.
y=0 y=1 y=2 y=3 y=4 y=5
k =0 1 2 3 4 5 6 ...
k =1 2 3 4 5 6 7 ...
k =2 3 5 7 9 11 13 ...
k =3 5 13 29 61 125 253 ...
k =4 13 65 533 ...
The next two entries give a sense of the growth rate of this function.
65536
A(4, 2) = 265536 − 3 A(4, 3) = 2(2 )
−3
Those are big numbers.
D.1 Lemma (a) A(k, y) > y
(b) A(k, y + 1) > A(k, y), and in general if ŷ > y then A(k, ŷ) > A(k, y)
(c) A(k + 1, y) ≥ A(k, y + 1)
(d) A(k, y) > k
(e) A(k + 1, y) > A(k, y) and in general if k̂ > k then A(k̂, y) > A(k, y)
(f) A(k + 2, y) > A(k, 2y)
Proof We will verify the first item here and leave the the others as exercises. They
all proceed the same way, with an induction inside of an induction.
This is the first item. We will prove it by induction on k .
(∗)
∀k ∀y A(k, y) > y
The k = 0 base step is A(0, y) = y + 1 > y . For the inductive step, assume that
statement (∗) holds for k = 0, . . ., k = n and consider the k = n + 1 case.
We must verify this statement,
∀y A(n + 1, y) > y (∗∗)
†
There are many different Ackermann functions in the literature. A common one is the function of one
variable A(k, k ).
Extra D. Ackermann’s function is not primitive recursive 51
then f is level k + 3.
I.D Exercises
D.7 If expressed in base 10, how many digits are in A(4, 2) = 265536 − 3?
Extra E. LOOP programs 53
Extra
I.E LOOP programs
Very important: in loop x ... end, changes in the contents of register x while
the inside code is run do not alter the number of times that the machine steps
through that loop. Thus, when this loop ends the value in r0 will be twice what it
was at the start.
loop r0
r0 = r0 + 1
end
That is, if we load x into r0 and y into r1, and run the above routine, then the
output x −. y will be in r0.
To show that for each primitive recursive function there is a LOOP program,
we can show how to compute each initial function, and how to do the combining
operations of function composition and primitive recursion.
The zero function Z (x) = 0 is computed by the LOOP program whose single
line is r0 = 0. The successor function S (x) = x + 1 is computed by the one-line
r0 = r0 + 1. Projection I i (x 0 , ... x i , ... x n−1 ) = x i is computed by r0 = ri .
The composition of two functions is easy. Suppose that д(x 0 , ... x n ) and
f (y0 , ... ym ) are computed by LOOP programs Pд and P f , and that д is an m -output
function so that the composition f ◦ д is defined. Then concatenating, so that the
Extra E. LOOP programs 55
and produces f (h 0 (y0, 0 , ... y0,m0 ), ... hn (yn, 0 , ... yn,mn )). The issue is that were we
to load the sequence of inputs y0, 0 , . . . into r0, . . . and start computing h 0 then,
for one thing, there is a danger that it could overwrite the inputs for h 1 . So we
must do some machine language-like register manipulations to shuttle data in and
out as needed.
Specifically, let P f , Ph0 , . . . Phn compute the functions. Each uses a limited
number of registers so there is an index j large enough that no program uses
register j . By definition, the LOOP program P to compute the composition will
be given the sequence of inputs starting in register 0. The first step is to copy
these inputs to start in register j . Next, zero out the registers below register j ,
copy h 0 ’s arguments down to begin at r0 and run Ph0 . When it finishes, copy its
output above the final register holding the inputs (that is, to the register numbered
(m 0 + 1) + · · · (mn + 1)). Repeat for the rest of the hi ’s. Finish by copying the
outputs down to the initial registers, zeroing out the remaining registers, and
running P f .
The other combiner operation is primitive recursion.
(
д(x 0 , ... x k −1 ) – if y = 0
f (x 0 , ... x k −1 , y) =
h(f (x 0 , ... x k −1 , z), x 0 , ... x k −1 , z) – if y = S (z)
Suppose that we have LOOP programs Pд and Ph . The register swapping needed is
similar to what happens for composition so we won’t discuss it. The program P f
starts by running Pд . Then it sets a fresh register to 0; call that register t. Now it
enters a loop based on the register y (that is, successive times through the loop
count down as y , y − 1, etc.). The body of the loop computes f (x 0 , ... x k −1 , t + 1) =
h(f (x 0 , ... x k −1 , t), x 0 , ... x k −1 , t) by running Ph and then it increments t.
Thus if a function is primitive recursive then it is computed by a LOOP program.
The converse holds also, but proving it is beyond our scope.
We have an interpreter for the LOOP language with two interesting aspects.
The first is that we change the syntax, replacing the C-looking syntax above with a
LISP-ish one. For instance, we swap the syntax on the left for that on the right.
The advantage of this switch is that the parentheses automatically match the
56 Chapter I. Mechanical Computation
beginning of a loop with the matching end and so the interpreter that we write
will not need a stack to keep track of loop nesting.
This interpreter has registers r0, r1, . . . , that hold natural numbers. We keep
track of them in a list of pairs.
;; A register is a pair (name:symbol contents:natural number)
(define REGLIST '())
(define (show-regs) ; debugging
(write REGLIST) (newline))
(define (clear-regs!)
(set! REGLIST '()))
The last LOOP operation is loop itself. Such an instruction can have inside it
the body of an entire LOOP program.
(define (intr-loop pars)
(letrec ((reps (get-reg-value (car pars)))
(body (cdr pars))
(iter (lambda (rep)
(cond
((equal? rep 0) '())
(else (intr-body body)
Extra E. LOOP programs 57
Finally, there is the code to interpret a program, including initializing the the
registers so we can view the input-output behavior as computing a function.
;; The data is a list of the values to put in registers r0 r1 r2 ..
;; Value of a program is the value remaining in r0 at end.
(define (interpret progr data)
(init-regs data)
(intr-body progr)
(get-reg-value (make-reg-name 0)))
;; init-regs Initialize the registers r0, r1, r2, .. to the values in data
(define (init-regs data)
(define (init-regs-helper i data)
(if (null? data)
'()
(begin
(set-reg-value! (make-reg-name i) (car data))
(init-regs-helper (+ i 1) (cdr data)))))
(clear-regs!)
(set-reg-value! (make-reg-name 0) 0)
(init-regs-helper 0 data))
As given, this prints only the value of r0, which is all we shall need here.
Here is a sample usage. The LOOP program, in LISP syntax, is pe.
#;1> (load "loop.scm")
#;2> (define pe '((incr r0) (incr r0) (loop r0 (incr r0))))
#;3> (interpret pe '(5))
14
With an initial value of 5, after being incremented twice then r0 will have a value
of 7. So the loop runs seven times, each time incrementing r0, resulting in an
output value of 14.
We can now make an interpreter for the C-like syntax shown earlier. We first do
some bookkeeping such as splitting the program into lines and dropping comments.
Then we convert the instructions as a purely string operation. Thus r0 = 0 becomes
(zero r0). Similarly, r0 = r0 + 1 becomes (incr r0) and r0 = r1 becomes
(copy r0 r1). Finally, loop r0 becomes (loop r0 (note the missing closing
paren), and end becomes ).
Here is the second interesting thing about the interpreter. Now that the C-like
syntax has been converted to a string in LISP-like syntax, we just need to interpret
58 Chapter I. Mechanical Computation
the string as a list. We write it to a file and then load that file. That is, unlike in
many programming languages, in Scheme we can create code on the fly.
Here is an example of running the interpreter. The program in C-like syntax is
this.
r1 = r1 + 1
r1 = r1 + 1
loop r1
r0 = r0 + 1
end
I.E Exercises
E.2 Write a LOOP program that triples its input.
E.3 Write a LOOP program that adds two inputs.
E.4 Modify the interpreter to allow statements like r0 = r0 + 2.
E.5 Modify the interpreter to allow statements like r0 = 1.
E.6 Modify the definition of interpret so that it takes one more argument, a
natural number m , and returns the contents of the first m registers.
Chapter
II Background
The first chapter began by saying that we are more interested in the things
that can be computed than in the details of how they are computed. In particular,
we want to understand the set of functions that are effective, that are intuitively
mechanically computable, which we formally defined as computable by a Turing
machine. The major result of this chapter and the single most important result in
the book is that there are functions that are uncomputable — there is no Turing
machine to compute them. There are jobs that no machine can do.
Section
II.1 Infinity
We will show that there are more functions than Turing machines, and that
therefore there are some functions with no associated machine.
Cardinality The set of functions and the set of Turing machines are both
infinite. We will begin with two paradoxes that dramatize the challenge
to our intuition posed by comparing the sizes of infinite sets. We will then
produce the mathematics to resolve these puzzles and apply it to the sets
of functions and Turing machines.
The first is Galileo’s Paradox. It compares the size of the set of perfect
squares with the size of the set of natural numbers. The first is a proper
subset of the second. However, the figure below shows that the two sets
Galileo Galilei
can be made to correspond, to match element-to-element, so in this sense
1564–1642
there are exactly as many squares as there are natural numbers.
0 1 2 3 4 5 6 7 8 9 10 11 ...
1.1 Animation: Correspondence n ↔ n 2 between the natural numbers and the squares.
The second paradox of infinity is Aristotle’s Paradox. On the left below are two
circles, one with a smaller radius. If we roll them through one revolution then the
trail left by the smaller one is shorter. However, if we put the smaller inside the
larger and roll them, as in a train wheel, then they leave equal-length trails.
Image: This is the Hubble Deep Field image. It came from pointing the Hubble telescope to the darkest
part of the sky, the very background, and soaking up light for eleven days. It covers an area of the sky
about the same width as that of a dime viewed seventy five feet away. Every speck is a galaxy. There
are thousand of them — there is a lot in the background. Robert Williams and the Hubble Deep Field
Team (STScI) and NASA. (Also see the Deep Field movie.)
62 Chapter II. Background
As with Galileo’s Paradox, the puzzle is that we might think of the set of points
on the circumference of larger circle as being a bigger set. But the right idea is
that the two sets have the same number of elements in that they correspond —
point-for-point, the circumference of the smaller matches the circumference of the
larger.
The animations below illustrate matching the points in two ways. The first
shows them as nested circles, with points on the inside matching points on the
outside. The second straightens that out so that the circumferences make segments
and then for every point on the top there is a matching point on the bottom.
1.5 Lemma For any function with a finite domain, the number of elements in that
domain is greater than or equal to the number of elements in the range. If such a
function is one-to-one then its domain has the same number of elements as its
range. If it is not one-to-one then its domain has more elements than its range.
Consequently, two finite sets have the same number of elements if and only if they
correspond, that is, if and only if there is a function from one to the other that is
a correspondence.
1.6 Lemma The relation between two sets S 0 and S 1 of ‘there is a correspondence
f : S 0 → S 1 ’ is an equivalence relation.
Proof Reflexivity is clear since a set corresponds to itself via the identity function.
For symmetry assume that there is a correspondence f : S 0 → S 1 and recall that
its inverse f −1 : S 1 → S 0 exists and is a correspondence in the other direction. For
transitivity assume that there are correspondences f : S 0 → S 1 and д : S 1 → S 2
and recall also that the composition д ◦ f : S 0 → S 2 is a correspondence.
We now give that relation a name. This definition extends Lemma 1.5’s
observation about same-sized sets from the finite to the infinite.
1.7 Definition Two sets have the same cardinality or are equinumerous, written
|S 0 | = |S 1 | , if there is a correspondence between them.
1.8 Example Stated in terms of the definition, Galileo’s Paradox is that the set
of perfect squares S = {n 2 n ∈ N } has the same cardinality as N because the
function f : N → S given by f (n) = n 2 is a correspondence. It is one-to-one
because if f (x 0 ) = f (x 1 ) then x 02 = x 12 and thus, since these are natural numbers,
x 0 = x 1 . It is onto because any element of the codomain y ∈ S is the square of
some n from the domain N by the definition of S , and so y = f (n).
1.9 Example Aristotle’s Paradox is that for r 0 , r 1 ∈ R+ , the interval [0 .. 2πr 0 ) has the
same cardinality as the interval [0 .. 2πr 1 ). The map д(x) = x · (2πr 1 /2πr 0 ) is a
correspondence; verification is Exercise 1.42.
1.10 Example The set of natural numbers greater than zero, N+ = { 1, 2, ... } has the
same cardinality as N. A correspondence is f : N → N+ given by n 7→ n + 1.
Comparing the sizes of sets, even infinite sets, in this way was
proposed by G Cantor in the 1870’s. As the paradoxes above dramatize,
Definition 1.7 introduces a deep idea. We should convince ourselves that
it captures what we mean by sets having the ‘same number’ of elements.
One supporting argument is that it is the natural generalization of the
finite case, Lemma 1.5. A second is Lemma 1.6, that it partitions sets into
classes so that inside of a class all sets have the same cardinality. That
is, it gives the ‘equi’ in equinumerous. The most important supporting
argument is that, as with Turing’s definition of his machine, Cantor’s
Georg Cantor definition is persuasive in itself. Gödel noted this, writing “Whatever
1845–1918 ‘number’ as applied to infinite sets may mean, we certainly want it to
have the property that the number of objects belonging to some class
does not change if, leaving the objects the same, one changes in any way . . . e.g.,
their colors or their distribution in space . . . From this, however, it follows at once
that two sets will have the same [cardinality] if their elements can be brought into
one-to-one correspondence, which is Cantor’s definition.”
64 Chapter II. Background
– if n < 2
n
n 0 1 2 3 4 5 6 ...
f (n) = n + 1 – if n ∈ { 2, 3 }
f (n) 0 1 3 4 6 7 8 ...
n + 2 – if n ≥ 4
This function is clearly one-to-one and onto. It is also computable; we could write
a program whose input/output behavior is f .
1.15 Example The set of prime numbers P is countable. There is a function p : N → P
where p(n) is the n -th prime, so that p(0) = 2, p(1) = 3, etc. We won’t produce a
formula for this function but obviously we can write a program whose input/output
behavior is p , so it is a correspondence that is effective.
1.16 Example Fix the set of symbols Σ = { a, ... , z }. Consider the set of strings made
of those symbols, such as az, xyz, and abba. The set of all such strings, denoted
Σ∗ , is countable. This table illustrates the correspondence that we get by taking
the strings in ascending order of length.
(The first entry is the empty string, ε = ‘ ’.) This correspondence is also effective.
1.17 Example The set of integers Z = { ... , −2, −1, 0, 1, 2, ... } is countable. The
natural correspondence, alternating between positive and negative numbers, is
Section 1. Infinity 65
also effective.
n∈N 0 1 2 3 4 5 6 ...
f (n) ∈ Z 0 +1 −1 +2 −2 +3 −3 ...
We have not given any non-computable functions because a goal of this chapter
is to show that such functions exist, and we are not there yet.
We close this section by circling back to the paradoxes of infinity that we began
with. In the prior example, the naive expectation is that the positives and the
negatives combined make Z twice as big as N. But this is the point of Galileo’s
Paradox; the right way to measure how many elements a set has is not through
superset and subset, the right way is through cardinality.
Finally, we will mention one more paradox, due to Zeno (circa 450 BC). He
imagines a tortoise challenging swift Achilles to a race, asking only for a head start.
Achilles laughs but the tortoise says that by the time Achilles reaches the spot x 0 of
the head start, the tortoise will have moved on to x 1 . On reaching x 1 , Achilles finds
that the tortoise has moved ahead to x 2 . At any x i , Achilles will always be behind
and so, the tortoise reasons, Achilles can never win. The heart of this argument is
that while the distances x i+1 − x i shrink toward zero, there is always further to go
because of the open-endedness at the left of the interval (0 .. ∞).
1.18 Figure: Zeno of Elea shows Youths the Doors to Truth and False, by covering half
the distance to the door, and then half of that, etc. (By either B Carducci (1560–
1608) or P Tibaldi (1527–1596).)
II.1 Exercises
✓ 1.19 Verify Example 1.13, that the function д : N → { 3k k ∈ N } given by n 7→ 3n
is both one-to-one and onto.
1.20 A friend tells you, “The perfect squares and the perfect cubes have the same
number of elements because these sets are both one-to-one and onto.” Straighten
them out.
1.21 Let f , д : Z → Z be f (x) = 2x and д(x) = 2x − 1. Give a proof or a
counterexample for each. (a) If f one-to-one? Is it onto? (b) If д one-to-one?
Onto? (c) Are f and д inverse to each other?
66 Chapter II. Background
✓ 1.22 Decide if each function is one-to-one, onto, both, or neither. You cannot
just answer ‘yes’ or ‘no’, you must justify the answer. (a) f : N → N given
by f (n) = n + 1 (b) f : Z → Z given by f (n) = n + 1 (c) f : N → N given
by f (n) = 2n (d) f : Z → Z given by f (n) = 2n (e) f : Z → N given by
f (n) = |n| .
1.23 Decide if each is a correspondence (you must also verify): (a) f : Q → Q
given by f (n) = n + 3 (b) f : Z → Q given by f (n) = n + 3 (c) f : Q → N
given by f (a/b) = |a · b| . Hint: this is a trick question.
1.24 Decide if each set finite or infinite and justify your answer. (a) { 1, 2, 3 }
(b) { 0, 1, 4, 9, 16, ... } (c) the set of prime numbers (d) the set of real roots of
x 5 − 5x 4 + 3x 2 + 7
1.25 Show that each pair of sets has the same cardinality by producing a one-to-one
and onto function from one to the other. You must verify that the function is a
correspondence. (a) { 0, 1, 2 }, { 3, 4, 5 } (b) Z, {i 3 i ∈ Z }
✓ 1.26 Show that each pair of sets has the same cardinality by producing a correspon-
dence (you must verify that the function is a correspondence): (a) { 0, 1, 3, 7 } and
{ π , π + 1, π + 2, π + 3 } (b) the even natural numbers and the perfect squares
(c) the real intervals (1 .. 4) and (−1 .. 1).
✓ 1.27 Verify that the function f (x) = 1/x is a correspondence between the subsets
(0 .. 1) and (1 .. ∞) of R.
1.28 Give a formula for a correspondence between the sets { 1, 2, 3, 4, ... } and
{ 7, 10, 13, 16 ... }.
✓ 1.29 Consider the set of characters C = { 0, 1, ... 9 } and the set of integers
A = { 48, 49, ... 57 }.
(a) Produce a correspondence f : C → A.
(b) Verify that the inverse f −1 : A → C is also a correspondence.
✓ 1.30 Show that each pair of sets have the same cardinality. You must give a
suitable function and also verify that it is one-to-one and onto.
(a) N and the set of even numbers
(b) N and the odd numbers
(c) the even numbers and the odd numbers
✓ 1.31 Although sometimes there is a correspondence that is natural, correspon-
dences need not be unique. Produce the natural correspondence from (0 .. 1) to
(0 .. 2), and then produce a different one, and then another different one.
1.32 Example 1.8 gives one correspondence between the natural numbers and the
perfect squares. Give another.
1.33 Fix c ∈ R such that c > 1. Show that f : R → (0 .. ∞) given by x 7→ c x is a
correspondence.
1.34 Show that the set of powers of two { 2k k ∈ N } and the set of powers of
1.35 For each give functions from N to itself. You must justify your claims. (a) Give
two examples of functions that are one-to-one but not onto. (b) Give two examples
of functions that are onto but not one-to-one. (c) Give two that are neither.
(d) Give two that are both.
1.36 Show that the intervals (3 .. 5) and (−1 .. 10) of real numbers have the same
cardinality by producing a correspondence. Then produce a second one.
1.37 Show that the sets have the same cardinality. (a) { 4k k ∈ N }, { 5k k ∈ N }
(b) { 0, 1, ... 99 }, {m ∈ N m 2 < 10 000 } (c) { 0, 1, 3, 6, 10, 15, ... }, N
✓ 1.38 Produce a correspondence between each pair of open intervals of reals.
(a) (0 .. 1), (0 .. 2)
(b) (0 .. 1), (a .. b) for real numbers a < b
(c) (0 .. ∞), (a .. ∞) for the real number a
(d) This shows a correspondence x 7→ f (x) between a finite interval of reals
and an infinite one, f : (0 .. 1) → (0 .. ∞).
P y=1
f (x)
x
The point P is at (−1, 1). Give a formula for f .
✓ 1.39
√ Not
every set involving irrational numbers is uncountable. The set S =
n
{ 2 n ∈ N and n ≥ 2 } contains only irrational numbers. Show that it is count-
able.
1.40 Let B be the set of characters from which bit strings are made, B = { 0, 1 }.
(a) Let B be the set of finite bit strings where the initial bit is 1. Show that B is
countable.
(b) Let B∗ be the set of finite bit strings, without the restriction on the initial bit.
Show that it also is countable. Hint: use the prior item.
1.41 Use the arctangent function to prove that the sets (0 .. 1) and R have the
same cardinality.
1.42 Example 1.9 restates Aristotle’s Paradox as: the intervals I 0 = [0 .. 2πr 0 ) and
I 1 = [0 .. 2πr 1 ) have the same cardinality, for r 0 , r 1 ∈ R+ .
(a) Verify it by checking that д : I 0 → I 1 given by д(x) = x · (r 1 /r 0 ) is a corre-
spondence.
(b) Show that where a < b , the cardinality of [0 .. 1) equals that of [a .. b).
(c) Generalize by showing that where a < b and c < d , the real intervals [a .. b)
and [c .. d) have the same cardinality.
1.43 Suppose that D ⊆ R. A function f : D → R is strictly increasing if x < x̂
implies that f (x) < f (x̂) for all x, x̂ ∈ D . Prove that any strictly increasing
function is one-to-one; it is therefore a correspondence between D and its range.
(The same applies if the function is strictly decreasing.) Does this hold for
D ⊆ N?
68 Chapter II. Background
✓ 1.44 A paradoxical aspect of both Aristotle’s and Galieo’s examples is that they
gainsay Euclid’s “the whole is greater than the part,” because they name sets
where that set equinumerous with a proper subset. Here, show that each
pair
of a set and a proper subset has the same cardinality. (a) N, { 2n n ∈ N }
(b) N, {n ∈ N n > 4 }
1.45 Example 1.14 illustrates that we can take away a finite number of elements
from the set N without changing the cardinality. Prove that — prove that if S is a
finite subset of N then N − S is countable.
1.46 (a) Let D = { 0, 1, 2, 3 } and C = { Spades, Hearts, Clubs, Diamonds }, and
let f : D → C be given by f (0) = Spades, f (1) = Hearts, f (2) = Clubs,
f (3) = Diamonds. Find the inverse function f −1 : C → D and verify that it
is a correspondence.
(b) Let f : D → C be a correspondence. Show that the inverse function exists.
That is, show that associating each y ∈ C with the x ∈ D such that f (x) = y
gives a well-defined function f −1 : C → D .
(c) Show that show that the inverse of a correspondence is also a correspondence,
that the function defined in the prior item is a correspondence.
1.47 Prove that a set S is infinite if and only if it has the same cardinality as a
proper subset of itself.
1.48 Prove Lemma 1.5 by proving each.
(a) For any function with a finite domain, the number of elements in that domain
is greater than or equal to the number of elements in the range. Hint: use
induction on the number of elements in the domain.
(b) If such a function is one-to-one then its domain has the same number of
elements as its range. Hint: again use induction on the size of the domain.
(c) If it is not one-to-one then its domain has more elements than its range.
(d) Two finite sets have the same number of elements if and only if there is a
correspondence from one to the other.
Section
II.2 Cantor’s correspondence
Countability is a property of sets so we naturally ask how it interacts with set
operations. Here we are interested in the cross product operation — after all,
Turing machines are sets of four-tuples.
2.1 Example The set S = { 0, 1 } × N consists of ordered pairs ⟨i, j⟩ where i ∈ { 0, 1 }
and j ∈ N. The diagram below shows two columns, each of which looks like
the natural numbers in that it is discrete and unbounded in one direction. So
informally, S is twice the natural numbers. As in Galelio’s Paradox this might lead
to a mistaken guess that it has more members than N. But S is countable.
To count it, the mistake to avoid is to go vertically up a column, which will
Section 2. Cantor’s correspondence 69
never get to the other column. Instead, alternate between the columns.
.. ..
. .
h0, 3i h1, 3i
h0, 2i h1, 2i
h0, 1i h1, 1i
h0, 0i h1, 0i
n∈N 0 1 2 3 4 5 ...
⟨i, j⟩ ∈ { 0, 1 } × N ⟨0, 0⟩ ⟨1, 0⟩ ⟨0, 1⟩ ⟨1, 1⟩ ⟨0, 2⟩ ⟨1, 2⟩ ...
The map from the top row to the bottom row is a pairing function because it
outputs pairs. Its inverse, from bottom to top, is an unpairing function. This
method extends to counting three copies { 0, 1, 2 } × N, four copies, etc.
2.3 Lemma The cross product of two finite sets is finite, and therefore countable.
The cross product of a finite set and a countably infinite set, or of a countably
infinite set and a finite set, is countably infinite.
2.4 Example The natural next set has infinitely many copies: N × N.
.. .. .. ..
. . . .
⟨0, 3⟩ ⟨1, 3⟩ ⟨2, 3⟩ ⟨3, 3⟩ ···
⟨0, 2⟩ ⟨1, 2⟩ ⟨2, 2⟩ ⟨3, 2⟩ ···
⟨0, 1⟩ ⟨1, 1⟩ ⟨2, 1⟩ ⟨3, 1⟩ ···
⟨0, 0⟩ ⟨1, 0⟩ ⟨2, 0⟩ ⟨3, 0⟩ ···
Counting up the first column or out the first row won’t work; here also we need
to alternate. So instead do a breadth-first traversal: start in the lower left with
⟨0, 0⟩ , then take pairs that are one away, ⟨1, 0⟩ and ⟨0, 1⟩ , then those that are
two away, ⟨2, 0⟩ , ⟨1, 1⟩ and ⟨0, 2⟩ etc.
70 Chapter II. Background
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i
...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
Number 0 1 2 3 4 5 6 ...
Pair ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ⟨0, 3⟩ . . .
That this procedure gives a correspondence is perfectly evident. But the
formula for going from the bottom line to the top is amusing so we will develop
it. Animation 2.5 numbers the diagonals.
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i ...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
Diagonal 0 1 2 3
Consider for example the pair ⟨1, 2⟩ . It is on diagonal number 3 and, just as
3 = 1 + 2, in general the diagonal number of a pair is the sum of its entries.
Diagonal 0 has one entry, diagonal 1 has two entries, and diagonal 2 has three
entries, so before diagonal 3 come six pairs. Thus, on diagonal 3 the initial
pair ⟨0, 3⟩ gets enumerated as number 6. With that, the pair ⟨1, 2⟩ is number 7.
So to find the number corresponding to ⟨x, y⟩ , note first that it lies on diagonal
d = x + y . The number of entries prior to diagonal d is 1 + 2 + · · · + d . This
is an arithmetic series with total d(d + 1)/2. Thus on diagonal d the first pair,
⟨0, x + y⟩ , has number (x + y)(x + y + 1)/2. The next pair on that diagonal,
⟨1, x + y − 1⟩ , gets the number 1 + [(x + y)(x + y + 1)/2], etc.
2.6 Definition Cantor’s correspondence cantor : N2 → N or unpairing function,
or diagonal enumeration† is cantor(x, y) = x + [(x + y)(x + y + 1)/2]. Its inverse
is the pairing function, pair : N → N2 .
2.7 Example Two early examples are cantor(1, 2) = 7 and cantor(2, 0) = 5. A later
one is cantor(0, 36) = 666.
†
Some authors use diamond brackets, writing ⟨x, y ⟩ where we write cantor(x, y).
Section 2. Cantor’s correspondence 71
2.9 Corollary The cross product of finitely many countable sets is countable.
Proof Suppose that S 0 , ... S n−1 are countable and that each function fi : N → S i
is a correspondence. By the prior result, the function cantorn−1 : N → Nn is a
correspondence. Write cantorn−1 (k) = ⟨k 0 , k 1 , ... kn−1 ⟩ . Then the composition
k 7→ ⟨f 0 (k 0 ), f 1 (k 1 ), ... fn−1 (kn−1 )⟩ from N to S 0 × · · · Sn−1 is a correspondence,
and so S 0 × S 1 × S n−1 is countable.
2.10 Example The set of rational numbers Q is countable. We know how to alternate
between positives and negatives so we will be done showing this if we count
the nonnegative rationals, f : N → Q+ ∪ { 0 }. A nonnegative rational number
is a numerator-denominator pair ⟨n, d⟩ ∈ N × N+ , except for the complication
that pairs collapse, meaning for instance that when the numerator is 4 and the
denominator is 2 then we get the same rational as when n = 2 and d = 1.
We will count with a program instead of a formula. Given an input i , the
program finds f (i) by using prior values, f (0), f (1), . . . f (i − 1). It loops,
using the pairing function cantor−1 to generate pairs: cantor−1 (0), cantor−1 (1),
cantor−1 (2), . . . For each generated pair ⟨a, b⟩ , if the second entry is 0 or if the
rational number a/b is in the list of prior values then the program rejects the pair,
going on to try the next one. The first pair that it does not reject is f (i).
The technique of that example is memoization or caching and it is widely used.
For example, when you visit a web site your browser saves any image to your disk.
If you visit the site again then your browser checks if the image has changed. If
not then it will use the prior copy, reducing download time.
The next result establishes that we can use memoization in general.
2.11 Lemma A set S is countable if and only if either S is empty or there is an onto
map f : N → S .
Proof Assume first that S is countable. If it is empty then we are done. If it is
finite but nonempty, S = {s 0 , ... sn−1 }, then this f : N → S map is onto.
(
si – if i < n
f (i) =
s 0 – otherwise
72 Chapter II. Background
Very important: Lemma 2.3 and Lemma 2.8 on the cross product of countable
sets are effectivizable. That is, if sets correspond to N via some effective numbering
then their cross product corresponds to N via an effective numbering. We finish
this section by applying that to Turing machines — we will give a way to effectively
number the Turing machines.
Turing machines are sets of instructions. Each instruction is a four-tuple,
a member of Q × Σ × (Σ ∪ { L, R }) × Q , where Q is the set of states and Σ is
the tape alphabet. So by the above numbering results, we can number the
instructions: there is an instruction whose number is 0, one with number 1, etc.
This is effective, meaning that there is a program that takes in a natural number
and outputs the corresponding instruction, as well as a program that takes in an
instruction and outputs the corresponding number (see Exercise 2.24).
With that, we can effectively number the Turing machines. One way is: starting
with a Turing machine P, use the prior paragraph to convert each of its instructions
to a number, giving a set {i 0 , i 1 , ... i n }, and then output the number associated
with that machine as e = д(P ) = 2i 0 + 2i 1 + · · · + 2i n .
The association in the other direction is much the same. Given a natural
number e , represent it in binary e = 2j0 + · · · + 2jk , form the set of instructions
corresponding to the numbers j 0 , . . . jk , and that is the output Turing machine.
(Except that we must check that this set is deterministic, that no two of the
instructions begin with the same qpTp , which we can do effectively, and if it is
not deterministic then let the output be the empty machine P = { }.)
The exact numbering that we use doesn’t matter much as long as it is has
certain properties, the ones in the following definition, for the rest of the book we
will just fix a numbering and cite its properties rather than mess with its details.
2.13 Definition A numbering is a function that assigns to each Turing machine
a natural number. For any Turing machine, the corresponding number is its
index number, or Gödel number, or description number. For the machine with
index e ∈ N we write Pe . For the function computed by Pe we write ϕ e .
A numbering is acceptable if it is effective: (1) there is a program that takes
as input the set of instructions and gives as output the associated number, (2) the
set of numbers for which there is an associated machine is computable, and
(3) there is an effective inverse that takes as input a natural number and gives as
output the associated machine.
Think of the machine’s index as its name. We will refer to it frequently, for
instance by saying “the e -th Turing machine.”
The takeaway point is that because the numbering is acceptable, the index is
source-equivalent — we can go effectively from the index to the machine source,
the set of four-tuple instructions, or from the source to the index.
2.14 Remark Here is an alternative scheme that is simple and is useful for thinking
about numbering, but that we won’t make precise. On a computer, the text of a
program is saved as a bit string, which we can interpret as a binary number, e .
74 Chapter II. Background
In the other direction, given a binary e on the disk, we can disassemble it into
assembly language source code. So there is an association between binary
numbers and source code.
2.15 Lemma (Padding lemma) Every computable function has infinitely many in-
dices: if f is computable then there are infinitely many distinct ei ∈ N with
f = ϕ e0 = ϕ e1 = · · · . We can effectively produce a list of such indices.
Proof Let f = ϕ e . Let q j be the highest-numbered state in the set Pe . For each
k ∈ N+ consider the Turing machine obtained from Pe by adding the instruction
q j+k BBq j+k , This gives an effective sequence of Turing machines Pe1 , Pe2 , . . .
with distinct indices, all having the same behavior, ϕ ek = ϕ e = f .
2.16 Remark Stated in terms of everyday programming, we can get infinitely different
many source codes that have the same compiled behavior. One way is by starting
with one source code and adding to the bottom a comment line containing the
number k .
Now that we have counted the Turing machines we are close to this book’s most
important result. The next section shows that there are so many natural number
functions that they cannot be counted, they cannot be put in correspondence
with N. This will prove that there are functions not computed by any Turing
machine.
II.2 Exercises
✓ 2.17 Extend the table of Example 2.1 through n = 12. Where f (n) = ⟨x, y⟩ , give
formulas for x and y .
✓ 2.18 For each pair ⟨a, b⟩ find the pair before it and the pair after it in Cantor’s
correspondence. That is, where cantor(a, b) = n , find the pair associated
with n + 1 and the pair with n − 1. (a) ⟨50, 50⟩ (b) ⟨100, 4⟩ (c) ⟨4, 100⟩
(d) ⟨0, 200⟩ (e) ⟨200, 0⟩
✓ 2.19 Corollary 2.12 says that the unionof two countable sets is countable.
(a) For each of the two sets T = { 2k k ∈ N } and F = { 5m m ∈ N } produce
a correspondence fT : N → T and f F : N → F . Give a table listing the
values of fT (0), . . . fT (9) and give another table listing f F (0), . . . f F (9).
(b) Give a table listing the first ten values for a correspondence f : N → T ∪ F .
2.20 Give an enumeration of N × { 0, 1 }. Find the pair matching 0, 10, 100, and
101. Find the number corresponding to ⟨2, 1⟩ , ⟨20, 1⟩ , and ⟨200, 1⟩ .
✓ 2.21 Example 2.1 says that the method for two columns extends to three. Give
an enumeration of { 0, 1, 2 } × N. That is, where д(n) = ⟨x, y⟩ give a formula for
x and y . Find the pair corresponding to 0, 10, 100, and 1 000. Find the number
corresponding to ⟨1, 2⟩ , ⟨1, 20⟩ , and ⟨1, 200⟩ .
2.22 Give an enumeration f of { 0, 1, 2, 3 } × N. That is, where f (n) = ⟨x, y⟩ ,
give a formula for x and y . Also give an enumeration f of { 0, 1, 2, ... k } × N.
Section 2. Cantor’s correspondence 75
Section
II.3 Diagonalization
Cantor’s definition of cardinality led us to produce correspondences. But it
can also happen that no correspondence exists. We now introduce a powerful
technique to show that. It is central to the entire Theory of Computation.
Diagonalization There is a set so large that it is not countable, that is, a set for
which no correspondence exists with N or any subset of it. It is the set of reals, R.
3.1 Theorem There is no onto map f : N → R. Hence, the set of reals is not
countable.
This result is important but so is the technique of proof that we will use. We
will pause to develop the intuition behind it. The table below illustrates a function
f : N → R, listing some inputs and outputs, with the outputs aligned on the
Section 3. Diagonalization 77
decimal point.
n Decimal expansion of f (n)
0 42 . 3 1 2 7 7 0 4 ...
1 2.0 1 0 0 0 0 0 ...
2 1.4 1 4 1 5 9 2 ...
3 −20 . 9 1 9 5 9 1 9 ...
4 0.1 0 1 0 0 1 0 ...
5 −0 . 6 2 5 5 4 1 8 ...
.. ..
. .
We will show that this function is not onto. We will do this by producing a number
z ∈ R that does not equal any of the outputs, any of the f (n)’s.
Ignore what is to the left of the decimal point. To its right go down the
diagonal, taking the digits 3, 1, 4, 5, 0, 1 . . . Construct the desired z by making
its first decimal place something other than 3, making its second decimal place
something other than 1, etc. Specifically: if the diagonal digit is a 1 then z gets
a 2 in that decimal place and otherwise z gets a 1 there. Thus, in this example
z = 0.121112 ...
By this construction, z differs from the number in the first row, z , f (0),
because they differ in the first decimal place. Similarly, z , f (1) because they
differ in the second place. In this way z does not equal any of the f (n). Thus f is
not onto. This technique is diagonalization.
(In this argument we have skirted a technicality, that some real numbers have
two different decimal representations. For instance, 1.000 ... = 0.999 ... because
the two differ by less than 0.1, less than 0.01, etc. This is a potential snag because
it means that even though we have constructed a representation that is different
than all the representations on the list, it still might not be that the number is
different than all the numbers on the list. However, dual representation only
happens for decimals when one of the representations ends in 0’s while the other
ends in 9’s. That’s why we build z using 1’s and 2’s.)
Proof We will show that no map f : N → R is onto.
Denote the i -th decimal digit of f (n) as f (n)[i] (if f (n) is a number with two
decimal representations then use the one ending in 0’s). Let д be the map on the
decimal digits { 0, ... , 9 } given by: д(j) = 2 if j is 1, and д(j) = 1 otherwise.
Now let z be the real number that has 0 to the left of its decimal point, and
whose i -th decimal digit is д(f (i)[i]). Then for all i , z , f (i) because z[i] , f (i)[i].
So f is not onto.
for any map to cover them all. The best we can do is something like this, which is
one-to-one but not onto.
3
2
2
1
1
0
0
3.3 Definition The set S has cardinality less than or equal to that of the set T ,
denoted |S | ≤ |T | , if there is a one-to-one function from S to T .
3.4 Example There is a one-to-one function from N to R, namely the inclusion map
that sends n ∈ N to itself, n ∈ R. So | N | ≤ | R | . (By Theorem 3.1 above the
cardinality is actually strictly less.)
3.5 Remark We cannot emphasize too strongly that the work in this chapter,
including the prior example, is startling and profound. Some infinite sets have
more elements than others. And, in particular, the reals have more elements than
the naturals. As dramatized by Galelio’s Paradox, this is not just that the naturals
are a subset of the reals. Instead it means that the set of naturals cannot be made
to correspond with the set of reals. This is like the children’s game Musical Chairs.
We have countably many chairs P 0 , P 1 , ..., chairs indexed by the natural numbers,
but there are so many children, so many real numbers, that some child is left
without a chair.
The wording of that definition implies that if both |S | ≤ |T | and |T | ≤ |S | then
|S | = |T | . That is true but the proof is beyond our scope; see Exercise 3.31.
For the next result recall that a set’s characteristic function 1S is the Boolean
function determining membership: 1S (s) = 1 if s ∈ S and 1S (s) = 0 if s < S .
Thus for the set of two letters S = { a, c }, the characteristic function with domain
Σ = { a, ... , z } is 1S (a) = 1, 1S (b) = 0, 1S (c) = 1, 1S (d) = 0, ... 1S (z) = 0.
Recall also that the power set P (S) is the collection of subsets of S . For instance,
if S = { a, c } then P (S) = { , { a }, { c }, { a, c } }.
3.6 Theorem (Cantor’s Theorem) A set’s cardinality is strictly less than that of its
power set.
Before stating the proof we first illustrate it. The easy half is starting with a
set S and producing a function to P (S) that is one-to-one: just map s ∈ S to {s }.
The harder half is showing that no map from S to P (S) is onto. As an example,
consider S = { a, b, c } and this function f : S → P (S).
f f f
a 7−→ { b, c } b 7−→ { b } c 7−→ { a, b, c } (∗)
In the table below, the first row lists the values of the characteristic function
1f (a) : S → { 0, 1 } on the inputs a, b, and c. The second row lists the input/output
values for 1f (b) . And, the third row lists 1f (c) .
Section 3. Diagonalization 79
We will show that no member of the domain maps to R and thus f is not onto.
Suppose that there exists ŝ ∈ S such that f (ŝ) = R . Consider
whether ŝ is an
element of R . We have that ŝ ∈ R if and only if ŝ ∈ {s s < f (s) }. By definition
of membership that holds if and only if ŝ < f (ŝ), which holds if and only if ŝ < R .
The contradiction means that no such ŝ exists.
II.3 Exercises
3.9 Your friend is confused about the diagonal argument. “If you had an infinite
list of numbers, it would clearly contain every number, right? I mean, if you had
80 Chapter II. Background
a list that was truly INFINITE, then you simply couldn’t find a number that is
not on the list!” Straighten them out.
3.10 Your classmate says, “Professor, you’ve made a mistake. The set of numbers
with one decimal place, such as 25.4 and 0.1, is clearly countable — just take
the integers and shift all the decimal places by one. The set with two decimal
places, such as 2.54 and 6.02 is likewise countable, etc. This is countably many
sets, each of which is countable, and so the union is countable. The union is the
whole reals, so the reals are countable.” Where is your friend’s mistake?
3.11 Verify Cantor’s Theorem, Theorem 3.6, for these finite sets. (a) { 0, 1, 2 }
(b) { 0, 1 } (c) { 0 } (d) { }
✓ 3.12 Use Definition 3.3 to prove that the first set has cardinality less than or
equal to the second set.
(a) S = { 1, 2, 3 } , Sˆ = { 11, 12, 13 }
(b) T = { 0, 1, 2 } , T̂ = { 11, 12, 13, 14 }
(c) U = { 0, 1, 2 } , the set of odd numbers
(d) the set of even numbers, the set of odds
3.13 One set is countable and the other is uncountable. Which is which?
(a) {n ∈ N n + 3 < 5 }
(b) {x ∈ R x + 3 < 5 }
✓ 3.14 Characterize each set as countable or uncountable. You need only give
a one-word answer. (a) [1 .. 4) ⊂ R (b) [1 .. 4) ⊂ N (c) [5 .. ∞) ⊂ R
(d) [5 .. ∞) ⊂ N
3.15 List all of the functions with domain A2 = { 0, 1 } and codomain P (A2 ).
How many functions are there for a set A3 with three elements? n elements?
3.16 List all of the functions from S to T . How many are one-to-one?
(a) S = { 0, 1 } , T = { 10, 11 }
(b) S = { 0, 1 } , T = { 10, 11, 12 }
✓ 3.17 Short answer: fill each blank by choosing from (i) uncountable, (ii) countable
or uncountable, (iii) finite, (iv) countable, (v) finite, countably infinite, or
uncountable (you might use an answer more than once, or not at all). Give the
sharpest conclusion possible. You needn’t give a proof.
(a) If A and B are finite then A ∪ B is .
(b) If A is countable and B is finite then A ∪ B is .
(c) If A is countable and B is uncountable then A ∪ B is .
(d) if A is countable and B is uncountable then A ∩ B is .
3.18 Short answer: suppose that S is countable and consider f : S → T . List all
of these that are possible: (i) S is finite, (ii) T is finite, (iii) S is countably infinite,
(iv) T is countably infinite, (v) T is uncountable, provided that (a) the map is onto,
(b) the map is one-to-one.
✓ 3.19 Name a set with a larger cardinality than R.
Section 3. Diagonalization 81
some numbers have more than one representation; an example is 0.01000 ...
and 0.00111 .... How could you make the argument work in base two?
3.29 The discussion after the statement of Theorem 3.1 includes that the real
number 1 has two different decimal representations, 1.000 ... = 0.999 ...
(a) Verify this equality using the formula for an infinite geometric series,
a + ar + ar 2 + ar 3 + · · · = a · 1/(1 − r ).
(b) Show that if a number has two different decimal representations then in
the leftmost decimal place where they differ, they differ by 1. Hint: that is
the biggest difference that the remaining decimal places can make up.
(c) In addition show that, for the one with the larger digit in that first differing
place, all of the digits to its right are 0, while the other representation has
that all of the remaining digits are 9’s. Hint: this is similar to the prior item.
3.30 Show that there is no set of all sets. Hint: use Theorem 3.6.
3.31 Definition 3.3 extends the definition of equal cardinality to say that |A| ≤ |B|
if there is a one-to-one function from A to B . The Schröder–Bernstein theorem
is that if both |S | ≤ |T | and |T | ≤ |S | then |S | = |T | . We will walk through
the proof. It depends on finding chains of images: for any s ∈ S we form the
associated chain by iterating application of the two functions, both to the right
and the left, as here.
... f −1 (д−1 (s)), д−1 (s), s, f (s), д(f (s)), f (д(f (s))) ...
(Starting with s the chain to the right is s, f (s), д(f (s)), f (д(f (s))), ... while the
chain to the left is ... f −1 (д−1 (s)), д−1 (s), s .) For any t ∈ T define the associated
chain similarly.
An example is to take a set of integers S = { 0, 1, 2 } and a set of characters T =
{ a, b, c }, and consider the two one-to-one functions f : S → T and д : T → S
shown here.
s f (s) 2 c t д(t)
0 b 1 b
a 0
1 c 0 a
b 1
2 a c 2
Starting at 0 ∈ S gives a single chain that is cyclic, ... 0, b, 1, c, 2, a, 0 ...
(a) Consider S = { 0, 1, 2, 3 } and T = { a, b, c, d } . Let f associate 0 7→ a,
1 7→ b, 2 7→ d and 3 7→ c. Let д associate a 7→ 0, b 7→ 1, c 7→ 2 and
d 7→ 3. Check that these maps are one-to-one. List the chain associated
with each element of S and the chain associated with each element of T .
(b) For infinite sets a chain can have a first element, an element without any
preimage. Let S be the even numbers and let T be the odds. Let f : S → T
be f (x) = x + 1 and let д : T → S be д(x) = x + 1. Show each map is
one-to-one. Show there is a single chain and that it has a first element.
(c) Argue that we can assume without loss of generality that S and T are
disjoint sets.
Section 4. Universality 83
(d) Assume that S and T are disjoint and that f : S → T and д : T → S are
one-to-one. Show that every element of either set is in a unique chain, and
that each chain is of one of four kinds: (i) those that repeat after some
number of terms (ii) those that continue infinitely in both directions without
repeating (iii) those that continue infinitely to the right but stop on the left
at some element of S , and (iv) those that continue infinitely to the right but
stop on the left at some element of T .
(e) Show that for any chain the function below is a correspondence between
the chain’s elements from S and its elements from T .
(
f (s) – if s is in a sequence of type (i), (ii), or (iii)
h(s) = − 1
д (s) – if s is in a sequence of type (iv)
Section
II.4 Universality
We have seen a number of Turing machines: one whose behavior is that its output
is the successor of its input, one that interprets its input as two numbers and adds
them, etc. These are single-purpose devices, where to get different input-output
behavior we needed to get a new machine, that is, new hardware. This was what
we meant by saying that a good first take on Turing machines is that they are more
like a modern computer program than a modern computer.
The picture below shows programmers of an early electronic computer. They
are changing its behavior by changing its circuits, using the patch cords.
Weaving by hand, as the loom operator on the left is doing, is intricate and slow.
We can make a machine to reproduce her pattern. But what if we want a different
pattern; do we need another machine? In 1801 J Jacquard built a loom like the
one on the right, controlled by paper cards. Getting a different pattern does not
require a new loom, it only requires swapping cards.
Turing introduced the analog for computing devices. He produced a single
Turing machine that we can give the instructions: “Consider the following Turing
machine. Have the same output behavior as this machine would on receiving the
following input.” We don’t need infinitely many different machines, we just need
this one, and it can be made to have any desired computable behavior.
“Have the same output behavior” means that if the specified machine halts on
that input then the universal machine halts and gives the same output, and if the
specified machine does not halt on that input then the universal machine also does
not halt.
Before we state the theorem, we will first address a question. This
machine may seem to present a chicken and egg problem: how can
we give a Turing machine as input to a Turing machine? In particular,
since the universal machine is itself a Turing machine, the theorem
seems to allow the possibility of giving it to itself — won’t feeding a
machine to itself lead to infinite regress?
We run Turing machines by loading symbols on the tape and
An ouroboros, pressing Start. So we don’t feed a machine to itself — instead, it inputs
a snake swal- symbols. True, we can feed a universal machine a pair e, x where e
lowing its own is the index of of the universal machine, and thus is computationally
tail equivalent to that machine’s source. But even so, the universe won’t
collapse — we can absolutely use a text editor to edit the bits that are
its own source, or give a compiler a source code listing for itself. Similarly, we
can feed a universal machine its own number. Certainly, lots of interesting things
happen as a result, but the point is that there is no inherent impossibility.
4.1 Theorem (Turing, 1936) There is Turing machine that when given the input
e, x will have the same output behavior as does Pe on input x .
Section 4. Universality 85
CHICKEN
(c) 2008-2013, The Chicken Team
(c) 2000-2007, Felix L. Winkelmann
Version 4.8.0.5 (stability/4.8.0) (rev 5bd53ac)
linux-unix-gnu-x86-64 [ 64bit manyargs dload ptables ]
compiled 2013-10-03 on aeryn.xorinia.dim (Darwin)
Illustrating this even more directly, in line 3 the interpreter gets as a single
expression both the source of a routine (it is shown highlighted) and the input,
and it returns the result of applying the source to the input. That is, like the loom’s
punched cards, our mechanism allows us to swap behaviors in and out, at will.
The most direct example of our everyday experience with computing systems
that act as universal machines is a programming language’s eval statement.
#;1> (define (utm s)
(eval s (scheme-report-environment 5)))
#;2> (define TEST '(lambda (i) (if (= i 0) 1 0)))
#;3> TEST
(lambda (i) (if (= i 0) 1 0))
†
Another often-used way to define a Universal Turing machine that is to have it take the single-number
input cantor(e, x ). ‡ The figure is a flow chart, which gives a high level outline of a routine, here of an
operating system or of a Universal Turing machine. We use three types of boxes. Rectangles are for the
ordinary flow of control. Round corner boxes are for Start and End. Diamond boxes, which appear in
later flow charts, are for decisions, if statements.
86 Chapter II. Background
But that’s silly. We can have if .. elif .. branches for a few cases but because
programs have finite length, code must handle all but finitely many n ’s uniformly.
There must be a branch that handles infinitely many inputs (there may be a finite
†
Writing a program that allows general users to evaluate arbitrary code is powerful but not safe,
especially if these users just surf in from the Internet. Restricting which commands the user can
evaluate, known as sandboxing, forms part of being careful with that power. For us, however, the
software engineering issues are not relevant.
Section 4. Universality 87
number of such branches), and all except for finitely many inputs must be handled
on such a branch.
Thus, the fact that Turing machines have only finitely many instructions imposes
a requirement of uniformity. What this machine does on 1 is unconnected to what
it does on other inputs
read n
if n==1:
print 42
else:
print 2*n
but in any program there are only a finite number of different cases.
4.2 Example Associating in this way the idea that ‘something is computable’ with ‘it is
uniformily computable’ has some surprising consequences. Consider the problem of
producing a program that inputs a number n and decides whether somewhere in the
decimal expansion of π = 3.14159 ... there is a length n sequence of consecutive
nines.
The answer: there are two possibilities. Either for all n such a sequence exists
or else there is some number n 0 where a sequence of 9’s exists for lengths less
than n 0 and no such sequence exists when n ≥ n 0 . Therefore the problem is solved
by one of these two programs. However, we don’t know which one.
read n read n
print 1 if n < n0:
print 1
else:
print 0
One aspect that is surprising is that neither of the two have anything to do with π .
Also surprising, and perhaps unsettling, is that we have shown that the problem is
solvable without showing how to solve it. That is, there is a difference between
showing that this function is computable
(
1 – if there are n consecutive 9’s in π
f (n) =
0 – otherwise
and possessing an algorithm to compute it. This observation shows that the idea
“something is computable if you can write a a program for it” is naive or, at least,
doesn’t go into enough detail to make the subtleties clear.
In contrast, consider a subroutine that inputs i ∈ N and outputs π ’s i -th decimal
place. With it, we can write a program that takes in n and looks through π for n
consecutive 9’s by searching the digits. This approach is constructive in that we
are constructing the answer, not just saying that it exists. It is also uniform in the
sense that we could modify it to take other subroutines as input and thus look for
strings of 9’s in other numbers. However, this approach has the disadvantage that
if n 0 is such that for n ≥ n 0 never does π have n consecutive 9’s then this program
will just search forever, without printing 0.
88 Chapter II. Background
Freeze the first argument, that is, lock x = a for some a ∈ N. The result is a
one-input program. This shows what happens when we freeze x at a = 7.
(define (P_7 y)
(P 7 y))
This is partial application because we are not freezing all of the input variables.
Instead, we are parametrizing the variable x to get a family of functions P 0 , P 1 , . . .
Obviously the programs in the family are related to the starting one. Denoting
the function computed by the starting program P as ψ (x, y) = x + y , partial
application gives a family of programs and functions: ψ 0 (y) = y , ψ 1 (y) = 1 + y ,
ψ 2 (y) = 2 + y , . . . The next result is that from the index of the starting program
or function, and from the values that are frozen, we can effectively compute the
family members.
4.3 Theorem (s-m-n theorem, or Parameter theorem) For every m, n ∈ N there is
a computable total function sm,n : N1+m → N such that for the m + n -ary function
ϕ e (x 0 , ... xm−1 , xm , ... xm+n−1 ), freezing the initial m variables at a 0 , ... am−1 ∈ N
gives an n -ary function equal to ϕ s(e,a0, ...am−1 ) (xm , ... xm+n−1 ).
The function ϕ e (x 0 , ... xm−1 , xm , ... xm+n−1 ) could be partial, that is, it could be
that the Turing machine Pe fails to halt on some inputs x 0 , ... xm−1 , xm , ... xm+n−1 .
Proof We will produce the function s to satisfy three requirements: it must be
effective, it must input an index e and an m -tuple a 0 , ... am−1 , and it must output
the index of a machine P̂ that, when given the input xm , ... xm+n−1 , will return the
value ϕ e (a 0 , ... am−1 , xm , ... xm+n−1 ), or diverge if that function diverges.
The idea is that the machine that computes s will construct the instructions
for P̂ . We can get from the instruction set to the index using Cantor’s encoding, so
with that we will be done.
Below on the left is the flowchart for the machine that computes s and on the
right is the flowchart for P̂ .
Section 4. Universality 89
Start
Start
Move left a0 + · · · + am−1 + m cells
Read e, a0, ... , am−1
Put a0 , . . ., am−1 on tape
Create instructions for P̂ separated by blanks
Recall that we are being flexible about the convention for input and output
representations for Turing machines but to be precise in this argument we assume
that input is encoded in unary, that multiple inputs are separated with a single
blank, and that when the machine is started the head should be under the input’s
left-most 1.
With that, we construct the machine P̂ so that the first thing it does is not read
its inputs xm , ... xm+n−1 . Instead, P̂ first moves left and puts a 0 , ... am−1 on the
tape, in unary and separated by blanks, and with a blank between am−1 and xm .
Then, using universality, P̂ simulates Turing machine Pe , and lets it run on that
input list.
In the notation sm,n , the subscript m is the number of inputs being frozen
while n is the number of inputs left free. As the prior example suggests, they can
sometimes be a bother and we usually omit them.
4.4 Example Consider the two-input routine sketched by this flowchart.
Start
Read x , y
( ∗)
Print x · y
End
By Church’s Thesis there is a Turing machine that fills in the sketch, and computes
the function ψ (x, y) = x · y . let that machine have index e . We can use the s-m-n
theorem to freeze the value of x to 0. On the left below is the flowchart sketching
the machine Ps1, 1 (e, 0) . It computes the function ϕ s1, 1 (e, 0) (y) = 0; for example,
ϕ s1, 1 (e, 0) (5) = 0.
Similarly the other two are flowcharts summarizing Ps1, 1 (e, 1) and Ps1, 1 (e, 2) , which
freeze the value of x at 1 and 2. The machine sketched in the center computes
ϕ s1, 1 (e, 1) (y) = y , so for instance ϕ s1, 1 (e, 1) (5) = 5. On the right the machine
computes ϕ s1, 1 (e, 2) (y) = 2y , and an example is ϕ s1, 1 (e, 2) (5) = 10.
In general, this is the flowchart for Ps1, 1 (e,x ) .
Start
Read y
(**)
Print x · y
End
Compare this to the flowchart in (∗) above. The difference is that this machine
does not read x . Rather, as in the three charts above, x is hard-coded into the
program body. That is, Ps1, 1 (e,x ) is a family of Turing machines, the first three of
which are in the prior paragraph. This family is parametrized by x , and the indices
are uniformly computable from e and x , using the function s .
The s-m-n Theorem says that we can hard code the values of parameters into
the machine’s source. But it says more. It also says that the resulting family of
functions is uniformly computable; there is a single computable function, s , going
from the index e and the parameter value x to the index of the result in (∗∗). So,
the s-m-n Theorem is about uniformity.
II.4 Exercises
4.5 Your friend asks, “What can a Universal Turing machine do that a regular
Turing machine cannot?” Help them out.
✓ 4.6 Has anyone ever built a Universal Turing machine, or a machine equivalent to
one?
4.7 Can a Universal Turing machine simulate another Universal Turing machine,
or for that matter can it simulate itself?
✓ 4.8 Your class has a jerk who keeps throwing out pronouncements that the prof
must patiently correct. This time its, “Universal Turing machines make no sense.
How could a machine simulate another machine that has more states? Obviously
it can’t.” Clue this chucklehead in.
4.9 Is there more than one Universal Turing machine?
4.10 What happens if we feed a Universal Turing machine to itself? For instance,
where the index e 0 is such that ϕ e0 (e, x) = ϕ e (x) for all x , what is the value of
ϕ e0 (e 0 , 5)?
4.11 Consider the function f (x 0 , x 1 ) = 3x 0 + x 0 · x 1 .
(a) Freeze x 0 to have the value 4. What is the resulting one-variable function?
(b) Freeze x 0 at 5. What is the resulting one-variable function?
(c) Freeze x 1 to be 0. What is the resulting function?
Section 4. Universality 91
4.12 Consider f (x 0 , x 1 , x 2 ) = x 0 + 2x 1 + 3x 2 .
(a) Freeze x 0 to have the value 1. What is the resulting two-variable function?
(b) What two-variable function results from fixing x 0 to be 2?
(c) Let a be a natural number. What two-variable function results from fixing x 0
to be a ?
(d) Freeze x 0 at 5 and x 1 at 3. What is the resulting one-variable function?
(e) What one-variable function results from fixing x 0 to be a and x 1 to be b , for
a, b ∈ N?
✓ 4.13 Suppose that the Turing machine sketched by this flowchart has index e .
Start
Read x 0 , x 1
Print x 0 + x 1
End
Start
Read x 0 , x 1 , x 2
Print x 0 + x 1 · x 2
End
Read x 0 , x 1
N Y
Print x 1 x 0 > 1? Infinite loop
End
(a) Describe ϕ s1, 1 (e, 0) . (b) What is ϕ s1, 1 (e, 0) (5)? (c) Describe ϕ s1, 1 (e, 1) . (d) What
is ϕ s1, 1 (e, 1) (5)? (e) Describe ϕ s1, 1 (e, 2) . (f) What is ϕ s1, 1 (e, 2) (5)?
92 Chapter II. Background
✓ 4.16 Suppose that the Turing machine sketched by this flowchart has index e .
Start
Read x 0 , x 1 , y
Y x 0 even? N
Print x 1 · y Print x 1 + y
End
Section
II.5 The Halting problem
We’ve showed that there are functions that are not mechanically computable. We
gave a counting argument, that there are countably many Turing machines but
uncountably many functions and so there are functions with no associated machine.
Section 5. The Halting problem 93
While knowing what’s true is great, even better is to exhibit a specific function that
is unsolvable. We will now do that.
Definition The natural approach to producing such a function is to go through
Cantor’s Theorem and effectivize it, to turn the proof into a construction.
Here is an illustrative table adapted from the discussion of Cantor’s Theorem
on page 77. Imagine that this table’s rows are the computable functions and its
columns are the inputs. For instance, this table lists ϕ 2 (3) = 5.
Input x Start
0 1 2 3 4 5 6 ...
ϕ0 3 1 2 7 7 0 4 ... Read e
ϕ1 0 5 0 0 0 0 0 ...
ϕ2 1 4 1 5 9 2 6 ... Compute table entry
ϕ3 9 1 9 1 9 1 9 ... for index e , input e
ϕ4 1 0 1 0 0 1 0 ...
Print result + 1
ϕ5 6 2 5 5 4 1 8 ...
.. ..
. . End
Diagonalizing means considering the machine on the right. It moves down the
array’s diagonal, changing the 3, changing the 5, etc., so that when the input is 0
then the output is 4, when the input is 1 then the output is 6, etc. It appears that
in the usual diagonalization way, this machine’s output does not equal any of the
table’s rows.
However, that’s a puzzle, an apparent contradiction. The flowchart outlines an
effective procedure — we can implement this by using a Universal Turing machine,
so its output should be one of the rows.
What’s the puzzle’s resolution? The program’s first, second, fourth, and fifth
boxes are trivial, so the issue must involve getting through the box in the middle.
The answer is that there must be an e ∈ N so that ϕ e (e)↑, and for that index the
Turing machine sketched in the flowchart never gets through the middle box and
never prints the apparently contradictory output. That is, to avoid a contradiction
the above table must contain ↑’s.
So we have an important insight: the fact that some computations fail to halt
on some inputs is central to the nature of computation.
5.1 Definition K = {e ∈ N ϕ e (e)↓, that is, Turing machine Pe halts on input e }
5.2 Problem (Halting problem) Given e ∈ N, determine whether ϕ e (e)↓, that is,
whether Turing machine Pe halts on input e .
For any e ∈ N, obviously either ϕ e (e) ↓ or ϕ e (e) ↑. The Halting problem is
whether we can mechanically settle which numbers are members of the set K .
94 Chapter II. Background
Then the function below is also mechanically computable. The flowchart illustrates
how f is constructed; it uses the above function in its decision box.
Start
( Read e
42 – if ϕ e (e)↑
f (e) = Y
K (e) = 0? N
Infinite loop
↑ – if ϕ e (e)↓ Print 42
End
(In f ’s top case the output value doesn’t matter, all that matters is that f converges.)
Since this function is mechanically computable, it has an index. Let that index
be ê , so that f (x) = ϕ ê (x) for all inputs x .
Consider f (ê) = ϕ ê (ê), that is, feed the machine the input ê . If it diverges then
the first clause in the definition of f means that f (ê)↓, which contradicts divergence.
If it converges then f ’s second clause means that f (ê)↑, also impossible. Since
assuming that halt_decider is mechanically computable leads to a contradiction,
that function is not mechanically computable.
With Church’s Thesis in mind will say that a problem is unsolvable if it is
mechanically unsolvable, if no Turing machine computes that task. If the problem
is to compute the answer to ‘yes’ or ‘no’ questions, so it is the problem of determining
membership in a set, then we will say that the set is undecidable.
Discussion The fact that the Halting Problem is unsolvable does not mean that
we cannot tell if any program halts. This program obviously adds 1 to its input
and then halts for every input.
#;1> (define (prompt/read prompt)
---> (display prompt)
---> (read-line))
#;2>
#;2> (+ 1
---> (string->number (prompt/read "Enter n")))
Enter n---> 4
5
Nor does the unsolvability of the Halting problem mean that we cannot tell if a
program does not halt. Consider this one.
#;1> (define (f x)
---> (+ 1 (f x)))
Section 5. The Halting problem 95
This obviously does not halt; once started, it just keeps going.
#;2> (f 0)
^C
Call history:
Instead, the unsolvability of the Halting Problem says that there is no single
program that, for all e , correctly decides in a finite time whether Pe halts on
input e .
That has the qualifier ‘finite time’ because we could perfectly well write source
code to read an input e , simulate Pe on input e , and then print some nominal
output such as 42, but if Pe on input e fails to halt then we would not get the
output in a finite time.
The ‘single program’ qualifier is there because for any index e , either Pe halts
on e or else it does not. That is, for any e one of these two programs gives the right
answer.
read e read e
print 0 print 1
Of course, guessing which one applies is not what we had in mind. We had in mind
a program, an effective and uniform procedure, that inputs e and outputs the right
answer.
Thus, the unsolvability of the Halting Problem is about the non-existence of a
single program that works across all indices. It speaks to uniformity, or rather, the
impossibility of uniformity.
halt (and output 0), but the utility does not change any outputs where P does
halt. That would give rise to a list of total functions like the one on page 93, and
diagonalization gives a contradiction.
Thus, halting, or rather failure to halt, is inherent in the nature of computation.
In any general computational scheme there must be some computations that halt
on all inputs, some that halt on no inputs, and some that halt on some inputs but
not on others.
That alone is enough to justify study of the Halting problem but we will give a
second reason. If halt_decider were a computable function then we could solve
many problems that we currently don’t know how to solve.
For instance, a natural number is perfect if it is the sum of its proper positive
divisors. Thus 6 is perfect because 6 = 1 + 2 + 3. Similarly, 28 = 1 + 2 + 4 + 7 + 14
is perfect. These have been studied since Euclid and today we understand the
form of all even perfect numbers. But no one knows if there are any odd perfect
numbers.
With a solution to the Halting Problem we could
Start
settle this question. The program shown here
Read x searches for an odd perfect number.† If it finds
one then it halts. If not then it does not halt. So if
i =0
we had a halt_decider and we gave it the index of
this program, then that would settle whether there
i = i + 1 No 2i + 1 perfect? exists any odd perfect numbers. There are many
open questions involving an unbounded search that
Yes
would fall to this approach. (Just to name one
Print 1
more: no one knows if there is any n > 4 such that
n
End 2(2 ) + 1 is prime. We could answer the question by
writing P to search for such an n , and give the index
of P to halt_decider.)
Before moving on, note that unbounded search is a theme in our studies. We
have seen it earlier, in defining general recursion. And, it is at the heart of the
Halting problem since the natural way to test whether ϕ e (e)↓ is to run a brute force
computation, an unbounded search for a stage at which the computation halts.
General unsolvability We have named one job, the Halting problem, that no
mechanical computer can do. With that one in hand, we are able to show that a
wide class of jobs cannot be done. That is, the Halting problem is part of a larger
unsolvability phenomenon.
5.4 Example Consider the following problem: we want to know if a given Turing
machine halts on the input 3. That is, given x , does ϕ x (3)↓? Of course, the nature
of the material we are studying is that we want to answer this question with a
†
This program takes an input x but ignores it; in this book we prefer to have the machines that we use
take an input and give an output.
Section 5. The Halting problem 97
computation.
(
1 – if ϕ x (3)↓
halts_on_three_decider(x) =
0 – otherwise
Start Start
Read x, y Read y
Run Px on x Run Px on x
Print 42 Print 42
End End
With that motivation we are ready for the argument. For contradiction,
assume that halts_on_three_decider is mechanically computable. Consider this
function. (
42 – if ϕ x (x)↓
ψ (x, y) =
↑ – otherwise
The flowchart on the left below outlines how to compute ψ . Because it is intuitively
mechanically computable, Church’s Thesis says that there is a Turing machine
whose input-output behavior is ψ . That Turing machine has an index, e , so
that ψ = ϕ e .
Start Start
Read x, y Read y
Run Px on x Run Px on x
Print 7 Print 7
End End
Section 5. The Halting problem 99
Start Start
Read x, y Read y
Run Px on x Run Px on x
Print 2y Print 2y
End End
Apply the s-m-n theorem to get a family of functions ϕ s(e,x ) parametrized by x . The
machine Ps(e,x ) is sketched by the flowchart on the right. Then ϕ x (x)↓ if and only if
outputs_seven_decider(s(e, x)) = 1. So the supposition that doubler_decider
is computable gives that the Halting problem is computable, which is wrong.
These examples show the Halting problem serving as a touchstone for unsolv-
ability. Often we show something is unsolvable by showing that if we could solve it
then we could solve the Halting problem. We say that the Halting problem reduces
to the given problem.†
Before the next subsection, three comments. First, to reiterate, saying that a
†
We use ‘reduces to’ in the same sense that we would in saying, “finding the roots of a polynomial
reduces to factoring that polynomial,” meaning that if we could factor then we could find the roots.
100 Chapter II. Background
II.5 Exercises
5.8 Someone in your class says, “I don’t get the point of the Halting problem; it
just seems totally not relevant. If you want programs to halt then just watch them
and when they exceed a set number of cycles, send a kill signal.” Give them a
clue.
5.9 True or false: there is no function that solves the Halting Problem; there is no
f such that f (e) = 1 if ϕ e (e)↓ and f (e) = 0 if ϕ e (e)↑.
✓ 5.10 Your study partner asks you, “The Turing machine P = {q 0 BBq 0 , q 0 11q 0 }
fails to halt for all inputs, that’s obvious. But these unsolvability results say that I
cannot know that. Why not?” Explain what they are missing.
5.11 You have a person in class who a lot of time talks before thinking. They say,
“Hey, I can solve the Halting problem. For any given Turing machine there are a
finite number of states, right? And the tape alphabet is finite, right? So there
are only finitely many state and character pairs that can happen. As the machine
runs, just monitor it for a repeat pair. If we see one then declare that the machine
is looping.” What’s missing?
5.12 This is the Hailstone function.
42 – if n = 0 or n = 1
h(x) = h(n/2) – if n is even
h(3n + 1) – else
The Collatz conjecture is that f halts on all x ∈ N. No one knows if it is true. Is it
an unsolvable problem to determine whether f halts on all input?
✓ 5.13 True or false?
(a) The problem of determining, given e , whether ϕ e (3)↓ is unsolvable because
no function halts_on_three_decider exists.
(b) The existence of unsolvable problems indicates weaknesses in the models of
computation, and we need stronger models.
Section 5. The Halting problem 101
Section
II.6 Rice’s Theorem
The intuition from the unsolvability examples is that we cannot mechanically
analyze the behavior of a mechanism. These two definitions make precise the word
‘behavior’.
6.1 Definition Two computable functions have the same behavior ϕ e ≃ ϕ ê if they
converge on the same inputs x ∈ N and when they do converge, they have the
same outputs.†
†
Strictly speaking we don’t need the symbol ≃. A function is a set of ordered pairs, So if ϕ e (0) ↓
while ϕ e (1)↑, then the set ϕ e contains a pair starting with 0 but no pair with first entry 1. Thus for
partial functions, if they converge on the same inputs and when they do converge they have the same
Section 6. Rice’s Theorem 103
6.2 Definition A set I of natural numbers is an index set‡ when for all indices e, ê ∈
N, if e ∈ I and ϕ e ≃ ϕ ê then also ê ∈ I .
6.3 Example The set I = {e ∈ N ϕ e (x) = 2x for all x } is an index set. Suppose that
e ∈ I and that ê ∈ N is such that ϕ e ≃ ϕ ê . Then the behavior of ϕ ê is also to double
its input: ϕ ê (x) = 2x for all x . Thus ê ∈ I also.
6.4 Example The set J = {e ∈ N ϕ e (x) = 3x for all x , or ϕ e (x) = x 3 for all x } is
Start Start
Read x, y Read y
Run Px on x Run Px on x
End End
We’ve constructed the machine sketched on the right so that if ϕ x (x) ↑ then
ϕ s(e,x ) ≃ ϕ e0 and thus s(e, x) < I . Further, if ϕ x (x)↓ then ϕ s(e,x ) ≃ ϕ e1 and thus
s(e, x) ∈ I . Therefore if I were mechanically computable, so that we could
effectively check whether s(e, x) ∈ I , then we could solve the Halting problem.
outputs, then we can simply say that the two are equal, ϕ = ϕˆ, as sets. We use ≃ as a reminder that
‡
the functions may be partial. It is called an index set because it is a set of indices.
104 Chapter II. Background
6.7 Example We use Rice’s Theorem to show that this problem is unsolvable: given e ,
decide if ϕ e (3)↓.
Consider the set I = {e ∈ N ϕ e (3)↓}. To apply Rice’s Theorem we must show
that this set is not empty, that it is not all of N, and that it is an index set. The
set I is not empty because we can write a Turing machine that acts as the identity
function ϕ(x) = x , and if e 0 is the index of that Turing Machine then e 0 ∈ I . The
set I is not equal to N because, where e 1 is the index of a Turing machine that
never halts, we have that e 1 < I .
To finish we will verify that I is an index set. Assume that e ∈ I and let ê ∈ N
be such that ϕ e ≃ ϕ ê . Then e ∈ I gives that ϕ e (3)↓ and ϕ e ≃ ϕ ê gives that ϕ ê (3)↓
also. Hence ê ∈ I , and I is an index set.
6.8 Example We can use Rice’s Theorem to show that this problem is unsolvable:
given e , decide if ϕ e (x) = 7 for some
x.
We will show that I = {e ∈ N ϕ e (x) = 7 for some x } is a nontrivial index set.
This set is not empty because, where e 0 is the index of a Turing Machine that
acts as the identity function ϕ e0 (x) = x , we have that e 0 ∈ I . It is not all of N
because, where e 1 is the index of a Turing Machine that never halts, ei < I . So I
is nontrivial.
To showing that I is an index set assume that e ∈ I and let ê ∈ N be such that
ϕ e ≃ ϕ ê . By the first assumption, ϕ e (x 0 ) = 7 for some input x 0 . By the second, the
same input gives ϕ ê (x 0 ) = 7. Consequently, ê ∈ I .
6.9 Example This problem is unsolvable: determine, given an index e , whether ϕ e is
this. (
4 – if x is prime
f (x) =
x + 1 – otherwise
Let I = { j ∈ N ϕ j = f }. The set I is not empty because we can write a program
with this behavior, and so by Church’s Thesis there is a Turing machine with this
behavior, and its index is a member of I . Also, I , N because there is a Turing
machine that fails to halt on any input, and its index is not a member of I .
To finish, we argue that I is an index set. So suppose that e ∈ I and that
ϕ e ≃ ϕ ê . Because e ∈ I we have that ϕ e (x) = f (x) for all inputs x . Because ϕ e ≃ ϕ ê
we have that ϕ e (x) = ϕ ê (x) for all x , and so ê is also a member of I . Hence, I is
an index set.
We close by reflecting on the significance of Rice’s Theorem.
This result addresses the properties of computable functions. It does not speak
to properties of machines that aren’t about input-output behaviors. For example,
the set of functions computed by C programs whose first character is ‘k’ is not an
index set. This brings us back to the declaration in the first paragraph of the first
chapter that we are more interested in what the machines do than in the details of
their internal construction.
At this chapter’s start we saw that unsolvable problems exist, although we used
Section 6. Rice’s Theorem 105
a counting argument that did not give us natural examples. With the Halting
problem we saw that there are interesting unsolvable problems. Here the definition
of index set gave us a natural way to encapsulate a behavior of interest, and Rice’s
Theorem says that every nontrivial index set is unsolvable. So we’ve gone from
taking unsolvable problems as exotic, to taking them as things that genuinely do
come up, to taking them as occurring everywhere.
Of course, that’s an overstatement; we’ve all seen and written real-world
programs with interesting behaviors. Nonetheless, Rice’s Theorem is especially
significant for understanding what can be done mechanically.
II.6 Exercises
6.10 Your friend says, “According to Rice’s Theorem, everything is impossible.
Every property of a computer program is non-computable. But I do this supposedly
impossible stuff all the time!” Set them straight.
6.11 Is I = {e Pe runs for at least 100 steps on input 5 } an index set?
For each of the problems from Exercise 6.12 to Exercise 6.18, show that it is unsolv-
able by applying Rice’s theorem. (These repeat the problems from Exercise 5.17 to
Exercise 5.23.)
✓ 6.12 Given an index x , determine if ϕ x is total, that is, if it converges on every
input.
✓ 6.13 Given an index x , decide if the Turing machine Px squares its input. That is,
decide if ϕ x maps y 7→ y 2 .
6.14 Given x , determine if the function ϕ x returns the same value on two
consecutive inputs, so that ϕ x (y) = ϕ x (y + 1) for some y ∈ N.
6.15 Given an index x , determine whether ϕ x fails to converge on input 5.
6.16 Given an index, determine if the computable function with that index fails
to converge on all odd numbers.
6.17 Given an index e , decide if the function ϕ e computed by machine Pe is
x 7→ x + 1.
6.18 Given an index e , decide if the function ϕ e fails to converge on both inputs x
and 2x , for some x .
✓ 6.19 Show that each of these is an unsolvable problem by applying Rice’s Theorem.
(a) The problem of determining if a function is total, that is, converges on every
input.
(b) The problem of determining if a function is partial, that is, fails to converge
on some input.
✓ 6.20 For each problem, fill in the blanks to prove that it is unsolvable.
We will show that I = {e ∈ N (1) } is a nontrivial index set. Then Rice’s theorem
will give that the problem of determining membership in I is algorithmically unsolvable.
First we argue that I , . The sketch (2) is intuitively computable, so by Church’s
Thesis there is such a Turing machine. That machine’s index is an element of I .
106 Chapter II. Background
···
Section 7. Computably enumerable sets 107
More formally stated, consider the relation ≃ between natural numbers given by
e ≃ ê if ϕ e ≃ ϕ ê . (a) Show that this is an equivalence relation. (b) Describe the
parts, the equivalence classes. (c) Show that each index set is the union of some
of the equivalence classes. Hint: show that if an index set contains one element
of a class then it contains them all.
6.28 Because being an index set is a property of a set, we naturally consider how
it interacts with set operations. (a) Show that the complement of an index set is
also an index set. (b) Show that the collection of index sets is closed under union.
(c) Is it closed under intersection? If so prove that and if not then give a
counterexample.
6.29 Do the e 0 ∈ I case in the proof of Rice’s Theorem, Theorem 6.6.
Section
II.7 Computably enumerable sets
To attack the Halting problem the natural thing is to start by simulating P0 on
input 0 for a single step. Then simulate P0 on input 0 for a second step and
also simulate P1 on input 1 for one step. After that, run P0 on 0 for a third step,
followed by P1 on 1 for a second step, and then P2 on 2 for one step. This process
cycles among the Pe on e simulations, running each for a step. Eventually you will
see some of these halt and the elements of K will fill in. On computer systems this
interleaving is called time-slicing but in theory discussions it is called dovetailing.
We are listing the elements of K : first f (0), then f (1), . . . (the computable
function f is such that, for instance, f (0) = e where it happens that Pe on input e
is the first of these to halt). Definition 1.12 gives the terminology that a function f
with domain N enumerates its range.
Why won’t this process of gradual enumeration solve the Halting problem? If
e ∈ K then it will tell us that eventually, but if e < K then it will not.
7.1 Definition A set of natural numbers is computable or decidable if its characteris-
tic function is computable, so that we can effectively determine both membership
and non-membership.
contain repeats, and the numbers could appear in jumbled up order, that is, not
necessarily in ascending order.)
7.3 Lemma The following are equivalent for a set of natural numbers.
(a) It is computably enumerable, that is, either it is empty or it is the range of a
total computable function.
(b) It is the domain of a partial computable function.
(c) It is the range of a partial computable function.
Proof We will show that the first two are equivalent. That the second and third
are equivalent is Exercise 7.29.
Assume first that S is computably enumerable. If S is empty then it is the domain
of the partial computable function that diverges on all inputs. So instead assume
that S is the range of a total computable f , and we will describe a computable д
with domain S . Given the input x ∈ N, to compute д(x) enumerate f (0), f (1),
. . . and wait for x to appear as one of the values. If x does appear then halt
the computation (and return some nominal value). If x never appears then the
computation never halts.
For the other direction, assume that S is the domain of a partial computable
function д, to show that it is computably enumerable. If S is empty then it
is computably enumerable by definition. Otherwise we must produce a total
computable f whose range is S . If S is finite but not empty, S = {s 0 , ... sm }, then
such a function is given by 0 7→ s 0 , . . . m 7→ sm , and n 7→ s 0 for n > m .
Finally assume that S is infinite. Fix some s 0 ∈ S . Given n ∈ N, run the
computations of each of д(0), д(1), . . . д(n) for n -many steps. Possibly some of
these computations halt. Define f (n) to be the least k where д(k) halts within n
steps, and so that k < { f (0), f (1), ... f (n − 1) }. If no such k exists then define
f (n) = ŝ ; this makes f a total function.
If t < S then д(t) never converges and so t is never enumerated by f . If s ∈ S
then eventually д(s) must converge, in some number of steps, ns . The number s is
then is queued for output by f in the sense that it will be enumerated by f as, at
most, f (ns + s).
Many authors define computably enumerable sets using the second or third
items. Definition 7.2 is more natural but also more technically awkward.
7.4 Definition We = {y ϕ e ↓}
because S is computable, and it must halt because S is infinite. Similarly, f (k) will
be the k -th smallest element in S .
As to the second item, first suppose that S is computable. The prior item shows
that it is computably enumerable. The complement of S is also computable because
its characteristic function is 1S c = 1 − 1S . So the prior item shows that S c is also
computably enumerable.
Finally, suppose that both S and S c are computably enumerable. Let S be
enumerated by f and let S c be enumerated by f¯. We must give an effective
procedure to determine whether a given x ∈ N is an element of S . We will
dovetail the two enumerations: first run the computation of f (0) for a step and
the computation of f¯(0) for a step, then run the computations of f (0) and f¯(0)
for a second step, etc. Eventually x will be enumerated into one or the other.
7.6 Corollary The Halting problem set K is computably enumerable. Its complement
K c is not.
Proof The set K is the domain of the function f (x) = ϕ x (x), which is mechanically
computable by Church’s Thesis. If the complement K c were computably enumerable
then Lemma 7.5 would imply that K is computable, but it isn’t.
That result gives one reason to be interested in computably enumerable sets,
namely that the Halting problem
set K falls into
the class of computably enumerable
sets, as do sets such as {e ϕ e (3)↓} and {e there is an x so that ϕ e (x) = 7 }. So
this collection of sets contains lots of interesting members.
Another reason that these sets are interesting is philosophical: with Church’s
Thesis we can think that, in a sense, computable sets are the only sets that we will
ever know, and semidecidable sets are ones that we at least half know.
II.7 Exercises
✓ 7.7 You got a quiz question to define computably enumerable. A friend of yours
says they answered, “A set that can be enumerated by a Turing machine but that
is not computable.” Is that right?
✓ 7.8 Produce a function that enumerates each set, whose range is the given set.
(a) N (b) the even numbers (c) the perfect squares (d) the set { 5, 7, 11 }.
7.9 Produce a function that enumerates each set (a) the prime numbers (b) the
natural numbers whose digits are in non-increasing order (e.g., 531 or 5331 but
not 513).
7.10 One of these two is computable and the other is computably enumerable but
not computable.
Which is which?
(a) {e Pe halts on input 4 in less than twenty steps }
(b) {e Pe halts on input 4 in more than twenty steps }
7.11 Short answer: for each set state whether it is computable, computably
enumerable but not computable, or neither. (a) The set of indices e of Turing
110 Chapter II. Background
machines that contain an instruction using state q 4 . (b) The set of indices of
Turing machines that halt on input 3. (c) The set of indices of Turing machines
that halt on input 3 in fewer than 100 steps.
✓ 7.12 You read someone online who says, “every countable set S is computably
enumerable because if f : N → N has range S then you have the enumeration S
as f (0), f (1), . . .” Explain why this is wrong.
✓ 7.13 The set A5 = {e ϕ e (5)↓} is clearly not computable. Show that it is
computably enumerable.
7.14 Show that the set {e ϕ e (2) = 4 } is computably enumerable.
7.15 Name a set that has an enumeration but not a computable enumeration.
7.16 Name three sets that are computably enumerable but not computable.
✓ 7.17 Let K 0 = { ⟨e, x⟩ Pe halts on input x }.
(a) Show that it is computably enumerable.
(b) Show that the columns of K 0 , the sets C e = { ⟨e, x⟩ Pe halts on input x }
make up all the computable enumerable sets.
7.18 We know that there are subsets of N that are not computable. Are the
computably enumerable sets the rest of the subsets?
✓ 7.19 Show that the set Tot = {e ϕ e (y)↓ for all y } is not computable and not
computably enumerable. Hint: if this collection is computably enumerable then
we can get a table like the one that starts ??, on Unsolvability.
7.20 Can there be a set such that the problem of determining membership in that
set is unsolvable, and also the set is computably enumerable?
7.21 (a) Prove that every finite set is computably enumerable. (b) Sketch a
program that takes as input a finite set and returns a function that enumerates
the set.
7.22 Prove that every infinite computably enumerable set has an infinite com-
putable subset.
7.23 Let f be a partial computable function that enumerates the infinite set R ⊆ N.
Produce a total computable function that enumerates R .
7.24 A set is enumerable in increasing order if there is a computable function f
that is increasing: n < m implies f (n) < f (m), and whose range is the set. Prove
that an infinite set S is computable if and only if it is computably enumerable in
increasing order.
7.25 A set is computably enumerable without repetition if it is the range of a
computable function that is one-to-one. Prove that a set is computably enumerable
and infinite if and only if it is computably enumerable without repetition.
7.26 A set is co-computably enumerable if its complement is computably enumer-
able. Produce a set that is neither computably enumerable nor co-computably
enumerable.
Section 8. Oracles 111
7.27 Computability is a property of sets so we can consider its interaction with set
operations. (a) Must a subset of a computable set be computable? (b) Must the
union of two computable sets be computable? (c) The intersection? (d) The
complement?
7.28 Computable enumerability is a property of sets so we can consider its
interaction with set operations. (a) Must the union of two computably enumerable
sets be computably enumerable? (b) The intersection? (c) The complement?
7.29 Finish the proof of Lemma 7.3 by showing that the second and third items
are equivalent.
Section
II.8 Oracles
†
We can instead think that the first problem is more general than the second. For instance, the problem
of inputting a natural number and outputting its prime factors is harder than the problem of inputting
a natural and determining if it is divisible by seven. Clearly if we could solve the first then we could
solve the second. ‡ Opening it would let out the magic smoke.
112 Chapter II. Background
When x ∈ X then this takes one branch, while if x < X then it takes the other.
Most of what we have already developed about machines carries over. For
instance, program are strings, each program has an index, and the index is source-
equivalent, so that from an index we can compute the program source and from a
source we can find the index.
In the setup above, the program code does not change if we change the oracle —
if we unplug the X oracle box and replace it with a Y oracle box then the white box
is unchanged. Of course, the values returned by oracle(x) may change, resulting
in changes to the outcome of running the program with the oracle, the two-box
system. But the enhanced Turing machine stays the same. Thus, to specify a
relative computation, in addition to specifying which program we are using and
which inputs, we must also specify the oracle set. This explains the notations for
the oracle Turing machine, PeX , and for the outcome of the function computed
relative to an oracle, ϕ eX (x).
8.1 Definition If a function computed from X is the characteristic function of the
set S then we say that S is X -computable, or that S is Turing reducible to X or
that S reduces to X , denoted S ≤T X .
That is, S ≤T X if and only if ϕ eX = 1S for some e ∈ N. Think of the set S as
being a easier problem, or at least no harder, than X . For instance, where E is the
set of even numbers then E ≤T K .
8.2 Remark The terminology ‘S reduces to X ’ can at first seem reversed. The idea is
that we can solve problem S by using a solution to X . This phrase also appears in
other areas of Mathematics. For instance, in Calculus we may say that finding the
area under a polynomial curve reduces to the problem of antidifferentiation.
8.3 Theorem (a) A set is computable if and only if it is computable relative to the
empty set, or relative to any computable set.
(b) (Reflexivity) Every set is computable from itself, A ≤T A.
Section 8. Oracles 113
8.8 Theorem K ≡T K 0 .
Proof For K ≤T K 0 suppose that we have access to a K 0 -oracle. Since it can say
whether Pe halts on x for any input x , it can clearly say whether Pe halts on e .
For the K 0 ≤T K half, consider the flowchart on the left; obviously this machine
halts for all input triples exactly if ⟨e, x⟩ ∈ K 0 . By Church’s Thesis there is a Turing
machine implementing it; let it be machine Pê .
Start Start
Read e, x, y Read y
Simulate Pe Simulate Pe
on input x on input x
Print 42 Print 42
End End
Get the flowchart on the right by applying the s-m-n theorem to parametrize e
and x (that is, this is a sketch of machine Ps(ê,e,x ) ). That flowchart represents a
family of machines, one for each pair ⟨e, x⟩ .
Now suppose that we are given a pair ⟨e, x⟩ and consider the right-side flowchart
for that pair. It either halts on all inputs y or fails to halt on all inputs, depending
on whether ϕ e (x)↓. In particular, taking the input to be the number of this machine
y = s(ê, e, x), we have that Ps(ê,e,x ) halts on input s(ê, e, x) if and only if ϕ e (x)↓.
So give a question about membership in K 0 , about whether ⟨e, x⟩ ∈ K 0 , we can
answer it by determining whether s(ê, e, x) ∈ K .
8.9 Corollary The Halting problem is at least as hard as any computably enumerable
problem: We ≤T K for all e ∈ N.
Proof By Lemma 7.3 the computably enumerable sets are the columns of K 0 .
We = {y ϕ e (y)↓} = {y ⟨e, y⟩ ∈ K 0 }
So We ≤T K 0 ≡T K .
Because the Halting problem is in this sense the hardest of the computably
enumerable problems, we say that it is complete among the c.e. sets.
8.10 Theorem There is no e ∈ N such that ϕ eK is the characteristic function of
K K = {x ϕ xK (x)↓}. That is, where the Relativized Halting problem is the
problem of determining membership in K K , its solution is not computable from
a K oracle.
Proof This is an adaptation of the proof that the Halting problem is unsolvable.
Assume otherwise, that there is a mechanical computation relative to a K oracle
Section 8. Oracles 115
Then the function below is also computable relative to a K oracle. The flowchart
illustrates its construction; it uses the above function for the branch.
Start
( Read x
42 – if ϕ xK (x)↑
f K (x) = N Y
↑ – if ϕ xK (x)↓ Print 42 ϕ eK (x ) = 1? Infinite loop
End
Start Start
Read x, y Read y
Y N Y N
x ∈ X? x ∈ X?
End End
X
On the right ϕ s(e,x (y)↓ if and only if x ∈ X . Taking the oracle to be S and the
)
S
input to be s(e, x) gives that x ∈ S if and only if ϕ s(e,x (s(e, x))↓, which holds if
)
and only if s(e, x) ∈ K S .
That answers the question posed at the start of this section. One problem
strictly harder than the Halting problem is to compute the characteristic function
of K K .
II.8 Exercises
Recall from page 11 that a Turing machine is a decider for a set if it computes the
characteristic function of that set.
✓ 8.13 Suppose that the set A is Turing-reducible to the set B . Which of these are
true?
(a) A decider for A can be used to decide B .
(b) If A is computable then B is computable also.
(c) If A is uncomputable then B is uncomputable too.
✓ 8.14 Both oracles and deciders take in a number and return, 0 or 1, whether that
number is in the set. What’s the difference?
✓ 8.15 Your friend says, “Oracle machines are not real, so why talk about them?”
What do you say?
8.16 Your classmate says they answered a quiz question to define an oracle with,
“A set to solve unsolvable problems.” Give them a gentle critique.
8.17 Is there an oracle for every problem? For every problem is there an oracle?
8.18 There is this person in your class who keeps mouthing off and your professor
has to keep gently setting them straight. This time its, “Oracles can solve
unsolvable problems, right? And K K is unsolvable, right? So an oracle like the K
oracle should solve it.” Help your prof out here.
8.19 Is the number of oracles countable or uncountable?
✓ 8.20 Prove that A ≤T Ac for all A ⊆ N.
8.21 Your study partner confesses, “I don’t understand relative computation. Any
computation using an oracle must make only finitely many oracle calls if it halts.
But a finite oracle is computable, and so by Lemma 8.5 it is reducible to any set.”
Help them out.
8.22 Let A and B be sets. Show that if A(q) = B(q) for all q ∈ N used in the oracle
computation ϕ A (x) then ϕ A (x) = ϕ B (x).
✓ 8.23 Show that K ≰T .
✓ 8.24 Show that the Halting problem set K reduces to each.
(a) {x Px outputs a 7 for some input }
(b) {x ϕ x (y) = 2y for all input }
8.25 Let A and B be sets. Produce a set C so that A ≤T C and B ≤T C .
8.26 Fix an oracle. Prove that the collection of sets computable from that oracle
is countable.
Section 9. Fixed point theorem 117
8.27 The relation ≤T involves sets so we naturally ask how it interacts with set
operations.
(a) Does A ⊆ B imply A ≤T B ?
(b) Is A ≤T A ∪ B ?
(c) Is A ≤T A ∩ B ?
(d) Is A ≤T Ac ?
8.28 Let A ⊆ N. (a) Define when a set is computably enumerable in an oracle.
(b) Show that N is computably enumerable in A for all sets A. (c) Show that K A
is computably enumerable in A.
Section
II.9 Fixed point theorem
Recall our first example of diagonalization, the proof that the set of real numbers
is not countable, on page 76. We assume that there is an f : N → R and consider
its inputs and outputs, as illustrated in this table.
Let a decimal representation of the number on row n be dn = d.d ˆ n, 0 dn, 1 dn, 2 ... Go
down the diagonal to the right of the decimal point to get the sequence of digits
⟨d 0, 0 , d 1, 1 , d 2, 2 , ...⟩ . With that sequence, construct a number z = 0.z 0 z 1 z 2 ... by
making its n -th decimal place be something other than dn,n . In our example we
took a transformation t of digits given by t(dn,n ) = 2 if dn,n = 1, and t(dn,n ) = 1
otherwise, so that the table above gives z = 0.1211 ... Then the diagonalization
argument culminates in verifying that z is not any of the rows.
Below is a table with all such sequences, that is, all effective sequences of effective
functions, ϕ ϕe (n) .
Sequence term
n=0 n=1 n=2 n=3 ...
e =0 ϕ ϕ0 (0) ϕ ϕ0 (1) ϕ ϕ0 (2) ϕ ϕ0 (3) ...
e =1 ϕ ϕ1 (0) ϕ ϕ1 (1) ϕ ϕ1 (2) ϕ ϕ1 (3) ...
Sequence e =2 ϕ ϕ2 (0) ϕ ϕ2 (1) ϕ ϕ2 (2) ϕ ϕ2 (3) ...
e =3 ϕ ϕ3 (0) ϕ ϕ3 (1) ϕ ϕ3 (2) ϕ ϕ3 (3)
.. .. ..
. . .
Each entry ϕ ϕe (n) is a computable function. If ϕ e (n) diverges then the function as
whole diverges.
The natural transformation is to use a computable function f .
tf
ϕ x 7−→ ϕ f (x )
The next result shows that under this transformation, diagonalization fails. Thus,
the transformation t f has a fixed point.
9.1 Theorem (Fixed Point Theorem, Kleene 1938)† For any total computable
function f there is a number k such that ϕ k = ϕ f (k ) .
Proof The array diagonal is ϕ ϕ0 (0) , ϕ ϕ1 (1) , ϕ ϕ2 (2) ... The flowchart on the left below
is a sketch of a function f (n, x) = ϕ ϕn (n) (x). Church’s Thesis says that some Turing
machine computes this function; let that machine have index e 0 . Apply the s-m-n
theorem to parametrize n , giving the right chart, which describes the family of
machines that compute ϕ s(e0,n) , the n -th function on the diagonal.
Start Start
Read n, x Read x
(
Run Pn on n Run Pn on n ϕ ϕe (e) (x) – if ϕ e (e)↓
ϕ s(e0,e) (x) =
↑ – otherwise
With the result w , With the result w ,
run Pw on input x run Pw on input x
End End
The index e 0 is fixed, so s(e 0 , n) is a function of one variable. Let д(n) = s(e 0 , n),
so that the diagonal functions are ϕд(n) . This function д is computable and total.
†
This is also known as the Recursion Theorem but there is another widely used result of that name.
This name is more descriptive so we’ll go with it.
Section 9. Fixed point theorem 119
Under t f those functions are transformed to ϕ f д(0) , ϕ f д(1) , ϕ f д(2) , ... The
composition f ◦ д is computable and total, since f is specified as total.
Start
( Read x
ϕ f ϕn (n) (x) – if ϕ n (n)↓ Run Pn on n
ϕ f д(n) (x) =
↑ – otherwise
With the result w , run Pf (w ) on x
End
Start Start
Read x, m Read x (
42 – if x = m
x = m? x = m? ϕ s(eo ,m) (x) =
Y N Y N ↑ – otherwise
Print 42 Loop Print 42 Loop
End End
Since e 0 is fixed (it is the index of the machine sketched on the left), s(e 0 , x) is a
total computable function of one variable, f (m) = s(e 0 , m), where the associated
Turing machine halts only on input m . The Fixed Point Theorem gives a fixed
point, ϕ f (e) = ϕ e , and the associated Turing machine Pe halts only on e .
120 Chapter II. Background
This says that there is a Turing machine that halts only on one input, its index.
Rephrased for rhetorical effect, this machine’s name is its behavior.†
9.4 Corollary There is an m ∈ N such that ϕm (x) = m for all inputs x .
Proof Consider the function ψ (x, y) = x . As the flowchart on the left illustrates, it
is computable.‡ So by Church’s Thesis there is a Turing machine that computes it.
Let that machine have index e , so that ψ (x, y) = ϕ e (x, y) = x .
Start Start
Read x , y Read y
Print x Print x
End End
9.5 Remark Every Turing machine has some index number but here the index is
related to its machine’s behavior. Imagine finding that in our numbering scheme,
machine P7 outputs 7 on all inputs. This may seem to be an accident of the choice
of scheme. But it isn’t an accident; the corollary says that something like this must
happen for any acceptable numbering.
The Fixed Point Theorem is deep, showing surprising and interesting behaviors
that occur in any sufficiently powerful computation system. For instance, since a
Turing machine’s index is source-equivalent, the prior result raises the question
of whether there is a program that prints its own source, that self-reproduces. In
addition to the discussion below, Extra C has more.
Discussion The Fixed Point Theorem and its proof are often considered mysterious,
or at any rate obscure. Here we will expand on a few points.
One aspect that bears explication is how it employs the use-mention distinction.
Compare the sentence Atlantis is a mythical city to There are two a’s in “Atlantis”. In
the first, we say that ‘Atlantis’ is used because it has a value, it points to something.
In the second sentence ‘Atlantis’ is not referring to something — its value is itself —
so we say that it is mentioned.§
†
Here, ‘name’ is used as an equivalent of ‘index’ that is meant to be evocative. ‡ In this argument
perhaps the flowchart is overkill since the function is obviously computable. But when it is not obvious,
as in the prior result and in some of the exercises, we need an outline of how to compute the function.
§
A version of this comes up in programming books. If such a book has the sentence, “The number
of players is players” then the first ‘players’ refers to people while the second is a variable from the
program. There the computer code is in a typewriter font, as is a standard practice, because quoting
would be awkward and ugly.
Section 9. Fixed point theorem 121
The x and y variables are being considered at different levels of meaning than
ordinary variables. On one level, x refers to the contents of 123, while on another
level it is about the contents of those contents, what’s in address 901.
As to the role played by the use-mention distinction in the Fixed Point Theorem,
the proof starts by taking д(e) to be the name of this procedure.
(
ϕ ϕe (e) (x) – if ϕ e (e)↓
ϕд(e) (x) = ϕ s(e0,e) (x) =
↑ – otherwise
Don’t be fooled by the notation; it is not the case that д(e) equals ϕ e (e) but
instead д(e) is an index of the flowchart on the right in the proof, describing the
procedure that computes the function above. Regardless of whether ϕ e (e)↓, we can
nonetheless compute the index д(n) and from it the instructions for the function.
There is an analogy here with Atlantis — despite that the referred-to city doesn’t
exist we can still sensibly assert things about its name.
†
Using the * operator to access the value stored at a pointer is called dereferencing that pointer. There
is a matching referencing operator, &, that gives the address of an existing variable.
122 Chapter II. Background
Informally, what д(e) names is, “Given input x , run Pe on input e and if it halts
with output w then run Pw on input x .” Shorter: “Produce ϕ e (e) and then do
ϕ e (e).”
Next, from f we consider the composition and give it a name f ◦ д = ϕv .
Substituting v into the prior paragraph gives that д(v) names, “Compute ϕv (v)
and then do ϕv (v).” That’s the same as “Compute f ◦ д (v) and then do f ◦ д (v).”
Note the self-reference; it may naively appear that to compute д(v) we need
to compute д(v), that the instructions for д(v) paradoxically contains itself as a
subpart.
Then д(v) first computes the name of f ◦ д (v) and after that runs the machine
numbered f ◦ д (v). So д(v) and f ◦ д (v) are names for machines that compute
the same function. Thus д(v) does not contain itself; more precisely, the set of
instructions for computing д(v) does not contain itself. Instead, it contains a name
for the instructions for computing itself.
II.9 Exercises
✓ 9.7 Your friend asks you about the proof of the Fixed Point Theorem, Theorem 9.1.
“The last line says ϕд(v) = ϕ ϕv (v) ; isn’t this just saying that д(v) = ϕv (v)? Why
the circumlocution?” Help them out.
✓ 9.8 Show each. (a) There is an index e such that ϕ e = ϕ e+7 . (b) There is an e
such that ϕ e = ϕ 2e .
9.9 What conclusion can you draw about acceptable enumerations of Turing
machines by applying the Fixed Point Theorem to each of these? (a) the tripling
function x 7→ 3x (b) the squaring function x 7→ x 2 (c) the function that gives
0 except for x = 5, when it gives 1 (d) the constant function x 7→ 42
9.10 We will prove that there is an m such that Wm = {x ϕm (x)↓} = {m 2 }.
(a) You want to show that there is a uniformly computable family of functions
like this. (
42 – if y = x 2
ϕ s(e,x ) (y) =
↑ – otherwise
Extra
II.A Hilbert’s Hotel
Once upon a time there was an infinite hotel. The rooms were numbered 0, 1,
. . . , naturally. One day every room was occupied when someone new came to the
front desk; could the hotel accommodate? The clerk hit on the idea of moving
each guest up a room, that is, moving the guest in room n to room n + 1. With
that, room 0 was empty. So this hotel always has space for a new guest, or a finite
number of new guests.
Next a bus rolls in with infinitely many people p0 , p1 , ... The clerk has
the idea to move each guest to a room with twice the number, putting
the guest from room n into room 2n . Now the odd-numbered rooms are
empty, so pi can go in room 2i + 1, and everyone has a room.
Then in rolls a convoy of buses, infinitely many of them, each with
infinitely many people: B 0 = {p0, 0 , p0, 1 , ... }, and B 1 = {p1, 0 , p1, 1 , ... },
etc. By now the spirit is clear: move each current guest to a new room
with twice the number and the new people go into the odd-numbered
rooms, in the breadth-first order that we use to count N × N.
After this experience the clerk may well suppose that there is always
room in the infinite hotel, that it can fit any set of guests at all, with a Plenty of empty
rooms in this
sufficiently clever method. Restated, this story makes natural the guess
hotel.
that all infinite sets have the same cardinality. That guess is wrong.
There are sets so large that their members could not all fit in the hotel.
One such set is R.†
†
Alas, the infinite hotel does not now exist. The guest in room 0 said that the guest from room 1 would
cover both of their bills. The guest from room 1 said yes, but in addition the guest from room 2 had
agreed to pay for all three rooms. Room 2 said that room 3 would pay, etc. So Hilbert’s Hotel made no
money despite having infinitely many rooms, or perhaps because of it.
124 Chapter II. Background
II.A Exercises
A.1 Imagine the hotel is empty. A hundred buses arrive, where bus Bi contains
passengers bi, 0 , bi, 1 , etc. Give a scheme for putting them in rooms.
A.2 Give a formula assigning a room to each person from the infinite bus convoy.
A.3 The hotel builds a parking lot. Each floor Fi has infinitely many spaces fi, 0 ,
fi, 1 , . . . And, no surprise, there are infinitely many floors F 0 , F 1 , . . . One day
the hotel is empty and buses arrive, one per parking space, each with infinitely
many people. Give a way to accommodate all these people.
A.4 The management is irked that this hotel cannot fit all of the real numbers. So
they announce plans for a new hotel, with a room for each r ∈ R. Can they now
cover every possible set of guests?
Extra
II.B The Halting problem in Wider Culture
The Halting problems and the related results are about limits. In the light of
Church’s Thesis, they say that there are things that we can never do. To help
understand their impact on the intellectual world outside mathematics as well as
inside we can place them in a historical setting.
With Napoleon’s downfall in the early 1800’s, many
people in Europe felt a swing back to a sense of order and
optimism, fueled by progress.† For example, in the history
of Turing’s native England, Queen Victoria’s reign from
1837 to 1901 seemed to many English commentators to
be an extended period of prosperity and peace. Across
wider Europe, people perceived that the natural world was
being tamed with science and engineering — witness the
introduction of steam railways in 1825, the opening of the Queen Victoria opens the Great
Suez Canal in 1869, and the invention of the electric light Exhibition of the Works of
in 1879.‡ Industry of All Nations, 1851
In science this optimism was expressed by A A Michel-
son, who wrote in 1899, “The more important fundamental laws and facts of
physical science have all been discovered, and these are now so firmly established
that the possibility of their ever being supplanted in consequence of new discoveries
is exceedingly remote.”
The twentieth century physicist R Feynman has likened science to working out
the rules of a game by watching it being played, “to try to understand nature is to
†
These statements are in the context of European intellectual culture, the context in which early Theory
of Computation results appeared. A broader view is outside our scope. ‡ This is not to say that this
perception is justified. Disease and poverty were rampant, colonialism and imperialism ruined the lives
of millions, for much of the time the horrors of industrial slavery in the US south went unchecked, and
Europe was hardly an oasis of calm, with for instance the revolutions of 1848. Nonetheless the zeitgeist
included a sense of progress, of winning.
Extra B. The Halting problem in Wider Culture 125
imagine that the gods are playing some great game like chess. . . . And you don’t
know the rules of the game, but you’re allowed to look at the board from time to
time, in a little corner, perhaps. And from these observations, you try to figure out
what the rules are of the game.” Around the year 1900 many observers thought
that we basically had got the rules and that although there might remain a couple
of obscure things like castling, those would be worked out soon enough.
In Mathematics, this view was most famously voiced in an address
given by Hilbert in 1930, “We must not believe those, who today, with
philosophical bearing and deliberative tone, prophesy the fall of culture and
accept the ignorabimus. For us there is no ignorabimus, and in my opinion
none whatever in natural science. In opposition to the foolish ignorabimus
our slogan shall be: We must know — we will know.” (‘Ignorabimus’ means
D Hilbert ‘that which we must be forever ignorant of’ or ‘that thing we will never
†
1862–1943 fully penetrate’.) There was of course a range of opinion but the zeitgeist
was that we could expect that any question would be settled, and perhaps
soon.
But starting in the early 1900’s, that changed. Exhibit A is the
picture to the right. That the modern mastery of mechanisms can
have terrible effect became apparent to everyone during World
War I, 1914–1918. Ten million military men died. Overall, seventeen
million people died. With universal conscription, probably the men
in this picture did not want to be here. They were killed by a man
who probably also did not want to be here, who never knew that
he killed them, and who simply entered coordinates into a firing
mechanism. If you were at those coordinates, it didn’t matter how
brave you were, or how strong, or how right was your cause — you
died. All that stuff about your people and honor and god, that was World War I German
all bullshit. The zeitgeist shifted: Pandora’s box had opened and dead in a trench
the world is not at all ordered, reasoned, or sensible.
At something like the same time in science, Michaelson’s assertion that physics
was a solved problem was destroyed by the discovery of radiation. This brought in
quantum theory, which has at its heart that there are events that are completely
random, that included the uncertainty principle, and that led to the atom bomb.
With Einstein we see most directly the shift in wider intellectual culture away
from a sense of unlimited progress. After experiments during a solar eclipse in 1919
provided strong support for his theories, Einstein became an overnight celebrity
(“Einstein Theory Triumphs” was the headline in The New York Times). He was
†
Below we will cite some things as turning points that occur before 1930; how can that be? For one
thing, cultural shifts always involve muddled timelines. For another, this is Hilbert’s retirement address
so we can reasonably take his as a lagging view. Finally, in Mathematics the shift occurred later than in
the general culture. We mark that shift with the announcement of Gödel’s Incompleteness Theorem.
That announcement came at the same meeting as Hilbert’s speech, on the day before. Gödel was in the
audience for Hilbert’s address and whispered to O Taussky-Todd, “He doesn’t get it.”
126 Chapter II. Background
seen by the public as having changed our view of the universe from Newtonian
clockwork to one where “everything is relative.” His work showed that the universe
has limits and that everyday perceptions break down: nothing can travel faster
than light, time bends, and even the commonsense idea of two things happening
at the same instant falls apart.
In the general culture there were many reflections of this
loss of certainty. For example, the generation of writers and
artists who came of age in World War I — including Eliot,
Fitzgerald, Hemingway, Pound, and Stein — became known
as the Lost Generation. They expressed their experience
through themes of alienation, isolation, and dismay at the
corruption they saw around them. In music, composers such
as Debussy, Mahler, and Strauss broke with the traditional
expressive forms, in ways that were often hard for listeners to Salvadore Dali’s 1931 Persis-
understand — Stravinsky’s Rite of Spring caused a near riot at tence of Memory. Depicts
relativity’s warping of a
its premiere in 1913. As for art, the painting here dramatically
pillar of reality, time itself.
shows that visual artists also picked up on these themes.
In mathematics, much the same inversion of the stand-
ing order happened in 1930 with K Gödel’s announcement
of the Incompleteness Theorem. This says that if we fix a
(sufficiently strong) formal system such as the elementary
number theory of N with addition and multiplication then
there are statements that, while true in the system, cannot
be proved in that system. The theorem is clearly about
what cannot be done — there are things that are true that
Gödel and his best friend we shall never prove. This statement of hard limits seemed
to many observers to be especially striking in mathematics,
which had held a traditional place as the most solid of knowledge. For example,
I Kant said, “I assert that in any particular natural science, one encounters genuine
scientific substance only to the extent that mathematics is present.”
Gödel’s Theorem is closely related to the Halting problem. In a mathematical
proof, each step must be verifiable as either an axiom or as a deduction that is valid
from the prior steps. So proving a mathematical theorem is a kind of computation.†
Thus, Gödel’s Theorem and other uncomputability results are in the same vein.
To people at the time these results were deeply shocking, revolutionary. And
while we work in an intellectual culture that has absorbed this shock, we must still
recognize them as a bedrock.
†
This implies that you could start with all of the axioms and apply all of the logic rules to get a set of
theorems. Then application of all of the logic rules to those will give all the second-rank theorems, etc.
In this way, by dovetailing from the axioms you can in principle computably enumerate the theorems.
Extra C. Self Reproduction 127
Extra
II.C Self Reproduction
Paley’s watch In 1802, W Paley famously argued for the existence of a god from a
perception of unexplained order in the natural world.
In crossing a heath, . . . suppose I had found a watch upon the ground . . . [W]hen
we come to inspect the watch we perceive . . . that its several parts are framed and put
together for a purpose, e.g., that they are so formed and adjusted as to produce motion,
and that motion so regulated as to point out the hour of the day . . . the inference we
think is inevitable, that the watch must have a maker — that there must have existed,
at some time and at some place or other, an artificer or artificers who formed it for the
purpose which we find it actually to answer, who comprehended its construction and
designed its use.
The marks of design are too strong to be got over. Design must have had a designer.
That designer must have been a person. That person is GOD.
Paley then gives his strongest argument, that the most incredible
thing in the natural world, that which distinguishes living things from
stones or machines, is that they can, if given a chance, self-reproduce.
Suppose, in the next place, that the person, who found the watch,
would, after some time, discover, that, in addition to all the properties
which he had hitherto observed in it, it possessed the unexpected property
of producing, in the course of its movement, another watch like itself . . . If
that construction without this property, or which is the same thing, before
this property had been noticed, proved intention and art to have been
William Paley
employed about it; still more strong would the proof appear, when he
1743–1805
came to the knowledge of this further property, the crown and perfection
of all the rest.
This argument was a very influential before the discovery by Darwin and
Wallace of descent with modification through natural selection. It shows that from
among all the things in the natural world to marvel at — the graceful shell of a
nautilus, the precision of an eagle’s eye, or consciousness — the greatest wonder
for many observers was self-reproduction.
128 Chapter II. Background
This is close. Just escape some newlines and quotation marks.|| This program,
try3.c, works.
†
The easiest such program finds its source file on the disk and prints it. That is cheating. ‡ The
backslash-n gives a newline character. § The char *e="..." construct gives a string. In the C
language printf(...) command the first argument is a string. In that string double quotes expand to
single quotes, %c takes a character substitution from any following arguments, and %s takes a string
substitution. || The 10 is the ASCII encoding for newline and 34 is ASCII for a double quotation mark.
Extra C. Self Reproduction 129
Quines are possible in any complete model of computation; the exercises ask
for them in a few languages.
Know thyself A program that prints itself can seem to be a parlor trick. But
for routines to have access to their code is useful. For example, to write a
toString(obj) method you probably want your method to ask obj for its source.
Another example, more nefarious, is a computer virus that transmits copies of its
code to other machines.
We will show how a routine can know its source. We will start with an alternate
presentation of a machine that prints itself.
First, two technical points. One is that given two programs we can combine them
into one, so that we run the first and then run the second. The other point is that we
have fixed a numbering of Turing machines that is ‘acceptable’, meaning that there
is a computable function from indices to machines and another computable function
back. Write T for the set of Turing machines and let the function str : T → B∗
input a Turing machine and return a standard bitstring representation of that
machine (i.e., its source), let machine : N → T input an index e and return the
machine Pe , and let idx : B∗ → N input the string representation of Pe and returns
the index of that machine, e (if the input string doesn’t represent a Turing machine
then it doesn’t matter what this function does). Do this in such a way that idx is
the inverse of the function str ◦ machine.
Consider the machines sketched below. The first computes the function
echo(σ ) = σ . Let it have index e 0 and apply the s-m-n Theorem to get the family
of machines sketched in the middle, Ps(e0,σ ) , each of which ignores its input and
just prints σ .
Start
Start
Start Read σ
Read σ
Print σ Erase σ
Print σ
End Print str ◦ machine (s(e0, σ ))
End
End
On the right, s(e 0 , σ ) is the index of the middle machine so str ◦ machine(s(e 0 , σ ))
is the standard string representation of the middle machine for σ . Thus, if σ is
the standard representation of a Turing machine P then when the machine on
the right is done, the tape will contain only the standard representation of the
middle machine for σ , the machine that ignores its input and prints out σ . Call the
machine on the right Q and call the function that it computes q : Σ∗ → Σ∗ .
The machine that prints itself is a combination of two machines, A and B.
Here’s B.
130 Chapter II. Background
Start
Read β
Compute α = q(β )
Print α a β
End
The other machine is A = Ps(e0, str(B)) , which ignores anything on the tape and
prints out the string representation of B.
The action of the combination on an empty tape is that first the A part prints
out the standard string representation str(B). Then B reads it in as β , computes
α = str(A), concatenates the two string representations α ⌢ β , and prints it. This is
the string representation of itself, of the combination of A with B.
To get a machine that computes with its own source we will extend this
approach. The idea is to start with a machine C that takes two inputs, a string
representation of a machine and a string, and then get the desired machine D that
uses its own representation.
C.1 Theorem For any Turing machine C that computes a two-input function
c : Σ∗ × Σ∗ → Σ∗ there is a machine D that computes a one-input function
d : Σ∗ → Σ∗ where d(ω) = c(str(D), ω).
The machine D is the combination of three machines, A, B, and C . First, as
shown on the left, modify Q to write its output after a string already on the tape,
because we need to leave the input ω on the tape.
Start Start
Read σ Read ω , τ
End End
Second, modify A to be Ps(e0, str(B)⌢str(C )) , which ignores anything on the tape and
prints out the string representation of the combination of machines B and C .
Next, as shown on the right, modify B to input two strings, ω and τ . Apply q to
the second to compute A’s standard representation α . Print out the concatenation
of α and τ , then a blank, and then the input ω . That has the form of two
inputs; finish by running the machine C on them.
Verbing In English you can accomplish a self-reference with, “This sentence has
32 characters.” But formal languages such as programming languages usually don’t
have a self-reference operator like the ‘this’ in that sentence. The above discussion
Extra D. Busy Beaver 131
shows that no such operator is necessary. We can also use those techniques in
English, as here.
Print out two copies of the following, the second in quotes: “Print out two copies of
the following, the second in quotes:”
The verb ‘to quine’ means “to write a sentence fragment a first time, and then
to write it a second time, but with quotation marks around it” For example, from
‘say’ we get “say ‘say”’. And, quining ‘quine’ gives “quine ‘quine’.”
In this linguistic analogy of the self-reproducing programs, the word plays the
role of the data, the part played by the machine A or the part played by try3.c’s
string char *e. In the slogan “Produce the machine, and then do the machine,”
they are the ‘produce’ part. The machine B plays the role of the verb ‘quine’, and is
the ‘do’ part.
Reflections on Trusting Trust K Thompson is one of the two main cre-
ators of the UNIX operating system. For this and other accomplishments
he won the Turing Award, the highest honor in computer science. He
began his acceptance address with this.
In college, before video games, we would amuse ourselves by posing
programming exercises. One of the favorites was to write the shortest
self-reproducing program. . . .
More precisely stated, the problem is to write a source program that,
Ken Thompson
when compiled and executed, will produce as output an exact copy of its
b 1943
source. If you have never done this, I urge you to try it on your own. The
discovery of how to do it is a revelation that far surpasses any benefit obtained by being
told how to do it. The part about “shortest” was just an incentive to demonstrate skill
and determine a winner.
This celebrated essay develops a quine and goes on to show how the existence
of such code poses a security threat that is very subtle and just about undetectable.
The entire address (Thompson 1984) is widely available; everyone should read it.
II.C Exercises
C.2 Produce a Scheme quine.
C.3 Produce a Python quine.
C.4 Consider a Scheme function diag that is given a string σ and returns a
string with each instance of x in σ replaced with a quoted version of σ . Thus
diag("hello x world") returns hello ’hello x world’ world. Show that
print(diag('print(diag(x))')) is a quine.
C.5 Write a program that defines a function f taking a string as input, and
produces its output by applying f to its source code. For example, if f reverses
the given string, then the program should outputs its source code backwards.
C.6 Write a two-level polyglot quine, a program in one language that outputs a
program in a second language, which outputs the original program.
132 Chapter II. Background
Extra
II.D Busy Beaver
Here is a try at solving the Halting problem: “For any n ∈ N the set of Turing
machines having n many tuples or fewer is finite. For some members of this set
Pe (e) halts and for some members it does not, but because the set is finite the list of
which Turing machines halt must also be finite. Finite sets are computable. So to
solve the Halting problem, given a Turing Machine P , find how many instructions
it has and just compute the associated finite halting information set.” The problem
with this plan is uniformity, or rather lack of it — there is no single computable
function that accepts inputs of the form ⟨n, e⟩ and that outputs 1 if the n -instruction
machine Pe (e) halts, or 0 otherwise.
The natural adjustment of that plan, the uniform attack, is to start all of the
machines having n or fewer instructions and dovetail their computations until no
more of them will ever converge.
That is, consider D : N → N, where D(n) is the minimal number of steps after
which all of the n -instruction machines that will ever converge have done so. We
can prove that D is not computable. For, assume otherwise. Then to compute
whether Pe halts on input e , find how many instructions n are in the machine Pe ,
compute D(n), and run Pe (e) for D(n)-many steps. If Pe (e) has not halted by then,
it never will. Of course, this contradicts the unsolvability of the Halting problem.
The function D may seems like just another uncomputable function; why is it
especially enlightening? Observe that if a function D̂ has values larger than D , if
D̂(n) ≥ D(n) for all sufficiently large n, then D̂ is also not computable. This gives
us an insight into one way that functions can fail to be computable: they can grow
too fast.†
So, which n -line program is the most productive? The Busy
Beaver problem is: which n -state Turing Machine leaves the most
1’s after halting, when started on an empty tape?
Think of this as a competition — who can write the busiest
machine? To have a competition we need precise rules, which
differ in unimportant ways from the conventions we have adopted
in this book. So we fix a definition of Turing Machines where
Rare moment of rest there is a single tape that is unbounded at one end, there are
two tape symbols 1 and B, and where transitions are of the form
∆(state, tape symbol) = ⟨state, tape symbol, head shift⟩ .
Busy Beaver is unsolvable Write Σ(n) for the largest number of 1’s that any
n state machine, when started on a blank tape, leaves on the tape after halting.
Write S(n) for the most moves, that is, transitions.
Why isn’t Σ computable? The obvious thing is to do a breadth-first search: there
are finitely many n -state machines, start them all on a blank tape, and await
†
Note the connection with the Ackermann function: we showed that it is not primitive recursive because
it grows faster than any primitive recursive function.
Extra D. Busy Beaver 133
developments.
That won’t work because some of the machines won’t halt. At any
moment you have some machines that have halted and you can see
how many 1’s are on each such tape, so you know the longest so far.
But as to the not-yet-halted ones, who knows? You can by-hand see
that this one or that one will never halt and so you can figure out the
answer for n = 1 or n = 2. But there is no algorithm to decide the
question for an arbitrary number of states.
D.1 Theorem (Radó, 1962) The function Σ is not computable.
Tibor Radó 1895–
Proof Let f : N → N be computable. We will show that Σ , f by 1965
showing that Σ(n) > f (n) for infinitely many n .
First note that there is a Turing Machine Mj having j many states
that writes j -many 1’s to a blank tape. For instance, here is M4 .
B,1 B,1 B,1 B,1
q0 q1 q2 q3 Halt
1,R 1,R 1,R 1,R
Also note that we can compose two Turing machines. The illustration below shows
two machines on the left. On the right, we have combined the final states of the
first machine with the start state of the second.
... mj ... mj
It has the properties: if 0 < m then f (m) < F (m), and m 2 ≤ F (m), and F (m) <
F (m + 1). It is intuitively computable so Church’s Thesis says there is a Turing
machine MF that computes it. Let that machine have n F many states.
Now consider the Turing machine P that performs Mj and follows that with
the machine MF , and then follows that with another copy of the machine MF . If
started on a blank tape this machine will first produce j -many 1’s, then produce
F (j)-many 1’s, and finally will leave the tape with F (F (j))-many 1’s. Thus its
productivity is F (F (j)). It has j + 2n F many states.
Compare that with the j + 2n F -state Busy Beaver machine. By definition
F (F (j)) ≤ Σ(j + 2n F ). Because n F is constant (it is the number of states in the
machine MF ), the relation j + 2n F ≤ j 2 < F (j) holds for sufficiently large j .
Since F is strictly increasing, F (j + 2n F ) < F (F (j)). Combining gives f (j + 2n F ) <
F (j + 2n F ) < F (F (j)) ≤ Σ(j + 2n F ), as required.
134 Chapter II. Background
What is known That Σ(0) = 0 and Σ(1) = 1 follow straight from the definition.
(The convention is to not count the halt state, so Σ(0) refers to a machine consisting
only of a halting state.) Radó noted in his 1962 paper that Σ(2) = 4. In 1964 Radó
and Lin showed that Σ(3) = 6.
D.2 Example This is the three state Busy Beaver machine.
∆ B 1
q0 q 1 , 1, R q 4 , 1, R
q1 q 2 , 0, R q 1 , 1, R
q2 q 3 , 1, L q 0 , 1, L
In 1983 A Brady showed that Σ(4) = 107. As to Σ(5), even today no one knows.
Here are the current world records.
n 1 2 3 4 5 6
Σ(n) 1 4 6 13 ≥ 4 098 ≥ 1.29 × 10865
S(n) 1 6 21 107 ≥ 47 176 870 ≥ 3 × 101730
Not only are Busy Beaver numbers very hard to compute, at some point they
become impossible. In 2016, A Yedida and S Aaronson obtained an n for which Σ(n)
is unknowable. To do that, they created a programming language where programs
compile down to Turing machines. With this, they constructed a 7918-state
Turing machine that halts if there is a contradiction within the standard axioms
for Mathematics, and never halts if those axioms are consistent. We believe that
these axioms are consistent, so we believe that this machine doesn’t halt. However,
Gödel’s Second Incompleteness Theorems shows that there is no way to prove the
axioms are consistent using the axioms themselves, so Σ(n) is unknowable in that
even if we were given the number n , we could not use our axioms to prove that it
is right, to prove that this machine halts.
So one way for a function to fail to be computable is if it grows faster than
any computable function. Note, however, that this is not the only way. There are
functions that grow slower than some computable function but are nonetheless not
computable.
II.D Exercises
✓ D.3 Give the computation history, the sequence of configurations, that come from
running the three-state Busy Beaver machine. Hint: you can run it on the Turing
machine simulator.
✓ D.4 (a) How many Turing machines with tape alphabet { B, 1 } are there having
one state? (b) Two? (c) How many with n states?
D.5 How many Turing machines are there, with a tape alphabet Σ of n characters
and having n states?
Extra E. Cantor in Code 135
D.6 Show that there are uncomputable functions that grow slower than some
computable function. Hint: There are uncountably many functions with output in
the set B.
D.7 Give a diagonal construction of a function that is greater than any computable
function.
Extra
II.E Cantor in Code
The definitions of cardinality and accountability do not require that the functions
must be effective. In this section we effectivize, counting sets such as { 0, 1 } × N and
N × N using functions that are mechanically computable. The most straightforward
way to show that these functions can be computed is to exhibit code, so here it is.
Scheme’s let creates a local variable.
(use numbers)
We will need both the map and its inverse, which goes from the number to
the pair. Here is the routine that inverts cantor. The let* variant allows us to
compute the local variable t by using the local variable d computed before it, in
the prior line.
;; xy given the cantor number, return (x y)
(define (xy c)
(let* ((d (diag-num c))
(t (triangle-num d)))
(list (- c t)
(- d (- c t)))))
; xy-4 Un-number quads: give (x0 x1 x2 x3) so that (cantor-4 x0 x1 x2 x3) => c
(define (xy-4 c)
(let ((pr (xy c)))
(cons (car pr)
(xy-3 (cadr pr)))))
What the heck, let’s extend to tuples of any size. We don’t need these but they
are fun. The cantor-n routine takes a tuple of any length and outputs the Cantor
number of that tuple. Also there is xy-arity that takes two inputs, the length of a
tuple and its Cantor number, and produces the tuple.
†
The code for diag-num has two implementation details of interest. One is that in Scheme the floor
function returns a floating point number. We want xy to be the inverse of cantor, which inputs
integers, so we want diag-num to return an integer. That explains the inexact->exact conversion.
The second detail is that the code leads to numbers large enough to give floating point overflows.
For instance, (cantor-n 1 2 3 4 5 6 7) returns 1.05590697087673e+55. So the code shown has
the naive version of diag-num commented out and instead uses a library for bignums, integers of
unbounded size.
Extra E. Cantor in Code 137
;; These routines generalize: number any tuple, or find the tuple corresponging
;; to a number.
;; The only ugliness is that the empty tuple is unique, so there is only
;; one tupe of that arity.
;; xy-arity return the list of the given arity making the cantor number c
;; If arity=0 then only c=0 is valid (others return #f)
(define (xy-arity arity c)
(cond ((= 0 arity)
(if (= 0 c )
'()
(begin
(display "ERROR: xy-arity with arity=0 requires c=0") (newline)
#f)))
((= 1 arity) (list c))
(else (cons (car (xy c))
(xy-arity (- arity 1) (cadr (xy c)))))))
The xy-arity routine is not uniform in that it covers only one arity at a time.
Said another way, xy-arity is not the inverse of cantor-n in that we have to tell
it the tuple’s arity.
To cover tuples of all lengths we define two matched routines, cantor-omega
and xy-omega that communicate using a simple data structure, a pair where
the first element is the length of the tuple and the second is the tuple’s cantor
number. These two are correspondences between the natural numbers and the set
of sequences of natural numbers. They are inverse.
;; cantor-omega encode the arity in the first component
(define (cantor-omega . tuple)
(let ((arity (length tuple)))
(cond ((= arity 0) (cantor 0 0))
((= arity 1) (cantor 0 (+ 1 (car tuple))))
(else
(let ((newtuple (list (- arity 1)
(apply cantor-n tuple))))
(apply cantor newtuple))))))
(0)
#;6> (xy-omega 2)
(0 0)
#;7> (xy-omega 3)
(1)
#;8> (xy-omega 4)
(0 1)
#;9> (xy-omega 5)
(0 0 0)
#;10> (cantor-omega 1 2 3 4)
12693900784
#;11> (xy (cantor-omega 1 2 3 4))
(4 159331)
#;12> (xy-omega (cantor-omega 1 2 3 4))
(1 2 3 4)
This illustrates. The last line associates the number 2558 with the three-element
quadlist.
#;1> (natural->quad 1)
(0 0 0 1)
#;2> (natural->quad 2)
(1 0 0 0)
#;3> (natural->quad 3)
(0 1 0 0)
#;4> (cantor-omega 3 2 1)
2558
#;5> (get-nth-quadlist 2558)
((0 1 0 0) (1 0 0 0) (0 0 0 1))
two numbers. The second condition implies the first so we check here only the
second.
This routine checks for determinism by sorting the quadlist alphabetically,
so that if there are two quad’s beginning with the same pair they will then be
adjacent. Checking for adjacent quad’s with the same first two elements only
requires walking once down the list.
;; quadlist-is-deterministic? Is the list of quads deterministic?
;; qlist list of length 4 lists of numbers
(define (quadlist-is-deterministic? qlist)
(let ((sorted-qlist (sort qlist quad-less?)))
(quadlist-is-deterministic-helper sorted-qlist)))
;; quadlist-is-deterministic-helper look for adjacent quads that differ
;; sq sorted list of quads
(define (quadlist-is-deterministic-helper sq)
(cond
((null? sq) #t)
((= 1 (length sq)) #t)
((first-two-equal? (car sq) (cadr sq)) #f)
(else (quadlist-is-deterministic-helper (cdr sq)))))
With that, here is the function that takes in a Turing machine as a list of quads
and finds a natural number index for that Turing machine, along with the inverse
function, taking the machine to an index for its machine.
;; godel Return the index number of Turing machine tm
(define (godel tm)
(let ((c 0))
(do ((dex 0 (+ 1 dex)))
((equal? tm (car (tm-next c))) dex)
(set! c (+ 1 (cadr (tm-next c)))))))
; Here to reuse if a bug appears
; (display "godel tm-next=") (write (tm-next dex)) (newline)
#;2> (machine 0)
()
#;3> (machine 1)
((0 0 0 0))
#;4> (machine 2)
((0 0 0 1))
#;5> (machine 3)
((1 0 0 0))
#;6> (godel '((0 0 0 0)))
1
#;7> (godel '((0 1 1 1)))
298
These rely on helper routines. Handling the states is trivial: for instance,
a0 = 0 and a3 = 0 translate to the state q 0 .
(define (nat->inst-zero i)
i)
(define (inst->nat-zero i)
i)
(define (nat->inst-three i)
i)
(define (inst->nat-three i)
i)
The present tape character Tp has three possibilities. It can be a blank, which
we associate with a1 = 0. Second, for readability we allow lower case letters a–z,
which we associate with a1 = 1 through a1 = 26. Finally, for higher-numbered
Extra E. Cantor in Code 141
a1’s we just punt and write them as natural numbers. For instance, a1 = 27 is
associated with Tp = 0.
(define ASCII-a (char->integer #\a))
(define (nat->inst-one i)
(cond
((= i 0) #\B)
((and (> i 0) (<= i 26))
(integer->char (+ (- i 1) ASCII-a)))
(else (- i 27))))
(define (inst->nat-one i)
(cond
((equal? i #\B) 0)
((char? i) (+ 1 (- (char->integer i) ASCII-a)))
(else (+ i 27))))
Note Scheme’s notation for characters, for instance #\a and #\B represent the
characters a and B.
The tape-next description Tn is much the same, except that it also can be L or R.
(define (nat->inst-two i)
(cond
((= i 0) #\L)
((= i 1) #\R)
((= i 2) #\B)
((and (> i 2) (<= i 28))
(integer->char (+ (- i 3) ASCII-a)))
(else (- i 29))))
(define (inst->nat-two i)
(cond
((equal? i #\L) 0)
((equal? i #\R) 1)
((equal? i #\B) 2)
((char? i) (+ 3 (- (char->integer i) ASCII-a)))
(else (+ i 29))))
The machine here is simple; if started on a blank tape it writes an a and then
halts in the next step.
#;1> (instructionlist->quadlist '((0 #\B #\a 0) (0 #\a #\a 1)))
((0 0 3 0) (0 1 3 1))
The list (machine 0), (machine 1), . . . contains all the Turing machines.
II.E Exercises
E.1 The code for machine, the routine that inputs a natural number and produces
the Turing machine corresponding to that number, is slow. Find how long it
takes to produce Pn for the numbers n = 0, 100, ... , 700. You can use, e.g.,
(time (machine 100)). Graph n against the time.
E.2 What does Turing machine number 666 do? Does it halt on input 0? On
input 666?
142 Chapter II. Background
E.3 The set of Turing machines can be numbered in ways other than the one
given here. One is to use the same coding of states and tape symbols but instead
of leveraging Cantor’s correspondence, it uses the powers of primes to get the
final index. For instance, the Turing machine P = {q 0 B1q 0 , q 0 11q 1 } has the
two quad’s (0 0 4 0) and (0 3 4 1). We can take the index of P to be the
natural number 21 31 55 71 111 134 175 192 (we add 1 to the exponents because if
we did not then we could not tell whether the four-tuple ⟨0, 0, 0, 0⟩ is one of
the instructions). (a) What are some advantages and disadvantages of the two
encodings? (b) Compute the index of the example P under this encoding.
Part Two
Automata
Chapter
III Languages
Turing machines input strings and output strings, sequences of tape symbols. So a
natural way to work is to represent a problem as a string, put it on the tape, run a
computation, and end with the solution as a string.
Everyday computers work the same way. Consider a program that finds the
shortest driving distance between cities. Probably we work by inputting the map
distances as a strings of symbols and inputting the desired two cities as two strings,
and after running the program we have the output directions as a string. So strings,
and collections of strings, are essential.
Section
III.1 Languages
Our machines input and output strings of symbols. We take a symbol (sometimes
called a token) to be an atomic unit that a machine can read and write.† On
everyday binary computers the symbols are the bits, 0 and 1. An alphabet is a
nonempty and finite set of symbols. We usually denote an alphabet with the upper
case Greek letter Σ, although an exception is the alphabet of bits, B = { 0, 1 }. A
string over an alphabet is a sequence of symbols from that alphabet. We use lower
case Greek letters such as σ and τ to denote strings. We use ε to denote the empty
string, the length zero sequence of symbols. The set of all strings over Σ is Σ∗ .‡
1.1 Definition A language L over an alphabet Σ is a set of strings drawn from that
alphabet. That is, L ⊆ Σ∗ .
1.2 Example The set of bitstrings that begin with 1 is L = { 1, 10, 11, 100, ... }.
1.3 Example Another language over B is the finite set { 1000001, 1100001 }.
1.4 Example Let Σ = { a, b }. The language consisting of strings where the number of
a’s is twice the number of b’s is L = {ε, aab, aba, baa, aaaabb, ... }.
1.5 Example Let Σ = { a, b, c }. The language of length-two strings over that alphabet
is L = Σ2 = { aa, ab, ba ... , cc }. Over the same alphabet this is the language of
Image: The Tower of Babel, by Pieter Bruegel the Elder (1563) † We can imagine Turing’s clerk
calculating without reading and writing symbols, for instance by keeping track of information by having
elephants move to the left side of a road or to the right. But we could translate any such procedure into
one using marks that our mechanism’s read/write head can handle. So readability and writeability are
not essential but we require them in the definition of symbols as a convenience; after all, elephants are
inconvenient. ‡ For more on strings see the Appendix on page 354.
146 Chapter III. Languages
{ aaa, bbb, ccc, aab, aac, abb, abc, acc, bbc, bcc }
(It is not that the set is sorted in ascending order since sets don’t have an order.
Instead, each string has its characters come in ascending order.)
1.6 Definition A palindrome is a string that reads the same forwards as backwards.
Some words from English that are palindromes are ‘kayak’, ‘noon’, and ‘racecar’.
1.7 Example The language of palindromes over Σ = { a, b } is L = {σ ∈ Σ∗ σ = σ R }.
L = { ai bj ck ∈ Σ∗ i, j, k ∈ N and i 2 + j 2 = k 2 }
1.11 Example Fix an alphabet Σ. The collection of all finite languages over that
alphabet is a class.
1.12 Example Let Pe be a Turing machines, using the input alphabet Σ = { B, 1 }. The
set of strings Le = {σ ∈ Σ∗ Pe halts on input σ } is a language. The collection of
all such languages, of the Le for all e ∈ N, is the class of computably enumerable
languages over Σ.
We next consider operations on languages. They are sets so the operations
of union, intersection, etc., apply. However, for instance the union of a language
over { a }∗ with a language over { b }∗ is an awkward marriage, a combination of
strings of a’s with strings of b’s. That is, the union of a language over Σ0 with
a language over Σ1 is a language over Σ0 ∪ Σ1 . The same thing happens for
intersection.
Section 1. Languages 147
1.14 Example Where the language is the set of bitstrings L = { 1000001, 1100001 }
then the reversal is L R = { 1000001, 10000011 }.
1.15 Example If the language L consists of two strings { a, bc } then the second power
of that language is L2 = { aa, abc, bca, bcbc }. Its Kleene star is this.
on any input the machine correctly computes all ‘yes’ and all ‘no’ answers, while
recognizing a language requires only that it correctly computes all ‘yes’ answers.
III.1 Exercises
1.17 List five of the shortest strings in each language, if there are five.
(a) {σ ∈ B∗ the number of 0’s plus the number of 1’s equals 3 }
(b) {σ ∈ B∗ σ ’s first and last characters are equal }
L̂ = {σ ⌢ σ R σ ∈ Σ∗ }?
1.35 For any language L ⊆ Σ∗ we can form the set of prefixes.
Section
III.2 Grammars
We have defined that a language is a set of strings. But this allows for any willy-nilly
set. In practice a language is usually given by rules.
Here is an example. Native English speakers will say that the noun phrase
“the big red barn” sounds fine but that “the red big barn” sounds wrong. That is,
sentences in natural languages are constructed in patterns and the second of those
does not follow the English pattern. Artificial languages such as programming
languages also have syntax rules, usually very strict rules.
A grammar a set of rules for the formation of strings in a language, that is, it
is an analysis of the structure of a language. In an aphorism, grammars are the
language of languages.
The rules use two different components. The ones written in typewriter type,
such as young, are from the alphabet Σ of the language. These are terminals. The
ones written with angle brackets and in italics, such as ⟨article⟩ , are nonterminals.
These are like variables, and are used for intermediate steps.
The two symbols ‘→’ and ‘|’ are neither terminals nor nonterminals. They are
metacharacters, part of the syntax of the rules themselves.
These rewrite rules govern the derivation of strings in the language. Under
the English grammar every derivation starts with ⟨sentence⟩ . Along the way,
intermediate strings contain a mix of nonterminals and terminals. The rules
all have a head with a single nonterminal. So to derive the next string, pick a
nonterminal in the present string and substitute an associated rule body.
⟨sentence⟩ ⇒ ⟨noun phrase⟩ ⟨verb phrase⟩
⇒ ⟨article⟩ ⟨adjective⟩ ⟨noun⟩ ⟨verb phrase⟩
⇒ the ⟨adjective⟩ ⟨noun⟩ ⟨verb phrase⟩
⇒ the young ⟨noun⟩ ⟨verb phrase⟩
⇒ the young man ⟨verb phrase⟩
⇒ the young man ⟨verb⟩ ⟨noun phrase⟩
⇒ the young man caught ⟨noun phrase⟩
⇒ the young man caught ⟨article⟩ ⟨noun⟩
⇒ the young man caught the ⟨noun⟩
⇒ the young man caught the ball
Note that the single line arrow → is for rules, while the double line arrow ⇒ is for
derivations.†
The derivation above always substitutes for the leftmost nonterminal, so it is a
leftmost derivation. However, in general we could substitute for any nonterminal.
The derivation tree or parse tree is an alternative representation.‡
hsentencei
the ball
hexpri
⟨expr⟩ ⇒ ⟨term⟩
⇒ ⟨term⟩ * ⟨factor⟩ htermi
⇒ ⟨factor⟩ * ⟨factor⟩
htermi * hfactori
⇒ x * ⟨factor⟩
⇒ x * ( ⟨expr⟩ ) hfactori ( hexpri )
⇒ x * ( ⟨term⟩ + ⟨expr⟩ )
⇒ x * ( ⟨term⟩ + ⟨term⟩ ) x htermi + hexpri
⇒ x * ( ⟨factor⟩ + ⟨term⟩ )
hfactori htermi
⇒ x * ( ⟨factor⟩ + ⟨factor⟩ )
⇒ x * ( y + ⟨factor⟩ ) y hfactori
⇒ x*(y+z)
z
In that example the rules for ⟨expr⟩ and ⟨term⟩ are recursive. But we don’t
get stuck in an infinite regress because the question is not whether you could
perversely keep expanding ⟨expr⟩ forever; the question is whether, given a string
such as x*(y+z), you can find a terminating derivation.
In the prior example the nonterminals such as ⟨expr⟩ or ⟨term⟩ describe the
role of those components in the language, as did the English grammar fragment’s
⟨noun phrase⟩ and ⟨article⟩ . But in the examples and exercises below we often use
small grammars whose terminals and nonterminals do not have any particular
meaning. For these cases, we often move from the verbose notation like ‘ ⟨sentence⟩
→ ⟨noun phrase⟩ ⟨verb phrase⟩ ’ to writing single letters, with nonterminals in
upper case and terminals in lower case.
2.4 Example This two-rule grammar has one nonterminal, S.
S → aSb | ε
Here is a derivation of the string a2 b2 .
tion of a3 b3 . For this grammar, derivable strings have the form an bn for n ∈ N.
We next give a complete description of how the production rules govern the
derivations. Each rule in a context free grammar has the form ‘head → body’
where the head consists of a single nonterminal. The body is a sequence of
terminals and nonterminals. Each step of a derivation has the form below, where
τ0 and τ1 are sequences of terminals and non-terminals.
τ0 ⌢ head ⌢τ1 ⇒ τ0 ⌢ body ⌢τ1
That is, if there is a match for the rule’s head then we can replace it with the body.
Where σ0 , σ1 are sequences of terminals and nonterminals, if they are related
by a sequence of derivation steps then we may write σ0 ⇒∗ σ1 . Where σ0 = S is
the start symbol, if there is a derivation σ0 ⇒∗ σ1 that finishes with a string of
terminals σ1 ∈ Σ∗ then we say that σ1 has a derivation from the grammar.†
This description is like the one on page 8 detailing how a Turing machine’s
instructions determine the evolution of the sequence of configurations that is a
computation. That is, production rules are like a program, directing a derivation.
However, one difference from that page’s description is that there Turing machines
are deterministic, so that from a given input string there is a determined sequence
of configurations. Here, from a given start symbol a derivation can branch out to
go to many different ending strings.
2.5 Definition The language derived from a grammar is the set of strings of
terminals having derivations that begin with the start symbol.
2.6 Example This grammar’s language is the set of representations of natural numbers.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | . . . | 9
This is a derivation for the string 321, along with its parse tree.
†
This definition of rules, grammars, and derivations suffices for us but it is not the most general one.
One more general definition allows heads of the form σ0 X σ1 , where σ0 and σ1 are strings of terminals.
(The σi ’s can be empty.) For example, consider this grammar: (i) S → aBSc | abc, (ii) Ba →
aB, (iii) Bb → bb. Rule (ii) says that if you see a string with something followed by a then you can
replace that string with a followed by that thing. Grammars with heads of the form σ0 X σ1 are context
sensitive because we can only substitute for X in the context of σ0 and σ1 . These grammars describe
more languages than the context free ones that we are using. But our definition satisfies our needs and
is the class of grammars that you will see in practice.
154 Chapter III. Languages
hnaturali
⟨natural⟩ ⇒ ⟨digit⟩⟨natural⟩
hdigiti hnaturali
⇒ 3 ⟨natural⟩
⇒ 3 ⟨digit⟩⟨natural⟩
3 hdigiti hnaturali
⇒ 32 ⟨natural⟩
⇒ 32 ⟨digit⟩
2 hdigiti
⇒ 321
1
2.7 Example This grammar’s language is the set of strings representing natural
numbers in unary.
⟨natural⟩ → ε | 1 ⟨natural⟩
2.8 Example Any finite language is derived from a grammar. This one gives the
language of all length 2 bitstrings, using the brute force approach of just listing all
the member strings.
S → 00 | 01 | 10 | 11
This gives the length 3 bitstrings by using the nonterminals to keep count.
A → 0B | 1B
B → 0C | 1C
C → 0|1
2.9 Example For this grammar
S → aSb | T | U
T → aS | a
U → Sb | b
an alternative is to replace T and U by their expansions to get this.
S → aSb | aS | a | Sb | b
It generates the language L = { ai bj ∈ { a, b }∗ i , 0 or j , 0 }.
The prior example is the first one where the generated language is not clear
so we will do a formal verification. We will show mutual containment, that the
generated language is a subset of L and that it is also a superset. The rule that
eliminates T and U shows that any derivation step τ0 ⌢ head ⌢τ1 ⇒ τ0 ⌢ body ⌢τ1
only adds a’s on the left and b’s on the right, so every string in the language has
the form ai bj . That same rule shows that in any terminating derivation S must
eventually be replaced by either a or b. Together these two give that the generated
language is a subset of L.
For containment the other way, we will prove that every σ ∈ L has a derivation.
We will use induction on the length |σ | . By the definition of L the base case
Section 2. Grammars 155
2.10 Example The fact that derivations can go more than one way leads to an important
issue with grammars, that they can be ambiguous. Consider this fragment of a
grammar for if statements in a C-like language
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
and this code string.
if enrolled(s) if studied(s) grade='P' else grade='F'
if enrolled(s) if enrolled(s)
if studied(s) if studied(s)
grade='P' grade='P'
else else
grade='F' grade='F'
hexpri hexpri
b c a b
Again, the issue is that we get two different behaviors. For instance, substitute 1
for a, and 2 for b, and 3 for c. The left tree gives 1 + (2 · 3) = 7 while the right
tree gives (1 + 2) · 3 = 9.
In contrast, this grammar for elementary algebra expressions is unambiguous.
⟨expr⟩ → ⟨expr⟩ + ⟨term⟩
| ⟨term⟩
⟨term⟩ → ⟨term⟩ * ⟨factor⟩
| ⟨factor⟩
⟨factor⟩ → ( ⟨expr⟩ )
| a | b | ... | z
Choosing grammars that are not ambiguous is important in practice.
III.2 Exercises
✓ 2.12 Use the grammar of Example 2.3. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar, besides the string
in the example. (f) Give three strings in the language { +, *, ), (, a ... , z }∗ that
cannot be derived.
2.13 Use the grammar of Exercise 2.15. (a) What is the start symbol? (b) What
Section 2. Grammars 157
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar besides the ones
in the exercise, or show that there are not three such strings. (f) Give three
strings in the language L = {σ ∈ Σ ∪ { space }∗ Σ is the set of terminals } that
cannot be derived from this grammar, or show there are not three such strings.
2.14 Use this grammar.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | 1 | . . . | 9
(a) What is the alphabet? What are the terminals? The nonterminals? What
is the start symbol? (b) For each production, name the head and the body.
(c) Which are the metacharacters that are used? (d) Derive 42. Also give its
parse tree. (e) Derive 993 and give the associated parse tree. (f) How can
⟨natural⟩ be defined in terms of ⟨natural⟩ ? Doesn’t that lead to infinite regress?
(g) Extend this grammar to cover the integers. (h) With this grammar, can you
derive +0? -0?
✓ 2.15 From this grammar
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct object⟩
⟨direct object⟩ → ⟨article⟩ ⟨noun⟩
⟨article⟩ → the | a
⟨noun⟩ → car | wall
⟨verb⟩ → hit
derive each of these: (a) the car hit a wall (b) the car hit the wall
(c) the wall hit a car.
2.16 In the language generated by this grammar.
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun1⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct-object⟩
⟨direct-object⟩ → ⟨article⟩ ⟨noun2⟩
⟨article⟩ → the | a | ε
⟨noun1⟩ → dog | flea
⟨noun2⟩ → man | dog
⟨verb⟩ → bites | licks
(a) Give a derivation for dog bites man.
(b) Show that there is no derivation for man bites dog.
158 Chapter III. Languages
✓ 2.17 Your friend tries the prior exercise and you see their work so far.
⟨sentence⟩ ⇒ ⟨subject⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨verb⟩ ⟨direct object⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨noun2⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨man|dog⟩
Stop them and explain what they are doing wrong.
2.18 With the grammar of Example 2.3, derive (a+b)*c.
✓ 2.19 Use this grammar
S → TbU
T → aT | ε
U → aU | bU | ε
for each part. (a) Give both a leftmost derivation and rightmost derivation of
aabab. (b) Do the same for baab. (c) Show that there is no derivation of aa.
2.20 Use this grammar.
S → aABb
A → aA | a
B → Bb | b
(a) Derive three strings.
(b) Name three strings over Σ = { a, b } that are not derivable.
(c) Describe the language generated by this grammar.
2.21 Give a grammar for the language { an bn+m am n, m ∈ N }.
✓ 2.22 Give the parse tree for the derivation of aabb in Example 2.4.
2.23 Verify
that the language derived from the grammar in Example 2.4 is L =
{ an bn n ∈ N }.
2.24 What is the language generated by this grammar?
A → aA | B
B → bB | cA
✓ 2.25 In many programming languages identifier names consist of a string of letters
or digits, with the restriction that the first character must be a letter. Create a
grammar for this, using ASCII letters.
2.26 Early programming languages had strong restrictions on what could be a
variable name. Create a grammar for a language that consists of strings of at
most four characters, upper case ASCII letters or digits, where the first character
must be a letter.
2.27 What is the language generated by a grammar with a set of production rules
that is empty?
Section 2. Grammars 159
separate rules every valid addresses in the world, which is just silly.)
2.30 Recall Turing’s prototype computer, a clerk doing the symbolic manipulations
to multiply two large numbers. Deriving a string from a grammar has a similar
feel and we can write grammars to do computations. Fix the alphabet Σ = { 1 },
so that we can interpret derived strings as numbers represented in unary.
(a) Produce a grammar whose language is the even numbers, { 12n n ∈ N } .
E → E-E |a |b
(b) Derive a-b-a from this grammar, which is unambiguous.
E → E-T |T
T → a |b
2.37 Use the grammar from the footnote on 153 to derive aaabbbccc.
Section
III.3 Graphs
Researchers in the Theory of Computation often state their problems, and the
solution of those problems, in the language of Graph Theory. Here are two examples
we have already seen. Both have vertices connected by edges that represent a
relationship between the vertices.
hexpri
htermi
htermi * hfactori
q0 q1 q2 q3 hfactori ( hexpri )
B, L B, L B, R
1, R 1, B 1, L
x htermi + hexpri
hfactori htermi
y hfactori
3.2 Example This simple graph G has five vertices N = {v 0 , ... , v 4 } and eight edges.
v0
v1 v2
E = { {v 0 , v 1 }, {v 0 , v 2 }, ... {v 3 , v 4 } }
v3 v4
Important: a graph is not its picture. Both of these pictures show the same graph
†
Graphs can have infinitely many vertices but we won’t ever need them. For convenience of notation
we will stick to finite ones.
162 Chapter III. Languages
as above because they show the same vertices and the same connections.
v0
v3 v1 v1 v2
v0
v4 v2 v3 v4
Instead of writing e = {v, v̂ } we often write e = vv̂ . Since sets are unordered
we could write the same edge as e = v̂v .
3.3 Definition Two graph edges are adjacent if they share a vertex, so that they are
uv and vw . A walk is a sequence of adjacent edges ⟨v 0v 1 , v 1v 2 , ... , vn−1vn ⟩ . Its
length is the number of edges, n . If the initial vertex v 0 equals the final vertex vn
then is closed walk, otherwise it is open. If no edge occurs twice then it is a trail.
If a trail’s vertices are distinct, except possibly that the initial vertex equals the
final vertex, then it is a path. A closed path with at least one edge is a cycle. A
graph is connected if between any two vertices there is a path.
3.4 Example On the left is a path from u 0 to u 3 ; it is also a trail and a walk. On the
right is a cycle.
v0 v1
u0
v2 v3
u1
v4 v5
u2 u3
v6 v7
There are many variations of Definition 3.1, used for modeling circumstances
that a simple graph cannot model. One variant allows that some vertices to connect
to themselves, forming a loop. Another is a multigraph, which allows two vertices
to have more than one edge between them.
Still another is a weighted graph, which gives each edge a real number
weight, perhaps signifying the distance or the cost in money or in time
to traverse that edge.
A very common variation is a directed graph or digraph, where edges
have a direction, as in a road map that includes one-way streets. In a
digraph, if an edge is directed from v to v̂ then we can write it as vv̂ but
not in the other order. The Turing machine at the start of this section is
a digraph and also has loops.
Some important graph variations involve the nature of the connections. Courtesy
A tree is an undirected connected graph with no cycles. At the start of xkcd.com
this section is a syntax tree. A directed acyclic graph or DAG is a directed
graph with no directed cycles.
3.5 Definition From one vertex v 0 , another vertex v 1 is reachable if there is a path
from the first to the second.
3.6 Definition In a graph, a circuit is a closed walk that either contains all of the
edges, making it an Euler circuit, or all of the vertices, making it a Hamiltonian
circuit.
3.7 Example The graph on the right of Example 3.4 is a Hamiltonian circuit but not
an Euler circuit.
3.8 Definition Where G = ⟨N , E ⟩ is a graph, a subgraph Ĝ = ⟨N̂ , Ê ⟩ satisfies
N̂ ⊆ N and Ê ⊆ E . A subgraph with every possible edge, with the property that
for every e = vi v j ∈ E such that vi , v j ∈ N̂ then e ∈ Ê , is an induced subgraph.
3.9 Example In the graph G on the left of Example 3.4, consider the highlighted path
E = {u 0u 1 , u 1u 3 }. Taking those edges along with the vertices that they contain,
N̂ = {u 0 , u 1 , u 3 }, gives a subgraph Ĝ .
Also in G , the induced subgraph involving the set of vertices {u 0 , u 2 , u 3 } is the
outer triangle.
v0 v1 v2 v3 v4
v0 0 1 1 0 0
v1 ©1 0 1 1 1 ª®
M( G ) = v 2 ( ∗)
1 1 0 1 1 ®®
0 1 1 0 1®
v3
v4 «0 1 1 1 0¬
3.10 Definition For a graph G , the adjacency matrix M(G ) representing the graph
has i, j entries equal to the number of edges from vi to v j .
This definition covers graph variants that were listed earlier. For instance, the
graph represented in (∗) is a simple graph because the matrix has only 0 and 1
entries, because all the diagonal entries are 0, and because the matrix is symmetric,
meaning that the i, j entry has a 1 if and only if the j, i entry is also 1. If a graph has
a loop then the matrix has a diagonal entry that is a positive integer. If the graph
is directed and has a one-way edge from vi to v j then the i, j entry records that
edge but the j, i entry does not. And for a multigraph, where there are multiple
edges from one vertex to another, the associated entry will be larger than 1.
164 Chapter III. Languages
3.11 Lemma Let the matrix M(G ) represent the graph G . Then in its matrix multiplica-
tive n -th power the i, j entry is the number of paths of length n from vertex vi to
vertex v j .
On the right the graph has no 3-coloring. The argument goes: the four vertices are
completely connected to each other. If two get the same color then they will be
adjacent same-colored vertices. So a coloring requires four colors.
3.13 Example This shows five committees, where some committees share some
members. How many time slots do we need in order to schedule all committees so
no members must be in two places at once?
A B C D E
Armis Crump Burke India Burke
Jones Edwards Frank Harris Jones
Smith Robinson Ke Smith Robinson
Model this with a graph by taking each vertex to be a committee and if committees
are related by sharing a member then put an edge between them.
B
A C
E D
The picture shows that three colors is enough, that is, three time slots suffice.
Graph isomorphism We sometimes want to know when two graphs are essentially
identical. Consider these two.
Section 3. Graphs 165
w0 w1
v3 v4 v5
w5 w2
v0 v1 v2
w4 w3
They have the same number of vertices and the same number of edges. Further, on
the right as well as on the left there are two classes of vertices where all the vertices
in the first class connect to all the vertices in the second class (on the left the two
classes are the top and bottom rows while on the right they are {w 0 , w 2 , w 4 } and
{w 1 , w 3 , w 5 }). A person may suspect that as in Example 3.2 these are two ways to
draw the same graph, with the vertex names changed for further obfuscation.
That’s true: if we make a correspondence between the vertices in this way
Vertex on left v0 v1 v2 v3 v4 v5
Vertex on right w0 w2 w4 w1 w3 w5
then as a consequence the edges also correspond.
Edge on left {v 0 , v 3 } {v 0 , v 4 } {v 0 , v 5 } {v 1 , v 3 } {v 1 , v 4 } {v 1 , v 5 }
Edge on right {w 0 , w 1 } {w 0 , w 3 } {w 0 , w 5 } {w 2 , w 1 } {w 2 , w 3 } {w 2 , w 5 }
Edge on left (cont) {v 2 , v 3 } {v 2 , v 4 } {v 2 , v 5 }
Edge on right {w 2 , w 1 } {w 2 , w 3 } {w 2 , w 5 }
3.14 Definition Two graphs G and Ĝ are isomorphic if there is a one-to-one and onto
map f : N → N̂ such that G has an edge {vi , v j } ∈ E if and only if Ĝ has the
associated edge { f (vi ), f (v j ) } ∈ Ê .
To verify that two graphs are isomorphic the most natural thing is to produce
the map f and then verify that in consequence the edges also correspond. The
exercises have examples.
Showing that graphs are not isomorphic usually entails finding some graph-
theoretic way in which they differ. A common and useful such property is to
consider the degree of a vertex, the total number of edges touching that vertex with
the proviso that a loop from the vertex to itself counts as two. The degree sequence
of a graph is the non-increasing sequence of its vertex degrees. Thus, the graph in
Example 3.13 has degree sequence ⟨3, 2, 1, 1, 1⟩ . Exercise 3.32 shows that if graphs
are isomorphic then associated vertices have the same degree and thus graphs with
different degree sequences are not isomorphic. Also, if the degree sequences are
equal then they help us construct an isomorphisms, if there is one; examples of this
are in the exercises. (Note, though, that there are graphs with the same degree
sequence that are not isomorphic.)
III.3 Exercises
✓ 3.15 Draw a picture of a graph illustrating each relationship. Some graphs will be
digraphs, or may have loops or multiple edges between some pairs of vertices.
166 Chapter III. Languages
v0
v5
v1 v4
v6 v9
v7 v8
v2 v3
subgraphs with four nodes and four edges. (d) Find all induced subgraphs with
four nodes and four edges.
3.19 A graph is a collection of vertices and edges, not a drawing. So a single
graph may have quite different pictures. Consider a graph G with the vertices
N = {A, ... H } and these edges.
C H
D F
(b) A planar graph is one that can be drawn in the plane so that its edges do not
cross. Show that G is planar.
3.20 Fill in the table’s blanks.
✓ 3.21 Morse code represents text with a combination of a short sound, written ‘.’
and pronounced “dit,” and a long sound, written ‘-’ and pronounced “dah.” Here
are the representations of the twenty six English letters.
Some representations are prefixes of others. Give the graph for the prefix relation.
3.22 Show that every tree has a 2-coloring.
3.23 A person keeps six species of fish as pets. Species A cannot be in a tank with
species B or C . Species B cannot be with A, C , or E . Species C cannot be with A,
B , D , or E . Species D cannot be with C or F . Species E cannot be together with
B , C , or F . Finally, species F cannot be in with D or E . (a) Draw the graph where
the nodes are species and the edges represent the relation ‘cannot be together’.
(b) Find the chromatic number. (c) Interpret it.
168 Chapter III. Languages
✓ 3.24 If two cell towers are within line of sight of each other then they must get
different frequencies. Here each tower is a vertex and an edge between towers
denotes that they can see each other.
v0 v1 v2
v3 v4 v5 v6
v7 v8 v9 v 10
0 1 1 0
1 0 0 1®
© ª
1 0 0 1®
®
«0 1 1 0¬
b c X Y Z
y
B C
D E F G
v2 v3 w0
v0 v1 w2 w3 w4 w5
v4 v5 w1
(f) Use the prior result to show that the two graphs of Example 3.4 are not
isomorphic.
As in the final item, in arguments we often use the contrapositive of these
statements. For instance, the first item implies that if they do not have the same
number of vertices then they are not isomorphic.
3.33 Prove Lemma 3.11.
(a) An edge as a length-1 walk. Show that in the product of the matrix with
2
itself M(G ) the entry i, j is the number of length-two
n walks.
(b) Show that for n > 2, the i, j entry of the power M(G ) equals the number
of length n walks from vi to v j .
3.34 Consider these two graphs, G0 and G1 .
170 Chapter III. Languages
v0 v1 n6 n2
v4 v5 n5 n1
v6 v7 n0 n4
v2 v3 n7 n3
Find the image, under the correspondence, of the edges of G0 . Do they match
the edges of G1 ?
(e) Of course, failure of any one proposed map does not imply that the two
cannot be isomorphic. Nonetheless, argue that they are not isomorphic.
3.35 In a graph, for a node q 0 there may be some nodes qi that are unreachable,
so there is no path from q 0 to qi .
(a) Devise an algorithm that inputs a directed graph and a start node q 0 , and
finds the set of nodes that are unreachable from q 0 .
(b) Apply your algorithm to these two starting with w 0 .
w0 w4
w3
w2 w1 w3
w0 w1 w2
Extra
III.A BNF
III.A Exercises
✓ A.5 US ZIP codes have five digits, and may have a dash and four more digits at
the end. Give a BNF grammar.
A.6 Write a grammar in BNF for the language of palindromes.
✓ A.7 At a college, course designations have a form like ‘MA 208’ or ‘PSY 101’, where
the department is two or three capital letters and the course is three digits. Give
a BNF grammar.
✓ A.8 Example A.3 uses some BNF convenience abbreviations.
(a) Give a grammar equivalent to ⟨pointfloat⟩ that doesn’t use square brackets.
(b) Do the same for the repetition operator in ⟨intpart⟩ ’s rule, and for the
grouping in ⟨exponent⟩ ’s rule (you can use ⟨intpart⟩ here).
✓ A.9 In Roman numerals the letters I, V, X, L, C, D, and M stand for the values 1, 5,
10, 50, 100, 500, and 1 000. We write the letters from left to right in descending
order of value, so that XVI represents the number that we would ordinarily write
as 16, and MDCCCCLVIII represents 1958. We always write the shortest possible
string, so we do not write IIIII because we can instead write V. However, as we
don’t have a symbol whose value is larger than 1 000 we must represent large
numbers with lots of M’s.
(a) Give a grammar for the strings that make sense as Roman numerals.
(b) Often Roman numerals are written in subtractive notation: for instance, 4 is
represented as IV, because four I’s are hard to distinguish from three of them
in a setting such as a watch face. In this notation 9 is IX, 40 is XL, 90 is XC,
400 is CD, and 900 is CM. Give a grammar for the strings that can appear in
this notation.
A.10 This grammar is for a small C-like programming language.
⟨program⟩ ::= { ⟨statement-list⟩ }
⟨statement-list⟩ ::= [ ⟨statement⟩ ; ]*
174 Chapter III. Languages
⟨gr⟩ ::= - | ,
⟨precision⟩ ::= ⟨integer⟩
⟨type⟩ ::= b | c | d | e | E | f | F | g | G | n | o | s | x | X | %
Take ⟨integer⟩ to produce ⟨digit⟩ ⟨integer⟩ or ⟨digit⟩ . Give a derivation of these
strings: (a) 03f (b) +#02X.
Chapter
IV Automata
Section
IV.1 Finite State Machines
We produce a new model of computation by modifying the definition of Turing
Machine. We will strip out the capability to write, changing the tape head from
read/write to read-only. This gives us insight into what can be done with states
alone. It will turn out that this type of machine can do many things, but not as
many as a Turing machine.
Definition We will use the same type of transition tables and transition graphs as
with Turing machines.
1.1 Example A power switch has two states, q off and q on and its input alphabet has
one symbol, toggle.
toggle
qoff qon
toggle
1.2 Example Operate this turnstile by putting in two tokens and then pushing through.
It has three states and its input alphabet is Σ = { token, push }.
push
As we saw with Turing machines, the states are a limited form of memory. For
instance, q one is how the turnstile “remembers” that it has so far received one
token.
Image: The astronomical clock in Notre-Dame-de-Strasbourg Cathedral, for computing the date of
Easter. Easter falls on the first Sunday after the full moon on or after the spring equinox. Calculation of
this date was a great challenge for mechanisms of that time, 1843.
Section 1. Finite State Machines 179
1.3 Example This vending machine dispenses items that cost 30 cents.† The picture
is complex so we will show it in three layers. First are the arrows for nickels and
pushing the dispense button.
push n
After receiving 30 cents and getting another nickel, this machine does something
not very sensible: it stays in q 30 . In practice a machine would have further states
to keep track of overages so that we could give change, but here we ignore that.
Next comes the arrows for dimes
d
d d d d d
q q
1.4 Example This machine, when started in state q 0 and fed bit strings, will keep
track of the remainder modulo 4 of the number of 1’s.
0 q0 q1 0
1
1 1
0 q3 1 q2 0
1.5 Definition A Finite State machine, or Finite State automata, is composed of five
things, M = ⟨Q, q start , F , Σ, ∆⟩ . They are a finite set of states Q , one of which is
the start state q start , a subset F ⊆ Q of accepting states or final states, a finite input
alphabet set Σ, and a next-state function or transition function ∆ : Q × Σ → Q .
This may not immediately appear to be like our definition of a Turing Machine.
Some of that is because we have already defined the terms ‘alphabet’ and ‘transition
function’. The other differences follow from the fact that that Finite State machines
cannot write. For one thing, because Finite State machines cannot write they don’t
†
US coins are: 1 cent coins that are not used here, nickles are 5 cents, dimes are 10 cents, and quarters
are 25 cents.
180 Chapter IV. Automata
need to move the tape for scratch work, so we’ve dropped the tape action symbols
L and R.
The other difference between Finite State machines and Turing machines is the
presence of the accepting states. Consider, in the vending machine of Example 1.3,
the state q 30 . It is an accepting state, meaning that the machine has seen in the
input what it is looking for. The same goes for Example 1.2’s turnstile state q ready
and Example 1.1’s power switch state q on . While we can design a Turing Machine
to indicate a choice by arranging so that for each input the machine will halt and
the only thing on the tape will be either a 1 or 0, a Finite State machine gives a
decision by ending in one of these designated states. Below, we’ve pictured that the
accepting states are wired to the red light so that we know when a computation
succeeds. In the transition graphs we denote the final states with double circles
and in the transition function tables we mark them with a ‘+’.
To work a Finite State machine device, put the finite-length input on the tape
and press Start.
The machine consumes the input, at each step deleting the prior tape character
and then reading the next one. We can trace through the steps when Example 1.4’s
modulo 4 machine gets the input 10110.
0110 0
1 q1
4 q3
110
2 q1
5 q3
Consequently there is no Halting problem for Finite State machines — they always
halt after a number of steps equal to the length of the input. At the end, either the
Accept light is on or it isn’t. If it is on then we say that the machine accepts the
input string, otherwise it rejects the string.
1.6 Example This machine accepts a string if and only if it contains at least two 0’s as
well as an even number of 1’s. (The + next to q 2 marks it as an accepting state.)
Section 1. Finite State Machines 181
∆ 0 1
q0 q1 q2
q0 q1 q3
0
0 0
q1 q2 q4
1 1 1 1 1 1 + q2 q2 q5
q3 q4 q0
q3 q4 q5 0
0 0
q4 q5 q1
q5 q5 q2
This machine illustrates the key to designing Finite State machines, that each state
has an intuitive meaning. The state q 4 means “so far the machine has seen one 0
and an odd number of 1’s.” And q 5 means “so far the machine has seen two 0’s
but an odd number of 1’s.” The drawing brings out this principle. Its first row has
states that have so far seen an even number of 1’s, while the second row’s states
have seen an odd number. Its first column holds states have seen no 0’s, the second
column holds states have seen one, and the third column has states that have seen
two 0’s.
1.7 Example This machine accepts strings that are valid as decimal representations
of integers. Thus, it accepts ‘21’ and ‘-707’ but does not accept ‘501-’. Both the
transition graph and the table group some inputs together when they result in
the same action. For instance, when in state q 0 this machine does the same thing
whether the input is + or -, namely it passes into q 1 .
1,...,9
∆ +, - 0, . . . 9 else
q0 +,-
q1 q2 0,..,9 q0 q1 q2 e
1,...,9
q1 e q2 e
other
other
other
+ q2 e q2 e
e any e e e e
Any bad input character sends the machine to the error state. e , which is a sink
state, meaning that the machine never leaves that state.
Our Finite State machine descriptions will usually assume that the alphabet is
clear from the context. For instance, the prior example just says ‘else’. In practice
we take the alphabet to be the set of characters that someone could conceivably
enter, including letters such as a and A or characters such as exclamation point or
open parenthesis. Thus, design of a Finite State machine up to a modern standard
might use all of Unicode. But for the examples and exercises here, we will use
small alphabets.
1.8 Example This machine accepts strings that are members of the set { jpg, pdf, png }
of filename extensions. Notice that it has more than one final state.
182 Chapter IV. Automata
q1 p
q2 g
q3
j
q0 p
q4 q5 q6 e
d f
q7 g
q8
That drawing omits many edges, the ones involving the error state e . For instance,
from state q 0 any input character other than j or p is an error. (Putting in all the
edges would make a mess. Cases such as this are where the transition table is
better than the graph picture. But most of our machines are small so we typically
prefer the picture.)
This example illustrates that for any finite language there is a Finite State
machine that accepts a string if and only if it is a member of the language. The idea
is: for strings that have common prefixes, the machine steps through the shared
parts together, as here with pdf and png. Exercise 1.46 asks for a proof.
1.9 Example Although they have no scratch memory, Finite State machines can
accomplish useful work such as some kinds of arithmetic. This machine accepts
strings representing a natural number that is a multiple of three, such as 15
and 5013.
2,5,8
0,3,6,9 q0 q1 0,3,6,9
1,4,7
2,5,8
1,4,7
1,4,7 2,5,8
q2
0,3,6,9
Because q 0 is an accepting state, this machine accepts the empty string. Exercise 1.23
asks for a modification of this machine to accept only non-empty strings.
1.10 Example Finite State machines are easy to translate to code. Here is a Scheme
version of the multiple of three machine.†
;; Decide if the input represents a multiple of three
(define (multiple-of-three-fsm input-string)
(let ((state 0))
(if (= 0 (multiple-of-three-fsm-helper state input-string))
(display "accepted")
(display "rejected"))
(display (newline))))
1.11 Example This is a simplified version of how phone numbers used to be handled in
North America. Consider the number 1-802-555-0101. The initial 1 signifies that
the call should leave the local exchange office to go to the long lines. The 802 is an
area code; the system can tell this is so because its second digit is either 0 or 1 so
it is not a same-area local exchange. Next the system processes the local exchange
number of 555, routing the call to a particular physical local switching office. That
office processes the line number of 0101, and makes the connection.
Op Int1 Intn x
1 x Legend:
x 0, . . . 9
n 2, . . . 9
0 LL1 n LL2 n p 0, 1
p LL3 n X1 n
1
q0 X2 n X3 x L1
2,3,5,7,8
4 x
6 H2 n L2
9 1
H3 1 Info x
R2 n L3
1
R3 1
Rep x
E2 n L4
1
E3 1 Emr x
Con
†
One of the great things about the Scheme programming languages is that, because the last thing
called in multiple-of-three-fsm-helper is itself, the compiler converts the recursion to iteration.
So we get the expressiveness of recursion with the space conservation of iteration.
184 Chapter IV. Automata
Today the picture is much more complicated. For example, no longer are area codes
required to have a middle digit of 0 or 1. This additional complication is possible
because instead of switching with physical devices, we now do it in software.
After the definition of Turing machine we gave a formal description of the
action of those machines. We next do the same here.
A configuration of a Finite State machine is a pair C = ⟨q, τ ⟩ , where q is a state,
q ∈ Q , and τ is a (possibly empty) string, τ ∈ Σ∗ . We start a machine with some
input string τ0 and say that the initial configuration is C0 = ⟨q 0 , τ0 ⟩ .
A Finite State machine acts by a sequence of transitions from one configuration
to another. For s ∈ N+ the machine’s configuration after the s -th transition is its
configuration at step s , Cs .
Here is the rule for making a transition (we sometimes say it is an allowed or
legal transition, for emphasis). Suppose that the machine is in the configuration
Cs = ⟨q, τs ⟩ . In the case that τs is not empty, pop the string’s leading symbol c .
That is, where c = τs [0], take τs+1 = ⟨τs [1], ... τs [k]⟩ for k = |τs | − 1. Then the
machine’s next state is q̂ = ∆(q, c) and its next configuration is Cs+1 = ⟨q̂, τs+1 ⟩ .
Denote this before-after relationship between configurations by Cs ⊢ Cs+1 .†
The other case is that the string τs is empty. This is the halting configuration
Ch . No transitions follow a halting configuration.
At each transition the length of the tape string drops by one so every computation
eventually reaches a halting configuration Ch = ⟨q, ε⟩ . A Finite State machine
computation is a sequence C0 ⊢ C1 ⊢ C2 ⊢ · · · Ch . We can abbreviate such a
sequence with ⊢∗ ,as in C0 ⊢∗ Ch .‡
If the ending state is a final state, q ∈ F , then the machine accepts the input τ0 ,
otherwise it rejects τ0 .
Notice that, as with the formalism for Turing machines, the heart of the
definitions is the transition function ∆. It makes the machine move step-by-step,
from configuration to configuration, in response to the input.
1.12 Example The multiple of three machine of the prior example gives the computation.
⟨q 0 , 5013⟩ ⊢ ⟨q 2 , 013⟩ ⊢ ⟨q 2 , 13⟩ ⊢ ⟨q 0 , 3⟩ ⊢ ⟨q 0 , ε⟩ . Since q 0 is an accepting state,
the machine accepts 5013.
1.13 Definition The set of strings accepted by a Finite State machine M is the
language of that machine, L(M), or the language recognized, or decided, (or
accepted), by the machine.
(For Finite State machines, deciding a language is equivalent to recognizing it,
because the machine must halt. ‘Recognized’ is more the common term.)
1.14 Definition For any Finite State machine with transition function ∆ : Q × Σ → Q ,
the extended transition function ∆ ˆ : Σ∗ → Q gives the state in which the machine
ends, after starting in the start state and consuming the given string.
† ‡
As with Turing machines, read the symbol ⊢ aloud as “yields in one step.” Read the symbol ⊢∗ as
“yields eventually” or simply “yields.”
Section 1. Finite State Machines 185
ˆ
1.15 Example This machine’s extended transition function ∆
b a b ∆ a b
a q0 q1 q0
q0 a
q1 q2 + q1 q1 q2
b
q2 q1 q2
extends its ordinary transition function ∆ in that it repeats the first row of ∆’s table.
ˆ a) = q 1
∆( ˆ b) = q 0
∆(
ˆ ’s input length
(We disregard the difference between ∆’s input characters and ∆
ˆ
one strings.) Here is ∆’s effect on the length two strings.
ˆ aa) = q 1 ∆(
∆( ˆ ab) = q 2 ∆(
ˆ ba) = q 1 ∆(
ˆ bb) = q 0
IV.1 Exercises
✓ 1.16 Using this machine, trace through the computation when the input is (a) abba
(b) bab (c) bbaabbaa.
b a b
a
q0 a
q1 q2
b
1.17 True or false: because a Finite State is finite, its language must be finite.
1.18 Your classmate says, “I have a language L that recognizes the empty string ε .”
Straighten them out.
✓ 1.19 How many transitions does an input string of length n cause a Finite State
machine to undergo? n ? n + 1? n − 1? How many (not necessarily distinct)
states will the machine have visited after consuming the string?
1.20 Rebut “no Finite State machine can recognize the language { an b n ∈ N }
because n is infinite.”
186 Chapter IV. Automata
✓ 1.21 For each of these formal descriptions of a language, give a one or two sentence
English-language description. Also list five strings that are elements as well as
five that are not, if there are five.
∗
(a) L = {α ∈ { a, b } α = an ban for n ∈ N }
∗
(b) { β ∈ { a, b } β = an bam for n, m ∈ N }
∗
(c) { ban ∈ { a, b } n ∈ N }
∗
(d) { an ban+2 ∈ { a, b } n ∈ N }
∗ ∗
(e) {γ ∈ { a, b } γ has the form γ = α ⌢ α for α ∈ { a, b } }
✓ 1.22 For the machines of Example 1.6, Example 1.7, Example 1.8, and Example 1.9,
answer these. (a) What are the accepting states? (b) Does it recognize the
empty string ε ? (c) What is the shortest string that each accepts? (d) Is the
language of accepted strings infinite?
1.23 Modify the machine of Example 1.9 so that it accepts only non-empty strings.
✓ 1.24 The best way to develop a Finite State machine is to think about what each
state is doing, what it means. For each language, name five strings that are in the
language and five that are not (the alphabet is Σ = { a, b }). Then design a Finite
State machine that will recognize the language by first giving a one-sentence
English description of each state that you use. Then give both a circle diagram
and a table for the transition function.
(a) L1 = {σ ∈ Σ∗ σ has at least one a and at least one b }
(c) L3 = {σ ∈ Σ∗ σ ends in ab }
(d) L4 = { an bm ∈ Σ∗ n, m ≥ 2 }
(e) L5 = { an bm ap ∈ Σ∗ m = 2 }
1.25 Produce the transition graph picturing this transition function. What is the
language of this machine?
∆ a b
q0 q2 q1
+ q1 q0 q2
q2 q2 q2
a a a,b
✓ 1.27 Give a Finite State machine over Σ = { a, b, c } that accepts any string
containing the substring abc. As in Example 1.6, give a brief description of each
state’s intuitive meaning in the machine.
1.28 Consider the language of strings over Σ = { a, b } containing at least two a’s
and at least two b’s. Name five elements of the language, and five non-elements.
Section 1. Finite State Machines 187
Give a Finite State machine recognizing this language. As in Example 1.6, give a
brief description of the intuitive meaning of each state.
✓ 1.29 For each language, name five strings in the language and five that are not (if
there are not five, name as many as there are). Then give a transition graph and
State machine recognizing the language. Use Σ = { a, b }.
table for a Finite
∗
(a) {σ ∈ Σ σ has at least two a’s }
(b) {σ ∈ Σ∗ σ has exactly two a’s }
1.30 Produce a Finite State machine over the alphabet Σ = { A, ... Z, 0, ... 9 } that
accepts only the string 911, and a machine that accepts any string but that one.
1.31 Using Example 1.15, apply the extended transition function to all of the length
three and length four string inputs.
✓ 1.32 Consider a language of comments, which begin with the two-character string
/#, end with #/, and have no #/ substrings in the middle. Give a Finite State
machine to recognize that language. (Just producing the transition graph is
enough.)
1.33 For each language, give five strings from that language and five that are not
(if there are not that many then list all of the strings that are possible). Also give
a Finite State machine that recognizes the language. Use Σ = { a, b }.
∗
(a) L = {σ ∈ { a, b } σ ends in aa }
∗
(b) {σ ∈ { a, b } σ = ε }
∗
(c) {σ ∈ { a, b } σ = a3 b or σ = ba3 }
∗
(d) {σ ∈ { a, b } σ = an or σ = bn for n ∈ N }
1.34 What happens when the input to an extended transition function is the empty
string?
✓ 1.35 Produce a FiniteState machine that recognizes each.
∗
(a) {σ ∈ { 0, ... 9 } σ has either no 0’s or no 2’s }
∗
(b) {σ ∈ { 0, ... 9 } σ is the decimal representation of a multiple of 5 }
✓ 1.36 Give a Finite State machine over the alphabet Σ = { A, ... Z } that accepts
only strings in which the vowels occur in ascending order. (The traditional vowels,
in ascending order, are A, E, I, O, and U.)
✓ 1.37 Consider this grammar.
⟨real⟩ → ⟨posreal⟩ | + ⟨posreal⟩ | - ⟨posreal⟩
⟨posreal⟩ → ⟨natural⟩ | ⟨natural⟩ . | ⟨natural⟩ . ⟨natural⟩
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨natural⟩
⟨digit⟩ → 0 | . . . 9
(a) Give five strings that are in its language and five that are not. (b) Is the
188 Chapter IV. Automata
string .12 in the language? (c) Briefly describe the language. (d) Give a Finite
State machine that recognizes the language.
1.38 Produce five strings in each language and five that are not. Also produce a
Finite State machine to recognize it.
∗
(a) {σ ∈ B every 1 in σ has a 0 just before it and just after }
(b) {σ ∈ B∗ σ represents a number divisible by 4 in binary }
∗
(c) {σ ∈ { 0, ... 9 } σ represents an even number in decimal }
∗
(d) {σ ∈ { 0, ... 9 } σ represents a multiple of 100 in decimal }
describe a Finite State machine; you need not give the full graph or table.
1.40 As in Example 1.12, find the computation for the multiple of three machine
with the initial string 2332.
1.41 We will through the formal definition of the extended transition function
that follows Definition 1.14 by applying it to the machine in Example 1.6. (a) Use
ˆ 0) and ∆(
the definition to find ∆( ˆ 1). (b) Use the definition to find ∆
ˆ ’s output on
inputs 00, 01, 10, and 11. (c) Find its action on all length three strings.
✓ 1.42 Produce a Finite State machine that recognizes the language over Σ = { a, b }
containing no more than one occurrence of the substring aa. That is, it may
contain zero-many such substrings or one, but not two. Note that the string aaa
contains two occurrences of that substring.
1.43 Let Σ = B. (a) List all of the different Finite State machines over Σ
with a single state, Q = {q 0 }. (Ignore whether a state is final or not; we
will do that below.) (b) List all the the ones with two states, Q = {q 0 , q 1 }.
(c) How many machines are there with n states? (d) What if we distinguish
between machines with different sets of final states?
✓ 1.44 Propositiones ad acuendos iuvenes (problems to sharpen the young) is the
oldest collection of mathematical problems in Latin. It is by Alcuin of York
(735–804), royal advisor to Charlemagne and head of the Frankish court school.
One problem, Propositio de lupo et capra etfasciculo cauli, is particularly famous: A
man had to transport to the far side of a river a wolf, a goat, and a bundle of
cabbages. The only boat he could find was one that could carry only two of them.
For that reason, he sought a plan which would enable them all to get to the far side
unhurt. Let him, who is able, say how it could be possible to transport them safely.
A wolf cannot be left alone with a goat nor can a goat be left alone with cabbages.
Construct the relevant Finite State machine and use it to solve the problem.
1.45 Consider a variant on a Finite State machine, where the set of input strings
is bounded.
(a) In Rock-Paper-Scissors, two players simultaneously produce one of R, P,
or S. A player with R beats a player with S, and S beats P, and P beats R
(a repeat is a do-over). Encode a game with the two-character string
Section 2. Nondeterminism 189
σ = ⟨player one’s play, player two’s play⟩ . Make a machine that recognizes
a win for player one.
(b) Make a machine that accepts a Tic-Tac-Toe game that is a win for the X’s. A
board has nine squares so encode a game instance with length nine strings.
1.46 Show, as suggested by Example 1.8, that for any finite language there is a
Finite State machine recognizing that language.
1.47 There are languages not recognized by any Finite State machine. Fix an
alphabet Σ with at least two members. (a) Show that the number of Finite
State machines with that alphabet is infinite. (b) Show that it is countable.
(c) Show that the number of languages over that alphabet is uncountable.
Section
IV.2 Nondeterminism
Turing machines and Finite State machines both have the property that the next
state is completely determined by the current state and current character. Once
you lay out an initial tape and push Start then you just walk through the steps like,
well . . . , like an automaton. We now consider machines that are nondeterministic,
where from any configuration the machine could move to more than one next state,
or to just one, or even to no state at all.
Motivation Imagine a grammar with some rules and start symbol. We are given a
string and asked if has a derivation. The challenge to these problems is that you
sometimes have to guess which path the derivation should take. For instance, if
you have S → aS | bA then from S you can do two different things; which one
will work?
In the grammar section’s derivation exercises, we expect that an intelligent
person have the insight to guess the right way. However, if instead you were writing
a program then you might have it try every case; you might do a breadth-first
traversal of the tree of all derivations, until you found a success.
The American philosopher and Hall of Fame baseball catcher Y Berra
said, “When you come to a fork in the road, take it.” That’s a natural way
to attack this problem: when you come up against multiple possibilities,
fork a child for each. Thus, the routine might begin with the start state S
and for each rule that could apply it spawns a child process, deriving a
string one removed from the start. After that, each child finds each rule
that could apply to its string and spawns its own children, each of which
Yogi Berra
now has a string that is two removed from the start. Continue until the
1925–2015
desired string σ appears, if it ever does.
The prototypical example is the celebrated Traveling Salesman problem, that of
finding the shortest circuit of every city in a list. Start with a map of the roads in
the US lower forty eight states. We want to know if there is a trip that visits each
190 Chapter IV. Automata
state capital and returns back to where it began in, say, less than 16 000 kilometers.
We’ll start at Montpelier, the capital of Vermont. From there, for each potential next
capital we could fork a process, making forty seven new processes. The process
that is assigned Concord, New Hampshire, for instance, would know that the trip
so far is 188 kilometers. In the next round, each child would fork its own child
processes, forty six of them. For instance the process that after Montpelier was
assigned Concord would have a child assigned to Augusta, Maine and would know
that so far the trip is 452 kilometers. At the end, if there is a trip of less than
16 000 kilometers then some process knows it. There will be lots of processes and
many of them will have failed to find a short trip, but if even one succeeds then we
consider the overall search a success.
This computation is nondeterministic in that while it is happening
the machine is simultaneously in many different states. It imagines an
unboundedly-parallel machine, where any time you have a job for an
additional computing agent, a CPU, you can allocate one.† Think of
such a machine as angelic in that whenever it wants more computational
resources, such as being able to allocate new children, those resources just
appear.
This section considers nondeterministic Finite State machines. (Non- Persian angel,
deterministic Turing machines appear in the fifth chapter.) We will have 1555
‡
two ways to think about nondeterminism, two mental models. The first
was introduced above: when such a machine is presented with multiple possible
next states then the it forks, so that it is in all of them simultaneously. The next
example illustrates.
2.1 Example The Finite State machine below is nondeterministic because leaving q 0
are two arrows labeled 0. It also has states with a deficit of edges; e.g., no 1 arrow
leaves q 1 , so if it is in that state and reads that input then it passes to no state at all.
0,1
q0 q1 q2 q3
0 0 1
The animation shows what happens with input 00001. We take the computation
history as a tree. For example, on the first 0 the computation splits in two.
†
This echoes our experience with everyday computers, when we are writing an email in one window
and watching a video in another. The machine appears to be in multiple states simultaneously although
it may in fact be time-slicing, dovetailing by running each process in succession for a few ticks. ‡ While
these models are helpful in learning and thinking about nondeterminism, they are not part of the
formal definitions and proofs.
Section 2. Nondeterminism 191
Input
0 0 0 0 1
q0
q0 ⊢
⊢
q0
⊢ q1
⊢
q0
⊢ q1 q2 q3
⊢ ⊢ ⊢
q0
⊢ q1 q2
⊢ ⊢
q0
⊢ q1 q2
⊢
0 1 2 3 4 5
Step
In this nondeterministic machine the entries of the array are not states, they are
sets of states.
Nondeterministic machines may seem conceptually fuzzy so the formalities are
a help. Contrast these definitions with the ones for deterministic machines.
A configuration is a pair C = ⟨q, τ ⟩ , where q ∈ Q and τ ∈ Σ∗ . A machine starts
with an initial configuration C0 = ⟨q 0 , τ0 ⟩ . The string τ0 is the input.
Following the initial configuration there may be one or more sequences of
transitions. Suppose that there is a machine configuration Cs = ⟨q, τs ⟩ . For
s ∈ N+ , in the case where τs is not the empty string, a transition pops the string’s
leading symbol c to get τs+1 , takes the machine’s next state to be a member q̂ of
the set ∆(q, c) and then takes a subsequent configuration to be Cs+1 = ⟨q̂, τs+1 ⟩ .
Denote that two configurations are connected by a transition with Cs ⊢ Cs+1 .
The other case is that τs is the empty string. This is a halting configuration, Ch .
After Ch , no transitions follow.
A nondeterministic Finite State machine computation is a sequence of transitions
Section 2. Nondeterminism 193
Because it ends in an accepting state, the machine accepts the initial string, 00001.
2.6 Definition For a nondeterministic Finite State machine M, the set of accepted
strings is the language of the machine L(M), or the language recognized, (or
accepted), by that machine.†
ˆ : Σ∗ → Q .
We will also adapt the definition of the extended transition function ∆
Fix a nondeterministic M with transition function ∆ : Q × Σ → Q . Start with
ˆ
∆(ε) = {q 0 }. Where ∆(τˆ ) = {qi , qi , ... qi } for τ ∈ Σ∗ , define ∆(τˆ ⌢ t) =
0 1 k
∆(qi 0 , t) ∪ ∆(qi 1 , t) ∪ · · · ∪ ∆(qi k , t) for any t ∈ Σ. Then the machine accepts
σ ∈ Σ∗ if and only if any element of ∆(σ ˆ ) is a final state.
a,b q0 a
q1
b a
q2 q3 a,b
b
is the set of strings containing the substring aa or bb. For instance, the machine
accepts abaaba because there is a sequence of allowed transitions ending in an
accepting state, namely this one.
recognizes the language { (ac)n n ∈ N } = {ε, ac, acac, ... }. The symbol b isn’t
†
Below we will define something called ε transitions that make ‘recognized’ the right idea here, instead
of ‘decided’.
194 Chapter IV. Automata
2.9 Example This nondeterministic machine that accepts any string whose next to
last character is a
a,b q0 a
q1 q2
a,b
0,1
q0 q1 q2
0 1
2.11 Example This is a garage door opener listener that waits to hear the re-
mote control send the signal 0101110. That is, it recognizes the language
{σ ⌢ 0101110 σ ∈ B∗ }.
0,1 q0 q1 q2 q3 q4 q5 q6 q7
0 1 0 1 1 1 0
But the machine’s chain of states is set up for a string, 0101110, that begins with
two sets of 01’s, while τ begins with three. How can it guess that it should ignore
the first 01 but act on the second? Of course, in mathematics we can consider
whatever we can define precisely. However we have so far studied what can be
done by devices that are in principle physically realizable so this may seem to be a
shift.
However, we will next show how to convert any nondeterministic Finite State
machine into deterministic one that does the same job. So we can think of a
nondeterministic Finite State machine as an abbreviation, a convenience. This
obviates at least some of the paradox of guessing, at least for Finite State machines.
q0 +,-,ε
q1 q2 0,...9
1,...,9
Because of the ε it can accept strings that do not start with a + or - sign. For
instance, with input 123 the machine can begin by following the ε transition to
state q 1 , then read the 1 and transition to q 2 , and stay there while processing the
2 and 3. This is a branch of the computation tree accepting the input, and so the
string 123 is in the machine’s language.
2.14 Example A machine may follow two or more ε transitions. From q 0 this machine
may stay in that state, or transition to q 2 , or q 3 , or q 5 , all without consuming any
input.
ε
q3 c
q4
ε
q0 a
q1 q2
b
ε
q5 q6
d
That is, the language of this machine is the four element set L = { abc, abd, ac, ad }.
We can give a precise definition of the action of a nondeterministic Finite State
machine with ε transitions.
First we define the collection of states reachable by ε moves from a given state.
For that we use E : Q × N → P (Q) where E(q, i) is the set of states reachable
from q within at most i -many ε transitions. That is, set E(q, 0) = {q } and where
E(q, i) = {qi 0 , ... qi k }, set E(q, i + 1) = E(q, i) ∪ ∆(qi 0 , ε) ∪ · · · ∪ ∆(qi k , ε). Observe
that these are nested, E(q, 0) ⊆ E(q, 1) ⊆ · · · and that each is a subset of Q .
But Q has only finitely many states so there must be an iˆ ∈ N where the sequence
of sets stops growing, E(q, iˆ ) = E(q, iˆ + 1) = · · · . Define the ε closure function
Ê : Q → P (Q) by Ê(q) = E(q, iˆ).
With that, we are ready to describe the machine’s action. As before, a
configuration is a pair C = ⟨q, τ ⟩ , where q ∈ Q and τ ∈ Σ∗ . A machine starts with
some initial configuration C0 = ⟨q 0 , τ0 ⟩ , where the string τ0 . is the input.
The key description is that of a transition. Consider a configuration Cs = ⟨q, τs ⟩
for s ∈ N and suppose that τs is not the empty string. We will describe a
configuration Cs+1 = ⟨q̂, τs+1 ⟩ that is related to the given one by Cs ⊢ Cs+1 . (As
with the earlier description of nondeterministic machines without ε transitions,
there may be more than one configuration related in this way to Cs .)
† ‡
Assume ε < Σ Or, think of it as transitioning on consuming the empty string ε .
196 Chapter IV. Automata
The string is easy; just pop the leading character to get τs = t ⌢ τs+1 where
t ∈ Σ. To get a legal state q̂ : (i) find the ε closure Ê(q) = {qs0 , ... qsk }, (ii) let q̄
be an element of the set ∆(qs0 , t) ∪ ∆(qs1 , t) ∪ · · · ∪ ∆(qsk , t), and (iii) take q̂ to be
an element of the ε closure Ê(q̄).
If τs is the empty string then this is a halting configuration, Ch . No transitions
follow Ch .
A nondeterministic Finite State machine computation is a sequence of transitions
ending in a halting configuration, C0 = ⟨q 0 , τ0 ⟩ ⊢ C1 ⊢ C2 ⊢ · · · Ch = ⟨q, ε⟩ . From a
given C0 there may be many such sequences. If at least one ends with a halting
state, having q ∈ F , then the machine accepts the input τ0 , otherwise it rejects τ0 .
With that, we will modify the definition of the extended transition func-
tion ∆ˆ : Σ∗ → Q from section 2. Begin by defining ∆(ε) ˆ = Ê(q 0 ). Then the
rule for going from a string to its extension is that for τ ∈ Σ∗ and where
ˆ ) = {qi , qi , ... qi }.
∆(τ 0 1 k
⟨q 0 , abc⟩ ⊢ ⟨q 1 , bc⟩ ⊢ ⟨q 2 , c⟩ ⊢ ⟨q 3 , c⟩ ⊢ ⟨q 4 , ε⟩
(note the ε transition between q 2 and q 3 ). This sequence shows it also accepts the
input string d.
⟨q 0 , d⟩ ⊢ ⟨q 5 , d⟩ ⊢ ⟨q 6 , ε⟩
As with nondeterministic machines, one reason that we use ε transitions is that
they can make solving a complex job much easier.
2.17 Example An ε transition can put two machines together with a parallel connection.
This shows a machine whose states are named with q ’s combined with one whose
states are named with r ’s.
a,b
q0 a
q1 q2
b
ε
s0 c
ε
r0 r1
a
Section 2. Nondeterminism 197
is {σ ∈ Σ∗ σ ends in ab } and
The top nondeterministic machine’s language
the bottom machine’s language is {σ ∈ Σ σ = (ac)n for some n ∈ N }, where
∗
2.18 Example An ε transition can also make a serial connection between machines.
The machine on the left below recognizes the language { (aab)m m ∈ N } and the
q2 a q1 q4 q5
b
a a
b a
q0 q3
If we insert an ε bridge from each of the left side’s final states (in this example
there happens to be only one such state) to the right side’s initial state
q2 a q1 q4 q5
b
a a
b a
q0 ε
q3
then the combined machine accepts strings in the concatenation of those languages.
q0 a
q1 q2
b
Equivalence of the machine types We next prove that nondeterminism does not
change what we can do with Finite State machines.
2.20 Theorem The class of languages recognized by nondeterministic Finite State
machines equals the class of languages recognized by deterministic Finite State
198 Chapter IV. Automata
q0 a
q1 q2
b
∆D a b
s0 = { } s0 s0
+ s 1 = {q 0 } s4 s0
s 2 = {q 1 } s0 s3
+ s 3 = {q 2 } s0 s0
+ s 4 = {q 0 , q 1 } s4 s3
+ s 5 = {q 0 , q 2 } s4 s0
+ s 6 = {q 1 , q 2 } s0 s3
+ s 7 = {q 0 , q 1 , q 2 } s4 s3
b b
s2 s3 s4 a
s7
a
b
If the nondeterministic machine has k -many states then under this construction
the deterministic machine has 2k -many states. Typically many of them can be
eliminated. For instance, in the above machine the state s 6 is clearly unreachable
since there are no arrows into it. The next section covers minimizing the number
of states in a machine.
We next expand that construction to cover ε transitions. Basically, we follow
those transitions. For example, the start state of the deterministic machine is the
ε closure of {q 0 }, the set of the states of MN that are reachable by a sequence
of ε transitions from q 0 . In addition, suppose that we have a nondeterministic
machine and we are constructing the associated deterministic machine’s next-state
function ∆D , that the current configuration is si = {qi 1 , qi 2 , ... } and that the
machine is reading a. If there is a ε transition from qi j to some q then to the set of
next states add ∆ N ({q }, a), and in fact add the entire ε closure.
2.22 Example Consider this nondeterministic machine.
ε
q0 q1
b
a b
ε
q2 q3 a
To find the set of next states, follow the ε transitions. For instance, suppose that
this machine is in q 0 and the next tape character is a. The arrow on the left takes
the machine from q 0 to q 2 . Alternatively, following the ε transition from q 0 to q 3
and then reading the a gives q 3 . So the machine is next in the two states q 0 and q 3 .
These are the ε closures.
state q q0 q1 q2 q3
ε closure Ê(q) {q 0 , q 3 } {q 0 , q 1 } {q 2 } {q 3 }
The full deterministic machine is on the next page. The start state is the
ε closure of {q 0 }, the state s 7 = {q 0 , q 3 }. A state is accepting if it contains any
200 Chapter IV. Automata
∆D a b
s0 = { } s0 s0
+ s 1 = {q 0 } s 10 s2
+ s 2 = {q 1 } s 10 s2
s 3 = {q 2 } s0 s7
s 4 = {q 3 } s4 s0
+ s 5 = {q 0 , q 1 } s 10 s 12
+ s 6 = {q 0 , q 2 } s 10 s 12
+ s 7 = {q 0 , q 3 } s 10 s 12
+ s 8 = {q 1 , q 2 } s 10 s 12
+ s 9 = {q 1 , q 3 } s 10 s2
s 10 = {q 2 , q 3 } s4 s7
+ s 11 = {q 0 , q 1 , q 2 } s 10 s 12
+ s 12 = {q 0 , q 1 , q 3 } s 10 s 12
+ s 13 = {q 0 , q 2 , q 3 } s 10 s 12
+ s 14 = {q 1 , q 2 , q 3 } s 10 s 12
+ s 15 = {q 0 , q 1 , q 2 , q 3 } s 10 s 12
IV.2 Exercises
2.23 Give the transition function for the machine of Example 2.7, and of Exam-
ple 2.8.
✓ 2.24 Consider this machine.
q0 q1 q2 1
0,1 1
(a) Does it accept the empty string? (b) The string 0? (c) 011? (d) 010?
(e) List all length five accepted strings.
2.25 Your class has a jerk who has adopted a world-weary pose and who interjects,
“Isn’t all this machine-guessing stuff just mathematical abstractions that are not
real?” How should the prof respond?
Section 2. Nondeterminism 201
2.26 Your friend objects, “Epsilon transitions don’t make any sense because the
machine below will never get its first step done; it just endlessly follows the
epsilons.” Correct their mistake.
ε
q0 q1 q2
b
ε ,a
✓ 2.27 Give the transition graph of a nondeterministic Finite State machine that
accepts valid North American local phone numbers, strings of the form d 3 -d 4 ,
with three digits, followed by a hyphen character, and then four digits.
2.28 Draw the transition graph
of a nondeterministic machine that recognizes the
language {σ = τ0τ1τ2 ∈ B∗ τ0 = 1, τ2 = 1, and τ1 = (00)k for some k ∈ N }.
2.29 This machine has Σ = { a, b }.
b
a q0 q1 ε
q2 b
a,b
(a) What is the ε closure of q 0 ? Of q 1 ? q 2 ? (b) Does it accept the empty string?
(c) a? b? (d) Show that it accepts aab by producing a suitable sequence of ⊢
relations. (e) List five strings of minimal length that it accepts. (f) List five of
minimal length that it does not accept.
2.30 Produce the table description of the next-state function ∆ for the machine in
the prior exercise. It should have three columns, for a, b, and ε .
2.31 Consider this machine.
0 1
q0 q1 q2
0 1
∆ a b
q0 {q 0 } {q 1 , q 2 }
q1 {q 3 } {q 3 }
q2 {q 1 } {q 3 }
+ q3 {q 3 } {q 3 }
gives the next-state function for a nondeterministic Finite State machine. (a) Draw
the transition graph. (b) What is the recognized language? (c) Give the
next-state table for a deterministic machine that recognizes the same language.
✓ 2.34 Draw the graph of a nondeterministic Finite State machine over B that
accepts strings with the suffix 111000111.
✓ 2.35 For each draw the transition graph for a Finite State machine, which may be
nondeterministic, that accepts the given strings from { a, b }∗ .
(a) Accepted strings have a second character of a and next to last character of b.
(b) Accepted strings have second character a and the next to last character is
also a.
✓ 2.36 Make a table giving the ε closure function Ê for the machine in Example 2.14.
✓ 2.37 Find the nondeterministic Finite State machine that accepts all bitstrings that
begin with 10. Use the algorithm given above to produce the transition function
table of a deterministic machine that does the same.
2.38 Find a nondeterministic Finite State machine that recognizes this language
of three words: L = { cat, cap, carumba }.
2.39 Give a nondeterministic Finite State machine over Σ = { a, b, c } that rec-
ognizes the language of strings that omit at least one of the characters in the
alphabet.
✓ 2.40 What is the language of this nondeterministic machine with ε transitions?
a
a q0 b q1 q2 b
0 q0 q1 q2 0 0,1
0 0,1
q1
Section 3. Regular expressions 203
A → aA |bB
S A B F
B → bB |b a b b
(a) Give three strings from the language of the grammar and show that they are
accepted by the machine.
(b) Describe the language of the grammar and the machine.
2.47 Decide whether each problem is solvable or unsolvable by a Turing machine.
(a) LD F A = { ⟨M, σ ⟩ the deterministic Finite State machine M accepts σ }
(b) LN F A = { ⟨M, σ ⟩ the nondeterministic machine M accepts σ }
(Assume that the machine M is described in some reasonably way, say, by using
bit strings to encode the transition function.)
2.48 (a) For the machine of Example 2.22, for each q ∈ Q produce E(q, 0), E(q, 1),
E(q, 2), and E(q, 3). List Ê(q) for each q ∈ Q .
(b) Do the same for Exercise 2.29’s machine.
Section
IV.3 Regular expressions
204 Chapter IV. Automata
a
q0 q1 q2 Stephen
a
b
Kleene
1909–1994
accepts strings that have some number of b’s (perhaps zero many) followed
by at least one a, possibly then followed by at least one b and then at least
one a. Kleene introduced a convenient way, called regular expressions, to
denote constructs such as “any number of” and “followed by.” He gave the
definition in the first subsection below, and supported it with the theorem
in the second subsection.
That is, the rules for operator precedence are: star binds most tightly, then
concatenation, then the pipe alternation operator, |. To get another order, use
parentheses.
3.4 Definition Let Σ be an alphabet not containing any of the metacharacters ), (,
|, or *. A regular expression over Σ is a string that can be derived from this
grammar
⟨regex⟩ → ⟨concat⟩
| ⟨regex⟩ ‘|’ ⟨concat⟩
⟨concat⟩ → ⟨simple⟩
| ⟨concat⟩ ⟨simple⟩
⟨simple⟩ → ⟨char⟩
| ⟨simple⟩ *
| ( ⟨regex⟩ )
⟨char⟩ → | ε | x 0 | x 1 | . . .
where the x i characters are members of the alphabet Σ.†
As to their semantics, what regular expressions mean, we will define that
recursively. We start with the bottom line, the single-character regular expressions,
and give the language that each describes. We will then do the forms on the other
lines, for each interpreting it as the description of a language.
The language described by the single-character regular expression is the
empty set, L() = . The language described by the regular expression consisting
of only the character ε is the one-element language consisting of only the empty
string, L(ε) = {ε }. If the regular expression consists of just one character from the
alphabet Σ then the language that it describes contains only one string and that
string has only that single character, as in L(a) = { a }.
We finish by defining the semantics of the operations. Start with regular
expressions R 0 and R 1 describing languages L(R 0 ) and L(R 1 ). Then the pipe
symbol describes the union of the languages, so that L(R 0 |R 1 ) = L(R 0 ) ∪ L(R 1 ).
Concatenation of the regular expressions describes concatenation of the languages,
L(R 0⌢R 1 ) = L(R 0 )⌢L(R 1 ). And, the Kleene star of the regular expression describes
the star of the language, L(R 0 ∗ ) = L(R 0 )∗.
3.5 Example Consider the regular expression aba* over Σ = { a, b }. It is the
concatenation of a, b, and a*. The first describes the single-element language
L(a) = { a }. Likewise, the second describes L(b) = { b }. Thus, the string ab
†
As we have done with other grammars, here we use the pipe symbol | as a metacharacter, to collapse
rules with the same left side. But pipe also appears in regular expressions. For that usage we wrap it in
single quotes, as ‘|’.
206 Chapter IV. Automata
= { ab }
The regular expression a* describes the star of the language L(a), namely L(a*) =
n
{ a n ∈ N }. Concatenating it with L(ab) gives this.
We finish this subsection with some constructs that appear often. These
examples use Σ = { a, b, c }.
3.6 Example Describe the language consisting of strings of a’s whose length is a
multiple of three, L = {a 3k k ∈ N } = {ε, aaa, aaaaaa, ... }, with the regular
expression (aaa)*.
Note that the empty string is a member of that language. A common gotcha is
to forget that star is for any number of repetitions, including zero-many.
3.7 Example To match any character we can list them all. The language consisting
of three-letter words ending in bc is { abc, bbc, cbc }. The regular expression
(a|b|c)bc describes it. (In practice the alphabet can be very large so that listing
all of the characters is impractical; see Extra A.)
3.8 Example The regular expression a*(ε |b) describes the language of strings that
have any number of a’s and optionally end in one b, L = {ε, b, a, ab, aa, aab, ... }.
Similarly, to describe the language consisting of words with between three and
five a’s, L = { aaa, aaaa, aaaaa } we can use aaa(ε |a|aa).
3.9 Example The language { b, bc, bcc, ab, abc, abcc, aab, ... } has words starting
with any number of a’s (including zero-many a’s), followed by a single b, and then
ending in fewer than three c’s. To describe it we can use a*b(ε |c|cc).
Kleene’s Theorem The next result justifies our study of regular expressions
because it shows that they describe the languages of interest.
3.10 Theorem (Kleene’s theorem) A language is recognized by a Finite State
machine if and only if that language is described by a regular expression.
We will prove this in separate halves. The proofs use nondeterministic machines
but since we can convert those to deterministic machines, the result holds there
also.
3.11 Lemma If a language is described by a regular expression then there is a Finite
State machine that recognizes that language.
Section 3. Regular expressions 207
Proof We will show that for any regular expression R there is a machine that
accepts strings matching that expression. We use induction on the structure of
regular expressions.
Start with regular expressions consisting of a single character. If R = then
L(R) = { } and the machine on the left below recognizes L(R). If R = ε then
L(R) = {ε } and the machine in the middle recognizes this language. If the regular
expression is a character from the alphabet, such as R = a, then the machine on
the right works.
q0 q0 q0 a
q2
3.12 Example Building a machine for the regular expression ab(c|d)(ef)* starts with
machines for the single characters.
q4 c q5
ε ε
q0 a q1 ε q2 b q3 q12
q8 e q9 ε q10 f q11
ε
q6 d q7
q4 c q5 ε
ε ε
q0 a q1 ε q2 b q3 ε q12 q8 e q9 ε q10 f q11
ε ε
q6 d q7
208 Chapter IV. Automata
qi a
q qo qi qo
b ab
In the after picture the edge is labeled ab, with more than just one character. For
the proof we will generalize transition graphs to allow edge labels that are regular
expressions. We will eliminate states , keeping the recognized language the same.
We will be done when there remain only two states, with one edge between them.
That edge’s label is the desired regular expression.
Before the proof, an example. Consider the machine on the left below.
b b
q1 q1
a c a c
q0 q2 e ε
q0 q2 ε f
d d
The proof starts as above on the right by introducing a new start state guaranteed
to have no incoming edges, e , and a new final state guaranteed to be unique, f .
Then the proof eliminates q 1 as below.
e ε
q0 q2 ε f
d|(ab*c)
Clearly this machine recognizes the same language as the starting machine.
Proof Call the machine M. If it has no accepting states then the regular expression
is and we are done. Otherwise, we will transform M to a new machine, M̂,
with the same language, on which we can execute the state-elimination strategy.
First we arrange that M̂ has a single accepting state. Create a new state f and
for each of M’s accepting states make a ε transition to f (by the prior paragraph
there is at least one such accepting state). Change all the accepting states to
non-accepting ones and then make f accepting.
Next introduce a new start state, e . Make a ε transition between it and q 0 ,
(Ensuring that M̂ has at least two states allows us to handle machines of all sizes
uniformly.)
Because the edge labels are regular expressions, we can arrange that from
any qi to any q j is at most one edge, because if M has more than one edge then
in M̂ use the pipe, |, to combine the labels, as here.
Section 3. Regular expressions 209
a
qi qj qi qj
a|b
b
Do the same with loops, that is, cases where i = j . Like the prior transformations,
clearly this does not change the language of accepted strings.
The last part of transforming to M̂ is to drop any useless states. If a state
node other than f has no outgoing edges then drop it along with the edges into it.
The language of the machine will not change because this state cannot lead to an
accepting state, since it doesn’t lead anywhere, and this state is not itself accepting
as only f is accepting.
Along the same lines, if a state node q is not reachable from the start e then
can drop that node along with its incoming and outgoing edges. (The idea of
unreachable is clear but for a formal definition see Exercise 3.34.)
With that, M̂ is ready for state elimination. Below are before and after pictures.
The before picture shows a state q to be eliminated. There are states qi 0 , . . . qi j
with an edge leading into q , and states qo0 , . . . qok that receive an edge leading
out of q . (By the setup work above, q has at least one incoming and at least one
outgoing edge.) In addition, q may have a loop.
Ri , o R i , o |(R i R ℓ *R o 0 )
0 0 0 0 0
qi0 Ri , o
qo0 qi0 qo0
0 k R i , o |(R i R ℓ *R o )
Ro0 0 k 0 k
Ri
0
.. q
.. .. ..
. Rℓ
. . .
Ri
j
Ro
k
qi j Ri , o qok qi j R i , o |(R i R ℓ *R o 0 ) qok
j 0 j 0 j
Ri , o R i , o |(R i R ℓ *R o )
j k j k j k
(Here is a subtle point: possibly some of the states shown on the left of each of the
two pictures equal some shown on the right. For example, possibly qi 0 equals qo0 .
If so then the shown edge R i 0,o0 is a loop.)
Eliminate q and the associated edges by making the replacements shown in the
after picture. Observe that the set of strings taking the machine from any incoming
state qi to any outgoing state qo is unchanged. So the language recognized by the
machine is unchanged.
Repeat this elimination until all that remains are e and f , and the edge between
them. (The machine has finitely many states so this procedure must eventually
stop.) The desired regular expression is edge’s label.
3.14 Example Consider M on the left. Introduce e and f to get M̂ on the right.
210 Chapter IV. Automata
b
b
q2
q2
b a
b a
b
b
e q0 q1 f
q0 q1 ε ε
a
a
ε
Start by eliminating q 2 . In the terms of the proof’s key step, q 1 = qi 0 and q 0 = qo0 .
The regular expressions are R i 0 = a, Ro0 = b, R i 0,o0 = b, and R ℓ = b. That gives
this machine.
b|(ab*b)
e ε
q0 q1 ε f
a
ε
Next eliminate q 1 . There is one incoming node q 0 = qi 0 and two outgoing nodes
q 0 = qo0 and f = qo1 . (Note that q 0 is both an incoming and outgoing node; this
is the subtle point mentioned in the proof.) The regular expressions are R i 0 = a,
Ro0 = b|(ab*b), and Ro1 = ε .
ε |a(b|ab*b)
e ε
q0 f
ε |aε
All that remains is to eliminate q 0 . The sole incoming node is e = qi 0 and the sole
outgoing node is f = qo0 , and so R i 0 = ε , Ro0 = ε |aε , and R ℓ = ε |a(b|ab ∗ b).
e ε (ε |(a(b|ab*b)))*(ε |aε )
f
IV.3 Exercises
3.15 Decide if the string σ matches the regular expression R . (a) σ = 0010,
R = 0*10 (b) σ = 101, R = 1*01 (c) σ = 101, R = 1*(0|1) (d) σ = 101,
R = 1*(0|1)* (e) σ = 01, R = 1*01*
✓ 3.16 For each regular expression produce five bitstrings that match and five that do
not, or as many as there are if there are not five. (a) 01* (b) (01)* (c) 1(0|1)1
(d) (0|1)(ε |1)0* (e)
3.17 Give a brief plain English description of the language for each regular
expression. (a) a*cb* (b) aa* (c) a(a|b)*bb
✓ 3.18 For these regular expressions and for each element of { a, b }∗ that is of length
less than or equal to 3, decide if it is a match. (a) a*b (b) a* (c) (d) ε
(e) b(a|b)a (f) (a|b)(ε |a)a
Section 3. Regular expressions 211
3.19 For these regular expressions, decide if each element of B∗ of length at most 3
is a match. (a) 0*1 (b) 1*0 (c) (d) ε (e) 0(0|1)* (f) (100)(ε |1)0*
✓ 3.20 A friend says to you, “The point of parentheses is that you first do inside
the parentheses and then do what’s outside. So Kleene star means ‘match the
inside and repeat’, and the regular expression (0*1)* matches the strings 001001
and 010101 but not 01001 and 00000101, where the substrings are unequal.”
Straighten them out.
3.21 Produce a regular expression for the language of bitstrings with a substring
consisting of at least three consecutive 1’s.
3.22 Someone who sits behind you in class says, “I don’t get it. I got a regular
expression that I am sure is right. But the book got a different one.” Explain
what is up.
3.23 For each language, give five strings that are in the language and five that
are not. Then give a regular expression describing the language. Finally, give a
n b2m m, n ≥ 1 }
Finite State machine that accepts the language. (a) L 0 = { a
(b) L0 = { an b3m m, n ≥ 1 }
3.24 Give a regular expression for the language over Σ = { a, b, c } whose strings
are missing at least one letter, that is, the strings that are either without any a’s,
or without any b’s, or without any c’s.
3.25 Give a regular expression for each language. Use Σ = { a, b }∗ . (a) The set of
strings starting with b. (b) The set of strings whose second-to-last character is a.
(c) The set of strings containing at least one of each character. (d) The strings
where the number of a’s is divisible by three.
3.26 Give a regular expression to describe each language over the alphabet
Σ = { a, b, c }. (a) The set of strings starting with aba. (b) The set of strings
ending with aba. (c) The set of strings containing the substring aba.
✓ 3.27 Give a regular expression to describe each language over B. (a) The set
of strings of odd parity, where the number of 1’s is odd. (b) The set of strings
where no two adjacent characters are equal. (c) The set of strings representing
in binary multiples of eight.
✓ 3.28 Give a regular expression to describe each language over the alphabet
Σ = { a, b }. (a) Every a is both immediately preceded and immediately followed
by a b character. (b) Each string has at least two b’s that are not followed by an
a.
3.29 Give a regular expression for each language of bitstrings. (a) The number of
0’s is even. (b) There are more than two 1’s. (c) The number of 0’s is even and
there are more than two 1’s.
3.30 Give a regular expression to describe each language.
∗
(a) {σ ∈ { a, b } σ ends with the same symbol it began with, and σ , ε }
(b) { ai ba j i and j leave the same remainder on division by three }
212 Chapter IV. Automata
(a) a
(b) a
a b q2 q1 q3
q0 q1 q2 a
a b
b b a,b
✓ 3.35 Show that the set of languages over Σ that are described by a regular
expression is countable. Conclude that there are languages not recognized by
any Finite State machine.
3.36 Construct the parse tree for these regular expressions over Σ = { a, b }.
(a) a(b|c) (b) ab*(a|c)
3.37 Construct the parse tree for Example 3.3’s a(b|c)* and a(b*|c*).
✓ 3.38 Get a regular expression by applying the method of Lemma 3.13’s proof to
this machine.
a,b
b
q0 q1
a
(a) Get M̂ by introducing e and f . (b) Where q = q 0 , describe which state from
the machine is playing the diagram’s before picture role of qi 0 , which edge is R i 0 ,
etc. (c) Eliminate q 0 .
Section 4. Regular languages 213
3.39 Apply method of Lemma 3.13’s proof to this machine. At each step describe
which state from the machine is playing the role of qi 0 , which edge is R i 0 , etc.
0,1
q0 q1 q2
1 1
(a) Eliminate q 0 . (b) Eliminate q 1 . (c) q 2 (d) Give the regular expression.
3.40 Apply the state elimination method of Lemma 3.13’s proof to eliminate q 1 .
Note that each of the states q 0 and q 2 are of the kind described in the proof’s
comment on the subtle point.
B
A C
q0 q1 q2
E D
3.41 An alternative proof of Lemma 3.11 reverses the steps of Lemma 3.13. This is
she subset method. Start by labeling the single edge on a two-state machine with
the given regular expression.
e R
f
R1
qi qo =⇒ qi qo
R0 |R1 R0
qi ε
q ε
qo
qi qo =⇒
R*
R
Use this approach to get a machine that recognizes the language described by the
following regular expressions. (a) a|b (b) ca* (c) (a|b)c* (d) (a|b)(b*|a*)
Section
IV.4 Regular languages
We have seen that deterministic Finite State machines, nondeterministic Finite
State machines, and regular expressions all describe the same set of languages.
The fact that we can describe these languages in so many different ways suggests
that there is something natural and important about them.†
†
This is just like the fact that the equivalence of Turing machines, general recursive functions, and
all kinds of other models suggests that the computable sets form a natural and important collection.
Neither collection is just a historical artifact of what happened to be first explored.
214 Chapter IV. Automata
4.2 Lemma The number of regular languages over an alphabet is countably infinite.
The collection of languages over that alphabet is uncountable, and consequently
there are languages that are not regular.
Proof Fix an alphabet Σ. Recall that, as defined in Appendix A, any alphabet is
nonempty and finite. Thus there are infinitely many regular languages over that
alphabet, because every finite language is regular — just list all the cases as in
Example 1.8 — and there are infinitely many finite languages.
Next we argue that the number of regular languages is countable. This holds
because the number of regular expressions over Σ is countable: clearly there are
finitely many regular expressions of length 1, of length 2, etc. The union of those
is a countable union of countable sets, and so is countable.
We finish by showing that the set of languages over Σ, the set of all L ⊆ Σ∗, is
uncountable. The set Σ∗ is countably infinite by the argument of the prior two
paragraphs. The set of all L ⊆ Σ∗ is the power set of Σ∗ , and so has cardinality
greater than the cardinality of Σ∗ , which makes it uncountable.
Closure properties In proving Lemma 3.11, the first half of Kleene’s Theorem,
we showed that if two languages L0 , L1 are regular then their union L ∪ L1 is
regular, their concatenation L0 ⌢ L1 is regular, and the Kleene star L0 ∗ is regular
also. Briefly, where R 0 is a regular expression describing the language L0 and
R 1 describes L1 then the regular expression R 0 |R 1 describes L0 ∪ L1 , and R 0R 1
describes the concatenation L0 ⌢ L1 , and R 0 * describes L0 ∗.
Recall that a structure is closed under an operation if performing that operation
on its members always yields another member. The next result restates the above
paragraph in this language.
4.3 Lemma The collection of regular languages is closed under union, concatenation,
and Kleene star.
We can ask about the closure of regular languages under other operations. We
will use the product construction.
4.4 Example The machine on the left, M0 , accepts strings with fewer than two a’s.
The one on the right, M1 , accepts strings with an odd number of b’s.
a,b a a
b b
q0 q1 q2 s0 s1
a a
b
∆0 a b ∆1 a b
+ q0 q1 q0 s0 s0 s1
+ q1 q2 q1 + s1 s1 s0
q2 q2 q2
Consider a machine M whose states are
Q 0 × Q 1 = { (q 0 , s 0 ), (q 0 , s 1 ), (q 1 , s 0 ), (q 1 , s 1 ), (q 2 , s 0 ), (q 2 , s 1 ) }
∆ a b
(q 0 , s 0 ) (q 1 , s 0 ) (q 0 , s 1 )
(q 0 , s 1 ) (q 1 , s 1 ) (q 0 , s 0 )
(q 1 , s 0 ) (q 2 , s 0 ) (q 1 , s 1 )
(q 1 , s 1 ) (q 2 , s 1 ) (q 1 , s 0 )
(q 2 , s 0 ) (q 2 , s 0 ) (q 2 , s 1 )
(q 2 , s 1 ) (q 2 , s 1 ) (q 2 , s 0 )
This machine runs M0 and M1 in parallel. For instance, if we feed the string
aba to M, then the machine’s states go from (q 0 , s 0 ) to (q 1 , s 0 ), then to (q 1 , s 1 ),
and then to (q 2 , s 1 ). This is simply because M0 passes from q 0 to q 1 , then to q 1 ,
and then q 2 , while M1 does s 0 to s 0 , then to s 1 , and finally to s 1 .
The above table does not specify which states are accepting. Suppose that we
say that accepting states (qi , s j ) are the ones where both qi and s j are accepting.
Then by the prior paragraph, M accepts a string σ if both M0 and M1 accept it.
That is, this specification of accepting states causes M to accept the intersection of
the language of M0 and the language of M1 .
4.5 Theorem The collection of regular languages is closed under set intersection,
set difference, and set complement.
Proof Start with two Finite State machines, M0 and M1 , which accept languages
L0 and L1 over some Σ. Perform the product construction to get M. If the
accepting states of M are those pairs where both the first and second component
states are accepting, then M accepts the intersection of the languages, L0 ∩ L1 .
If the accepting states of M are those pairs where the first component state is
accepting but the second is not, then M accepts the set difference of the languages,
L0 − L1 . A special case of that is when L0 is the set of all strings, Σ∗ , whereby M
accepts the complement of L1 .
These closure properties often make it easier to show that a language is regular.
4.6 Example To show that the language
IV.4 Exercises
✓ 4.7 True or false? Obviously you must justify each answer.
(a) Every regular language is finite.
(b) Over B, the empty language is not regular.
(c) The intersection of two languages is regular.
(d) Over B, the language of all strings, B∗, is not regular.
(e) Every Finite State machine accepts at least one string.
(f) For every Finite State machine there is one that has fewer states but recognizes
the same language.
4.8 One of these is true and one is false. Which is which? (a) Any finite language
is regular. (b) Any regular language is finite.
4.9 Is {σ ∈ B∗ σ represents a power of 2 in binary } a regular language?
a,b
q0 q1 a
q2 s0 s1
b
a
b
give the transition table for the cross product. Sepcify the accepting states so that
the result will accept (a) the intersection of the languages of the two machines, and
(b) the union of the languages.
4.16 Find the machine that is the cross product of the second machine, M1 , from
Example 4.4, with itself.
Section 4. Regular languages 217
a a
b
s0 s1
b
with itself. Set the accepting states so that it accepts the same language, L1 .
4.17 One of our first examples of Finite State machines, Example 1.6, accepts a
string when it contains at least two 0’s as well as an even number of 1’s. Make
such a machine as a product of two simple machines.
4.18 For each, state True or False and give a justification.
(a) Every language is the subset of a regular language.
(b) The union of a regular language and a language that is not regular must be
not regular.
(c) Every language has a subset that is not regular.
(d) The union of two regular languages is regular, without exception.
4.19 Fill in the blank (with justification): The concatenation of a regular language
with a not-regular language regular. (a) must be (b) might be, or might
be not (c) cannot be
4.20 Where L is a language, define L+ as the language L ⌢ L∗ . Show that if L is
regular then so is L+.
4.21 True or false: all finite languages are regular, and there are countably many
finite languages, and there are countably many regular sets, so therefore all
regular languages are finite.
4.22 Use closure properties to show that if L is regular then the set of even-length
strings in L is also regular.
4.23 Example 4.6 shows that closure properties can make easier some arguments
that a language is regular. It can do the same for arguments that a language
is not regular. The next section shows that { an bn ∈ { a, b }∗ n ∈ N } is not
regular (this is a restatement of Example 5.2). Use that and closure properties to
show that {σ ∈ { a, b }∗ σ contains the same number of a’s as b’s } is not regular.
{e ∈ N ϕ e (e)↓}.
218 Chapter IV. Automata
4.26 Lemma 4.2 shows that the collection of regular languages over B is countable.
Show that not every individual language in it is countable.
✓ 4.27 An alternative definition of a regular language is one generated by a regular
grammar, where rewrite rules have three forms: X → tY, or X → t, or X → ε .
That is, the rule head has one nonterminal and rule body has a terminal followed
by a nonterminal, or possibly a single nonterminal or the empty string. This is an
example, with the language that it generates.
S → aS | bS
S → aA
L = {σ ∈ { a, b }∗ σ = τ ⌢ aa or σ = τ ⌢ aab }
A → aB
B → ε|b
Here we outline an algorithm that inputs a regular grammar and produces a
Finite State machine that recognizes the same language. Apply these steps to the
above grammar. (a) For each nonterminal X make a machine state q X , where the
start state is the one for the start symbol. (b) For each X → ε rule make state q X
accepting. (c) For each X → tY rule put a transition from q X to qY labeled t.
(d) If there are any X → t rules then make an accepting state q̄ , and for each
such rule put a transition from q X to q̄ labeled t.
4.28 We can give an alternative proof of Theorem 4.5, that the collection of regular
languages is closed under set intersection, set difference, and set complement,
that does not rely on a somewhat mysterious “by construction.”
c
(a) Observe that the identity S ∩ T = (S c ∪ T c ) gives intersection in terms of
union and complement. Use Lemma 4.3 to argue that if regular languages
are closed under complement then they are also closed under intersection.
(b) Use the identity S − T = S ∪ T c to make a similar observation about set
difference.
(c) Show that the complement of a regular languae is also a regular language.
4.29 Prove that the language recognized by a Finite State machine with n states
is infinite if and only if the machine accepts at least one string of length k , where
n ≤ k < 2n.
4.30 Fix two alphabets Σ0 , Σ1 . A function h : Σ0 → Σ1 ∗ induces a homomorphism
on Σ0 ∗ via the operation h(σ ⌢τ ) = h(σ ) ⌢h(τ ) and h(ε) = ε .
(a) Take Σ0 = B and Σ1 = { a, b } . Fix a homomorphism ĥ(0) = a and ĥ(1) = ba.
Find ĥ(01), ĥ(10), and
ĥ(101).
(b) Define h(L) = {h(σ ) σ ∈ Σ0 ∗ } . Let L̂ = {σ ⌢ 1 σ ∈ B∗ } ; describe it with a
regular expression. Using the homomorphism ĥ from the prior item, describe
ĥ(L̂) with a regular expression.
(c) Prove that the collection of regular languages is closed under homomorphism,
that if L is regular then so is h(L).
4.31 The proofs here works with deterministic Finite State machines. Find a
nondeterministic Finite State machine M so that producing another machine M̂
Section 5. Languages that are not regular 219
(a) Show that for any two strings the reversal of the concatenation is the
concatenation, in the opposite order, of the reversals (σ0 ⌢ σ1 ) R = σ1 R ⌢ σ0 R .
Hint: do induction on the length of σ1 .
(b) We will prove the result by showing that for any regular expression R , the
reversal L(R) R is described by a regular expression. We will construct this
expression by defining a reversal operation on regular expressions. Fix an
alphabet Σ and let (i) R = , (ii) ε R = ε , (iii) x R = x for any x ∈ Σ,
(iv) (R 0 ⌢ R 1 ) R = R 1 R ⌢ R 0 R , (v) (R 0 |R 1 ) R = R 0 R |R 1 R , and (vi) (R *) R =
(R R )*. (Note the relationship between (iv) and the prior exercise item.)
Now show that R R describes L(R) R . Hint: use induction on the length of the
regular expression R .
Section
IV.5 Languages that are not regular
The prior section gave a counting argument to show that there are languages that
are not regular. Now we produce a technique to show that specific languages are
not regular.
The idea is that, although Finite State machines are finite, they can handle
arbitrarily long inputs. This chapter’s first example, the power switch from
Example 1.1, has only two states but even if we toggle it hundreds of times, it still
keeps track of whether the switch is on or off. To handle these long inputs with
only a small number of states, a machine must revisit states, that is, it must loop.
Loops cause a pattern in what a machine accepts. The diagram shows a machine
that accepts aabbbc (it only shows some of the states, those that the machine
traverses in processing this input).
b
qi3 qi2
b a
q0 a
qi1 qi4 c
qi5
b
Besides aabbbc, this machine must also accept a(abb)2 bc because that string takes
the machine through the loop twice, and then to the accepting state. Likewise,
this machine accepts a(abb)3 bc and looping more times pumps out more accepted
strings.
220 Chapter IV. Automata
For contradiction, assume that it is regular. Then the Pumping Lemma says that L
has a pumping length, p .
Consider the string σ = (p )p . It is an element of L and its length is greater
than or equal to p so the Pumping Lemma applies. So σ decomposes into three
substrings σ = α ⌢β ⌢γ satisfying the conditions. Condition (1) is that the length of
the prefix α ⌢ β is less than or equal to p . Because of this condition we know that
both α and β are composed exclusively of open parentheses, (’s. Condition (2) is
that β is not the empty string, so it contains at least one (.
Condition (3) is that all of the strings αγ , α β 2γ , α β 3γ , . . . are members of L.
To get the desired contradiction, consider α β 2γ . Compared with σ = α βγ , this
string has an extra β , which adds at least one open parenthesis without adding any
balancing closed parentheses. In short, α β 2γ has more (’s than )’s. It is therefore
not a member of L. But the Pumping Lemma says it must be a member of L, and
therefore the assumption that L is regular is incorrect.
We have seen many examples of things that regular expressions and Finite State
machines can do. Here we see something that they cannot. Matching parentheses,
and other types of matching, is something that we often want to do, for instance,
in a compiler. So the Pumping Lemma helps us show that for some common
computing tasks, regular languages are not enough.
5.3 Example Recall that a palindrome is a string that reads the same backwards
5 5
as forwards, such
R as bab, abbaabba, or a ba . We will prove that the language
∗
L = {σ ∈ Σ σ = σ } of all palindromes over Σ = { a, b } is not regular.
For contradiction assume that this language is regular. The Pumping Lemma
says that L has a pumping length. Call it p . Consider σ = ap bap , which is an
element of L and has more than p characters. Thus it decomposes as σ = α βγ ,
subject to the three conditions. Condition (1) is that |α β | ≤ p and so both substrings
α and β are composed entirely of a’s. Condition (2) is that β is not the empty
string and so β consists of at least one a.
Consider the list from condition (3), αγ , α β 2γ , α β 3γ , ... We will get the desired
contradiction from the first element, αγ (the other list members also lead to a
contradiction but we only need one).
Compared to σ = α βγ , in αγ the β is gone. Because α and β consist entirely
of a’s, the substring γ got σ ’s b, and must also have the ap that follows it. So in
passing from σ = α βγ to αγ we’ve omitted at least one a before the b but none of
the a’s after it, and therefore αγ is not a palindrome. This contradicts the Pumping
Lemma’s third condition, that the strings in the list are all members of L.
5.4 Remark In that example σ has three parts, σ = ap ⌢ b ⌢ ap , and it decomposes
into three parts, σ = α ⌢ β ⌢γ . Don’t make the mistake of thinking that the two
decompositions match. The Pumping Lemma does not says that α = ap , β = b,
and γ = ap — indeed, as the example says the Pumping Lemma gives that β is
not the b part. Instead, the Pumping Lemma only says that the first two strings
together, α ⌢ β , consists exclusively of a’s. So it could be that α β = ap , or it could
222 Chapter IV. Automata
instead be that the γ starts with some a’s that are then followed by bap.
members have a number of 0’s that is one more than the number of 1’s. We will
prove that it is not regular.
For contradiction assume otherwise, that L is regular, and set p as its pumping
length. Consider σ = 0p+1 1p ∈ L. Because |σ | ≥ p , the Pumping Lemma gives a
decomposition σ = α βγ satisfying the three conditions. Condition (1) says that
|α β | ≤ p , so that the substrings α and β have only 0’s. Condition (2) says that β
has at least one character, necessarily a 0. Consider the list from Condition (3): αγ ,
α β 2γ , α β 3γ , . . . Compare its first entry, αγ , to σ (other entries would also yield
a contradiction). The string αγ has fewer 0’s then does σ but the same number
of 1’s. So the number of 0’s in αγ is not one more than its number of 1’s. Thus
αγ < L, which contradicts the Pumping Lemma.
We can interpret that example to say that Finite State machines cannot correctly
recognize a predecessor-successor relationship. We can also use the Pumping
Lemma to show Finite State machines cannot recognize other arithmetic relations.
5.6 Example The language L = { an n is a perfect square } = {ε, a, a4 , a9 , a16 , ... }
is not regular. For, suppose otherwise. Fix a pumping length p and consider
2
σ = a(p ) , so that |σ | = p 2 .
By the Pumping Lemma, σ decomposes into α βγ , subject to the three conditions.
Condition (1) is that |α β | ≤ p , which implies that |β | ≤ p . Condition (2) is that
0 < |β | . Now consider the strings αγ , α β 2γ , . . .
The gap between the length |σ | = |α βγ | and the length |α β 2γ | is at most p ,
because 0 < |β | ≤ p . But the definition of the language is that after σ the next
longest string has length (p + 1)2 = p 2 + 2p + 1, which is strictly greater than p .
Thus the length of α β 2γ is not a perfect square, which contradicts the Pumping
Lemma’s assertion that α β 2γ ∈ L.
Sometimes we can solve problems by using the Pumping Lemma in conjunction
with the closure properties of regular languages.
5.7 Example The language L = {σ ∈ { a, b }∗ σ has as many a’s as b’s } is not reg-
IV.5 Exercises
A useful technique when you are stuck on a language description is to try listing
five strings that are in the language and five that are not. Another is to describe
the language in prose, as though over a telephone. Both help you think through the
formalities.
✓ 5.8 Example 5.5 shows that { 0m 1n ∈ B∗ m = n + 1 } is not regular but your
friend doesn’t get it and asks you, “What’s wrong with the regular expression
0n+1 1n ?” Explain it to them.
5.9 Example 5.2 uses α β 2γ to show that the language of balanced parentheses is
not regular. Instead get the contradiction by showing that αγ is not a member of
the language.
5.10 Your friend has been thinking. They say, “Hey, the diagram just before
Theorem 5.1 doesn’t apply unless the language is infinite. Sometimes languages
are regular because they only have like three or four strings. So the Pumping
Lemma is wrong.” In what way do they need to further refine their thinking?
5.11 Someone in the class emails you, “If a language has string with length greater
than the number of states, which is the pumping length, then it cannot be a
regular language.” Correct?
✓ 5.12 For each, give five strings that are elements of the language and five that are
not, and then show that the language is not regular. (a) L0 = { an bm n + 2 = m }
(b) L1 = { an bm cn n, m ∈ N } (c) L2 = { an bm n < m }
✓ 5.13 Your study partner has read Remark 5.4 but it is still sinking in. About
the matched parentheses example, Example 5.2, they say, “So σ = (p )p , and
σ = α βγ . We know that α β consists only of a’s, so it must be that γ consists of
)’s.” Give them a goose.
5.14 In class someone asks, “Isn’t it true that languages don’t have a unique
pumping length? That if a length of p = 5 will do then p = 6 will also do?”
Before the prof answers, what do you think?
5.15 Show that the language over { a, b } consisting of strings having more a’s
than b’s is not regular.
✓ 5.16 For each language over Σ = { a, b } produce five strings that are mem-
bers. Then decide if that language is regular. You must prove each asser-
tion by either producing a regular expression
or using the Pumping Lemma.
(a) { an bm ∈ Σ∗ n = 3 } (b) { an bm ∈ Σ∗ n + 3 = m } (c) {α ⌢ α α ∈ Σ∗ }
5.17 One of these is regular and one is not. Which is which? You must prove your
assertions. (a) { an bm ∈ { a, b }∗ n = m 2 } (b) { an bm ∈ { a, b }∗ 3 < m, n }
✓ 5.18 Use the Pumping Lemma to prove that L = { am−1 cbm m ∈ N+ } is not
regular. It may help to first produce five strings from the language.
5.19 Is {σ ∈ B∗ σ = α βα R for α, β ∈ B∗ } regular? Either way, prove it.
b a a b
q0 a
qi1 a
qi4 qi5 a
qi8
b
Hint: the third condition’s sequence has a constant positive length difference.
5.28 Consider { ai bj ci ·j i, j ∈ N }. (a) Give five strings from this language.
5.30 For a regular language, a pumping length p is a number with the property
that every word of length p or more can be pumped, that is, can be decomposed
so that it satisfies the three properties of Theorem 5.1. The proof of that theorem
shows that where a Finite State machine recognizes the language, the number of
states in the machine suffices as a pumping length. But p can be smaller.
(a) Consider the language L described by (01)*. Construct a deterministic
Finite State machine with three states that recognizes this language.
(b) Show that the minimal pumping length for L is 1.
5.31 Nondeterministic Finite State machines can always be made to have a single
accepting state. For deterministic machines that is not so.
(a) Show that any deterministic Finite State machine that recognizes the finite
language L1 = {ε, a } must have at least two accepting states.
(b) Show that any deterministic Finite State machine that recognizes L2 =
{ε, a, aa } must have at least three accepting states.
(c) Show that for any n ∈ N there is a regular language that is not recognized
by any deterministic Finite State machine with at most n accepting states.
Section
IV.6 Minimization
Contrast these two Finite State machines. For each, the language of accepted
strings is {σ ∈ B∗ σ has at least one 0 and at least one 1 }.
q0 q2 1 q0 q2 q4 1
1 1 1
0 0 0 0 0
( ∗)
q1 q3 q1 q3 q5
1 1 1
0 0,1 0 0 0,1
Our experience from making machines is that in a properly designed machine the
states have a well-defined meaning. For instance, on the left q 2 means something
like, “have seen at least one 1 but still waiting for a 0.”
The machine on the right doesn’t satisfy this design principle because the
meaning of q 4 is the same as that of q 2 , and q 3 ’s meaning is the same as q 5 ’s. That
is, the two pairs of states have the same future. This machine has redundant states.
We will give an algorithm that starts with a Finite State machine and from
it finds the smallest machine that recognizes the same language. The algorithm
collapses together redundant states.
6.1 Definition In a Finite State machine over Σ, where n ∈ N we say that two
states q, q̂ are n -distinguishable if there is a string σ ∈ Σ∗ with |σ | ≤ n such that
starting the machine in state q and giving it input σ ends in an accepting state
while starting it in q̂ and giving it σ does not, or vice versa. Otherwise the states
226 Chapter IV. Automata
are n -indistinguishable, q ∼n q̂ .
Two states q, q̂ are distinguishable if there is an n for which they are n -
distinguishable. Otherwise they are indistinguishable, q ∼ q̂ .
6.2 Example Consider the machine on the left above. Starting it in state q 0 and feeding
it σ = 0 ends in the non-accepting state q 1 , while starting it in q 2 and processing
the same input ends in the accepting state q 3 . So q 0 and q 2 are 1-distinguishable,
and therefore are distinguishable.
Another example is that q 2 and q 3 are 0-distinguishable, via σ = ε . That is, a
state that is not accepting is 0-distinguishable from a state that is accepting.
6.3 Example More happens with the machine in the right. This table gives the result
of starting in each state and feeding the machine each length 0, length 1, and
length 2 string. As called for in the definition, the table doesn’t give the resulting
state but instead records whether it is accepting, F , or nonaccepting, Q − F .
ε 0 1 00 01 10 11
q0 Q −F Q −F Q −F Q −F F F Q −F
q1 Q −F Q −F F Q −F F F F
q2 Q −F F Q −F F F F Q −F
q3 F F F F F F F
q4 Q −F F Q −F F F F Q −F
q5 F F F F F F F
The effect of the length 0 string is that there are two kinds of states: members of
{q 0 , q 1 , q 2 , q 4 } are taken to nonaccepting resulting states and members of {q 3 , q 5 }
result in accepting states.
The length 1 strings split the machine’s states into four groups. For instance,
q 0 is 1-distinguishable from q 1 because the two result columns say Q − F , Q − F
for q 0 but say Q − F , F for q 1 . In total there are four 2-distinguishable sets of states,
{q 0 }, {q 1 }, {q 2 , q 4 }, and {q 3 , q 5 }.
The length 2 strings do not further divide the states; the relation of 2-
distinguishable gives the same four classes of states.
6.4 Lemma The ∼ relation and the ∼n relations are equivalences.
So consider again the machine with redundant states that we saw in (∗) above.
We use the following notation the equivalence classes, here for the two classes of
the ∼0 relation, the four of the ∼1 relation, and the four of the ∼2 relation.
n ∼n classes
0 E0, 0 = {q 0 , q 1 , q 2 , q 4 } E0, 1 = {q 3 , q 5 }
1 E1, 0 = {q 0 } E1, 1 = {q 1 } E1, 2 = {q 2 , q 4 } E1, 3 = {q 3 , q 5 }
2 E2, 0 = {q 0 } E2, 1 = {q 1 } E2, 2 = {q 2 , q 4 } E2, 3 = {q 3 , q 5 }
The states that we spotted by eye as redundant, q 2 , q 4 and q 3 , q 5 continue to be
together in the same class.
For the algorithm, consider how states q and q̂ could be n + 1-distinguishable but
not n -distinguishable. Let the length n + 1 string σ = ⟨s 0 , s 1 , ... sn−1 , sn ⟩ = τ ⌢sn
distinguishes them. Because the states are not n -distinguishable, where the prefix
τ brings the machine from q to a state r in some class En,i , then τ must bring the
machine from q̂ to some rˆ in the same class, En,i . So distinguishing between these
states must involve σ ’s final character sn taking r to a state in one class, En, j , and
taking rˆ to a state in another, En, jˆ.
Therefore, at each step we don’t need to test whole strings, we need only test
single characters, to see whether they split the equivalence classes, the En,i ’s.
For instance, consider again the machine on the right above, along with its
∼1 classes E1, 0 , E1, 1 , E1, 2 , and E1, 3 . To see if there is any additional splitting in
going to the ∼2 classes, instead of checking all the length 2 strings we see if the
members of E1, 2 and E2, 2 , and are sent to different ∼1 classes on being fed single
characters. (We need only test classes with more than one member because the
singleton classes cannot split.)
E1, 2 0 1 E2, 2 0 1
q2 E1, 3 E1, 2 q3 E1, 3 E1, 3
q4 E1, 3 E1, 2 q5 E1, 3 E1, 3
In both tables there is no split, because the right side of the rows are the same for
all the classes members. So we can stop.
The examples of this algorithm below show how to translate this into a minimal
machine, and add a table notation that simplifies the computation.
6.5 Example We will find a machine that recognizes the same language as this one
but that has a minimum number of states.
q2 a q4
a,b
b
b
q0 q5 a,b
a b a,b
q1 a
q3
To do bookkeeping we will use triangular tables like the one below. They have an
228 Chapter IV. Automata
entry for every two-element set {i, j } where i and j are indices of states and i , j .
Start by checkmarking the i, j entries where one of qi and q j is accepting while
the other is not.
0
✓ 1
✓ 2
✓ ✓ 3
✓ ✓ 4
✓ ✓ ✓ 5
These mark states that are 0-distinguishable and the blanks denote pairs of states
that are 0-indistinguishable. In short, here are the two ∼0 -equivalence classes.
E0, 0 = {q 0 , q 3 , q 4 } E0, 1 = {q 1 , q 2 , q 5 }
a b a b
q0, q3 q1, q5 q2, q5 E0, 1 , E0, 1 E0, 1 , E0, 1 0
q0, q4 q1, q5 q2, q5 E0, 1 , E0, 1 E0, 1 , E0, 1 ✓ 1
✓ 2
q3, q4 q5, q5 q5, q5 E0, 1 , E0, 1 E0, 1 , E0, 1
✓ ✓ 3
q1, q2 q3, q4 q4, q3 E0, 0 , E0, 0 E0, 0 , E0, 0 ✓ ✓ 4
✓ q1, q5 q3, q5 q4, q5 E0, 0 , E0, 1 E0, 0 , E0, 1 ✓ ✓ ✓ ✓ ✓ 5
✓ q2, q5 q4, q5 q3, q5 E0, 0 , E0, 1 E0, 0 , E0, 1
We have found that the states q 1 and q 2 are not 1-distinguishable, but that q 5 can
be 1-distinguished from q 1 and q 2 . In short, E0, 1 = {q 1 , q 2 , q 5 } splits into two
∼1 classes.
E1, 0 = {q 0 , q 3 , q 4 } E1, 1 = {q 1 , q 2 } E1, 2 = {q 5 }
We’ve updated the triangular table with marks at 1, 5 and 2, 5.
Iterate. The next iteration subdivides the ∼1 -equivalence classes, the E1,i ’s, to
compute the ∼2 -equivalence classes.
Section 6. Minimization 229
a b a b 0
✓ {q 0 , q 3 } {q 1 , q 5 } {q 2 , q 5 } E1, 1 , E1, 2 E1, 1 , E1, 2 ✓ 1
✓ 2
✓ {q 0 , q 4 } {q 1 , q 5 } {q 2 , q 5 } E1, 1 , E1, 2 E1, 1 , E1, 2
✓ ✓ ✓ 3
{q 3 , q 4 } {q 5 , q 5 } {q 5 , q 5 } E1, 2 , E1, 2 E1, 2 , E1, 2 ✓ ✓ ✓ 4
{q 1 , q 2 } {q 3 , q 4 } {q 4 , q 3 } E1, 0 , E1, 0 E1, 0 , E1, 0 ✓ ✓ ✓ ✓ ✓ 5
We have found that q 3 and q 4 are not 2-distinguishable, they are each distinguishable
from q 0 . The ∼1 class E1, 0 splits into two ∼2 classes.
The updated triangular table contains the same information since its only blanks
are at entries 1, 2 and 3, 4.
Once more through the iteration gives this.
0
a b a b ✓ 1
✓ 2
{q 1 , q 2 } {q 3 , q 4 } {q 4 , q 3 } E2, 2 , E2, 2 E2, 2 , E2, 2
✓ ✓ ✓ 3
{q 3 , q 4 } {q 5 , q 5 } {q 5 , q 5 } E2, 3 , E2, 3 E2, 3 , E2, 3 ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ ✓ 5
This shows the minimized machine, with r 0 as a name for E2, 0 and r 1 for E2, 1 , etc.
Its start state r 0 is the one containing q 0 . Its final states are the ones containing
final states of the original machine.
r0 a,b
r1 a,b
r2 a,b
r3 a,b
1
q2 q4 q5
1 1
0 0,1 0
230 Chapter IV. Automata
First, q 5 cannot be reached from the start state. Drop it. That leaves this initial
triangular table.
0
1
2
✓ ✓ ✓ 3
✓ ✓ ✓ 4
It gives these ∼0 classes, the non-final states and the final states.
E0, 0 = {q 0 , q 1 , q 2 } E0, 1 = {q 3 , q 4 }
0 1 0 1 0
✓ q0, q1 q1, q2 q2, q3 E0, 0 , E0, 0 E0, 0 , E0, 1 ✓ 1
✓ q0, q2 q1, q2 q2, q4 E0, 0 , E0, 0 E0, 0 , E0, 1 ✓ 2
q1, q2 q2, q2 q3, q4 E0, 0 , E0, 0 E0, 1 , E0, 1 ✓ ✓ ✓ 3
q3, q4 q3, q4 q3, q4 E0, 1 , E0, 1 E0, 1 , E0, 1 ✓ ✓ ✓ 4
The first row gets a check mark because on being fed a 1 the states q 0 and q 1 go
to resulting states, q 2 and q 3 , that are in different ∼0 classes. The same is true
for the second row. So q 0 is 1-distinguishable from q 1 and q 2 but they are not
1-distinguishable from each other. That is, E0, 0 = {q 0 , q 1 , q 2 } splits in two.
As earlier, the updated triangular table contains the same information, since it has
only two blank entries, 1, 2 and 3, 4.
On the next iteration no more splitting happens. The minimized machine has
three states.
0 0,1
s0 0,1
s1 1
s2
We will close this section with a proof that this algorithm returns a minimal
machine. For that, consider the drawing below. It has the input machine above the
output machine so we can imagine that its states project down onto the output
machine’s states with p(q 0 ) = r 0 , p(q 1 ) = p(q 2 ) = r 1 , p(q 3 ) = p(q 4 ) = r 2 , and
p(q 5 ) = r 3 .
q2 a q4
a,b
b
b
algorithm input MI : q0 q5 a,b
a b a,b
q1 a
q3
output MO : r0 a,b
r1 a,b
r2 a,b
r3 a,b
Section 6. Minimization 231
The point is that the arrows work — the algorithm groups together MI ’s states to
make MO ’s states in a way that respects the starting machine’s transitions.
The tables below make the same point. The left table is the transition function
of the starting machine, ∆MI . The right table groups the q ’s into r ’s, so it shows
∆MO . The states are grouped in a way that allows the transitions in MO to be
derived from the transitions in MI . For instance, q 1 and q 2 project to r 1 , and when
presented with an input a they each transition to a state (q 3 and q 4 respectively)
that projects to r 2 .
∆MI a b ∆MO a b
q0 q1 q2 r0 r1 r1
q1 q3 q4
r1 r2 r2
q2 q4 q3
q3 q5 q5
r2 r3 r3
q4 q5 q5
q5 q5 q5 r3 r3 r3
More precisely, the algorithm allows us to define E MO (p(q), x) = p( E MI (q, x) ) for
all q ∈ MI and x ∈ Σ.
6.7 Lemma The algorithm above returns a machine that recognizes the same language
as the input machine, L(MO ) = L(MI ), and from among all of the machine
recognizing the same language has the minimal number of states.
Proof We will argue that the algorithm halts for all input machines, that the
returned machine recognizes the same language, and that it has a minimal number
of states. The first is easy: the algorithm halts after a step where no class splits and
since these machines have only finitely many states, that step must appear.
The second holds because the transition function of the output machine respects
the transition function of the input. Start both machines on the same string, σ ∈ Σ∗ .
The machine MI starts in q 0 while MO starts in a state E0 that contains q 0 . The
first character of σ moves MI to a state q̂ and moves MO to a state that contains q̂ .
The processing proceeds in this way until the string runs out. Then MI is in a final
state if and only if MO is in a state that contains that final state, which is itself a
final state of MO . Thus the two machines accept the same set of strings.
For the third, let M̂ be a machine that recognizes the same language as MO .
We will show that it has at least as many states by giving an association, where
each state in MO is associated with at least one state in M̂ and never are different
states in MO associated with the same state in M̂.
Consider the union of the sets of states of the two machines (assume that they
have no states in common). We will follow the process above to find when two
states in this union are indistinguishable. As above, start by saying that two states
in the union are 0-indistinguishable if either both are final in their own machine or
neither is final. Step n + 1 of this process, also as above, begins with ∼n classes
En, 0 , ... En,k of states from the union that are n -indistinguishable. For each such
232 Chapter IV. Automata
class, see if it splits. That is, see if there are two states in that class that are sent by
a character x ∈ Σ to different ∼n classes. This gives the ∼n+1 classes. When we
reach a step with no splitting then we know which states are indistinguishable and
they form the ∼ classes.
Notice that the start states in the two machines are indistinguishable, are
in the same ∼ class, because L(MO ) = L(M̂). In addition, if two states are
indistinguishable then their successor states on any one input symbol x ∈ Σ are
also indistinguishable from each other, simply because if a string σ distinguishes
between the successors then x ⌢ σ distinguishes between the original two states.
In turn, the successors of these successors are indistinguishable, etc.
Now, say that states in MO and M̂ are associated if they are indistinguishable,
that is, if they are in the same ∼ class. We first show that every state q of MO
is associated with at least one state of M̂. Because MO is the output of the
minimization process, it has no inaccessible state. So there is a string that takes
the start state of MO to q . This string takes the start state of M̂ to some q̂ , and
the prior paragraph applies to give that q ∼ q̂ .
We finish by showing that there cannot be two different states of MO that are
both associated with the same state of M̂. If there were two such states then by
Lemma 6.4 they would be indistinguishable from each other. But that’s impossible
because MO is the output of the minimization process, which ensures all of its
states are distinguishable.
IV.6 Exercises
6.8 From the triangular table find the ∼i classes.
0
1
2
✓ ✓ ✓ 3
✓ ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ 5
6.9 From the ∼i classes find the associated triangular table. (a) Ei, 0 = {q 0 , q 1 },
Ei, 1 = {q 2 }, and Ei, 2 = {q 3 , q 4 }, (b) Ei, 0 = {q 0 }, Ei, 1 = {q 1 , q 2 , q 4 }, and
Ei, 2 = {q 3 }, (c) Ei, 0 = {q 0 , q 1 , q 5 }, Ei, 1 = {q 2 , q 3 }, and Ei, 2 = {q 4 },
✓ 6.10 Suppose that E0, 0 = {q 0 , q 1 , q 2 , q 5 } and E0, 1 = {q 3 , q 4 }, and from the
machine you compute this table.
a b a b
q0, q1 q1, q1 q2, q3 E0, 0 , E0, 0 E0, 0 , E0, 1
q0, q2 q1, q2 q2, q4 E0, 0 , E0, 0 E0, 0 , E0, 1
q0, q5 q1, q5 q2, q5 E0, 0 , E0, 0 E0, 0 , E0, 0
q1, q2 q1, q2 q3, q4 E0, 0 , E0, 0 E0, 1 , E0, 1
q1, q5 q1, q5 q3, q5 E0, 0 , E0, 0 E0, 1 , E0, 0
q2, q5 q2, q5 q4, q5 E0, 0 , E0, 0 E0, 1 , E0, 0
q3, q4 q3, q4 q5, q5 E0, 1 , E0, 1 E0, 0 , E0, 0
Section 6. Minimization 233
(a) Which lines of the table do you checkmark? (b) Give the resulting ∼1 classes.
✓ 6.11 This machine accepts strings with an odd parity, with an odd number of 1’s.
Minimize it, using the algorithm described in this section. Show your work.
0 0 0
1
q0 q1 q2
1
1
✓ 6.12 For many machines we can find the unreachable states by eye, but there is
an algorithm. It inputs a machine M and initializes the set of reachable states
to R 0 = {q 0 }. For n > 0, step n of the algorithm is: for each q ∈ R n find all
states q̂ reachable from q in one transition and add those to make Rn+1 . That
is, R n+1 = R n ∪ { q̂ = ∆M (q, x) q ∈ R n and x ∈ Σ }. The algorithm stops when
Rk = Rk +1 and the set of reachable states is R = Rk . The unreachable states are
the others, Q − R .
For each machine, perform this algorithm. Show the steps.
q3 q5
a,b
b
a,b
(a) a
a
a,b (b) a b
q0 q1 q2 q0 q1 a q2 q3 q4
a a, b
b
a,b b b
✓ 6.13 Perform the minimization algorithm on the machine with redundant states
at the start of this section, the one on the right in (∗) on page 225.
6.14 What happens when you minimize a machine that is already minimal?
✓ 6.15 This machine accepts strings described by (ab|ba)*. Minimize it, using the
algorithm of this section and showing the work.
a
q0 a
q1 q2 b q7
b
b a a b a a b
q3 q4 b q5 a q6
b
a,b
6.16 If a machine’s start state is accepting, must the minimized machine’s start
state be accepting? If you think “yes” then prove it, and if you think “no” then
give an example machine where it is false.
q1
0 1
0 0
q0 q2 q4 0,1
1
0
1 1
q3
6.18 Minimize this. Show the work, including producing the diagram of the
minimized machine.
a a
q1 q3
b
a b
q0 q5 a,b
b b
q2 q4
b
a a
q0 q1 0
0
0
1 0 1
1 q2 1 q3
6.21 What happens if you perform the minimization procedure in Example 6.6
without first omitting the unreachable state?
✓ 6.22 Minimize.
q0 a
q1 a
q2 a
q3 a
q4
b b b b a,b
Note that the algorithm takes, roughly, a number of steps that are equal to the
number of states in the machine.
Section 7. Pushdown machines 235
q1 a,b
a
q0 a
b
q2 b
Section
IV.7 Pushdown machines
No Finite State machine can recognize the language of balanced parentheses.
So this machine model is not powerful enough to use, for instance, if you want
to decide whether input strings are valid programs in a modern programming
language. To handle nested parentheses the natural data structure is a stack. We
will next see a machine type consisting of a Finite State machine with access to a
pushdown stack.
236 Chapter IV. Automata
Like a Turing machine tape, a stack is unbounded storage. But it has restrictions
that the tape does not. A stack doesn’t give random access to the elements. It is
like the restaurant dish dispenser below. When you pop a dish off, a spring pushes
the remaining dishes up so you can reach the next one. When you push a new dish
on, its weight compresses the spring so all the old dishes move down and the latest
dish is the only one that you can reach. We say that this stack is LIFO, Last-In,
First-Out.
Below on the right is a sequence of views of a stack data structure. First the
stack has two characters, g3 and g2. We push g1 on the stack, and then g0. Now,
although g1 is on the stack, we don’t have immediate access to it. To get at g1 we
must first pop off g0, as in the last stack shown.
g3 g1 g0 g1
g2 g3 g1 g3
g2 g3 g2
g2
We assume that the stack alphabet Γ contains the character that we use to mark
the stack bottom, ⊥.† The rest of Γ is g0, g1, etc. We also assume that the tape
alphabet Σ does not contain the blank, B, or the character ε .‡
†
Read that character aloud as “bottom.” ‡ The definition allows ε to appear in two separate places, as
the second component of ∆’s inputs and also as the empty string, from Γ ∗ . However, one of those is in
the inputs and the other is in the outputs so it isn’t ambiguous.
Section 7. Pushdown machines 237
The transition function describes how these machines act. For the input
⟨qi , s, дj ⟩ ∈ Q × (Σ ∪ { B, ε }) × Γ there are two cases. When the character s is
an element of Σ ∪ { B } then an instruction ∆(qi , s, дj ) = ⟨qk , γ ⟩ applies when the
machine is in state qi with the tape head reading s and with the character дj on
top of the stack. If there is no such instruction then the computation halts, with
the input string not accepted. If there is such an instruction then the machine does
this: (i) the read head moves one cell to the right, (ii) the machine pops дj off
the stack and pushes the characters of the string γ = ⟨дi 0 , ... дim ⟩ onto the stack
in the order from дim first to дi 0 last, and (iii) the machine enters state qk . The
other case for the input ⟨qi , s, дj ⟩ is when the character s is ε . Everything is the
same except that the tape head does not move. (We use this case to manipulate
the stack without consuming any input.)
As with Finite State machines, Pushdown machines don’t write to the tape but
only consume the tape characters. However, unlike Finite State machines they can
fail to halt, see Exercise 7.6.
The starting configuration has the machine in state q 0 , reading the first character
of σ ∈ Σ∗ , and with the stack containing only ⊥. A machine accepts its input σ if,
after starting in its starting configuration and after scanning all of σ , it eventually
enters an accepting state q ∈ F .
Notice that at each step the machine pops a character off the stack. If we want
to leave the stack unchanged then as part of the instruction we must push that
character back on. In addition, if the machine reaches a configuration where the
stack is empty then it will lock up and be unable to perform any more instructions.†
7.2 Example This grammar generates the language of balanced parentheses.
The Pumping Lemma shows that no Finite State machine recognizes this language.
But it is recognized by a Pushdown machine. This machine has states Q =
{q 0 , q 1 , q 2 } with accepting states F = {q 1 }, and languages Σ = { [, ] } and
Γ = { g0, ⊥ }. The table gives its transition function ∆, with the instructions
numbered for ease of reference.
Instr no Input Output
0 qo , [, ⊥ q 0 , ‘g0⊥’
1 qo , [, g0 q 0 , ‘g0g0’
2 qo , ], g0 q0, ε
3 qo , ], ⊥ q2, ε
4 qo , B, ⊥ q1, ε
It keeps a running tally of the number of [’s minus the number of ]’s, as the number
†
An alternative to the final state definition of acceptance we are using is to define that a machine
accepts its input if after consuming that input, it empties the stack. The definitions are equivalent in
that a string is accepted by either type of machine if it is accepted by the other.
238 Chapter IV. Automata
of g0’s on the stack. This computation starts with the input [[]][] and ends in an
accepting state.
Step Configuration
[ [ ] ] [ ] ⊥
0 q0
[ [ ] ] [ ] g0 ⊥
1 q0
[ [ ] ] [ ] g0 g0 ⊥
2 q0
[ [ ] ] [ ] g0 ⊥
3 q0
[ [ ] ] [ ] ⊥
4 q0
[ [ ] ] [ ] g0 ⊥
5 q0
[ [ ] ] [ ] ⊥
6 q0
[ [ ] ] [ ]
7 q1
7.3 Example Recall that a palindrome is a string that reads the same forwards and
backwards, σ = σ R . This language of palindromes uses a c character as a middle
marker.
When the Pushdown machine below is reading τ it pushes characters onto the
stack; g0 when it reads a and g1 when it reads b. That’s state q 0 . When the
machine hits the middle c, it reverses. It enters q 1 and starts popping; when
reading a it checks that the popped character is g0, and when reading b it checks
that what popped is g1. If the machine hits the stack bottom at the same moment
that the input runs out, then it goes into the accepting state q 3 .
Section 7. Pushdown machines 239
Step Configuration
b a c a b ⊥
0 q0
b a c a b g1 ⊥
1 q0
b a c a b g0 g1 ⊥
2 q0
b a c a b g0 g1 ⊥
3 q1
b a c a b g1 ⊥
4 q1
b a c a b ⊥
5 q1
b a c a b
6 q3
7.4 Remark Stack machines are often used in practice, particularly for running
hardware. Here is a ‘Hello World’ program in the PostScript printer language.
/Courier % name the font
20 selectfont % font size in points, 1/72 of an inch
72 500 moveto % position the cursor
(Hello world!) show % stroke the text
showpage % print the page
The interpreter pushes Courier on the stack, and then on the second line pushes
20 on the stack. It then executes selectfont, which pops two things off the stack
to set the font name and size. After that it moves the current point, and places the
text on the page. Finally, it draws that page to paper.
This language is quite efficient. But it is more suited to situations where the
code is written by a program, such as with a word processor or LATEX, than to
240 Chapter IV. Automata
P → ε | 0 | 1 | 0P0 | 1P1 LPAL = {ε, 0, 1, 00, 11, 000, 010, 101, 111, ... }
This language is not recognized by any Finite State machine, but it is recognized
by a Pushdown machine.
This machine has Q = {q 0 , q 1 , q 2 } with accepting states F = {q 2 }, and
languages Σ = B and Γ = { g0, g1, ⊥ }.
During its first phase it puts g0 on the stack when it reads the input 0 and
puts g1 on the stack when it reads 1. During the second phase, if it reads 0 then
it only proceeds if the popped stack character is g0 and if it reads 1 then it only
proceeds if it popped g1.
Step Configuration
0 1 1 0 ⊥
0 q0
0 1 1 0 g0 ⊥
1 q0
0 1 1 0 g1 g0 ⊥
2 q1
0 1 1 0 g0 ⊥
3 q1
0 1 1 0 ⊥
4 q2
Here is the machine accepting 01010 using instructions 0, 4, 12, 16, and 17.
Step Configuration
0 1 0 1 0 ⊥
0 q0
0 1 0 1 0 g0 ⊥
1 q0
0 1 0 1 0 g1 g0 ⊥
2 q0
0 1 0 1 0 g1 g0 ⊥
3 q1
0 1 0 1 0 g0 ⊥
4 q1
0 1 0 1 0 ⊥
5 q2
The nondeterminism is crucial. In the first example, after step 1 the machine is
in state q 0 , is reading a 1, and the character that will be popped off the stack is g0.
Both instructions 3 and 9 apply to that configuration. But, applying instruction 3
would not lead to the machine accepting the input string. The computation shown
instead applies instruction 9, going to state q 1 , whose intuitive meaning is that the
machine switches from pushing to popping.
We have given two mental models of nondeterminism. One is that the machine
guesses when to switch, and that for this even-length string making that switch
halfway through is the right guess. We say the string is accepted because there
exists a guess that is correct, that ends in acceptance. (That there exist incorrect
guesses is not relevant.)
242 Chapter IV. Automata
Taking the other view of nondeterminism omits guessing and instead sees
the computation as a tree. In one branch the machine applies instruction 3 and
in another it applies instruction 9. By definition, for this machine the string is
accepted because there is at least one accepting branch (the above table of the
sequence of configurations shows the tree’s accepting branch).
Input strings with odd length are different. In the language of guessing, the
machine needs to guess that it must switch from pushing to popping at the middle
character, but it must not push anything onto the stack since that thing would
never get popped off. Instead, when instruction 12 pops the top character g1 off
the stack, as all instructions do when they are executed, it immediately pushes it
back on. The net effect is that in this turn around from pushing to popping the
stack is unchanged.
Recall that deterministic Finite State machines can do any jobs that nondeter-
ministic ones can do. The palindrome result shows that for Pushdown machines the
situation is different. While nondeterministic Pushdown machines can recognize
the language of palindromes, that job cannot be done by deterministic Pushdown
machines. So for Pushdown machines, nondeterminism changes what can be done.
Intuitively, Pushdown machines are between Turing machines and Finite State
machines in that they have a kind of unbounded read/write memory, but it is
limited. We’ve proved that they are more powerful than Finite State machines
because they can recognize the language of balanced parentheses.
There is a relevant result that we will mention but not prove: there are jobs that
Turing machines can do but that no Pushdown machine can do. One is the decision
problem for the language {σ ⌢ σ σ ∈ B∗ }. The intuition is that this language
contains strings such as 1010, 10101010, etc. A Pushdown machine can push the
characters onto the stack, as it does for the language of balanced parentheses, but
then to check that the second half matches the first it would need to pop them off
in reverse order.†
The diagram below summarizes. The box encloses all languages of bitstrings,
all subsets of B∗ . The nested sets enclose those languages recognized by some
Finite State machine, or some Pushdown machine, etc.
†
Another way to tell that the set of languages recognized by an nondeterministic Pushdown machine
is a strict subset of the set of languages recognized by a Turing machine is to note that there is no
Halting Problem for Pushdown machines. We can write a program that inputs a string and a Pushdown
machine, and decides whether it is accepted. But of course we cannot write such a program for Turing
machines. Since the languages differ and since anything computed by a Pushdown machine can be
computed by a Turing machine, the languages of Pushdown machines must be a strict subset.
Section 7. Pushdown machines 243
IV.7 Exercises
✓ 7.6 Produce a Pushdown Automata that does not halt.
✓ 7.7 Produce a Pushdown machine to recognize each language over Σ = { a, b, c }.
(a) { an cb2n n ∈ N }
(b) { an cbn−1 n > 0 }
where the number of a’s before the b is the same as the number of c’s after it.
7.14 Find a grammar that generates the language {σ ⌢ b ⌢ σ R σ ∈ { a, b }∗ }.
7.17 Show that the language of all palindromes from Example 7.5 is not recognized
by any Finite State machine. Hint: you can use the Pumping Lemma.
7.18 Show that a string σ ∈ B∗ is a palindrome σ = σ R if and only if it is generated
by the grammar given in Example 7.5. Hint: Use induction in both directions.
7.19 Show that the set of pushdown automata is countable.
7.20 Show that any language recognized by a Pushdown machine is recognized
by some Turing machine.
7.21 There is a Pumping Lemma for Context Free languages: if L is Context Free
then it has a pumping length p ≥ 1 such that any σ ∈ L with |σ | ≥ p decomposes
into five parts σ = α ⌢ β ⌢ γ ⌢ δ ⌢ ζ subject to the conditions (i) |α βγ | ≤ p ,
(ii) |βδ | ≥ 1, and (iii) α β nγ δ n ζ ∈ L for all n ∈ N.
(a) Use it to show that { an bn cn n ∈ N } is not Context Free.
(b) Show that {σ 2 σ ∈ B∗ } is not Context Free.
7.22 For both Turing machines and Finite State machines, after we gave an
informal description of how they act we supplemented that with a formal one.
Supply that for Pushdown machines.
(a) Define a configuration.
(b) Define the meaning of the yields symbol ⊢ and a transition step.
(c) Define when a machine accepts a string.
Extra
IV.A Regular expressions in the wild
Regular expressions are often used in practice. For instance, imagine that you
need to search a web server log for the names of all the PDF’s downloaded from a
subdirectory. A user on a Unix-derived system might type this.
The grep utility looks through the file line by line, and if a line matches the
pattern then grep prints that line. That pattern, starting with the subdirectory
/linearalgebra/, is an extended regular expression.
That is, in practice we often need text operations, and regular expressions are
an important tool. Modern programming languages such as Python and Scheme
include capabilities for extended regular expressions, sometimes called regexes,
that go beyond the small-scale theory examples we saw earlier. These extensions
fall into two categories. The first is convenience constructs that make easier
something that would otherwise be doable, but awkward. The second is that some
of the extensions to regular expressions in modern programming languages go
beyond mere abbreviations. More on this later.
Extra A. Regular expressions in the wild 245
(|0|1)\d:\d\d\s(am|pm)
Recall that in the regular expression a(b|c)d the parentheses and the pipe
†
The digits are contiguous in ASCII and their descendents are contiguous in Unicode. ‡ Programming
languages in practice by default have the dot match any character except newline. In addition, they
have a way to make it also match newline.
246 Chapter IV. Automata
are not there to be matched. They are metacharacters, part of the syntax of the
regular expression. Once we expand the alphabet Σ to include all characters, we
run into the problem that we are already using some of the additional characters
as metacharacters.
To match a metacharacter prefix it with a back-
slash, ‘\’. Thus, to look for the string ‘(Note’ put a
backslash before the open parentheses \(Note. Sim-
ilarly, \| matches a pipe and \[ matches an open
square bracket. Match backslash itself with \\. This
is called escaping the metacharacter. The scheme de- Courtesy xkcd.com
scribed above for representing lists with \d, \D, etc is
an extension of escaping.
Operator precedence is: repetition binds most strongly, then concatenation,
and then alternation (force different meanings with parentheses). Thus, ab* is
equivalent to a(b*), and ab|cd is equivalent to (ab)|(cd).
Quantifiers In the theoretical cases we saw earlier, to match ‘at most one a’ we
used ε |a. In practice we can write something like (|a), as we did above for the
twelve hour times. But depicting the empty string by just putting nothing there
can be confusing. Modern languages make question mark a metacharacter and
allow you to write a? for ‘at most one a’.
For ‘at least one a’ modern languages use a+, so the plus sign is another
metacharacter. More generally, we often want to specify quantities. For instance, to
match five a’s extended regular expressions use the curly braces as metacharacters,
with a{5}. Match between two and five of them with a{2,5} and match at least
two with a{2,}. Thus, a+ is shorthand for a{1,}.
As earlier, to match any of these metacharacters you must escape them. For
instance, To be or not to be\? matches the famous question.
Cookbook All of the extensions to regular expressions that we are seeing are
driven by the desires of working programmers. Here is a pile of examples showing
them accomplishing practical work, matching things you’d want to match.
A.4 Example US postal codes, called ZIP codes, are five digits. We can match them
with \d{5}.
A.5 Example North American phone numbers match \d{3} \d{3}-\d{4}.
A.8 Example A C language identifier begins with an ASCII letter or underscore and
then can have arbitrarily many more letters, digits, or underscores: [a-zA-Z_]\w*.
A.9 Example Match a user name of between three and twelve letters, digits, under-
scores, or periods with [\w\.]{3,12}. Use .{8,} to match a password that is at
least eight characters long.
A.10 Example Match a valid username on Reddit: [\w-]{3,20}. The hyphen, because
it comes last in the square brackets, matches itself. And no, Reddit does not allow
a period in a username.
A.11 Example For email addresses, \S+@\S+ is a commonly used extended expression.†
A.12 Example Match the text inside a single set of parentheses with \([^()]*\).
A.13 Example This matches a URL, a web address such as http://joshua.smcvt.
edu/computing. This regex is more intricate than prior ones so it deserves some
explanation. It is based on breaking URL’s into three parts: a scheme such as http
followed by a colon and two forward slashes a host such as joshua.smcvt.edu,
and a path such as /computing (the standard also allows a query string that follows
a question mark but this regex does not handle those).
(https?|ftp)://([^\s/?\.#]+\.?){1,4}(/[^\s]*)?
Notice the https?, so the scheme can be http or https, as well as ftp. After
a colon and two forward slashes comes the host part, consisting of some fields
separated by periods. We allow almost any character in those fields, except for
a space, a question mark, a period or a hash. At the end comes a path. The
specification allows paths to be case sensitive but the regex here has only lower
case.
But wait! there’s more! You can also match the start of a line and end of line
with the metacharacters caret ‘^’ and dollar sign ‘$’.
A.14 Example Match lines starting with ‘ Theorem’ using ^Theorem. Match lines ending
with end{equation*} using end{equation\*}$.
The regex engines in modern languages let you specify that the match is case
insensitive (although they differ in the syntax).
A.15 Example An HTML document tag for an image, such as <img src="logo.jpg">,
uses either of the keys src or img to give the name of the file containing the
image that will be served. Those strings can be in upper case or lower case, or
any mix. Racket uses a ‘?i:’ syntax to mark part of the regex as insensitive:
\\s+(?i:(img|src))= (note also the double backslash, which is how Racket
escapes the ‘s’).
†
This is naive in that there are elaborate rules for the syntax of email addresses (see below). But it is a
reasonable sanity check.
248 Chapter IV. Automata
Beyond convenience The regular expression engines that come with recent
programming languages have capabilities beyond matching only those languages
that recognized by Finite State machines.
A.16 Example The web document language HTML uses tags such as <b>boldface
text</b> and <i>italicized text</i>. Matching any one is straightforward,
for instance <b>[^<]*</b>. But for a single expression that matches them all you
would seem to have to do each as a separate case and then combine cases with a
pipe. However, instead we can have the system remember what it finds at the start
and look for that again at the end. Thus, Racket’s regex <([^>]+)>[^<]*</\\1>
matches HTML tags like the ones given. Its second character is an open parenthesis,
and the \\1 refers to everything between that open parenthesis and the matching
close parenthesis. (As you might guess from the 1, you can also have a second
match with \\2, etc.)
That is a back reference. It is very convenient. However, it gives extended
regular expressions more power than the theoretical regular expressions that we
studied earlier.
A.17 Example This is the language of squares over Σ = { a, b }.
L = {σ ∈ Σ∗ | σ = τ ⌢τ for some τ ∈ Σ∗ }
Some members are aabaab, baaabaaa, and aa. The Pumping Lemma shows that
the language of squares is not regular; see Exercise A.35. Describe this language
with the regex (.+)\1; note the back-reference.
Downsides Regular expressions are powerful tools, and this goes double for
enhanced regexes. As illustrated by the examples above, some of their uses are: to
validate usernames, to search text files, and to filter results. But they can come
with costs also.
For instance, the regular expression
for twelve hour time from Example A.3
(ε |0|1)\d:\d\d\s(am|pm) does indeed
match ‘8:05 am’ and ‘10:15 pm’ but it falls
short in some respects. One is that it
requires am or pm at the end, but times are
often are given without them. We could
change the ending to (ε |\s am|\s pm), Courtesy xkcd.com
which is a bit more complex but does solve
the issue.
Another issue is that it also matches some strings that you don’t want, such as
13:00 am or 9:61 pm. We can solve this as with the prior paragraph, by listing the
cases.†
†
Some substrings are elided so it fits in the margins, .
Extra A. Regular expressions in the wild 249
And, even if you do have an address that fits the standard, you don’t know if there
is an email server listening at that address.
At this point regular expressions may be starting to seem a little less like a
fast and neat problem-solver and a little more like a potential development and
maintenance problem. The full story is that sometimes a regular expression is just
what you need for a quick job, and sometimes they are good for more complex
tasks also. But some of the time the cost of complexity outweighs the gain in
expressiveness. This power/complexity tradeoff is often referred to online by citing
this quote from J Zawinski.
IV.A Exercises
✓ A.18 Which of the strings matches the regex ab+c? (a) abc (b) ac (c) abbb
(d) bbc
A.19 Which of the strings matches the regex [a-z]+[\.\? !]? (a) battle!
(b) Hot (c) green (d) swamping. (e) jump up. (f) undulate? (g) is.?
✓ A.20 Give an extended regular expression for each. (a) Match a string that
has ab followed by zero or more c’s, (b) ab followed by one or more c’s,
(c) ab followed by zero or one c, (d) ab followed by two c’s, (e) ab followed
by between two and five c’s, (f) ab followed by two or more c’s, (g) a followed
by either b or c.
250 Chapter IV. Automata
✓ A.21 Give an extended regular expression to accept a string for each description.
(a) Containing the substring abe.
(b) Containing only upper and lower case ASCII letters and digits.
(c) Containing a string of between one and three digits.
A.22 Give an extended regular expression to accept a string for each description.
Take the English vowels to be a, e, i, o, and u.
(a) Starting with a vowel and containing the substring bc.
(b) Starting with a vowel and containing the substring abc.
(c) Containing the five vowels in ascending order.
(d) Containing the five vowels.
A.23 Give an extended regular expression matching strings that contain an open
square bracket and an open curly brace.
✓ A.24 Every lot of land in New York City is denoted by a string of digits called BBL,
for Borough (one digit), Block (five digits), and Lot (four digits). Give a regex.
✓ A.25 Example A.5 gives a regex for North American phone numbers.
(a) They are sometimes written with parentheses around the area code. Extend
the regex to cover this case.
(b) Sometimes phone numbers do not include the area code. Extend to cover
this also.
A.26 Most operating systems come with file that has a list of words, which
can be used for spell-checking, etc. For instance, on Linux it may be at
/usr/share/dict/words but in any event you can find it by running locate
words | grep dict. Use that file to find how many words fit the criteria.
(a) contains the letter a (b) starts with A (c) contains a or A (d) contains X
(e) contains x or X (f) contains the string st (g) contains the string ing
(h) contains an a, and later a b (i) contains none of the usual vowels a, e, i, o or u
(j) contains all the usual vowels (k) contains all the usual vowels, in ascending
order
✓ A.27 Give a regex to accept time in a 24 hour format. It should match times of the
form ‘hh:mm:ss.sss’ or ‘hh:mm:ss’ or ‘hh:mm’ or ‘hh’.
A.28 Give a regex describing a floating point number.
✓ A.29 Give a suitable extended regular expression.
(a) All Visa card numbers start with a 4. New cards have 16 digits. Old cards
have 13
(b) MasterCard numbers either start with 51 through 55, or with the numbers
2221 through 2720. All have 16 digits.
(c) American Express card numbers start with 34 or 37 and have 15 digits.
✓ A.30 Postal codes in the United Kingdom have six possible formats. They are:
(i) A11 1AA, (ii) A1 1AA, (iii) A1A 1AA, (iv) AA11 1AA, (v) AA1 1AA, and (vi) AA1A
1AA, where A stands for a capital ASCII letter and 1 stands for a digit.
(a) Give a regex.
Extra A. Regular expressions in the wild 251
✓ A.31 You are stuck on a crossword puzzle. You know that the first letter (of eight)
is an g, the third is an n and the seventh is an i. You have access to a file that
contains all English words, each on its own line. Give a suitable regex.
A.32 In the Downsides discussion of Example A.3, we change the ending to (ε |\s
am|\s pm). Why not \s(ε |am|pm), which factors out the whitespace?
A.33 Give an extended regular expression that matches no string.
✓ A.34 The Roman numerals taught in grade school use the letters I, V, X, L, C,
D, and M to represent 1, 5, 10, 50, 100, 500, and 1000. They are written in
descending order of magnitude, from M to I, and are written greedily so that we
don’t write six I’s but rather VI. Thus, the date written on the book held by the
Statue of Liberty is MDCCLXXVI, for 1776. Further, we replace IIII with IV, and
replace VIIII with IX. Give a regular expression for valid Roman numerals less
than 5000.
L = {σ ∈ Σ∗ σ = τ ⌢τ for some τ ∈ Σ∗ }
A.36 Consider L = { 0n 10n n > 0 }. (a) Show that it is not regular. (b) Find a
regex.
A.37 In regex golf you are given two lists and must produce a regex that matches
all the words in the first list but none of the words in the second. The ‘golf’ aspect
is that the person who finds the shortest regex, the one with the fewest characters,
wins. Try these: accept the words in the first list and not the words in the second.
(a) Accept: Arthur, Ester, le Seur, Silverter
Do not accept: Bruble, Jones, Pappas, Trent, Zikle
(b) Accept: alight, bright, kite, mite, tickle
Do not accept: buffing, curt, penny, tart
(c) Accept: afoot, catfoot, dogfoot, fanfoot, foody, foolery, foolish, fooster,
footage, foothot, footle, footpad, footway, hotfoot, jawfoot, mafoo, nonfood,
padfoot, prefool, sfoot, unfool
Do not accept: Atlas, Aymoro, Iberic, Mahran, Ormazd, Silipan, altared,
chandoo, crenel, crooked, fardo, folksy, forest, hebamic, idgah, manlike,
marly, palazzi, sixfold, tarrock, unfold
A.38 In a regex crossword each row and column has a regular expression. You
have to find strings for those rows and columns that meet the constraints.
252 Chapter IV. Automata
(AB|OE|SK)
[^SPEAK]+
(A|B|C)\1
EP|IP|EF
(a) (b)
HE|LL|O+ .*M?O.*
[PLEASE]+ (AN|FE|BE)
Extra
IV.B The Myhill-Nerode Theorem
We defined regular languages in terms of Finite State machines. Here we will give
a characterization that does not depend on that.
This Finite state machine accepts strings that end in ab.
b
q0 a
q1 b q2
a
b a
Consider other strings over Σ = { a, b }, not just the accepted ones, and see where
they bring the machine.
The collection of all strings Σ∗ , pictured below, breaks into three sets, those that
bring the machine to q 0 , those that bring the machine to q 1 , and those that bring
the machine to q 2 .
B.1 Definition Let M be a Finite State machine with alphabet Σ. Two strings
σ0 , σ1 ∈ Σ∗ are M-related if starting the machine with input σ0 ends with it in
the same state as does starting the machine with input σ1 .
B.2 Lemma The binary relation of M-related is an equivalence, and so partitions the
collection of all strings Σ∗ into equivalence classes.
Proof We must show that the relation is reflexive, symmetric, and transitive.
Reflexivity, that any input string σ brings the machine to the same state as itself, is
obvious. So is symmetry, that if σ0 brings the machine to the same state as σ1 then
Extra B. The Myhill-Nerode Theorem 253
B.5 Example Let L be the set {σ ∈ B∗ σ has an even number of 1’s }. We can find
the parts of the partition. If two strings σ0 , σ1 both have an even number of 1’s
then they are L-related. That’s because for any τ ∈ B∗ , if τ has an even number of
1’s then σ0 ⌢τ ∈ L and σ0 ⌢τ ∈ L, while if τ has an odd number of 1’s then the
concatenations will not be members of L. Similarly, two strings both have an odd
number of 1’s then they are L-related. So the relationship ∼L gives rise to this
partition of B∗ .
EL, 0 = {ε, 0, 00, 11, 000, 011, 101, 110, ... }EL, 1 = { 1, 01, 10, 001, 010, ... }
B.6 Example Let L be {σ ∈ { a, b }∗ σ has the same number of a’s as b’s }. Then two
members of L, two strings σ0 , σ1 ∈ Σ∗ with the same number of a’s as b’s, are
L-related. This is because for any suffix τ , the string σ0 ⌢τ is an element of L if
and only if σ1 ⌢τ is an element of L, which happens if and only if τ has the same
number of a’s as b’s.
Similarly, two strings σ0 , σ1 such that the number of a’s is one more than the
number of b’s are L-related because for any suffix τ , the string σ0 ⌢τ is an element
of L if and only if σ1 ⌢τ is an element of L, namely if and only if τ has one fewer a
than b.
Following this reasoning, ∼L partitions { a, b }∗ into the infinitely many parts
∗
EL,i = {σ ∈ { a, b } the number of a’s minus the number of b’s equals i }, where
i ∈ Z.
254 Chapter IV. Automata
a,b
q1 q3
a,b
a
q0 a,b
b
q2 q4
a,b
We will compare the partitions induced by the two relations introduced above.
The M-related relation breaks { a, b }∗ into five parts, one for each state (since
each state in M is reachable).
EM, 0 = {ε }
EM, 1 = { a, aaa, aab, aba, abb, aaaaa, aaaab, ... }
EM, 2 = { b, baa, bab, bba, bbb, baaaa, baaab, ... }
EM, 3 = { aa, ab, aaaa, aaab, aaba, aabb, abaa, abab, abba, abbb, aaaaaa, ... }
EM, 4 = { ba, bb, baaa, baab, baba, babb, bbaa, bbab, bbba, bbbb, baaaaa, ... }
Verify this by noting that if two strings are in EL, 0 then adding a suffix τ will result
in a string that is a member of L if and only if the length of τ is even, and the same
reasoning holds for EL, 1 and odd-length τ ’s.
The sketch below shows the universe of strings { a, b }∗ , partitioned in two ways.
There are two L-related parts, the left and right halves. The five M-related parts
are subsets of the L-related parts.
EM, 0
EM, 1
ELL, 0 EM, 3 ELL, 1
EM, 4 EM, 2
That is, the M-related partition is finer than the L-related partition (‘fine’ in the
sense that sand is finer than gravel).
B.8 Lemma Let M be a Finite State machine that recognizes L. If two strings are
M-related then they are L-related.
Proof Assume that σ0 and σ1 are M-related, so that starting M with input σ0
causes it to end in the same state as starting it with input σ1 . Thus for any suffix τ ,
giving M the input σ0 ⌢τ causes it to end in the same state as does the input σ1 ⌢τ .
In particular, σ0 ⌢τ takes M to a final state if and only if σ1 ⌢τ does. So the two
Extra B. The Myhill-Nerode Theorem 255
B.10 Example We will milk Example B.7 for another observation. Take a string σ from
EM, 1 and append an a. The result σ ⌢ a is a member of EM, 3 , simply because if the
machine is in state q 1 and it receives an a then it moves to state q 3 . Likewise, if
σ ∈ EM, 4 , then σ ⌢ b is a member of EM, 2 . If adding the alphabet character x ∈ Σ
to one string σ from EL,i results in a string σ ⌢ x from EL, j then the same will
happen for any string from EL,i .
In this example we see that’s true because the EM ’s are contained in the EL ’s.
The key step of the next result is to find it even in a context where there is no
machine.
B.11 Theorem (Myhill-Nerode) A language L is regular if and only if the relation
∼L has only finitely many equivalence classes.
Proof One direction is easy. Suppose that L is a regular language. Then it is
recognized by a Finite State machine M. By Lemma B.8 the number of element
in the partition induced by ∼L is finite because the number of elements in the
partition associated with being M-related is finite, as there is one part for each of
M’s reachable states.
For the other direction suppose that the number of elements in the partition
associated with being L-related is finite. We will show that L is regular by
producing a Finite State machine that recognizes L.
The machine’s states are the partition’s elements, the EL,i ’s. That is, si is EL,i .
The start state is the part containing the empty string ε . A state is final if that part
contains strings from the language L (Lemma B.9 (2) says that each part contains
either no strings from L or consists entirely of strings from L).
The transition function is: for any state si = EL,i and alphabet element x ,
compute the next state ∆(si , x) by starting with any string in that part σ ∈ EL,i ,
appending the character to get a new string σ̂ = σ ⌢x , and then finding the part
containing that string, the EL, j such that σ̂ ∈ EL, j . Then ∆(si , x) = s j .
We must verify that this transition function is well-defined. That is, the
definition of ∆(si , x) as given potentially depends on which string σ you choose
256 Chapter IV. Automata
from si = EL,i , and we must check that choosing a different string cannot lead to
a different resulting part. This follows from (1) in Lemma B.9: take two starting
strings from the same part σ0 , σ1 ∈ EL,i and make a common extension by the
one-element string β = ⟨x⟩ so the results are in the same part σ0 ∼L σ1 .
Here is an equivalent way to describe the next-state function that is illuminating.
Recall that we write the part containing σ as Jσ K. Then the definition of the
transition function for the machine under construction is ∆(Jσ K, x) = Jσ ⌢x K. With
that, a simple induction shows that the extended transition function in the new
ˆ
machine is ∆(α) = Jα K.
Finally, we must verify that the language recognized by this machine is L. For
any string σ ∈ Σ∗ , starting this machine with σ as input will cause the machine to
end in the partition containing σ ; this is what the prior paragraph says. This string
will be accepted by this machine if and only if σ ∈ L.
IV.B Exercises
✓ B.12 Find the L equivalence classes for each regular set. The alphabet is Σ =
{ a, b }.
(a) L0 = { an b n ∈ N }
(b) L1 = { a2 bn n ∈ N }
✓ B.13 For each language describe the L equivalence classes. The alphabet is B.
(a) The set of strings ending in 01
(b) The set of strings where every 0 is immediately followed by two 1’s
(c) The set of string with the substring 0110
(d) The set of strings without the substring 0110
✓ B.14 The language of palindromes L = {σ ∈ a, b∗ σ R = σ } is not regular. Find
is not regular.
Part Three
Computational Complexity
Chapter
V Computational Complexity
In the first part of this book we asked what can be done with a mechanism at
all. This mirrors the history: when the Theory of Computing began there were
no physical computers. Researchers were driven by considerations such as the
Entscheidungsproblem. The subject was interesting, the questions compelling, and
there were plenty of problems, but the initial phase had a theory-only feel.
A natural next step is to look to do jobs efficiently. When physical computers
became widely available, that’s exactly what happened. Today, the Theory of
Computing has incorporated many questions that at least originate in applied fields,
and that need answers that are feasible.
We start by reviewing how we measure the practicality of algorithms, the orders
of growth of functions. Then we will see a collection of the kinds of problems
that drive the field today. By the end of this chapter we will be at the research
frontier, and we will state some things without proof as well as discuss some things
about which which we are not sure. In particular, we will consider the celebrated
question of P versus NP.
Section
V.1 Big O
We begin by reviewing the definition of the order of growth of functions. We will
study this because of its relationship with how algorithms consume computational
resources.
First, we illustrate with an anecdote. Here is a grade school multiplication.
678
× 42
1356
2712
28476
The algorithm combines each digit of the multiplier 42 with each digit of the
multiplicand 678, in a nested loop. A person could think that this is the right
way to compute multiplication — indeed, the only way — and that in general to
multiply of two n digit numbers requires about n 2 -many operations.
Image: Striders can walk on water because they are five orders of magnitude smaller than us. This
change of scale changes the world — bugs see surface tension as more important than gravity. Similarly,
changing an algorithm from taking n 2 time to taking time that is n · lg n can make some things easy
that were previously simply not practical.
260 Chapter V. Computational Complexity
50
√
n
500 1000
√
However, for large n the value n is much bigger than 10 lg(n). For instance,
√
1 000 000 = 1 000 while 10 lg(1 000 000) ≈ 199.32.
√
1000 n
500
10 lg(n)
500000 1000000
So the first criteria is that big O’s definition must focus on what happens in the
long run.
†
See the Theory of Computing blog feed at http://cstheory-feed.org/ (Various authors 2017).
‡
Recall that lg(n) = log2 (n). That is, compute lg(n) by starting with n and then finding the power of 2
that produces it, so if n = 8 then lg(n) = 3 and if n = 10 then lg(n) ≈ 3 . 32.
Section 1. Big O 261
500 f
400
300 д
200
10
100 5 f /д
1
5 10 15 20 5 10 15 20
On the left a person’s eye is struck that n 2 + 5n + 6 is ahead of n 2 . But on the right
the ratios show that this is misleading. For large inputs, f ’s 5n and 6 are swamped
by the highest order term, the n 2 . Consequently these two track together — by far
the biggest factor in the behavior of these two is that they are both quadratic —
and their long run behavior is basically the same.
1.2 Example Next compare the quadratic f (n) = n 2 + 5n + 6 with the cubic д(x) =
n 3 + 2n + 3. In contrast to the the prior example, these two don’t track together.
Initially f is larger, with f (0) = 6 > д(0) = 3 and f (1) = 12 > д(1) = 6. But soon
enough the cubic accelerates ahead of the quadratic, so much that at the scale of
the graph, the values of f don’t rise much above the axis.
д
10000
40 д/f
30
20
f 10
10 20 5 10 15 20
On the right side, the graph underlines that д races ahead of f because the ratios
†
These graphs are discrete; they picture functions of natural numbers, not of real numbers.√This is
because the performance functions take inputs that are natural numbers. The earlier graphs of n and
10 lg n are also discrete but they have so many dots that they appear to be continuous.
262 Chapter V. Computational Complexity
grow without bound. So д is a faster-growing function than f . In the long run they
both go to infinity, but in a sense, д goes there faster.
1.3 Example Finally, compare the quadratics f (x) = 2n 2 + 3n+ 4 and д(n) = n 2 + 5n+ 6.
We’ve already seen that the function comparison definition needs to discount the
initial behavior that f (0) = 4 < д(0) = 6 and f (1) = 9 < д(1) = 12, and instead
focus on the long run.
1000
f
500 5
д
f /д
2
5 10 15 20 10 20
This example differs from Example 1.1 in that in the long run, f stays ahead
of д, and gains in an absolute sense, because of the 2 in f ’s dominant term 2n 2 ,
compared with д’s n 2. So it may appear that we should count д as less than f .
However, unlike in Example 1.2, f does not accelerate away. Instead, the ratio
between the two is bounded. For O, we will consider that д’s growth rate is
equivalent to f ’s.
1.4 Example We close the motivation with a very important example. Let the function
bits : N → N give the number of bits needed to represent its input in binary. The
bottom line of this table gives lg(n), the power of 2 that equals n .
Input n 0 1 2 3 4 5 6 7 8 9
Binary 0 1 10 11 100 101 110 111 1000 1001
bits(n) 1 1 2 2 3 3 3 3 4 4
lg(n) – 0 1 1.58 2 2.32 2.58 2.81 3 3.17
5 10 15 20 25 30
10 20 30 40 50 60 70 80 90 100
Over the long run the ‘+1’ does not matter much and the floor does not matter much
(and is algebraically awkward). A reasonable summary is that the function giving
the number of bits required to express a number n is the base 2 logarithm, lg n .
Example 1.2 notes that the function comparison definition given below disregards
constant multiplicative factors. The formula for converting among logarithmic
functions with different bases, logc (x) = (1/logb (c))· logb (x), shows that they differ
only by a constant factor, 1/logb (c). So even the base does not matter; another
reasonable summary is that the number of bits is “a” logarithmic function.
Definition Machine resource sizes, such as the number of bits of the input and of
memory, are natural numbers and so to describe the performance of algorithms we
may think to focus on functions that input and output natural numbers. However,
above we have already found useful a function, lg, that inputs and outputs reals.
So instead we will consider a subset of the real functions.†
1.5 Definition A complexity function f is one that inputs real number arguments
and outputs real number values, and (1) it has an unbounded domain in that there
is a number N ∈ R+ such that x ≥ N implies that f (x) is defined, and (2) it is
eventually nonnegative in that there is a number M ∈ R+ so that x ≥ M implies
that f (x) ≥ 0.
1.7 Remarks (1) Read O(д) aloud as “Big-O of д.” We use ‘O’ because this is about
the order of growth. (2) The term ‘complexity function’ is not standard. We will
find it convenient for stating the results below. (3) We may say ‘x 2 + 5x + 6 is
O(x 2 )’ instead of ‘ f is O(д) where f (x) = x 2 + 5x + 6 and д(x) = x 2 ’. (4) The
‘ f = O(д)’ notation is very common, but is awkward in that it does not follow the
usual rules of equality. For instance f = O(д) does not allow us to write ‘O(д) = f ’.
Another awkwardness is that x = O(x 2 ) and x 2 = O(x 2 ) together do not imply
that x = x 2 . (5) Some authors allow negative real outputs and write the inequality
with absolute values, f (x) ≤ C · |д(x)| . (6) Sometimes you see ‘ f is O(д)’ stated
as ‘ f (x) is O(д(x))’. Speaking strictly, this is wrong because f (x) and д(x) are
numbers, not functions.
†
Using real functions has the disadvantage that natural number functions such as n ! can seem to be
left out. One way to deal with this is to extend these to take real number arguments, for instance,
extending the factorial to ⌊x ⌋ !, whose domain is the set of nonnegative reals. In addition, we shall be
pragmatic, so that when working with natural number functions is easiest then we shall do that.
264 Chapter V. Computational Complexity
д д
f
f
N N
To apply the definition, we must produce suitable N and C , and verify that
they work.
1.8 Example Let f (x) = x 2 and д(x) = x 3 . Then f is O(д), as witnessed by N = 2
and C = 1. The verification is: x > N = 2 implies that д(x) = x 3 = x · x 2 is greater
than 2 · x 2 , which in turn is greater than x 2 = C · f (x) = 1 · f (x).
If f (x) = 5x 2 and д(x) = x 4 then to show f is O(д) take N = 2 and C = 2.
The verification is that x > N = 2 implies that C · x 4 = 2 · x 2 · x 2 ≥ 8x 2 > 5x 2 .
1.9 Example Don’t confuse a function having values that are smaller than another’s
with its growth rate being smaller. Let д(x) = x 2 and f (x) = x 2 + 1, so that
д(x) < f (x). But д’s growth rate is not smaller, because f is O(д). To verify, take
N = 2 and C = 2, so that for x ≥ N = 2 we have C · д(x) = 2x 2 = x 2 + x 2 >
x 2 + 1 = f (x).
1.10 Example Let Z : R → R be the zero function Z (n) = 0. Then Z is O(д) for every
complexity function д. Verify that with N = 1 and C = 1.
1.11 Example Some pairs of functions aren’t comparable, that is, neither f ∈ O(д) nor
д ∈ O(f ). Let д(x) = x 3 and consider this function.
(
x 2 – if ⌊x⌋ is even
f (n) = 4
x – if ⌊x⌋ is odd
To see that f is not O(д), consider inputs where ⌊x⌋ is odd. There is no constant C
that gives C · x 3 ≥ x 4 ; for instance, C = 3 will not do because when ⌊x⌋ is greater
than 3 and odd then 3 · д(x) = 3x 3 < x 4 = f (x). Likewise, д is not O(f ) because
of f ’s behavior when ⌊x⌋ is even.
Section 1. Big O 265
1.16 Figure: Each bean contains the complexity functions. Faster growing functions are
higher, so that if they were shown then f 0 (x) = x 5 would be above f 1 (x) = x 4 . On
the left is sketched is the cone O(д) for some д, containing all of the functions with
growth rate less than or equal to д’s. The ellipse at the top is Θ(д), holding functions
with growth rate equivalent to д’s. The sketch on the right adds the cone O(f ) for
some f in O(д).
For most of the functions that we work with, such as polynomial and logarithmic
functions, the next result often makes calculations involving Big O easier.
266 Chapter V. Computational Complexity
f (x) x 2 + 5x + 6 2x + 5 2
lim = lim 3 = lim 2
= lim =0
x →∞ д(x) x →∞ x + 2x + 3 x →∞ 3x + 2 x →∞ 6x
Then Theorem 1.17 says that f is O(д) but д is not O(f ). That is, f ’s growth rate is
less than д’s.
Next consider f (x) = 3x 2 + 4x + 5 and д(x) = x 2 .
3x 2 + 4x + 5 6x + 4 6
lim = lim = lim = 3
x →∞ x2 x →∞ 2x x →∞ 2
So the growth rates of the two are equivalent. That is, f is Θ(д).
For f (x) = 5x 4 + 15 and д(x) = x 2 − 3x , this
5x 4 + 15 20x 3 60x 2
lim = lim = lim =∞
x →∞ x 2 − 3x x →∞ 2x − 3 x →∞ 2
shows that f ’s growth rate is strictly greater than д’s rate — д is O(f ) but f is
not O(д).
1.20 Example The logarithmic function f (x) = logb (x) grows very slowly: logb (x)
is O(x), and logb (x) is O(x 0 . 1 ), and is O(x 0 . 01 ), and in fact logb (x) is O(x d ) for
any d > 0, no matter how small, by this equation.
By Theorem 1.17 that calculation also shows that x d is not O(logb (x)).
†
This case is denoted f is o(д). ‡ The ‘д is O(f )’ is denoted f is Ω(д). §
If L = 1 then f and д are
asymptotically equivalent, denoted f ∼ д .
Section 1. Big O 267
The difference in growth rates is even stronger than that. L’Hôpital’s Rule,
along with the Chain Rule, gives that (logb (x))2 is O(x) because this is 0.
Further, Exercise 1.46 shows that for every power k the function (logb (x))k is O(x d )
for any positive d , no matter how small.
The log-linear function x · lg(x) has a similar relationship to the polynomials
x d , where d > 1.
1.21 Example We can compare the polynomial f (x) = x 2 to the exponential д(x) = 2x .
2x 2x · ln(2) 2x · (ln(2))2
lim = lim = lim =∞
x →∞ x 2 x →∞ 2x x →∞ 2
Thus, f is in O(д) but д is not in O(f ).
An easy induction argument gives that
2x
lim =∞
x →∞ xk
for any k , and so x k is in O(2x ) but 2x is not in O(x k ).
1.22 Lemma Logarithmic functions grow more slowly than polynomial functions: if
f (x) = logb (x) for some base b and д(x) = am x m + · · · + a 0 then f is O(д)
but д is not O(f ). Polynomial functions grow more slowly than exponential
functions: where h(x) = b x for some base b > 1 then then д is O(h) but h is not
O(д).
Tractable and intractable This table lists orders of growth that appear often in
practice. They are listed with faster-growing functions later in the table.
Order Name Examples
O(1) Bounded f 0 (n) = 1, f 1 (n) = 15
O(lg(lg(n))) Double logarithmic f (n) = ln(ln(x))
O(lg(n)) Logarithmic f 0 (n) = ln(n), f 1 (n) = lg(n3 )
O (lg(n))c Polylogarithmic f (n) = (lg(n))3
O(n) Linear f 0 (n) = n , f 1 (n) = 3n + 4)
O(n lg(n)) = O(lg(n !)) Log-linear
O(n 2 ) Polynomial (quadratic) f 0 (n) = 5n2 + 2n + 12
O(n 3 ) Polynomial (cubic) f (n) = 2n 3 + 12n 2 + 5
..
.
O(2n ) Exponential f (n) = 10 · 2n
O(3n ) Exponential f (n) = 6 · 3n + n 2
..
.
O(n !) Factorial
O(nn ) –No standard name–
We often split that hierarchy between the polynomial and exponential func-
tions; the table below shows why. It lists how long a job will take if we use an
algorithm that runs in time lg n , time n , etc. (A modern computer runs at 10 GHz,
10 000 million ticks per second, and there are 3.16 × 107 seconds in a year.)
n=1 n = 10 n = 50 n = 100
lg n – 1.05 × 10−17 1.79 × 10−17 2.11 × 10−17
n 3.17 × 10−18 3.17 × 10−17 1.58 × 10−16 3.17 × 10−16
n lg n – 1.05 × 10−16 8.94 × 10−16 2.11 × 10−15
n2 3.17 × 10−18 3.17 × 10−16 7.92 × 10−15 3.17 × 10−14
n3 3.17 × 10−18 3.17 × 10−15 3.96 × 10−13 3.17 × 10−12
2n 6.34 × 10−18 3.24 × 10−15 3.57 × 10−3 4.02 × 1012
1.25 Table: Comparison of the times, in years, taken by algorithms whose behavior is
given by some functions from the order of growth hierarchy, on some input sizes n .
Section 1. Big O 269
Consider the final column, n = 100. Between the initial rows the relative
change is an order of magnitude, which is a lot, but the absolute times are small.
Then we get to the final two rows. It is not a typo, the bottom really is 1012 years.
This is a huge change, both relatively and absolutely. The universe is 14 × 109
years old so this computation, even with input size of only 100, would take longer
than the age of the universe. Exponential growth is very, very much larger than
polynomial growth.
Another way to understand this point is to consider the effect of adding
one more bit to an algorithm’s input, such as by passing from the length ten
σ0 = 11 0100 1010 to the length eleven σ1 = 110 1001 0101. An algorithm that
loops through the bits will just do one more loop, so it takes ten percent more time.
But an algorithm that takes 2 |σ | time will take double the time.
Cobham’s thesis is that the tractable problems — those that are at least con-
ceivably solvable in practice — are those for which there is an algorithm whose
resource consumption is polynomial.† For instance, if a problem’s best available
algorithm runs in exponential time then we may say that the problem is, or at least
appears, intractable.
The code snippet on the left has x=4 inside the loop. That makes it slower by nine
extra assignments. But Big O disregards this constant time difference. That is,
Big O is not the right tool for characterizing fine coding details. Big O works at a
higher level, such as for comparing runtimes among algorithms.
That fits with our second point about Big O. We use it to help pick the best
algorithm, to rank them according to how much they use of some computing
resources. But algorithms are tied to an underlying computing model.‡ So for the
comparison we need a definition of the time used on a particular machine model.
†
Cobham’s Thesis is widely accepted, but not universally accepted. Some researchers object that if an
algorithm runs in time n 100 or if it runs in time Cn 2 but with an enormous C then the solution is not
actually practical. A rejoinder to that objection notes that when someone announces an algorithm with
a large exponent or large constant then typically over time the approach gets refined, shrinking those
two. In any event, polynomial time is markedly better than exponential time. In this book we accept
the thesis because it gives technical meaning to the informal ‘acceptably fast’. ‡ More discussion of the
relationship between algorithms and machine models is in Section 3.
270 Chapter V. Computational Complexity
†
A more extreme example of a model-based difference is that addition of two n × n matrices on a
RAM model takes time that is O(n 2 ), but on an unboundedly parallel machine model takes constant
time, O(1). ‡ People do sometimes note the order of magnitude of these constants.
Section 1. Big O 271
linear, as O(n), with the reasoning that for the input n there are about n -many
divisions. How to explain the difference between these two Big O estimates?
This is another example of the relationship between an algorithm and an
underlying computing model. A programmer may make the engineering judgment
that for every use of their program the input will fit into a 64 bit word. They are
selecting a computation model, like the RAM model, where larger numbers take
the same time to read as smaller numbers. With this model, the prior paragraph
applies and the algorithm is linear.
So this difference in Big O estimates is in part an application versus theory
thing. In the common programming application setting, where the bit size of the
inputs is bounded, the runtime behavior is O(n). In a theoretical setting, accepting
input that is arbitrarily large and so the runtime is a function of the bit size of the
inputs, the algorithm is O(2b ). An algorithm whose behavior as a function of the
input is polynomial, but whose behavior as a function of the bit size of the input is
exponential, is said to be pseudopolynomial.
A fifth and final point about Big O. When we are analyzing an algorithm we
can consider the behavior that is the worst case for any input of that size (as in
Definition 1.26), or the behavior that is the average over all inputs of that size.
For instance, the quicksort algorithm takes quadratic time O(n 2 ) at worst, but on
average is O(n lg n). Note, though, that worst-case analysis is the most common.
V.1 Exercises
1.27 True or false: if a function is O(n 2 ) then it is O(n 3 ).
✓ 1.28 Your classmate emails you, “I have an algorithm with running time that is
O(n 2 ). So with input n = 5 it will take 25 ticks.” Tell them two things that they
have wrong.
1.29 Suppose that someone posts to a group that you are in, “I’m working on a
problem that is O(n 3 ).” Explain to them, gently, how their sentence is mistaken.
✓ 1.30 How many bits does it take to express each number in binary? (a) 5 (b) 50
(c) 500 (d) 5 000
✓ 1.31 One is true, the other one is not. Which is which? (a) If f is O(д) then f is
Θ(д). (b) If f is Θ(д) then f is O(д).
✓ 1.32 For each, find the function on the Hardy hierarchy, Table 1.24, that has
the same rate of growth. (a) n 2 + 5n − 2 (b) 2n + n 3 (c) 3n 4 − lg lg n
(d) lg n + 5
1.33 For each, give the function on the Hardy hierarchy, Table 1.24, that has the
same rate of(growth. That is, find д in that table where f is Θ(д).
n – if n < 100
(a) f (n) =
0 – else
Section 1. Big O 273
(
1 000 000 · n – if n < 10 000
(b) f (n) = 2
n – else
(
1 000 000 · n 2 – if n < 100 000
(c) f (n) =
lg n – else
✓ 1.34 For each pair of functions decide if f is O(д), or д is O(f ), or both, or
neither. (a) f (n) = 4n 2 + 3, д(n) = (1/2)n 2 − n (b) f (n) √
= 53n 3 , д(n) = ln n
√
(c) f (n) = 2n 2 , д(n) = n (d) f (n) = n 1 . 2 + lg n , д(n) = n 2 + 2n (e) f (n) = n 6 ,
д(n) = 2n/6 (f) f (n) = 3n , д(n) = 3 · 2n (g) f (n) = lg(3n), д(n) = lg(n)
1.35 Which of these are O(n 2 )? (a) lg n (b) 3 + 2n + n 2 (c) 3 + 2n + n 3
(d) 10 + 4n 2 + ⌊ cos(n 3 )⌋ (e) lg(5n )
✓ 1.36 For each, state true or false. (a) 5n 2 + 2n is O(n 3 ) (b) 2 + 4n 3 is O(lg n)
(c) ln n is O(lg n) (d) n 3 + n 2 + n is O(n 3 ) (e) n 3 + n 2 + n is O(2n )
1.37 For each find the smallest k ∈ N so that the given function is O(nk ).
(a) n 3 + (n 4 /10 000 000√) (b) (n + 2)(n + 3)(n2 − lg n) (c) 5n3 + 25 + ⌈cos(n)⌉
2 3 4
(d) 9 · (n + n ) (e) ⌊ 5n 7 + 2n 2 ⌋
1.38 Consider Table 1.25. (a) Add a column for n = 200. (b) Add a row for 3n .
✓ 1.39 On a computer that performs at 10 GHz, at 10 000 million instructions per
second, what is the longest input that can be done in √
a year under an algorithm
with each time performance function? (a) lg n (b) n (c) n (d) n 2 (e) n 3
(f) 2n
1.40 Sometimes in practice we must choose between two algorithms where the
performance of one is better than the performance of the other in a big-O sense,
but where the first has a long initial segment of poorer performance. What is the
least input number such that f (n) = 100 000 · n 2 is less than д(n) = n 3 ?
1.41 What is the order of growth of the run time of a deterministic Finite State
machine?
✓ 1.42 (a) Verify that the function f (x) = 7 is O(1).
(b) Verify that f (x) = 7 + sin(x) is O(1). So if a function is in O(1), that does
not mean that it is a constant function.
(c) Verify that f (x) = 7 + (1/x) is also O(1).
(d) Show that a complexity function f is O(1) if and only if it is bounded above
by a constant, that is, if an only if there exists L ∈ R so that f (x) ≤ L for all
inputs x ∈ R.
1.43 Where does д(x) ≤ x O(1) place the function д in the Hardy hierarchy?
Hint: see the prior question.
1.44 Let f (x) = 2x and д(x) = x 2 . Prove directly from Definition 1.6 that f
is O(д), but that д is not O(f ).
1.45 Prove that 2n is O(n !). Hint: because of the factorial, consider these natural
number functions and find suitable N , C ∈ N.
274 Chapter V. Computational Complexity
1.46 Use L’Hôpital’s Rule as in Example 1.20 to verify these for any d ∈ R+:
(a) (logb (x))3 is O(x d ) (b) for any k ∈ N+ , (logb (x))k is O(x d ).
1.47 What is the running time of the empty Turing machine?
1.48 Assume that д : R → R is increasing, so that x 1 ≥ x 0 implies that д(x 1 ) ≥
д(x 0 ). Assume also that f : R → R is a constant function. Show that f is O(д).
1.49 (a) Show that there is a computable function whose output values grow at a
rate that is O(1), one whose values grow at a rate that is O(n), one for O(n 2 ), etc.
(b) The Halting problem function K is uncomputable. Place its rate of growth in
the Hardy hierarchy, Table 1.24. (c) Produce a function that is not computable
because its output values are larger than those of any computable function. (You
need not show that the rate of growth is strictly larger, only that the output values
are larger.)
1.50 An algorithm that inputs natural numbers runs in pseudopolynomial time if
its runtime is polynomial in the numeric value of the input n , but exponential in
the number of bits required to represent n . Show that the naive algorithm to test
if the input is prime, which just checks whether it is divisible by any number that
is smaller and greater than 2, is pseudopolynomial. (Hint: we can check whether
one number divides another in quadratic time.)
✓ 1.51 Show that O(2x ) ∈ O(3x ) but O(2x ) , O(3x ).
1.52 Table 1.24 states that n ! grows slower than nn . (a) Verify this. Hint: although
n ! is a natural
√ number function, Theorem 1.17 still applies. (b) Stirling’s formula
is that n ! ≈ 2πn · (nn /e n ). Doesn’t this imply that n ! is Θ(nn )?
✓ 1.53 Two complexity functions f , д are asymptotically equivalent, f ∼ д, if
limx →∞ (f (x)/д(x)) = 1. Show that each pair is asymptotically equivalent:
(a) f (x) = x 2 + 5x + 1 and д(x) = x 2 , (b) lg(x + 1) and lg(x).
1.54 Is there an f so that O(f ) is the set of all polynomials?
1.55 There are orders of growth between polynomial and exponential. Specifically,
f (x) = x lg x is one.
(a) Show that lg(x) ∈ O((lg(x))2 ) but (lg(x))2 < O(lg(x)).
(b) Argue that for any power k , we have x k ∈ O(x lg x ) but x lg(x ) < O(x k ).
Hint: take the ratio, rewrite using a = 2lg(a) , and consider the limit of the
exponent.
2
(c) Show that x lg x = 2(lg x ) . Hint: take the logarithm of both halves.
(d) Show that x lg x is in O(2x ). Hint: form the ratio using the prior item.
The remaining exercises verify results that were presented without proof.
1.56 Verify the clauses of Lemma 1.12. (a) If a ∈ R+ then a f is also O(д).
(b) The function f 0 + f 1 is O(д), where д is defined by д(n) = max(д0 (n), д1 (n)).
(c) The product f 0 f 1 is O(д0д1 ).
1.57 Verify these clauses of Lemma 1.15. (a) The big-O relation is reflexive.
(b) It is also transitive.
Section 2. A problem miscellany 275
1.58 Theorem 1.17 says that if the limit of the ratio of two functions exists then
we can determine the O relationship between the two. Assume that f and д are
complexity functions.
(a) Suppose that limx →∞ f (x)/д(x) exists and equals 0. Show that f is O(д).
(Hint: this requires a rigorous definition of the limit.)
(b) We can give an example where f is O(д) even though limx →∞ f (x)/д(x)
does not exist. Verify that, where д(x) = x and where f (x) = x when ⌊x⌋ is
odd and f (x) = 2x when ⌊x⌋ is even.
1.59 Prove Lemma 1.22.
Section
V.2 A problem miscellany
Much of today’s work in the Theory of Computation is driven by problems that
come from a field outside the subject. We will describe some problems to get a
sense of the ones that appear in the subject, and also to use for examples and
exercises. These are all well known.
Problems with stories We start with a few problems that come with stories.
Besides being fun, and an important part of the field’s culture, these also give a
sense of where problems come from.
W R Hamilton was a polymath whose genius was recognized early
and he was given a sinecure as Astronomer Royal of Ireland. He made
important contributions to classical mechanics, where his reformulation
of Newtonian mechanics is now called Hamiltonian mechanics. Other
work of his in physics helped develop classical field theories such as
electromagnetism and laid the ground work for the development of
quantum mechanics. In mathematics, he is best known as the inventor
of the quaternion number system.
William Rowan One of his ventures was a game, Around the World. The vertices
Hamilton in the graph below were holes labeled with the names of world cities.
1805–1865 Players put pegs in the holes, looking for a circuit that visits each city
once and only once.
It did not make Hamilton rich. But it did get him associated with a great problem.
2.2 Problem (Hamiltonian Circuit) Given a graph, decide if it contains a Hamiltonian
circuit, a cyclic path that includes each vertex once and only once.
This is stated as a type of problem called a decision problem, because it asks for
a ‘yes’ or ‘no’ answer. In this section we will see a a variety of types of problems.
The next section will say more about problem types.
A special case of the Hamiltonian Circuit problem is the Knight’s Tour problem,
to use a chess knight to make a circuit of the squares on the board. (Recall
that a knight moves three squares at a time, two in one direction and then one
perpendicular to that direction.)
This is the solution given by L Euler. In graph terms, there are sixty four vertices,
representing the board squares. An edge goes between two vertices if they are
connected by a single knight move. Knight’s Tour asks for a Hamiltonian circuit of
that graph.
Hamiltonian Circuit has another famous variant.
2.3 Problem (Traveling Salesman) Given a weighted graph, where we call the vertices
S = {c 0 , ... c k −1 } ‘cities’ and we call the edge weight d(c i , c j ) ∈ N+ for all c i , c j
the ‘distance’ between the cities, find the shortest-distance circuit that visits every
city and returns back to the start.
We can start with a map of the state capitals of
the forty eight contiguous US states and the distances
between them: Montpelier VT to Albany NY is 254
kilometers, etc. From among all trips that visit each
city and return back to the start, such as Montpelier →
Albany → Harrisburg → · · · → Montpelier, we want
Courtesy xkcd.com the shortest one.
As stated, this is an optimization problem. However,
we can recast it as a decision problem. Introduce a bound B ∈ N and change the
problem statement to ‘decide if there is a circuit of total distance less than B ’. If
we had an algorithm to quickly solve this decision problem then we could also
quickly solve the optimization problem: ask whether there is a trip bounded by
length B = 1, then ask if there is a trip of length B = 2, etc. When we eventually
get a ‘yes’, we know the length of the shortest trip.
Section 2. A problem miscellany 277
A D
Euler’s summary sketch is in the middle and the graph is on the right.
2.4 Problem (Euler Circuit) Given a graph, find a circuit that traverses each edge
once and only once, or declare that no such circuit exists.
Next is a problem that sounds hard. But all of us see it solved every day, for
instance when we ask a phone for the shortest driving directions to someplace.
2.5 Problem (Shortest Path) Given a weighted graph and two vertices, find the
shortest path between them, or report that no path exists.
There is an algorithm that solves this problem quickly.† For instance, with this
graph we could look for the cheapest path from A to F .
14
A D
9
9 C 2 F
7
10 11 6
B E
15
2.8 Problem (Chromatic Number) Given a graph, find the smallest number k ∈ N
such that the graph is k -colorable.
†
This is in contrast to the goal of the Entscheidungsproblem.
Section 2. A problem miscellany 279
P Q R P ∨Q P ∨ ¬Q ¬P ∨ Q ¬P ∨ ¬Q ∨ ¬R f (P, Q, R)
F F F F T T T F
F F T F T T T F
F T F T F T T F
F T T T F T T F
T F F T T F T F
T F T T T F T F
T T F T T T T T
T T T T T T F F
That T in the final column witnesses that this formula is satisfiable.
2.9 Problem (Satisfiability, SAT ) Decide if a given Boolean expression is satisfiable.
2.12 Example These are two Western-tradition constellations, Ursa Minor and Draco.
Here we can solve the Vertex-to-Vertex Path problem by eye. For any two vertices
in Ursa Minor there is a path and for any two vertices in Draco there is a path. But
if the two are in different constellations then there is no path.
2.13 Problem (Minimum Spanning Tree) Given a weighted undirected graph, find a
minimum spanning tree, a subgraph containing all the vertices of the original
graph such that its edges have a minimum total.
This is an undirected graph with weights on the edges.
18
8
4
3 10
9
1 7 9
3 9
5 8
4
9 4
4
2 6
9
The highlighted subgraph includes all of the vertices, that is, it spans the graph. In
addition, its weights total to a minimum from among all of the spanning subgraphs.
From that it follows that this subgraph is a tree, meaning that it has no cycles, or
else we could eliminate an edge from the cycle and thereby lower the edge weight
total without dropping any vertices.
This problem looks like the Hamiltonian Circuit problem in requiring that the sub-
graph contain all the vertices. One difference is that for the Minimum Spanning Tree
problem we know algorithms that are quick, that are O(n lg n).
2.14 Problem (Vertex Cover) Given a graph and a bound B ∈ N, decide if the graph
has a size B vertex cover, a set of vertices, C , such that for any edge, at least one
of its ends is a member of C .
2.15 Example A museum posts guards to watch the exhibits. They have eight halls, laid
out as below, and they will post guards at corners. What is the smallest number of
guards that will suffice to watch all of the hallways?
Section 2. A problem miscellany 281
w0 w1 w2
w3 w4 w5
Obviously, one guard will not do. A two-element list that covers is C = {w 0 , w 4 }.
2.16 Problem (Clique) Given a graph and a bound B ∈ N, decide if the graph has a
size B clique, a set of B -many vertices such that any two are connected.
If the graph nodes represent people and the edges connect friends then we
want to know if there are B -many mutual friends. Any graph with a 4-clique has
the subgraph like the one below on the left and any graph with a 5 clique has the
subgraph like the one the right.
v4 v1
v5 v0
2.19 Problem (Broadcast) Given a graph with initial vertex v 0 , and a bound B ∈ N,
decide if a message can spread from v 0 to every other vertex within B steps. At
each step, any node that has heard the message can transmit it to at most one
adjacent node.
v7
v1 v4
v0 v8 v 10
v3 v2 v5 v9
v6
2.21 Example In the graph no vertex is more than three edges away from the initial
one. The animation shows it taking four steps to broadcast.
M = { ⟨a, b, a⟩, ⟨a, c, a⟩, ⟨b, b, a⟩, ⟨b, c, a⟩, ⟨a, b, d⟩, ⟨a, c, d⟩, ⟨b, b, d⟩, ⟨b, c, d⟩ }
The set M̂ = { ⟨a, b, a⟩, ⟨b, c, d⟩ } has 2 elements and they disagree in the first
coordinate, as well as on the second and third.
M = { ⟨1, 10, 200⟩, ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 10, 400⟩,
⟨3, 40, 100⟩, ⟨3, 40, 200⟩, ⟨4, 10, 200⟩, ⟨4, 20, 300⟩ }
A matching is M̂ = { ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 40, 100⟩, ⟨4, 10, 200⟩ }.
2.27 Example No sum of the numbers { 831, 357, 63, 987, 117, 81, 6785, 606 } adds to
T = 2105. All of the numbers are multiples of three, while the target T is not.
2.28 Problem (Knapsack) Given a finite set S whose elements s have a weight w(s) ∈
N+ and a value v(s) ∈ N+ , along with a weight bound B ∈ N+ and a value
target T ∈ N+ , find a subset Sˆ ⊆ S whose elements have a total weight less than
or equal to the bound and total value greater than or equal to the target.
Imagine that we have items to pack in a knapsack and we can carry at most ten
pounds. Can we pack a value of T = 100 or more?
Item a b c d
Weight 3 4 5 6
Value 50 40 10 30
Section 2. A problem miscellany 283
We pack the most value while keeping to the weight limit by taking items (a)
and (b). So we cannot meet the value target.
2.29 Problem (Partition) Given a finite multiset A that has for each of its elements an
associated positive number size s(a) ∈ N+ , decide if there is a division of the set
into two halves, Â and A − Â, so that the total of the sizes is the same in both
halves, a ∈Â s(a) = a<Â s(a).
Í Í
2.30 Example The set A = { I, a, my, go, rivers, cat, hotel, comb } has eight words.
The size of a word, s(σ ), is the number of letters. Then  = { cat, river, I, a, go }
gives a ∈Â s(a) = a<Â s(a) = 12.
Í Í
2.32 Problem (Crossword) Given an n × n grid, and a set of 2n -many strings, each of
length n , decide if the words can be packed into the grid.
2.33 Example Can we pack the words AGE, AGO, BEG, CAB, CAD, and DOG into a 3 × 3
grid?
C A B
A G E
D O G
The final three problems may seem inextricably linked, but as we understand
them today, they seem to be different in the big-O behavior of the algorithms to
solve them.
2.36 Problem (Divisor) Given a number n ∈ N, find a nontrivial divisor.
When the numbers are sufficiently large, we know of no efficient algorithm to
find divisors.† However, as is so often the case, at this time we also have no proof
that no efficient algorithm exists.‡ Not all numbers of a given length are equally
hard to factor. The hardest numbers to factor, using the best currently known
techniques, are semiprimes, the product of two prime numbers.
2.37 Problem (Prime Factorization) Given a number n ∈ N, produce its decomposition
into a product of primes.
Factoring seems, as far as we know today, to be hard. What about if you only
want to know whether a number is prime or composite, and don’t care about its
factors?
2.38 Problem (Composite) Given a number n ∈ N, determine if it has any nontrivial
factors; that is, decide if there is a number a that divides n and such that 1 < a < n .
For many years the consensus among experts was that
Composite was probably quite hard.§ One reasonable justifi-
cation was that, for centuries, many of the smartest people in
the world had worked on composites and primes, and none
of them had produced a fast test. But in 2002, M Agrawal,
N Kayal, and N Saxena proved that primality testing can be
done in time polynomial in the number of digits of the number.
This is the AKS primality test.|| Today, refinements of their
Nitin Saxena (b 1981),
technique run in O(n 6 ). Neeraj Kayal (b 1979),
This dramatically illustrates that, although experts are Manindra Agrawal
expert and their opinions have value, nonetheless they can be (b 1966)
wrong. People producing a result that gainsays established
orthodoxy has happened before and will happen again.
In short, one correct proof outweighs any number of expert opinions.
†
No efficient algorithm is known on a non-quantum computer. ‡ There is no proof despite centuries
of ingenious attacks on the problems by many of the brightest minds of the past, and of today. The
presumed difficulty of this problem is at the heart of widely used algorithms in cryptography. § There
are a number of probabilistic algorithms that are often used in practice that can test primality very
quickly, with an extremely small chance of error. || At the time that they did most of the work, Kayal
and Saxena were undergraduates.
Section 2. A problem miscellany 285
V.2 Exercises
P Q R
F F F T
F F T T
F T F F
F T T F
T F F T
T F T F
T T F T
T T T F
(a) The two terms P and ¬P are atoms. So are Q , ¬Q , R , and ¬R . Produce a
three-atom clause that evaluates to F only on the F -T -F line.
(b) Produce three-atom clauses for each of the other truth table lines having the
value F on the right.
(c) Take the conjunction of those four clauses and verify that it has the given
behavior.
✓ 2.43 Decide if each formula is satisfiable.
(a) (P ∧ Q) ∨ (¬Q ∧ R)
(b) (P → Q) ∧ ¬((P ∧ Q) ∨ ¬P)
✓ 2.44 Each of the five Platonic solids has a Hamiltonian circuit, as shown.
Hamilton used the fourth, the dodecahedron, for his game. Find a Hamiltonian
circuit for the third and the fifth, the octahedron and the icosahedron. To make
the connections easier to see, below we have grabbed a face in the back of each
solid, and expanded it until we could squash the entire shape down into the plane
without any edge crossings.
286 Chapter V. Computational Complexity
0
1
1
4 3 4 5
2
2 5
3 6
7 8
0 9
10 11
(a) Is there a path from AT&T to Ford Motor? (b) Can you get from Haliburton
to Ford Motor? (c) Can you get from Caterpillar to Ford Motor? (d) JP Morgan
to Ford Motor?
✓ 2.49 A popular game extends the Vertex-to-Vertex Path problem by counting the
degrees of separation. Below is a portion of the movie connection graph, where
actors are connected if they have ever been together in a movie.
Elvis Presley Meryl Streep
Change of Habit
Ed Asner JFK The River Wild
(c) Bacon’s?
(d) How many movies separate me from Meryl Streep?
✓ 2.50 This Knapsack instance has no solution when the weight bound is B = 73
and the value target is T = 140.
Item a b c d e
Weight 21 33 49 42 19
Value 50 48 34 44 40
G E H
C F I
D
A B
288 Chapter V. Computational Complexity
What is the minimum number of class times that we must use? In coloring terms,
we define that classes meeting at the same time are the same color and we ask for
the minimum number of colors needed so that no two same-colored vertices share
an edge. (a) Show that no three-coloring suffices. (b) Produce a four-coloring.
2.57 Some authors define the Satisfiability problem as: given a finite set of propo-
sitional logic statements, find if there is a single input tuple b0 , ... b j−1 , where
each bi is either T or F , that satisfies them all. Show that this is equivalent to the
definition given in Problem 2.9.
✓ 2.58 Find all 3-cliques in this graph.
v6 v5
v1 v4 v3
v0 v2
v0 v1
v2
v3 v4
v5 v6
v3 v4 v5
(b) In this graph find a vertex cover with k = 3 elements, and an independent
set with k̂ = 3 elements.
v0 v1 v2 v3
v4 v5
(c) In this graph find a vertex cover S with k = 4 elements. Find an independent
set Sˆ with k̂ = 6 elements.
Section 3. Problems, algorithms, and programs 289
v0 v1 v2 v3 v4
v5 v6 v7 v8 v9
0 1 2 3 4
α β γ δ ε
For example, instructor A can only teach courses 1, 2, and 3. And, course 0
can only run at time α or time δ . Verify that this is an instance of the
Three-dimensional Matching problem and find a match.
2.62 Consider Three Dimensional Matching, Problem 2.22. Let X = { a, b, c },
Y = { b, c, d }, and Z = { a, d, e }.
(a) List all the elements of M = X × Y × Z .
(b) Is there a three element subset M̂ whose triples have the property that no
two of them agree on any coordinate?
2.63 In Example 2.21 the broadcast takes four steps. Can it be done in fewer?
Section
V.3 Problems, algorithms, and programs
Now, with many examples in hand, we will briefly reflect on problems and solutions.
We will keep this discussion on an intuitive level only — indeed, many of these
things have no widely accepted precise definition.
A problem is a job, a task. Somewhat more precisely, it is a uniform family of
tasks, typically with an unbounded number of instances. For a sense of ‘family’,
contrast the general Shortest Path problem with that of finding the shortest path
between Los Angeles and New York. The first is a family while the second is an
instance. We are more likely to talk about the family, both because the second is a
special case so that any conclusions about the first subsumes the second, and also
because the first feels more natural.† We are most focused on problems that can be
†
There are interesting problems with only one task, such as computing the digits of π .
290 Chapter V. Computational Complexity
Types of problems There are patterns to the problems that we see in the Theory
of Computation. As a first example, a problem type that we have already seen
is a function problem. These ask that an algorithm that has a single output
for each input. A example is the Prime Factorization problem, which takes in a
natural number and returns its prime decomposition, perhaps as a sequence of
pairs, ⟨prime, exponent⟩ . Another example is the problem of finding the greatest
common divisor of two natural numbers, where the input is a pair of natural
numbers and the output is a natural number.
A second common problem type is the optimization problem. These call for a
solution that is best, according to some metric. The Shortest Path problem is one
of these, as is the Minimal Spanning Tree problem.
†
There is no widely-accepted formal definition of ‘algorithm’. Whatever it is, it fits between ‘mathematical
function’ and ‘computer program’. For example, a ‘sort’ function takes in a set of items and returns
the sorted sequence. This behavior could be implemented using different algorithms: merge sort,
heap sort, etc. In turn, each algorithm could be implemented by many programs, written in different
languages and for different platforms. So the best handle that we have is informal — an ‘algorithm’ is
an equivalence class of programs (i.e., Turing machines), where two programs are equivalent if they do
essentially the same thing. ‡ There are now coming up on a million volunteers offering computing
time. To join them, visit https://scienceunited.org/.
Section 3. Problems, algorithms, and programs 291
A perhaps less familiar problem type is a search problem. For one of these,
while there may be many solutions in the search space, the algorithm can stop when
it has found any one. A natural example inputs a Propositional Logic statement and
outputs a line in the truth table that witnesses that the statement is satisfiable (or
signals that there is no such line). There may be many such lines but it only needs
to find one. Another example is the problem, “Given a weighted graph, and two
vertices, and a bound B ∈ R, find a path between the vertices that costs less than
the bound.” Still another example is that of finding a B -coloring for a graph, from
among possibly many such colorings. Another example is the Knapsack problem.
This problem type also appeared in the discussion on nondeterminism on page 191,
where we defined that a string is accepted if in the computation tree we could find
at least one accepting branch.
A decision problem is one with a ‘Yes’ or ‘No’ answer.† The first problem
that we saw, the Entscheidungsproblem, is one of these.‡ We have also seen
decision problems in conjunction with the Halting problem, such as the problem of
determining, given an index e , whether ϕ e will output a seven for any input. In this
chapter we saw the problem of deciding whether a given natural number is prime,
the Composite problem, as well as the Clique problem, the Partition problem, and
the Subset Sum problem.
Often a decision problem is expressed as a language decision problem, or
language recognition problem, where we are given some language and asked for an
algorithm to decide if its input is a member of that language. We did lots of these
in the Automata chapter, such as producing a Finite State machine that decides if
an input string is a member of L = {σ ∈ { a, b }∗ σ contains at least two b’s }, or
proving that no such machine can determine membership in { an bn n ∈ N }.
This relates to the discussion from the Languages section, on page 147, about
the distinction between deciding a language and recognizing it.
3.1 Definition A language L is decided by a Turing machine, or Turing machine
decidable, if the function computed by that machine is the characteristic function
of the language. The language is recognized, or accepted, by when for each
input σ ∈ B∗, if σ ∈ L then the machine returns 1, while if σ < L then either the
machine does not halt or it does not return 1.
Restated, P decides the language L if the machine has this input-output behavior.
(
1 – if σ ∈ L
ϕ P (σ ) = 1L (σ ) =
0 – otherwise
Note in particular that in this case the machine halts for all inputs. Note also that
if a machine recognizes a language then when σ < L, possibly the machine just
does not halt.
†
Although a decision problem calls for producing a function of a kind, a Boolean function, they are
common enough to be a separate category. ‡ Recall that the word is German for “decision problem”
and that it asks for an algorithm to decide, given a mathematical statement, whether that statement is
true or false.
292 Chapter V. Computational Complexity
3.2 Remark One reason that we are interested in language membership decisions
comes from practice. A language compiler must recognize whether a given source
file is a member of the language. Another reason is that Finite State machines can
only do one thing, decide languages, and so to compare these with other machines
we must do so by comparing which languages they can decide. Still another reason
is that in many contexts stating a problem in this way is natural, as we saw with
the Halting problem.
Distinctions between problem types are fuzzy and often we can describe a task
with more than one type. For the task of determining the evenness of a number,
for instance, we could consider the function problem ‘given n , return its remainder
on division by 2’, or the language decision problem of determining membership in
L = { 2k k ∈ N }.
There, the different types are essentially the same. However, sometimes
selecting the problem type that best captures the complexities involved in a task
requires judgment. Consider the task of finding roots of a polynomial. We may
express it as a function problem with ‘given a polynomial p , return the set of its
rational number roots’, or as a language decision problem with ‘decide if a given
⟨p, r ⟩ , belongs to the set of all sequences consisting of a polynomial and one of its
rational roots’. The second option, for which the algorithm just plugs r into p , does
not seem to involve some of the essential difficulty in find a root, for instance such
as the problem of distinguishing between a single number that is a double root
and two close numbers that are each single roots.
When we have a choice of problem types, we prefer language decision problems.
It is our default interpretation of ‘problem’ and we will focus on them in the rest of
the book. In addition, we will be sloppy about the distinction between the decision
problem for a language and the language itself; for instance, we will write L for a
problem.
3.3 Example The Satisfiability problem, as stated, is a decision problem. We can
recast
it as the problem of determining membership in the language SAT =
{ F F is a satisfiable propositional logic statement }. This recasting is trivial, sug-
gesting that the language recognition problem form is a natural way to describe
the underlying task.
Recasting optimization problems as language decision problems often involves
using a parametrized sequence of languages.
3.4 Example The Chromatic Number problem inputs a graph and returns a minimal
number B ∈ N such that the graph is B -colorable. Recast it by considering the family
of languages, LB = { G G that has a B -coloring }. If we could solve the decision
problem for those languages then we could compute the minimal chromatic number
by testing B = 1, B = 2, etc., until we find the smallest B for which G ∈ LB .
3.5 Example The Traveling Salesman problem is an optimization problem. Recast it
as a sequence of language decision problems as above: consider a parameter B ∈ N
Section 3. Problems, algorithms, and programs 293
and define T S B = { G the graph G has a circuit of length no more than B }.
For a task, we want to state it as a problem in a way that captures the essential
difficulty. In particular, these recastings of optimization problems preserves
polytime solvability. For instance, if there were a power k ∈ N such that for each B
we could solve T S B in time O(nk ) then looping through B = 1, B = 2, etc., will
solve the Traveling Salesman problem in polytime, namely time O(nk +1 ).
Statements and representations To be complete, the description of a problem
must include the form of the inputs and outputs. For instance, if we state a problem
as: ‘input two numbers and output their midpoint’ then we have not fully specified
what needs to be done. The input or output might use strings representing decimal
numbers, or might be in binary, or even might be in unary, where the number n is
represented with n -many 1’s.†
The representation of the input and output data both matters and doesn’t
matter. One sense in which it matters is that the input’s form can change the
algorithm that we choose, or its runtime behavior. Suppose for instance that
we must decide whether a number is divisible by four. If the input is a bitstring
representing the number in binary then the algorithm is immediate: a number
is divisible by four if and only if in its final two bits are 00.‡ In contrast, if the
number is represented in unary then we may scan the 1’s, keeping track of the
current remainder modulo 4.
However, the representation doesn’t matter in the sense that if we have an
algorithm for one representation in hand then we can solve the problem for other
representations by translating to what the algorithm expects, and then applying
that algorithm. For example, for the divisible-by-four problem we could handle
unary inputs by converting them to binary and then applying the binary algorithm.§
The same applies to output representations.
Another way in which the representation doesn’t matter is that typically we
find that the costs of different representations don’t change the Big O runtime
behavior. For example we might have a graph algorithm whose run time is not
large at all, O(n lg n). Even for this minimal time, we can find a representation
for the input graphs, such as where inputting takes O(n) time, that leaves the
algorithm analysis conclusion unchanged at O(n lg n).
With this in mind, we will adopt the point of view, which we shall call Lipton’s
Thesis, that everything of interest can be represented with reasonable efficiency by
bitstrings.|| This applies to all of the mathematical problems stated earlier. But it
†
An experienced programmer may have the reaction that unary is wasteful — if the input is one
thousand then this representation would require the machine to take a thousand steps just to read
the input, while in binary notation the input is 1111101000, which takes only ten steps to read. But
unary is not completely useless; we have found that it suited our purpose when we simply wanted to
illustrate Turing machines. In any event, it certainly is possible. ‡ Thus, on a Turing machine, if when
the machine starts the head is under the final character, then the machine does not even need to read
the entire input to decide the question. The algorithm runs in time independent of the input length.
§
That is, the unary case reduces to the binary one. || ‘Reasonable’ means that it is not so inefficient as
to greatly change the big-O behavior.
294 Chapter V. Computational Complexity
also applies to cases that may seem less natural, such as that we can use bitstrings
to faithfully represent Beethoven’s 9th Symphony, or an exquisite Old Master.†
†
This is in some ways like Church’s Thesis. We cannot prove it, but our experience with digital
reproduction of music, movies, etc., finds that it is so. ‡ Naturally some exercises in this section cover
representations. § Many authors use diamond brackets to stand for a representation, as in ‘ ⟨G, v 0, v 1 ⟩ ’.
Here, we reserve diamond brackets for sequences.
Section 3. Problems, algorithms, and programs 295
V.3 Exercises
✓ 3.8 What is the difference — speaking informally, since some of these do not have
formal definitions — between an algorithm and: (a) a heuristic, (b) pseudocode,
(c) a Turing machine (d) a flowchart, and (e) a process?
3.9 So, if a problem is essentially a set of strings, what constitutes a solution?
3.10 What is the difference between a decision problem and a language decision
problem?
3.11 As an illustration of the thesis that even surprising things can be represented
reasonably efficiently and with reasonable fidelity in binary, we can do a simple
calculation. (a) At 30 cm, the resolution of the human eye is about 0.01 cm.
How many such pixels are there in a photograph that is 21 cm by 30 cm?
(b) We can see about a million colors. How many bits per pixel is that?
(c) How many bits for the photo, in total?
3.12 Name something important that cannot be represented in binary.
✓ 3.13 True or false: any two programs that implement the same algorithm must
compute the same function. What about the converse?
3.14 Some tasks are hard to express as a language decision problem. Consider
sorting the characters of a string into ascending order. Briefly describe why each
of these language decision problems fails to capture the task’s essential difficulty.
(a) {σ ∈ Σ∗ σ is sorted } (b) { ⟨σ , p⟩ p is a permutation that orders σ }
(d) L3 = {σ ∈ B∗ σ R = σ }
3.16 Solve the language decision problem for (a) the empty language, (b) the
language B, and (c) the language B∗.
3.17 For each language, sketch an algorithm that solves the language decision
problem.
(a) {σ ∈ B∗ σ matches the regular expression a*ba* }
3.19 For each language decision problem, give an algorithm that runs in O(1).
(a) The language of minimal-length binary representations of numbers that are
nonzero.
(b) The binary representations of numbers that exceed 1000.
3.20 In a graph, a bridge edge is one whose removal disconnects the graph. That
is, there are two vertices that before the bridge is removed are connected by a
path, but are not connected after it is removed. (More precisely, a connected
component of a graph is a set of vertices that can be reached from each other by
a path. A bridge edge is one whose removal increases the number of connected
components.) The problem is: given a graph, find a bridge. Is this a function
problem, a decision problem, a language decision problem, a search problem, or
an optimization problem?
✓ 3.21 For each, give the categorization that best applies: a function problem,
a decision problem, a language decision problem, a search problem, or an
optimization problem. (a) The Graph Connectedness problem, which inputs
a graph and decides whether for any two vertices there is a path between
them. (b) The problem that inputs two natural numbers and returns their
least common multiple. (c) The Graph Isomorphism problem that inputs two
graphs and determines whether they are isomorphic. (d) The problem that
takes in a propositional logic statement and returns an assignment of truth
values to its inputs that makes the statement true, if there is such an assignment.
(e) The Nearest Neighbor problem that inputs a weighted graph and a vertex,
and returns a vertex nearest the given one that does not equal the given one.
(f) The Discrete Logarithm problem: given a prime number p and two numbers
a, b ∈ N, determine if there is a power k ∈ N so that ak ≡ b (mod p). (g) The
problem that inputs a bitstring and decides if the number that it represents in
binary will, when converted to decimal, contain only odd digits.
✓ 3.22 For each, give the characterization that best applies: a function problem,
a decision problem, a language decision problem, a search problem, or an
optimization problem.
(a) The 3-Satisfiability problem, Problem 2.10
(b) The Divisor problem, Problem 2.36
(c) The Prime Factorization problem, Problem 2.37
(d) The F − SAT problem, where the input is a propositional logic expression
and the output is either an assignment of T and F to the expression’s variables
that makes it evaluate to T , or the string None.
(e) The Composite problem, Problem 2.38
3.23 Express each task as a language decision problem. Include in the description
explicit mention of the string representation.
(a) Decide whether a number is a perfect square.
(b) Decide whether a triple ⟨x, y, z⟩ ∈ N3 is a Pythagorean triple, that is, whether
x 2 + y2 = z2.
Section 3. Problems, algorithms, and programs 297
(b) The Nearest Neighbor problem, that inputs a weighted graph and a vertex
and returns the vertex nearest the given one, but not equal to it.
3.32 The Linear Programming problem starts with a list of linear inequalities
ai, 0x 0 + · · · + ai,n−1x n−1 ≤ bi for a 0 , ... an−1 , bi ∈ Q and it looks for a sequence
⟨s 0 , ... sn−1 ⟩ ∈ Qn that is feasible, in that substituting the number s j for the variable
x i ’s makes each inequality true. Give a version that is a (a) language decision
problem, (b) search problem, (c) function problem, and (d) optimization
problem. (For some parts there is more than one sensible answer.)
3.33 An independent set in a graph is a collection of vertices such that no
two are connected by an edge. Give a version of the problem of finding an
independent set that is a (a) a decision problem, (b) language decision problem,
(c) search problem, (d) function problem, and (e) optimization problem. (For
some parts there is more than one reasonable answer.)
3.34 Give an example of a problem where the decision variant is solvable quickly,
but the search variant is not.
3.35 Let LF = { ⟨n, B⟩ ∈ N2 there is an m ∈ { 1, ... B } that divides n } and con-
Section
V.4 P
Recall that we usually are not careful to distinguish between a language L and the
problem of deciding which strings are in that language.
4.1 Definition A complexity class is a collection of languages.
The term ‘complexity’ is there because these collections are often associated
with some resource specification, so that a class consists of the languages that
Section 4. P 299
are accepted by a mechanical computer whose use of some resource fits the
specification.†
4.2 Example One complexity class is the collection of languages for which there is an
deciding Turing machine that runs in time O(n 2 ). That is, C = { L0 , L1 , ... }, where
each Lj is decided by some machine Pi j , which, when given input σ , finishes
within |σ | 2 steps.
4.3 Example Another is the collection of languages accepted by Turing machines that
use only in logarithmic space. So, when the accepting machine gets input σ then
it halts after visiting a number of tape squares less than or equal to lg(|σ |).
Some points bear development. As to the computing machine, researchers
study not just Turing machines but other types of machines as well, including
nondeterministic Turing machines, and Turing machines with access to an oracle
for random numbers. The resource specification often involve bounds on the time
and space behavior. But they could instead be, for instance, the complement of
O(n 2 ), so it isn’t always a bound.‡
Definition The complexity class that we introduce now is the most important one.
It is the collection of problems that under Cobham’s Thesis we take to be tractable.
4.4 Definition A language decision problem is a member of the class P if there is an
algorithm for it that on a deterministic Turing machine runs in polynomial time.
4.5 Example One problem that is a member of P is that of deciding whether a given
graph is connected.
{ G for any two vertices v 0 , v 1 ∈ G , there is a path from one to the other }
4.7 Example Still another problem in P is the String Search problem, to decide
substring-ness.
{ ⟨σ , τ ⟩ ∈ Σ∗ σ is a substring of τ }
Often τ is very long and is called the haystack while σ is short and is the needle. The
algorithm that first tests σ at the initial character of τ , then at the next character,
etc., has a runtime of O(|σ | · |τ |), which is O(max(|σ |, |τ |)2 ).
4.8 Example A circuit is a directed acyclic graph. Each vertex, called a gate, is labeled
with a two input/one output Boolean function. The only exception is that the
vertices on the left are the input gates that provide source bits, b0 , b1 , b2 , b3 ∈ B.
Edges are called wires. Below, ∧ is the boolean function ‘and’, ∨ is ‘or’, ⊕ is
‘exclusive or’, and ≡ is the negation of ‘exclusive or’, which returns 1 if and only if
the two inputs bits are the same.
b0 ⊕
∧
b1 ∧
≡ f (b0 , b1, b2, b3 )
b2 ⊕
∨
b3 ∧
This circuit returns 1 if the sum of the input bits is a multiple of 3. The
Circuit Evaluation problem inputs a circuit like this one and computes the out-
put, f (b0 , b1 , b2 , b3 ). This problem is a member of P.
Slow
..
.
Fast
4.10 Figure: This shows all language decision problems, all L ⊆ B∗ . Shaded is P. The
shading is in layers to depict that P contains problems with a solution algorithm that
is O(1), problems with an algorithm that is O(n), ones with a O(n 2 ) algorithm, etc.
Naturalness We will give the class P a lot of attention because there are reasons
to think that it is the collection that best captures the notion of problems that have
a feasible solution.
†
We take a model to be ‘natural’ if it was not invented in order to be a counterexample to this. ‡ One
definition of ‘reasonable’ is “in principle physically realizable” (Bernstein and Vazirani 1997). § A Turing
machine with a random oracle.
302 Chapter V. Computational Complexity
The first reason echos the prior subsection. There are many models of compu-
tation, including Turing machines, RAM machines, and Racket programs. All of
them compute the same set of functions as Turing machines, by Church’s Thesis,
but they do so at different speeds. However, while the speeds differ, all these
models run within polytime of each other.† That makes P invariant under the
choice of computing model: if a problem is in P for any model then it is in P for all
of these models. The fact that Turing machines are our standard is in some ways a
historical accident, but differences between the runtime behavior of any of these
models is lost in the general polynomial sloppiness.
Another reason that P is a natural class is that we’d like that if two things, f
and д, are easy to compute then a simple combination of the two is also easy. More
precisely, fix total functions f , д : N → N and consider these.
(Recall that str(...) means that we represent the argument reasonably efficiently
as a bitstring). With that recasting, P is closed under function addition, scalar
multiplication by an integer, subtraction, multiplication, and composition. It is
also closed under language concatenation, and the Kleene star operator. It is the
smallest nontrivial class with these appealing closure properties.
But the main reason that P is our candidate is Cobham’s Thesis, the contention
that the formalization of ‘tractable problem’ should be that it has a solution
algorithm that runs in polynomial time. We discussed this on page 269; a person
may object that polytime is too broad a class to capture this idea because a problem
whose solution algorithm cannot be improved below a runtime of O(n 1 000 000 ) is
really not feasible or tractable. Further, using diagonalization we can produce such
problems. However, the problems produced in that way are artificial, and empirical
experience over close to a century of computing is that problems with solution
algorithms of very large degree polynomial time complexity do not seem to arise
in practice. We see plenty of problems with solution algorithms that are O(n lg n),
or O(n 3 ), and we see plenty of problems that are exponential, but we just do not
see O(n 1 000 000 ).
Moreover, often in the past when a researcher has produced an algorithm for a
problem with a runtime that is even a moderately large degree polynomial, then,
with this foot in the door, over the next few years the community brings to bear an
array of mathematical and algorithmic techniques that bring the runtime degree
down to reasonable size.
Even if the objection to Cobham’s Thesis is right and P is too broad, it would
nonetheless still be useful because if we could show that a problem is not in P
then we would have shown that it has no solution algorithm that is practical.‡
†
All of the non-quantum natural models. ‡ This argument has lost some of its force in recent years
with the rise of SAT solvers. These algorithms attack problems believed to not be in P, and can solve
instances of the problems of moderately large size, using only moderately large computing resources.
See Extra B.
Section 4. P 303
(This is like in the first and second chapter where we considered Turing machine
computations that are unbounded. Showing that something is not solvable even
for an unbounded computation also shows that it is not solvable within bounds.)
So Cobham’s Thesis, to this point, has held up. Insofar as theory should be a
guide for practice, this is a compelling reason to use P as a benchmark for other
complexity classes.
V.4 Exercises
✓ 4.12 True or False: if the language is finite then the language decision problem is
in P.
✓ 4.13 Your coworker says something mistaken, “I’ve got a problem whose algorithm
is in P.” They are being a little sloppy with terms; how?
✓ 4.14 What is the difference between an order of growth and a complexity class?
✓ 4.15 Your friend says to you, “I think that the Circuit Evaluation problem takes
exponential time. There is a final vertex. It takes two inputs, which come from
two vertices, and each of those take two inputs, etc., so that a five-deep circuit
can have thirty two vertices.” Help them see where they are wrong.
4.16 In class, someone says to the professor, “Why aren’t all languages in P
according to your definition? I’ll design a Turing machine P so that no matter
what the input is, it outputs 1. It only needs one step, so it is polytime for sure.”
Perhaps you are not able rightly to apprehend the kind of confusion of ideas that
could provoke such a question, but give a calm explanation of how it is mistaken.
4.17 True or false: if a language is accepted by a machine then its complement is
also accepted by a machine.
✓ 4.18 Show that the decision problem for {σ ∈ B∗ σ = τ 3 for some τ ∈ B∗ } is
in P.
✓ 4.19 Show that the language of palindromes, {σ ∈ B∗ σ = σ R }, is in P.
4.34 Prove that P is closed under complement. That is, prove that if a language is
in P then so is its set complement.
4.35 Prove that the class of languages P is closed under reversal. That is, prove
that if a language is an element of P then so is the reversal of that language
(which is the language of string reversals).
4.36 Show that P is closed under the concatenation of two languages.
4.37 Show that P is closed under Kleene star, meaning that if L ∈ P then L∗ ∈ P.
(Hint: σ ∈ L∗ if σ = ε , or σ ∈ L, or σ = α ⌢ β for some α, β ∈ L∗ )
4.38 Show that this problem is unsolvable: give a Turing machine P , decide
whether it runs in polytime on the empty input. Hint: if you could solve this
problem then you could solve the Halting problem.
4.39 There are studied complexity classes besides those associated with language
decision problems. The class FP consists of the binary relations R ⊆ N2 where
there is a Turing machine that, given input x ∈ N, can in polytime find a y ∈ N
where ⟨x, y⟩ ∈ R .
(a) Prove that this class closed under function addition, multiplication by a
scalar r ∈ N, subtraction, multiplication, and function composition.
(b) Where f : N → N is computable, consider this decision problem associated
with the function, Lf = { str(⟨n, f (n)⟩) ∈ B∗ n ∈ N } (where the numbers
are represented in binary). Assume that we have two functions f 0 , f 1 : N → N
such that Lf0 , Lf1 ∈ P. Show that the natural algorithm to check for closure
under function addition is pseudopolynomial.
4.40 Where L0 , L1 ⊆ B∗ are languages, we say that L1 ≤p L0 if there is a function
f : B∗ → B∗ that is computable, total, that runs in polytime, and so that σ ∈ L1
if and only if f (σ ) ∈ L0 . Prove that if L0 ∈ P and L1 ≤p L0 then L1 ∈ P.
Section
V.5 NP
01 ⊢ 00
⊢ q3 q3
00
⊢ 0
⊢ q1
q1
00
⊢ ···
q0 10 ⊢ 10 ⊢ 10
q2 q2 q2
The natural approach is to compute a truth table and see whether the final column
has any T ’s. Here, the formula is satisfiable because the TT F row ends in a T .
P Q R P ∨Q P ∨ ¬Q ¬P ∨ Q ¬P ∨ ¬Q ∨ ¬R Q ∨R (∗)
F F F F T T T F F
F F T F T T T T F
F T F T F T T T F
F T T T F T T T F
T F F T T F T F F
T F T T T F T T F
T T F T T T T T T
T T T T T T F T F
As to runtime, the number of table rows grows exponentially. Specifically, it is 2
raised to the number of input variables. So this approach is very slow on a serial
model of computation.
Each line of the truth table is easy; the issue is that there are a lot of lines. So
this problem perfectly suited for unbounded parallelism. For each line we could
fork a child process. These children are done quickly, certainly in polytime. If at
the end any child is holding a ‘T ’ then we declare that the expression is satisfiable.
In total then, a nondeterministic machine does this job in polytime while a serial
machine appears to require exponential time.
So while adding nondeterminism to Turing machines doesn’t allow them to
compute any entirely new functions, a person could sensibly conjecture that it does
allow them to compute some of those functions more quickly.
5.9 Lemma A language is in NP if and only if it has a polytime verifier. That is,
L ∈ NP exactly when there is a deterministic Turing machine V that halts on all
inputs and so that σ ∈ L if and only if there is a witness ω where V accepts ⟨σ , ω⟩ ,
in time polynomial in |σ | .
So to show that a language L is in NP, we will produce a verifier V. It takes as
input a pair containing a candidate for language membership σ and a hint ω . If
σ ∈ L then the verifier will confirm it, using the hint, in polytime. If σ < L then
no hint will cause the verifier to falsely report that σ is in the language.
The lemma’s proof is below, on page 311. Before that, we will clarify some
aspects of the definition and lemma with a few examples and comments.
5.10 Example The Satisfiability problem is to decide membership in this language.
SAT = {σ σ represents a propositional logic formula that is satisfiable }
†
Very important: no one knows whether P is a strict subset, that is, whether P , NP or P = NP. This is
the biggest open problem in the Theory of Computing. We will say more in a later section.
310 Chapter V. Computational Complexity
Start
Read σ , ω
Y N
It gives T ?
Accept Reject
Given a satisfiable candidate σ , such as the one from (∗), for this verifier there is a
suitable witness, namely ω = TTF, so that V can check in polytime that on that
line, the candidate evaluates to T . But, if given a candidate that is not satisfiable,
such as σ = P ∧ ¬P , then there does not exist a hint that will cause V to accept,
because no line that is pointed-to will give T .
Before the next example, a few points. First, the most striking thing about
Definition 5.8 is that it says that “there exists” a witness ω . It does not say where
that witness comes from. A person with a computational habit of thinking may
well ask, “but how will we find the ω ’s?” The question is not how to find them, the
question is whether there is a Turing machine that leverages the hint ω ’s to verifies
that the σ ’s are in L. In short, we don’t compute the ω ’s, we just use them.
The second point is that if σ ∈ L then the definition requires that there exists a
witness ω . But if σ < L then then the definition does not require a witness to that.
Instead, the opposite is true: the definition requires that from among all possible
witnesses ω ∈ B∗, there is none such that the verifier accepts ⟨σ , ω⟩ .†
One consequence of this asymmetry in the verifier definition is that if a problem L
is in NP then it is not clear whether its complement, Lc = B∗− L, is in NP. Consider
again the Satsifiability problem. If a propositional logic expression σ is satisfiable
then a suitable witness for that is the single line of the truth table. But there is no
obvious suitable witness to non-satisfiability; the natural thing to do is to check all
lines, which appears to take more than polytime. In short, where the complexity
class co-NP is the collection of complements of problems from NP, we don’t know
whether NP = co-NP.
Third, a comment on the runtime of the verifier. Consider the problem of chess.
Imagine that a demon hands you some papers and tells you that they contain a
chess strategy where you cannot lose. Checking that by having a computer step
through the responses to each move and responses to those responses, etc., at least
appears to take exponential time. So it appears that this perfect strategy is, in a
sense, useless. The definition requires that the verifier runs in polytime in order to
make the verification tractable.
Our final point is related to that. Observe that because the verifier runs in time
polynomial in |σ | , the witness ω must have length that is polynomial in |σ | . If the
†
Because of this, perhaps we should refer to ω with terms like ‘potential witness’ or ‘candidate
certificate’, but no one does.
Section 5. NP 311
witness were too long then the machine would not even be able to read it in the
allotted time.† So if we think of it as that ω gives proof that σ ∈ L and that the
verifier checks that proof, then the definition requires that the proof is tractable.
5.11 Example The Hamiltonian Path problem is like the Hamiltonian Circuit problem
except that, instead of requiring that the starting vertex equal the ending one, it
inputs two vertices.
{ ⟨G , v, v̂⟩ some path in G between v and v̂ visits every vertex exactly once }
We will show that this problem is in the class NP. Lemma 5.9 requires that we
produce a deterministic Turing machine verifier V . Here is our verifier, which takes
inputs ⟨σ , ω⟩ , where the candidate is σ = ⟨G , v, v̂⟩ . We take each witness to be a
path, ω = ⟨v, v 1 , ... v̂⟩ .
Start
Read σ , ω
Y N
All vertices visited once?
Accept Reject
If there is a Hamiltonian path then there is a witness ω so that V will accept its
input. Clearly that acceptance takes polytime. If the graph has no Hamiltonian
path then for no ω will V be able to verify the hint and accept its input.
5.12 Example The Composite problem asks whether a number has a nontrivial factor.
L = {n ∈ N+ n has a divisor a with 1 < a < n }
a branch of a computation tree with a string of numbers less than k . With that
witness, a deterministic verifier can retrace P ’s accepting branch. The branch’s
length must be less than p(|σ |), so the verifier V can do the retracing in polynomial
time.
Conversely, suppose that the language L is accepted by a verifier V that runs in
time bounded by a polynomial q , and that takes input ⟨σ , ω⟩ . We will construct a
nondeterministic Turing machine P that accepts an input bitstring τ if and only if
τ ∈ L.
The key is that this machine is allowed to be nondeterministic. Given a
candidate bitstring τ , (1) P nondeterminstically produces a witness bitstring κ of
length less than q(|τ |) (informally speaking, it guesses κ , or gets it from a demon)
(2) it then runs ⟨τ , κ⟩ through the verifier V , and (3) if the verifier accepts its input
then P accepts τ , while if the verifier does not accept then P also does not accept.
By definition, the nondeterministic machine P accepts the string if there is a
branch that accepts the string, and P rejects the string if every branch rejects it.
Suppose first that τ ∈ L. Because V is a verifier, in this case there exists a witness κ
that will result in V accepting ⟨τ , κ⟩ , so there is a way for the prior paragraph
to result in acceptance of τ , and so P accepts τ . Now suppose that τ < L. By
the definition of a verifier, no hint κ will result in V accepting ⟨τ , κ⟩ , and thus P
rejects τ .
A common reaction to the second half of that proof is something like, “Wait,
the machine pulls the witness κ out of thin air? How is that possibly legal?”
This reaction — about nondeterministic Turing machines and everyday experience
versus abstraction — is common and very reasonable, so we will address it.
As to everyday reality, we today know of no way to build physical devices that
bear the same relationship to nondeterministic Turing machines that ordinary
computers bear to deterministic ones. (Of course, you can write a program to
simulate nondeterministic behavior, at a cost in efficiency, but no device does it
natively.) When Turing formulated his definition there were no practical physical
computers matching it, but they were clearly coming and appeared soon after;
will we someday have nondeterministic computers? Putting aside proposals that
involve things like time travel through wormholes as too exotic, we will address
the devices that seem most likely to be coming, quantum computers.
Well-established physical theory says that subatomic particles can be in a
superposition of many states at once. Naively, it might seem that because of
this multi-way branching, if we could manipulate these then we would have
nondeterministic computation. But, that we know of, this is false. That we know
of, to get information out of a quantum computer we must use interference, and
we cannot read individual particles.†
However, the fact that we do not have practical nondeterministic devices, and
do not believe that we will in the near future, does not mean that their study is a
†
Some popularizations wrongly suggest that quantum computers are nondeterministic. That is, they
miss the point about interference.
Section 5. NP 313
V.5 Exercises
✓ 5.13 Your study partner asks, “In Lemma 5.9, since the witness ω is not required
to be effectively computable, why can’t I just take it to be the bit 1 if σ ∈ L, and 0
if not? Then writing the verifier is easy: just ignore σ and follow the bit.” They
are confused. Straighten them out.
✓ 5.14 Decide if each formula is satisfiable.
(a) (P ∧ Q) ∨ (¬Q ∧ R)
(b) (P → Q) ∧ ¬((P ∧ Q) ∨ ¬P)
5.15 True or false? If a language is in P then it is in NP.
5.16 Uh-oh. You find yourself with nondeterministic Turing machine where on
input σ , one branch of the computation tree accepts and one rejects. Some
branches don’t halt at all. What is the upshot?
✓ 5.17 You get an exercise, Write a nondeterministic algorithm that inputs a maze
and outputs 1 if there is a path from the start to the end.
(a) You hand in an algorithm that does backtracking to find any possible solution.
Your professor sends it back, and says to try again. What was wrong?
(b) You hand in an algorithm that, each time it comes to a fork in the maze,
chooses at random which way to go. Again you get it back with a note to
work out another try. What is wrong with this one?
(c) Give a right answer.
5.18 Sketch a nondeterministic algorithm to search an unordered array of numbers,
to see if it contains the number k . Describe it both in terms of unbounded
parallelism and in terms of guessing.
†
Not that there anything wrong with that.
314 Chapter V. Computational Complexity
5.19 A simple substitution cipher encrypts text by substituting one letter for
another. Start by fixing a permutation of the letters, for example ⟨F, P, ...⟩ Then
the cipher is that any A is replaced by a F, any B is replaced by a P, etc. Sketch three
algorithms for decoding a substitution cipher (assume that you have a program that
can recognize a correctly decoded string): (a) one that is deterministic, (b) one
that is nondeterministic and is expressed in terms of unbounded parallelism, and
(c) one expressed in terms of guessing.
✓ 5.20 Outline a nondeterministic algorithm that inputs a finite planar graph and
outputs Yes if and only if the graph has a four-coloring (that is, the algorithm
recognizes a correct four-coloring). Describe it both in terms of unbounded
parallelism and in terms of a demon providing a witness.
5.21 The Integer Linear Programming problem is to maximize a linear objective
function f (x 0 , ... x n ) = d 0x 0 + · · · + dn x n subject to constraints ai, 0x 0 + · · · +
ai,n x n ≤ bi , where all of x j , d j , b j , ai, j are integers. Recast it as a family of
language decision problems. Sketch a nondeterministic algorithm, giving both an
unbounded parallelism formulation and a guessing formulation.
✓ 5.22 The Semiprime problem inputs a number n ∈ N and decides if its prime
e e
factorization has exactly two primes, n = p00 p11 where ei > 0. State it as
a language decision problem. Sketch a nondeterministic algorithm that runs
in polytime. Give both an unbounded parallelism formulation and a guessing
formulation.
5.23 For each, give a language so that it is a language decision problem. Then
give a polytime nondeterministic algorithm. State it in terms of guessing.
(a) Three Dimensional Matching: where X , Y , Z are sets of integers having n
elements, given as input a set of triples M ⊆ X × Y × Z , decide if there
is an n -element subset M̂ ⊆ M so that no two triples agree on their first
coordinates, or second, or third.
(b) Partition: given a finite multiset A of natural numbers, decide if A splits into
multisets Â, A−Â so the elements total to the same number, a ∈Â a = a<Â a .
Í Í
5.24 Sketch a nondeterministic algorithm that inputs a planar graph and a bound
B ∈ N and decides whether the graph is B -colorable. Describe it in terms of
unbounded parallelism and also in terms of the machine guessing.
✓ 5.25 For each problem, cast it as a language decision problem and then prove that
it is in NP by filling in the blanks in this argument.
Lemma 5.9 requires that we produce a deterministic Turing machine verifier, V . It must
input pairs of the form ⟨σ , ω⟩ , where σ is (1) . It must have the property that if
σ ∈ L then there is an ω such that V accepts the input, while if σ < L then there is no
such witness ω . And it must run in time polynomial in |σ | .
The verifier interprets the bitstring witness ω as (2) , and checks that (3) .
Clearly that check can be done in polytime.
If σ ∈ L then by definition there is (4) , and so a witness ω exists that will cause V
Section 5. NP 315
to accept the input pair ⟨σ , ω⟩ . If σ < L then there is no such (5) , and therefore no
witness ω will cause V to accept the input pair.
(a) The Double-SAT problem inputs a propositional logic statement and decide
whether it has at least two different substitutions of Boolean values that
make it true.
(b) The Subset Sum problem inputs a set of numbers S ⊂ N and a target
sum T ∈ N, and decides whether least one subset of S adds to T .
✓ 5.26 In the British TV game show Countdown that is quite popular, players
are given six numbers from S = { 1, 2, ... 10, 25, 50, 75, 100 } (numbers may be
repeated), along with a target integer T ∈ [100 .. 999]. They must construct an
arithmetic expression that evaluates to the target, using the given numbers at
most once. The expression can involve addition, subtraction, multiplication, and
division (the division must have no remainder). Show that the decision problem
for this language
✓ 5.34 Show that this problem is in NP. A company has two delivery trucks.
They work with a weighted graph that called the road map. (Some vertex is
distinguished as the start/finish.) Each morning the company gets a set of vertices,
V . They must decide if there are two cycles such that every vertex in V is on at
least one of the two cycles, and both cycles have length at most B ∈ N.
✓ 5.35 Two graphs G0 , G1 are isomorphic if there is a one-to-one and onto function
f : N0 → N1 such that {v, v̂ } is an edge of G0 if and only if { f (v), f (v̂) } is
an edge of G1 . Consider the problem of computing whether two graphs are
isomorphic.
(a) Define the appropriate language.
(b) Show that the problem of determining membership in that language is a
member of the class NP.
5.36 The proof of Lemma 5.4 leaves two things undone.
(a) (König’s Lemma) Prove that if a connected tree has infinitely many vertices,
but each vertex has finite degree, then the tree has an infinite path. Hint: fix
a vertex v 0 and for each of its neighbors, look at how many vertices can be
reached without going through v 0 . One of the neighbors must have infinitely
many such vertices; call it v 1 . Iterate.
(b) Prove that if a nondeterministic Turing machine recognizes a language then
there is a deterministic machine that also recognizes it.
5.37 Following the definition of Turing machine, on page 8, we gave a formal
description of how these machines act. We did the same for Finite State machines
on page 184, and for nondeterministic Finite State machines on page 192. Give a
formal description of the action of a nondeterministic Turing machine.
5.38 (a) Show that the Halting problem in not in NP.
(b) What is wrong with this reasoning? The Halting problem is in NP because
given ⟨P , x⟩ , we can take as the witness ω a number of steps for P to halt on
input x . If it halts in that number of steps then the verifier accepts, and if
not then the verifier rejects.
Section
V.6 Polytime reduction
When we studied incomputability we found a sense in which we could think of
some problems as harder than others. Consider, the halts_on_three_checker
routine that, given x , decides whether Turing machine P halts on input 3. We
showed that with such a program we could solve the Halting problem. We denoted
this with K ≤T halts_on_three_checker.
Formally, we write B ≤T A when there is a computable function f such that
x ∈ B if and only if f (x) ∈ A. We say that B is Turing-reducible to A, because to
solve B , it suffices to solve A.
Section 6. Polytime reduction 317
6.2 Example Recall the Shortest Path problem that inputs a weighted graph, two
vertices, and a bound, and decides if there is path between the vertices of length
less than the bound.
L0 = { ⟨G , v 0 , v 1 , B⟩ there is path between the vertices of length less than B }
Recall also the Vertex-to-Vertex Path problem that inputs a graph and two vertices,
and decides if there is a path between the two.
L1 = { ⟨H, w 0 , w 1 ⟩ there is path between the vertices }
Proof The first two sentences, and downward closure of NP, are in Exercise 6.23.
318 Chapter V. Computational Complexity
Slow
..
.
Fast
6.4 Figure: The bean encloses all problems (we only show a few problems for graphical
clarity). Problems are shown connected if there is a polynomial time reduction from
one to the other. Highlighted are connections within the complexity class P.
6.5 Example We will show that Subset Sum ≤p Knapsack. Recall that the Subset Sum
problem starts with a multiset S = {s 0 , ... sk −1 } ⊂ N (a set in which repeated
numbers are allowed; basically a list of numbers) and a target T ∈ N+ . It asks if
there is a subset whose elements add to the target.
L0 = { ⟨S,T ⟩ some subset of S adds to T }
The Knapsack problem starts with a multiset of objects K = {k 0 , ... kn−1 }, along
with a bound W ∈ N and a target V ∈ N. There are also two functions, {w, v }K N+,
giving each ki a weight w(ki ), and a value v(ki ). The problem is to decide if there
is subset A ⊆ K such that the sum of the element weights is less than or equal
to W while the sum of the element values is greater than or equal to V .
L1 = { ⟨K, w, v,W , V ⟩ some A ⊆ K has a ∈A w(a) ≤ W and a ∈A v(a) ≥ V }
Í Í
A reduction function f must input pairs ⟨S,T ⟩ , must output 5-tuples ⟨K, w, v,W , V ⟩ ,
must run in polytime, and must be such that ⟨S,T ⟩ ∈ L0 holds if and only if
⟨K, w, v,W , V ⟩ ∈ L1 holds.
As an illustration, suppose that we want to know if there is a subset of
S = { 18, 23, 31, 33, 72, 86, 94 } that adds to T = 126, and we have access to an
oracle that can quickly solve any Knapsack problem. We could let K equal S , let
w and v be such that w(18) = v(18) = 18, w(23) = v(23) = 23, etc., and set the
weight and value targets W and V to be T = 126.
Section 6. Polytime reduction 319
In general, given ⟨S,T ⟩ , take f (⟨S,T ⟩) = ⟨S, w, v,T ,T ⟩ , where the functions are
given by w(si ) = v(si ) = si . Then clearly ⟨S,T ⟩ ∈ L0 if and only if f (⟨S,T ⟩) ∈ L1 ,
and clearly f is polytime.
The prior two examples show one kind of natural reduction, when one problem
is a special case of another, or at least closely related.
In addition, the prior example suggests that where the transformation from one
problem set to another is concerned, the details can hide the ideas. Often authors
will suppress the details and instead outline the transformation. We will do the
same here.
6.6 Example We will sketch an argument that the Graph Colorability problem reduces
to the Satisfiability problem, that Graph Colorability ≤p Satisfiability.
Recall that a graph is k -colorable if we can partition the vertices into k many
classes, called ‘colors’ because that’s how they are pictured, so that there is no edge
between two same-colored vertices.
(ai ∨ bi ∨ c i )
In addition, for each edge {vi , v j }, create three clauses that together ensure that
the edge does not connect two same-color vertices.
The graph has four vertices, so the expression starts with four clauses, saying that
for each vertex vi at least one of the associated variables ai , bi , or c i is T . The
graph has four edges, v 0v 1 , v 0v 3 , v 1v 2 , and v 2v 3 . The expression continues with
three clauses for each edge, together ensuring that the variables associated with
the edge’s vertices do not both have the value T . Thus, E is satisfiable if and only if
G has a 3-coloring.
Completing the proof means checking that the translation function, which
inputs a bitstring representation of G and outputs a bitstring representation of S , is
polynomial. That’s clear, although the argument is messy so we omit it.
Echoing what we said above, the significance of the reduction is that we now
know that if we could solve the Satisfiability problem in polynomial time then we
could solve the Graph Colorability problem in polynomial time.
So in this sense, the Satisfiability problem is at least as hard as Graph Colorability.
This section’s final example gives a problem that is at least as hard as Satisfiability.
6.8 Example Recall that the Clique problem is the decision problem for the lan-
guage L = { ⟨G , B⟩ G has a clique of at least B vertices }, where a clique is a set
of vertices that are all mutually connected. We will sketch the argument that
Satisfiability ≤p Clique.
The reduction f inputs a propositional logic expression E and outputs a pair
f (E) = ⟨G , B⟩ . It must run in polytime, and must be such that E ∈ SAT if and only
if f (E) ∈ L.
Consider this expression.
E = (x 0 ∨ x 1 ) ∧ (¬x 0 ∨ ¬x 1 ) ∧ (x 0 ∨ ¬x 1 )
v 0, 0 v 0, 1
v 1, 0 v 2, 0
v 1, 1 v 2, 1
Observe that E is satisfiable if and only if the graph has a 3-clique. Showing that
the translation function f is polytime is routine.
Those examples give some sense of why the Satisfiability problem can be
convenient, a benchmark problem for reducibility. Often it is natural to describe
the conditions in a problem with logical statements. In the next section we will
give a theorem saying that Satisfiability is at least as hard as every problem in NP.
We close with a comment on ≤p . The definition of L1 ≤p L2 is that σ ∈ L1 if
and only if f (σ ) ∈ L0 for some polytime computable f . So f takes the input σ
and does a computation, and at the end asks the L0 oracle the single question of
the membership of f (σ ). Other reductions are possible, for instance one that can
consult L0 any finite number of times, called Cook reducibility.
V.6 Exercises
6.9 Your friend is confused. “Lemma 6.3 says that every language in P is ≤p
to every other language. But there are uncountably many languages and only
countably many f ’s because they each come from some Turing machine. So I’m
not seeing how there are enough reduction functions for a given language to
be related to all others.” Straighten them out. Hint: the definition of reduction
function is asymmetric. Such a function must do the right thing for every input,
but it need not be onto and so it may leave elements of the codomain untouched,
that is, free to vary.
✓ 6.10 Show that if L0 < P and L0 ≤p L1 then L1 < P also. What about NP?
6.11 Prove that L ≤p Lc if and only if Lc ≤p L.
6.12 Example 6.5 includes as illustration a Subset Sum problem, where S =
{ 18, 23, 31, 33, 72, 86, 94 } and T = 126. Solve it.
✓ 6.13 Suppose that the language A is polynomial time reducible to the language B ,
A ≤p B . Which of these are true?
(a) A tractable way to decide A can be used to tractably decide B .
(b) If A is tractably decidable then B is tractably decidable also.
(c) If A is not tractably decidable then B is not tractably decidable too.
✓ 6.14 Fix an alphabet Σ. The Substring problem inputs two strings and decides if
the second is a substring of the first. The Cyclic Shift problem inputs two strings
and decides if the second is a cyclic shift of the first. (Where α = a 0a 1 ... an−1
and β = b0b1 ... bn−1 are length n strings, β is a cyclic shift of α if there is an
index k ∈ [0 .. n − 1] such that ai = b(k +i) mod n for all i .)
322 Chapter V. Computational Complexity
v0 v1 v2 v3 v4
v5 v6 v7 v8 v9
(c) Make a set S consisting of all of that graph’s edges, and for each v make a
subset Sv of the edges incident on that vertex. Find a set cover.
(d) Show that Vertex Cover ≤p Set Cover.
✓ 6.18 In this network, each edge is labeled with a capacity. (Imagine railroad lines
going from q 0 to q 6 .)
Section 6. Polytime reduction 323
q1 2 q4
3 2 4 1
q0 1 q3 2 q6
2 2 2 2
q2 q5
1
The Max-Flow problem is to find the maximum amount that can flow from left to
right. That is, we will find a flow Fqi ,q j for each edge, subject to the constraints
that the flow through an edge must not exceed its capacity and that the flow
into a vertex must equal the flow out (except for the source q 0 and the sink q 6 ).
The problem is to find the edge flows so that the source and sink see maximal
total flow. The Linear Programming optimization problem starts with a list of
linear equalities and inequalities, such as ai, 0x 0 + · · · + ai,n−1x n−1 ≤ bi for
a 0 , ... an−1 , bi ∈ Q, and it looks for a sequence ⟨s 0 , ... sn−1 ⟩ ∈ Qn that satisfies
all of the constraints, and such that a linear expression c 0x 0 + · · · + c n−1x n−1 is
maximal.
(a) Express each as a language decision problem, remembering the technique of
converting optimization problems using bounds.
(b) By eye, find the maximum flow for the above network.
(c) For each edge vi v j , define a variable x i, j . Describe the constraints on that
variable imposed by the edge’s capacity. Also describe the constraints on the
set of variables imposed by the limitation that for many vertices the flow in
must equal the flow out. Finally, use the variables to give an expression to
optimize in order to get maximum flow.
(d) Show that Max-Flow ≤p Linear Programming.
6.19 The Max-Flow problem inputs a directed graph where each edge is labeled
with a capacity, and the task is to find a the maximum amount that can flow from
the source node to the sink node (for more, see Exercise 6.18). The Drummer
problem starts with two same-sized sets, the rock bands, B , and potential
drummers, D . Each band b ∈ B has a set Sb ⊆ D of drummers that they would
agree to take on. The goal is to make the most number of matches.
(a) Consider four bands B = {b 0 , b 1 , b 2 , b 3 } and drummers D = {d 0 , d 1 , d 2 , d 3 } .
Band b0 likes drummers d 0 and d 2 . Band b1 likes only drummer d 1 , and b2
also likes only d 1 . Band b3 like the sound of both d 2 and d 3 . What is the
largest number of matches?
(b) Express each as a language decision problem.
(c) Draw a graph with the bands on the left and the drummers on the right.
Make an arrow from a band to a drummer if there is a connection. Now add
a source and a sink node to make a flow diagram.
(d) Show that Drummer ≤p Max-Flow.
6.20 In a propositional logic expression, a single variable is an atom, and an
atom or its negation is a literal. We shall say that a clause is a disjunction of
literals, so that P 0 ∨ ¬P 1 ∨ ¬P 2 is a 3-literal clause. Note that a clause evaluates
to T if and only if at least one of the literals evaluates to T . A propositional
324 Chapter V. Computational Complexity
q0 q1 q2
q3 q4 q5
✓ 6.22 We can do reductions between problems of types other than language decision
problems. Here are two optimization problems. The Assignment problem inputs
two same-sized sets, of workers W = {w 0 , ... w n−1 } and tasks T = {t 0 , ... tn−1 }.
For each worker-task pair there is a cost C(w i , t j ). The goal is to assign each of
the tasks, one per worker, at minimal total cost. The Traveling Salesman problem,
of course, inputs a graph whose edge weights give a cost for traversing that edge,
and asks for circuit of minimal total cost.
(a) By eye, solve this Assignment problem instance.
Section 7. NP completeness 325
Cost C(ti , w j ) w0 w1 w2 w3
t0 13 4 7 6
t1 1 11 5 4
t2 6 7 2 8
t3 1 3 5 9
(b) Consider this bipartite graph.
w0 w1 w2 w3
t0 t1 t2 t3
L0 L1 L2 L3
L0 (0,0) (0,1) (0,2) (0,3)
L1 (1,0) (1,1) (1,2) (1,3)
L2 (2,0) (2,1) (2,2) (2,3)
L3 (3,0) (3,1) (3,2) (3,3)
Section
V.7 NP completeness
Because P ⊆ NP, the class NP contains lots of easy problems, ones with
a fast algorithm. For instance, one member of NP is the problem of
determining whether a number is odd. Nonetheless, the interest in the
326 Chapter V. Computational Complexity
class is that it also contains lots of problems that seem to be hard. Can
we prove that these problems are indeed hard?
This question was raised by S Cook in 1971. He noted that with
polynomial time reducibility we have a way to make precise that an
efficient solution for one problem implies in an efficient solution for the
other. And, he showed that among the problems in NP, there are ones
that are maximally hard.†
7.1 Theorem (Cook-Levin theorem) The Satisfiability problem is in NP,
and has the property any problem in NP reduces to it: L ≤p SAT for
any L ∈ NP.
First, SAT ∈ NP because a nondeterministic machine can guess which line
of the truth table to verify. Said another way: given a Boolean formula, use as
a witness ω a sequence giving an assignment of truth values that satisfies the
formula.
We will not step through the proof here, but here is the basic idea.
We are given L ∈ NP and must show that L ≤p SAT . For this, we must
product a function f L that translates membership questions for L into
a Boolean formulas, such that the membership answer is ‘yes’ if and only
if the formula is satisfiable.
The only thing that we know about L is that its member σ ’s are
accepted by a nondeterministic machine P in time given by a polyno-
mial q . So the proof constructs, from ⟨P , σ , q⟩ , a Boolean formula that
Leonid Levin yields T if and only if P accepts σ . The Boolean formula encodes the
b 1948 constraints under which a Turing machine operates, such as that the
only tape symbol that can be changed in one step is the symbol under
the machine’s Read/Write head.
7.2 Definition A problem L is NP complete if it is a member of NP and any
member L̂ of NP is polynomial time reducible to it, L̂ ≤p L.
Slow
..
.
Fast
7.4 Figure: The bean contains all problems. The subset in the bottom right is NP, drawn
with P as a proper subset (although, strictly speaking, we don’t know that is true).
The top right, shaded, has the NP-hard problems. The highlighted intersection is the
set of NP complete problems.
ends is a member of C .
Clique Given a graph and a bound B ∈ N, decide if the graph has a B -clique, a
set of B -many vertices such that any two are connected.
Hamiltonian Circuit Given a graph, decide if it contains a Hamiltonian circuit, a
cyclic path that includes each vertex.
Partition Given a finite multiset S , decide if there is a division of the set into
two parts Sˆ and S − Sˆ so the total of the elements in the two is the same,
s ∈Sˆ s = s<Sˆ s .
Í Í
We will not show here that these are all NP complete; for that, see (Garey and
Johnson 1979).
7.7 Example The Traveling Salesman problem is NP complete. We can prove this by
showing that the Hamiltonian Circuit problem reduces to it: Hamiltonian Circuit ≤p
Traveling Salesman. Recall that we have recast Traveling Salesman as the decision
problem for the language of pairs ⟨G , B⟩ , where B is a parameter bound. Recall
also that this problem is a member of NP.
The reduction function inputs an instance of Hamiltonian Circuit, a graph Ĝ =
⟨N̂ , Ê ⟩ whose edges are unweighted. It returns the instance of Traveling Salesman
that uses the vertex set N̂ as cities, that takes the distances between the cities to
be d(vi , v j ) = 1 if vi v j ∈ Ê and d(vi , v j ) = 2 if vi v j < Ê , and such that the bound
is the number of vertices, B = | N̂ | .
This bound means that there will be a Traveling Salesman solution if and only if
there is a Hamiltonian Circuit solution, namely the salesman uses the edges of the
Hamiltonian circuit. All that remains is to argue that the reduction function runs
in polytime. The number of edges in a graph is no more than twice the number of
vertices, so polytime in the input graph size is the same as polytime in the number
of vertices. The reduction function’s algorithm examines all pairs of vertices, which
takes time that is quadratic in the number of vertices.
A common strategy to show that a given problem is NP complete using the List
of Basic NP Complete Problems is to show that a special case of it is on the list.
7.8 Example The Knapsack problem starts with a multiset of objects S = {s 0 , ... sk −1 },
where each element has a weight w(si ) ∈ N+ and a value v(si ) ∈ N+, and where
there are two overall criteria: a weight bound B ∈ N+ and a value target T ∈ N+.
The problem is to find a knapsack C ⊆ S whose elements have total weight less
than or equal to the bound, and total value greater than or equal to the target.
Observe first that this is an NP problem. As the witness we can use the k -bit
string ω such that ω[i] = 1 if si is in the knapsack C , and ω[i] = 0 if it is not. A
deterministic machine can verify this witness in polynomial time since it only has
to total the weights and values of the elements of C .
To finish, we must show that Knapsack is NP-hard. We will show that a special
case is NP-hard. Consider the case where w(si ) = v(si ) for all si ∈ S , and where the
two criteria each equal half of the total of all the weights, B = T = 0.5 · 0 ≤i <k w(i).
Í
Section 7. NP completeness 329
Student ℓ ℓˆ
sj {c i 0 , c i 1 , t j } {c i 2 , f j }
uj {t j , z j } {F}
vj {T} { fj , zj }
To confirm that this reduction function suffices, we must show that if for the
input instance there is an assignment of truth values to the Boolean variables that
satisfies the expression then for the output course scheduling instance there is a
solution, and we must also show the converse.
So first suppose that there a way to give each Boolean variable a value of T or
F such that the entire propositional logic formula evaluates to T . Then we get a
non-conflicting course time slot allotment by: where x i = T , put the course c i in
the same time slot as course ‘T’, and where x j = F , put the course c j in the slot slot
with ‘F’. This clearly works for the all-positive atom or all-negative atoms clauses.
The interesting clauses are the mixed positive and negative atom ones, such as our
example x i 0 ∨ x i 1 ∨ ¬x i 2 . We can check that for each possible assignment making
this clause evaluate to T , there is a non-conflicting way to arrange the courses. For
instance, one such assignment is x i 0 = T , x i 1 = F , and x i 2 = T . Here is a way to
assign courses to time slots that will avoid conflicts. We have put course c i 0 in the
time slot with course ‘T’, and student s j can take it then. This student can also take
course c i 2 in the time slot when ‘F’ is offered. As to student u j , then take course t j
when ‘T’ is offered and also take course ‘F’. Likewise, student v j takes ‘T’ and also
takes course f j when ‘F’ is offered.
Conversely, suppose that for every assignment of truth values to the Boolean
variables, the input expression is not satisfied. We will show that as a result, for
any allocation of classes into time slots there is a conflict.
So fix an allocation of classes into time slots, say with the classes c i 0 , c i 1 , . . .
being offered in the same slot as class ‘T’ and the rest with class ‘F’. Associate with
that allocation of classes the assignment of Boolean variables that sets x i 0 = T ,
x i 1 = T , etc. and the rest to F . By our assumption this expression is not satisfiable
so this assignment causes the formula to yield a value of F , Thus the expression
has at least one clause, clause j , that does not evaluate to T .
Section 7. NP completeness 331
The first possibility is that clause j has either all positive atoms or all negative
atoms. An example is the clause x i 0 ∨ x i 1 ∨ x i 2 . To make this clause evaluate to F ,
the assignment must have x i 0 = F , x i 1 = F , and x i 2 = F , and so the allocation
of classes must be that all three of c i 0 , c i 1 , and c i 2 go with class ‘F’. That gives a
conflict in problem instance’s course assignments, because we created a student s j
with the lists ℓs j = {c i 0 , c i 1 , c i 2 } and ℓˆs j = { F }.
The other case is that clause j has a mixed form, such as x i 0 ∨ x i 1 ∨ ¬x i 2 , so
that the Boolean variable assignment is x i 0 = F , x i 1 = F , and x i 2 = T . The output
course assignment problem instance puts c i 0 and c i 1 into a course at the same
time as class ‘F’, and c i 2 into a course at the same time as ‘T’. We claim that there
is no non-conflicting way to assign these students to courses. Refer to the table
above. This allocation of classes could only be non-conflicting if student s j selected
classes t j and f j . But then because of the capacity of one in these classes, student u j
must select z j and ‘F’, leaving student v j with a conflict.
Whew! To close, observe that the reduction function that creates the output L
instance from the input propositional logic formula runs in polytime.
Before we leave this discussion, we ask the natural question: what problems
are not complete? The short answer is that we don’t know. First, it is trivial from
the definition that if a problem is NP hard but not in NP then it is not NP complete.
Likewise, the empty language and the language of all strings are trivially not
complete. But as to proving that interesting problems from NP are not complete,
that is another matter. It is tied up with the question of whether P is unequal to NP,
which we address in the next subsection.
However, so that we have have not just brushed past the question, with the
assumption that P , NP, here are a few problems that are in NP and that
many people conjecture are tough but not NP complete. Most experts believe
that the Factoring problem is hard for classical computers† but that it is not
NP complete. Experts also suspect that the Graph Isomorphism problem and
the Vertex to Vertex Path problem are not NP complete. As always though, the
standard caution applies that without proof these judgements could be mistaken.
†
In 1994, P Shor discovered an algorithm for a quantum computer that solves the Factoring problem in
polynomial time. This will have significant implications if quantum computation proves to be possible
to engineer.
332 Chapter V. Computational Complexity
There are a number of simple ways to settle the question. By Lemma 7.5, if someone
shows that any NP complete problem is a member of P then P = NP. In addition, if
someone shows that there is an NP problem that is not a member of P then P , NP.
However, despite nearly a half century of effort by many extremely smart people,
no one has done either one.
As formulated in Karp’s original paper, the question of whether P equals NP
might seem of only technical interest.
A large class of computational problems involve the determination of properties
of graphs, digraphs, integers, arrays of integers, finite families of finite sets, boolean
formulas and elements of other countable domains. Through simple encodings . . .
these problems can be converted into language recognition problems, and we can
inquire into their computational complexity. It is reasonable to consider such a problem
satisfactorily solved when an algorithm for its solution is found which terminates
within a number of steps bounded by a polynomial in the length of the input. We
show that a large number of classic unsolved problems of covering, matching, packing,
routing, assignment and sequencing are equivalent, in the sense that either each of
them possesses a polynomial-bounded algorithm or none of them does.
But Karp demonstrated that many of the problems that people had been struggling
with in practical applications fall into this category. Researchers who have been
trying to find an efficient solution to Vertex Cover, and those who have been
working on Clique found that they are in some sense working on the same problem.
By now the list of NP complete problems includes determining the best layout of
transistors on a chip, developing accurate financial-forecasting models, analyzing
protein-folding behavior in a cell, or finding the most energy-efficient airplane
wing. So the question of whether P = NP is extremely practical, and extremely
important.†
In practice, proving that a problem is a member of NP is often an ending point;
a researcher may well reason that continuing to try to find an algorithm will not be
fruitful, since many of the best minds of Mathematics, Computer Science, and the
natural sciences have failed at it. They may instead turn their attention elsewhere,
perhaps to approximations that are good enough; see Extra B.
†
One indication of its importance is its inclusion on Clay Mathematics Institute’s list of problems for
which there is a one million dollar prize; see http://www.claymath.org/millennium-problems.
Section 7. NP completeness 333
Discussion The P versus NP question is certainly the sexiest one in the Theoretical
of Computing today. It has attracted a great deal of speculation, and gossip. In 2018
a poll of experts found that out of 152 respondents, 88% thought that P , NP while
only 12% thought that P = NP. This subsection discusses some of the intuition
around the question.
First, the intuition around the P , NP conjecture. Imagine a
jigsaw puzzle. We perceive that if a demon gave us an assembled
puzzle ω , then checking that it is correct is very much easier than it
would have been to work out the solution from scratch. Checking
for correctness is mechanical, tedious. But the finding, we think,
is creative — we expect that solving a jigsaw puzzle by brute-force
trying every possible piece against every other is far too much
computation to be practical.
Similarly, schemes for encryption are engineered so that, given an encrypted
message, decrypting it with the key is fast and easy but trying to decrypt it by
trying all possible keys is, we think, just not tractable.
A problem is in P if finding a solution is fast, while a problem is in NP if verifying
the correctness of a given witness ω is fast. From this point of view, the result that
P ⊆ NP becomes the observation that if a problem is fast to solve then it must be
Section 7. NP completeness 335
fast to verify. But most experts perceive that inclusion in the other direction is
extremely unlikely.
Restated informally, the P versus NP question asks if finding a solution as fast
as recognizing one. If P = NP then the two jobs are, in a sense, equally difficult.
Some commentators have extended this thinking outside of
Theoretical Computer Science. Cook is one, “Similar remarks apply
to diverse creative human endeavors, such as designing airplane
wings, creating physical theories, or even composing music. The
question in each case is to what extent an efficient algorithm for
A Selman’s plate, recognizing a good result can be found.” Perhaps it is hyperbole to
courtesy S Selman say that if P = NP then writing great symphonies would be a job
for computers, a job for mechanisms, but it is correct to say that if
P = NP and if we can write fast algorithms to recognize excellent music — and our
everyday experience with Artificial Intelligence makes this seem more and more
likely — then we could have fast mechanical writers of excellent music.
We finish with a taste of the contrarian view, the conjecture that P = NP.
Many observers have noted that there are cases where everyone “knew” that
some algorithm was the fastest but in the end it proved not to be so. The section on
Big-O begins with one, the grade school algorithm for multiplication. Another is
the problem of solving systems of linear equations. The Gauss’s Method algorithm,
which runs in time O(n 3 ), is perfectly natural and had been known for centuries
without anyone making improvements. However, while trying to prove that Gauss’s
Method is optimal, Strassen found a O(n lg 7 ) method (lg 7 ≈ 2.81).†
A more dramatic speedup happens with the Matching problem: given a graph
with the vertices representing people, connect two vertices if the people are
compatible. We want a set of edges that is maximal, and such that no two edges
share a vertex. The naive algorithm tries all possible match sets, which takes 2m
checks where m is the number of edges. Even with only a hundred people, there
are more things to try than atoms in the universe. But since the 1960’s we have an
algorithm that runs in polytime.
Every day on the Theory of Computing blog feed there are examples of this,
of researchers producing algorithms faster than the ones previously known. A
person can certainly have the sense that we are only just starting to explore what
is possible with algorithms. R J Lipton captured this sense.
Since we are constantly discovering new ways to program our “machines”, why not a
discovery that shows how to factor? or how to solve SAT ? Why are we all so sure that
there are no great new programming methods still to be discovered? . . . I am puzzled
that so many are convinced that these problems could not fall to new programming
†
Here is an analogy: consider the problem of evaluating 2p 3 + 3p 2 + 4p + 5. Someone might claim
that writing it as 2 · p · p · p + 3 · p · p + 4 · p + 5 makes obvious that it requires six multiplications. But
rewriting it as p · (p · (2 · p + 3) + 4 + 5 shows that it can be done with just three. That is, naturalness
and obviousness do not guarantee that something is correct. Without a proof, we must worry that
someone will produce a clever way to do the job with less.
336 Chapter V. Computational Complexity
tricks, yet that is what is done each and every day in their own research.
Knuth has a related but somewhat different take.
Some of my reasoning is admittedly naive: It’s hard to believe that P , NP and that
so many brilliant people have failed to discover why. On the other hand if you imagine
a number M that’s finite but incredibly large . . . then there’s a humongous number of
possible algorithms that do n M bitwise or addition or shift operations on n given bits,
and it’s really hard to believe that all of those algorithms fail.
My main point, however, is that I don’t believe that the equality P = NP will turn
out to be helpful even if it is proved, because such a proof will almost surely be
nonconstructive. Although I think M probably exists, I also think human beings will
never know such a value. I even suspect that nobody will even know an upper bound
on M .
Mathematics is full of examples where something is proved to exist, yet the proof
tells us nothing about how to find it. Knowledge of the mere existence of an algorithm
is completely different from the knowledge of an actual algorithm.
Of course, all this is speculation. Speculating is fun, and in order to make
progress in their work, investigators need to have intuition, need to have some
educated guesses. But in the end these researchers, and all of us, look to settle the
question with proof.
V.7 Exercises
✓ 7.11 You hear someone say, “The Satisfiability problem is NP because it is not
computable in polynomial time, so far as we know.” It’s a short sentence but find
three things wrong with it.
✓ 7.12 You have this person in your class who is no genius, which is fine, except that
they don’t know that. They say, “I will show that the Hamiltonian Circuit problem
is not in P, which will demonstrate that P , NP. The algorithm to solve a given
instance G of the Hamiltonian Circuit problem is: generate all permutations of G ’s
vertices, test each to find if it is a circuit, and if any circuits appear then accept
the input, else reject the input. For sure that algorithm is not polynomial, since
the first step is exponential.” Where is the mistake?
✓ 7.13 Your friend says, “The problem of recognizing when one string is a substring
of another has a polytime algorithm, so it is not in NP.” They have misspoken;
help them out.
7.14 Someone in your study group wants to ask your professor, “Is the brute force
algorithm for solving the Satisfiability problem NP complete?” Explain to them
that it isn’t a sensible question, that they are making a type error.
7.15 True or false?
(a) The collection NP is a subset of the NP complete sets, which is a subset of
NP-hard.
(b) The collection NP is a specialization of P to nondeterministic machines, so it
is a subset of P.
Section 7. NP completeness 337
✓ 7.16 Assume that P , NP. Which of these statements can we infer from the fact
that the Prime Factorization problem is in NP, but is not known to be NP-complete?
(a) There exists an algorithm that solves arbitrary instances of the Prime Factorization
problem.
(b) There exists an algorithm that efficiently solves arbitrary instances of this
problem.
(c) If we found an efficient algorithm for the Prime Factorization problem then
we could immediately use it to solve Traveling Salesman.
✓ 7.17 Suppose that L1 ≤p L0 . For each, decide if you can conclude it. (a) If
L0 is NP complete then so is L1 . (b) If L1 is NP complete then so is L0 .
(c) If L0 is NP complete and L1 is in NP then L1 is NP complete. (d) If L1
is NP complete and L0 is in NP then L0 is NP complete. (e) It cannot be the
case that both L0 and L1 are NP complete (f) If L1 is in P then so is L0 .
(g) If L0 is in P then so is L1 .
7.18 Show that each of these is in NP but is not NP complete, assuming that
P , NP.
(a) The language of even
numbers.
(b) The language { G G has a vertex cover of size at most four } .
✓ 7.19 Traveling Salesman is NP complete. From P , NP which of the following
statements could we infer?
(a) No algorithm solves all instances of Traveling Salesman.
(b) No algorithm quickly solves all instances of Traveling Salesman.
(c) Traveling Salesman is in P.
(d) All algorithms for Traveling Salesman run in polynomial time.
✓ 7.20 Prove that the 4-Satisfiability problem is NP hard.
✓ 7.21 The Hamiltonian Path problem inputs a graph and decides if there are two
vertices in that graph such that there is a path between those two that contains
all the vertices.
(a) Show that Hamiltonian Path is in NP.
(b) This graph has a Hamiltonian path. Find it.
v0 v7 v6 v8
v2 v5 v4
v1 v3
q0 q1 q2
q3 q4 q5
q6 q7 q8
q3 q4 q5
✓ 7.25 The difficulty in settling P = NP is to get lower bounds. That is, the trouble
lies in showing that the given problem cannot be solved by any algorithm without
such-and-such many steps. A common mistake is to think that any algorithm
must visit all of its input and then to produce a problem with lots of input. Show
that the successor function can be done on a Turing machine in constant time,
in only a few steps, so that the running time if the input is large is the same as
the time if the input is small. That is, show that this problem can be done with
an algorithm that does not visit all the input: on a Turing machine given input n
in unary, with the head under the leftmost 1, end with n + 1-many 1’s, with the
head under the leftmost 1.
7.26 If P = NP then what happens to NP complete sets?
7.27 Are there any problems in NP and not in P that are known to not be NP
complete?
7.28 Find three languages so that L2 ≤p L1 ≤p L0 , and L2 , L0 are NP complete,
while L1 ∈ P.
7.29 Prove that if P = NP then every L ∈ P is NP complete, except for the problems
of determining membership in the empty language and the full language, L =
and L = Σ∗ .
7.30 Prove Lemma 7.5.
7.31 The class P has some nice closure properties, and so does NP.
(a) Prove that NP is closed under union, so that if L, L̂ ∈ NP then L ∪ L̂ ∈ NP.
(b) Prove that NP is closed under concatenation.
(c) Argue that no one can prove that NP is not closed under set complement.
7.32 Is the set of NP complete sets countable or uncountable?
7.33 We will sketch a proof that the Halting
problem is NP hard but not NP.
Consider the language HP = { ⟨Pe , x⟩ ϕ e (x)↓}. (a) Show that HP < NP.
(b) Sketch an argument that for any problem L ∈ NP, there is a polynomial time
computable verifier, f : B∗ → B∗ , such that σ ∈ L if and only if f (σ ) ∈ HP .
Section
V.8 Other classes
There are many other defined complexity classes. The first one below is very
natural in the light of what we have seen.
We have used the Satisfiability problem as a touchstone result among problems
in NP. We have discussed computing it using a nondeterministic Turing machine
that is unboundedly parallel, or alternatively using a witness and verifier. But,
naively, in the more familiar computational setting of a deterministic machine, it
appears that we must enumerate the truth table. That is, it appears to takes time
that is exponential.
340 Chapter V. Computational Complexity
EXP In this chapter’s first section we included O(2n ) and O(3n ), and by extension
other exponentials, in the list of commonly encountered orders of growth.
Whereas a first take on polytime is “can conceivably be used,” a first approx-
imation of EXP is that for some of its problems the best algorithms are just too
slow to imagine using. However, the big take-away from EXP is that it contains
nearly every problem that we concern ourselves with in practice. We can construct
theories about still harder problems, but EXP is of interest because it is big enough
that it contains most problems that we seriously hope to ever attack.
8.1 Definition A language decision problem is an element of the complexity class
EXP if there is an algorithm for solving it that runs in time O(b p(n) ) for some
constant base b and polynomial p .
Satisfiability can be solved in exponential time by checking each row of the
truth table, and any NP problem can be solved from Satisfiability with only an
addition of polytime. So EXP has this relationship to the classes we have already
studied.
8.2 Lemma P ⊆ NP ⊆ EXP
Proof Fix L ∈ NP. We can verify L on a deterministic Turing machine P in
polynomial time using a witness whose length is bounded by the same polynomial.
Let this problem’s bound be nc .
We will decide L in exponential time by brute-forcing it: we will use P to run
every possible verification. Trying any single witness requires polynomial time,
nc . Witnesses are in binary so for length ℓ there are 0 ≤i ≤ℓ 2i = 2ℓ+1 − 1 many
Í
c
possible ones; In total then, brute force requires O(nc 2n ) operations. Finish by
c n c n c
observing that n 2 is in O(2 ).
We don’t know whether there are any NP problems that absolutely require
exponential time. Conceivably NP is contained in a smaller deterministic time
complexity class — for instance, maybe Satisfiability can be solved in less than
exponential time. But we just don’t know.
Slow
..
.
Fast
8.3 Figure: The bean encloses all problems. Shaded are the three classes P, NP, and
EXP, with EXP’s outline highlighted. They are drawn with strict containment, which
most experts guess is the true arrangement, but no one knows for sure.
Proof The only equality that is not immediate is the last one. Recall that a
problem is in EXP if an algorithm for it that runs in time O(b p(n) ) for some constant
base b and polynomial p . The equality above only uses the base 2. To cover the
2 2
discrepancy, we will show that 3n ∈ O(2(n ) ). Consider limx →∞ 2(x ) /3x . Rewrite
the fraction as (2x /3)x , which when x > 2 is larger than (4/3)x , which goes to
infinity. This argument works for any base, not just b = 3.
8.6 Remark While the above description of NP reiterates its naturalness, as we saw
earlier, the characterization that proves to be most useful in practice is that a
problem L is in NP if there is a deterministic Turing machine such that for each
input σ there is a polynomial length witness ω and the verification on that machine
for σ using ω takes polytime.
Space Complexity We can consider how much space is used in solving a problem.
8.7 Definition A deterministic Turing machine runs in space s : B∗ → R+ if for
all but finitely many inputs σ , the computation on that input uses less than or
equal to s(|σ |)-many cells on the tape. A nondeterministic Turing machine runs
in space s if for all but finitely many inputs σ , every computation path on that
input takes less than or equal to t(|σ |)-many cells.
The machine must use less than or equal to s(|σ |)-many cells even on non-
accepting computations.
8.8 Definition Let s : N → N. A language decision problem is an element of
DSPACE(s), or SPACE(s), if that languages is decided by a deterministic Turing
machine that runs in space O(s). A problem is an element of NSPACE(s) if
342 Chapter V. Computational Complexity
The Zoo Researchers have studied a great many complexity classes. There
are so many that they have been gathered into an online Complexity Zoo, at
complexityzoo.uwaterloo.ca/.
One way to understand these classes is that defining a class asks a type of
Theory of Computing question. For instance, we have already seen that asking
whether NP equals P is a way of asking whether unbounded parallelism makes any
essential difference — can a problem change from intractable to tractable if we
switch from a deterministic to a nondeterministic machine? Similarly, we know
that P ⊆ PSPACE. In thinking about whether the two are equal, researchers are
considering the space-time tradeoff: if you can solve a problem without much
memory does that mean you can solve it without using much time?
Here is one extra class, to give some flavor of the possibilities. For more, see
the Zoo.
The class BPP, Bounded-Error Probabilistic Polynomial Time, contains the
problems solvable by an nondeterministic polytime machine such that if the answer
is ‘yes’ then at least two-thirds of the computation paths accept and if the answer is
‘no’ then at most one-third of the computation paths accept. (Here all computation
paths have the same length.) This is often identified as the class of feasible problems
for a computer with access to a genuine random-number source. Investigating
whether BPP equals P is asking whether whether every efficient randomized
algorithm can be made deterministic: are there some problems for which there are
fast randomized algorithms but no fast deterministic ones?
On reading in the Zoo, a person is struck by two things. There are many, many
results listed — we know a lot. But there also are many questions to be answered —
breakthroughs are there waiting for a discoverer.
V.8 Exercises
✓ 8.14 Give a naive algorithm for each problem that is exponential. (a) Subset Sum
problem (b) k Coloring problem
2
8.15 Show that n ! is 2O(n ). Show that Traveling Salesman ∈ EXP.
✓ 8.16 This illustrates how large a problem can be and still be in EXP. Consider a
game that has two possible moves at each step. The game tree is binary.
(a) How many elementary particles are there in the universe?
344 Chapter V. Computational Complexity
(b) At what level of the game tree will there be more possible branches than
there are elementary particles?
(c) Is that longer than a chess game can reasonably run?
8.17 We will show that a polynomial time algorithm that calls a polynomial time
subroutine can run, altogether, in exponential time.
(a) Verify that the grade school algorithm for multiplication gives that squaring
an n -bit integer takes time O(n).
(b) Verify that repeated squaring of an n -bit integer gives a result that has length
2i n , where i is the number of squarings.
(c) Verify that if your polynomial time algorithm calls a squaring subroutine n
times then the complexity is O(4n n 2 ), which is exponential.
Extra
V.A RSA Encryption
One of the great things about the interwebs, besides that you can get free Theory
of Computing books, is that can buy stuff. You send a credit card number and a
couple of days later the stuff appears.
For this to be practical your credit card number must be kept secret. It must be
encrypted.
When you visit a web site using a https address, that site sends you information,
called a key, that your browser uses to encrypt your card number. The web site
then uses a different key to decrypt. This is an important point: the decrypter
must differ from than the encrypter since anyone on the net can see the encrypter
information that the site sent you. But the site keeps the decrypter information
private. These two, encrypter and decrypter, form a matched pair. We will describe
the mathematical technologies that make this work.
quite slow. To illustrate this, you might contrast the time it takes you to multiply
two four-digit numbers by hand with the time it takes you to factor an eight-digit
number chosen at random. Set aside an afternoon for that second job, it’ll take a
while.
The algorithm that we shall describe exploits the difference.
It was invented in 1976 by three graduate students, R Rivest,
A Shamir, and L Adleman. Rivest read a paper proposing key
pairs and decided to implement the idea. Over the course of
a year, he and Shamir came up with a number of ideas and for
each Adleman would then produce a way to break it. Finally
they thought to use Fermat’s Little Theorem. Adleman was Adi Shamir (b 1952), Ron
unable to break it since, he said, it seemed that only solving Rivest (b 1947), Leonard
Factoring would break it and no one knew how to do that. Their Adleman (b 1945)
algorithm, called RSA, was first announced in Martin Gardner’s
Mathematical Games column in the August 1977 issue of Scientific American. It
generated a tremendous amount of interest and excitement.
The basis of RSA is to find three numbers, a modulus n , an encrypter e , and a
decrypter d , related by this equation (here m is the message, as a number).
(me )d ≡ m (mod n)
Bob wants to say ‘Hi’. In ASCII that’s 01001000 01101001. If he converted that
string into a single decimal number it would be bigger than n so he breaks it into
two substrings, getting the decimals 72 and 105. Using her public key he computes
and sends Alice the sequence ⟨10496, 4861⟩ . Alice recovers his message by using
her private key.
500000 1000000
This theorem says that primes are common. For example, the number of primes
less than 21024 is about 21024 /ln(21024 ) ≈ 21024 /709.78 ≈ 21024 /29 . 47 ≈ 21015 .
Said another way, if we choose a number n at random then the probability that it
is prime is about 1/ln(n) and so a random number that is 1024 bits long will be
a prime with probability about 1/(ln(21024 )) ≈ 1/710. On average we need only
select 355 odd numbers of about that size before we find a prime. Hence we can
efficiently generate large primes by just picking random numbers, as long as we
can efficiently test their primality.
On our way to giving an efficient way to test primality, we observe that the
operations of multiplication and addition modulo m are efficient. (We will give
examples only, rather than the full analysis of the operations.)
A.3 Example Multiplying 3 915 421 by 52 567 004 modulo 3 looks hard. The naive
approach is to first take their product and then divide by 3 to find the remainder.
But there is a more efficient way. Rather than multiply first and then reduce
modulo m , reduce first and then multiply. That is, we know that if a ≡ b (mod m)
Extra A. RSA Encryption 347
a · 2a · · · (p − 1)a ≡ 1 · 2 · · · (p − 1) (mod p)
p−1
(p − 1)! · a ≡ (p − 1)! (mod p)
A.7 Example Let the prime be p = 7. Any number a with 0 < a < p is not divisible
by p . Here is the list.
a 1 2 3 4 5 6
a 7−1 1 64 729 4 096 15 625 46 656
(a 6 − 1)/7 0 9 104 585 2 232 6 665
†
In this case a is a witness to the fact that n is not prime.
Extra B. Tractability and good-enoughness 349
Proof By Fermat, ap−1 ≡ 1 (mod p) and aq−1 ≡ 1 (mod q). Raise the first to the
q − 1 power and the second to the p − 1 power.
a (p−1)(q−1) ≡ 1 (mod p) a (p−1)(q−1) ≡ 1 (mod q)
Since a (p−1)(q−1) − 1 is divisible by both p and q , it is divisible by their product
pq = n .
Experts think that the most likely attack on RSA encryption is by factoring the
modulus n . Anyone who factors n can use the same method as the RSA key setup
process to turn the encrypter e into the decrypter d . That’s why n is taken to be
the product of two large primes; it makes factoring as hard as possible.
There is a factoring algorithm that takes only O(b 3 ) time (and O((b) space),
called Shor’s algorithm. But it runs only on quantum computers. At this moment
there are no such computers built, although there has been progress on that. For
the moment, RSA seems safe. (There are schemes that could replace it, if needed.)
V.A Exercises
✓ A.11 There are twenty five primes less than or equal to 100. Find them.
✓ A.12 We can walk through an RSA calculation.
(a) For the primes, take p = 11, q = 13. Find n = pq and φ(n) = (p − 1) · (q − 1).
(b) For the the encoder e use the smallest prime 1 < e < φ(n) that is relatively
prime with φ(n).
(c) Find the decoder d , the multiplicative inverse of e modulo n . (You can uses
Euclid’s algorithm, or just test the candidates.)
(d) Take the message to be represented as the number m = 9. Encrypt it and
decrypt it.
A.13 To test whether a number n is prime, we could just try dividing it by all
numbers less than it.
(a) Show that we needn’t try all numbers less than n , instead we can just try
√
all k with 2 ≤ k ≤ n . √
(b) Show that we cannot lower that any further than n .
(c) For input n = 1012 how many numbers would you need to test?
(d) Show that this is a terrible algorithm since it is exponential in the size of the
input.
A.14 Show that the probability that a random b -bit number is prime is about 1/b .
350 Chapter V. Computational Complexity
Extra
V.B Tractability and good-enoughness
Tour of Swe-
den
Extra B. Tractability and good-enoughness 351
that our theory suggests are intractable. And many problems that are
attackable in theory but that turn out to be awkward in practice. So much
more work needs to be done.
Part Four
Appendix
Appendix A. Strings
An alphabet is a nonempty and finite set of symbols (sometimes called tokens). We
write symbols in a distinct typeface, as in 1 or a, because the alternative of quoting
them would be clunky.† A string or word over an alphabet is a finite sequence
of elements from that alphabet. The string with no elements is the empty string,
denoted ε .
One potentially surprising aspect of a ‘symbol’ is that a symbol may contain
more than one glyph. For instance, a programming language may have if as a
symbol, indecomposable into separate letters. For example, the Scheme alphabet
contains the symbols or and car, and allows variable names such as a, x, or
lastname. An example of a string is (or a ready), which is a sequence of five
alphabet elements ⟨(, or, a, ready, )⟩ .
Traditionally, alphabets are denoted with the Greek letter Σ. We will name
strings with lower case Greek letters, and denote the items in the string with the
associated lower case roman letter, as in σ = ⟨s 0 , ... , si−1 ⟩ or τ = ⟨t 0 , ... , t j−1 ⟩ . In
place of si we may write σ [i]. We also may write σ [i : j] for the substring between
terms i and j , including the first term but not the second, and we write σ [i :] for
the tail substring that starts with term i . We also use σ [−1] for the final character,
σ [−2] for the one before it, etc. For the string σ = ⟨s 0 , ... , sn−1 ⟩ , the length |σ | is
the number of symbols that it contains, n . In particular, the length of the empty
string is |ε | = 0.
The diamond brackets and commas are ungainly. For small-scale examples
and exercises, we use the shortcut of working with alphabets of single-character
symbols and then writing strings by omitting the brackets and commas. That is,
we write abc instead of ⟨a, b, c⟩ .‡ This convenience comes with the disadvantage
that without the diamond brackets the empty string is just nothing, which is why
we use the separate symbol ε .§
The alphabet consisting of the zero and one characters is B = { 0, 1 }. Strings
over this alphabet are bitstrings or bit strings.||
Where Σ is an alphabet, for k ∈ N the set of length k strings over that alphabet
is Σk . The set of strings over Σ of any (finite) length is Σ∗ = ∪k ∈N Σk . The asterisk
is the Kleene star, read aloud as “star.”
Strings are simple, so there are only a few operations. Let σ = ⟨s 0 ... , si−1 ⟩
and τ = ⟨t 0 , ... , t j−1 ⟩ be strings over an alphabet Σ. The concatenation σ ⌢τ or στ
appends the second sequence to the first: σ ⌢τ = ⟨s 0 ... , si−1 , t 0 , ... , t j−1 ⟩ . Where
†
We give them a distinct look to distinguish the symbol ‘a’ from the variable ‘a ’, so that we can tell “let
x = a” apart from “let x = a .” Symbols are not variables — they don’t hold a value, they are themselves
a value. ‡ To see why when we drop the commas we want the alphabet to consist of single-character
symbols, consider Σ = { a, aa } and the string aaa. Without the commas this string is ambiguous: it
could mean ⟨a, aa ⟩ , or ⟨aa, a ⟩ , or ⟨a, a, a ⟩ . § Omitting the diamond brackets and commas also blurs
the distinction between a symbol and a one-symbol string, between a and ⟨a ⟩ . However, dropping the
brackets it is so convenient that we accept this disadvantage. || Caution: in some contexts authors
consider infinite bitstrings, although ours will always be finite.
σ = τ0 ⌢ · · · ⌢τk −1 then we say that σ decomposes into the τ ’s and that each τi is a
substring of σ . The first substring, τ0 , is a prefix of σ . The last, τk −1 , is a suffix.
A power or replication of a string is an iterated concatenation with itself, so
that σ 2 = σ ⌢σ and σ 3 = σ ⌢σ ⌢σ , etc. We write σ 1 = σ and σ 0 = ε . The reversal
σ R of a string takes the symbols in reverse order: σ R = ⟨si−1 , ... , s 0 ⟩ . The empty
string’s reversal is ε R = ε .
For example, let Σ = { a, b, c } and let σ = abc and τ = bbaac. Then the
concatenation στ is abcbbaac. The third power σ 3 is abcabcabc, and the reversal
τ R is caabb. A string that equals its own reversal is a palindrome; examples are
α = abba, β = cdc, and ε .
Exercises
A.1 Let σ = 10110 and τ = 110111 be bit strings. Find each. (a) σ ⌢τ (b) σ ⌢τ ⌢σ
(c) σ R (d) σ 3 (e) 03 ⌢ σ
A.2 Let the alphabet be Σ = { a, b, c }. Suppose that σ = ab and τ = bca. Find
each. (a) σ ⌢τ (b) σ 2 ⌢τ 2 (c) σ R ⌢τ R (d) σ 3
A.3 Let L = {σ ∈ B∗ |σ | = 4 and σ starts with 0 }. How many elements are in
that language?
A.4 Suppose that Σ = { a, b, c } and that σ = abcbccbba. (a) Is abcb a prefix of σ ?
(b) Is ba a suffix? (c) Is bab a substring? (d) Is ε a suffix?
A.5 What is the relation between |σ | , |τ | , and |σ ⌢ τ | ? You must justify your
answer.
A.6 The operation of string concatenation follows a simple algebra. For each
of these, decide if it is true. If so, prove it. If not, give a counterexample.
R
(a) α ⌢ε = α and ε ⌢α = α (b) α ⌢β = β ⌢α (c) α ⌢ β R = β R ⌢α R (d) α R = α
R
(e) α i = α i
A.7 Show that string concatenation is not commutative, that there are strings σ
and τ so that σ ⌢τ , τ ⌢ σ .
A.8 In defining decomposition above we have ‘σ = τ0 ⌢ · · · ⌢ τn−1 ’, without
parentheses on the right side. This takes for granted that the concatenation
operation is associative, that no matter how we parenthesize it we get the same
string. Prove this. Hint: use induction on the number of substrings, n .
A.9 Prove that this constructive definition of string power is equivalent to the one
above.
(
ε – if n = 0
σn =
σ n−1 ⌢ σ – if n > 0
Appendix B. Functions
A function is an input-output relationship: each input is associated with a unique
output. An example is the association of each input natural number with an
output number that is twice as big. Another is the association of each string
of characters with the length of that string. A third is the association of each
polynomial an x n + · · · + a 1x + a 0 with a Boolean value T or F , depending on
whether 1 is a root of that polynomial.
For the precise definition, fix two sets, a domain D and a codomain C . A
function, or map, f : D → C is a set of pairs (x, y) ∈ D × C , subject to the
restriction of being well-defined, that every x ∈ D appears in one and only one
pair (more on this below). We write f (x) = y or x 7→ y and say ‘x maps to y ’.
(Note the difference between the arrow symbols in f : D → C and x 7→ y ). We
say that x is an input or argument to the function, and that y is an output or value.
An important point is what a function isn’t: it isn’t a formula or rule. The
function that gives the US presidents, f (0) = George Washington, etc., has no
sensible formula and isn’t determined by any rule less complex than an exhaustive
listing of cases. The same holds for a function that returns winners of the US
World Series, including next year’s winner. True, many functions are described by
a formula, such as E(m) = mc 2 , and as well, many functions are computed by a
program. But what makes something a function is that for each input there is one
and only one associated output. If we can calculate the outputs from the inputs,
that’s great, but that is not required.
Some functions take more than one input, for instance dist(x, y) = x 2 + y 2 .
p
We say that dist is 2-ary, and other functions are 3-ary, etc. The number of inputs
is the function’s arity. If the function takes only one input but that input is a tuple,
as with x = (3, 5), then we often drop the extra parentheses, so that instead of
f (x) = f ((3, 5)) we write f (3, 5).
Pictures We often illustrate functions using the familiar xy axes; here are graphs
of f (x) = x 3 and f (x) = ⌊x⌋ .
20
2
10
−2 2
2
−10
We also illustrate functions with a bean diagram, which separates the domain and
the codomain sets. Below on the left is the action of the exclusive or operator.
−3 −2 −1 0 1 2 3
F, F
F ,T F
T,F
T
T ,T
−3 −2 −1 0 1 2 3
On the right is a variant of the bean diagram, using the number line to show the
absolute value function mapping integers to integers.
Well-defined The definition of a function contains the condition that each domain
element maps to one and only one codomain element, y = f (x). We refer to this
condition by saying that functions are well-defined.
When we are considering a relationship between x ’s and y ’s and asking if it is
a function, well-definedness is typically what is at issue.† For instance, consider
the set of ordered pairs (x, y) where the square of y is x . If x = 9 then both
y = 3 and y = −3 are related to x , so this is not a functional relationship — it is
not well-defined — because there is not one and only one y for each x . Another
example is that when setting up a company’s email we may decide to use each
†
Sometimes people say that they are, “checking that the function is well-defined.” In a strict sense this
is confused, because if it is a function then it is by definition well-defined. However, while all tigers
have stripes, we do sometimes say “striped tiger.” Natural language is funny that way.
person’s first initial and last name, but the problem is that there can easily be more
than one, say, mdouglas. The issue here is that the relation (email, person) is not
well-defined.
If a function is suitable for graphing on xy axes then visual proof of well-
definedness is that for any x in the domain, the vertical line at x intercepts the
graph in one and only one point.
One-to-one and onto The definition of function has an asymmetry: among the
ordered pairs (x, y), it requires that each domain element x be in one pair and
only one pair, but it does not require the same of the codomain elements.
A function is one-to-one (or 1-1, or an injection) if each codomain element y is
in at most one pair. The function below is one-to-one because for every element y
in the codomain bean, the bean on the right, there is at most one arrow ending
at y .
The most common way to verify that a function f is one-to-one is to assume that
f (x 0 ) = f (x 1 ) and then deduce that therefore x 0 = x 1 . If a function is suitable
for graphing on xy axes then visual proof is that for any y in the codomain, the
horizontal line at y intercepts the graph in at most one point.
A function is onto (or a surjection) if each codomain element y is in at least
one pair. Thus, a function is onto if its codomain equals its range. The function
below is onto because every element in the codomain bean has at least one arrow
ending at it.
The most common way to verify that a function is onto is to start with a codomain
element y and then produce a domain element x that maps to it. If a function is
suitable for graphing on xy axes then visual proof is that for any y in the codomain,
the horizontal line at y intercepts the graph in at least one point.
As the above pictures suggest, where the domain and codomain are finite,
when there is a function f : D → C then we can conclude some things about the
number of elements in the sets. The first is that the number of elements in the
domain is less than or equal to the number in the codomain. The other is that if
the function is onto then the number of elements in the domain equals the number
in the codomain if and only if the function is one-to-one.
Correspondence A function is a correspondence (or bijection) if it is both one-to-
one and onto. The picture on the left shows a correspondence between two finite
sets, both with four elements, and the one on the right shows a correspondence
between the natural numbers and the primes.
0 1 2 3 4 5 6 7 ...
...
2 3 5 7 11 13 17 19
Exercises
B.1 Let f : R → R be f (x) = 3x + 1 and д : R → R be д(x) = x 2 + 1. (a) Show
that f is one-to-one and onto. (b) Show that д is not one-to-one and not onto.
B.2 Show each.
(a) Let д : R3 → R2 be the projection map (x, y, z) 7→ (x, y) and let f : R2 → R3
be the map (x, y) 7→ (x, y, 0). Then д is a left inverse of f but not a right
inverse.
(b) The function f : Z → Z given by f (n) = n 2 has no left inverse.
(c) Where D = { 0, 1, 2, 3 } and C = { 10, 11 } , the function f : D → C given by
0 7→ 10, 1 7→ 11, 2 7→ 10, 3 7→ 11 has more than one right inverse.
B.3 (a) Where f : Z → Z is f (a) = a + 3 and д : Z → Z is д(a) = a − 3, show
that д is inverse to f .
(b) Where h : Z → Z is the function that returns n + 1 if n is even and returns
n − 1 if n is odd, find a function inverse to h .
(c) If s : R+ → R+ is s(x) = x 2 , find its inverse.
B.4 Let D = { 0, 1, 2 } and C = { 10, 11, 12 }. Also let f , д : D → C be f (0) = 10,
f (1) = 11, f (2) = 12, and д(0) = 10, д(1) = 10, д(2) = 12. Then: (a) verify
that f is a correspondence (b) construct an inverse for f (c) verify that д is not
a correspondence (d) show that д has no inverse.
B.5 (a) Prove that a composition of one-to-one functions is one-to-one. (b) Prove
that a composition of onto functions is onto. With the prior item this gives that a
composition of correspondences is a correspondence. (c) Prove that if д ◦ f is
one-to-one then f is one-to-one. (d) Prove that if д ◦ f is onto then д is onto.
(e) If д ◦ f is onto, is f onto? If it is one-to-one, is д one-to-one?
B.6 Prove.
(a) A function has an inverse if and only if that function is a correspondence.
(b) If a function has an inverse then that inverse is unique.
(c) The inverse of a correspondence is a correspondence.
(d) If f and д are each invertible then so is д ◦ f , and (д ◦ f )−1 = f −1 ◦ д−1 .
B.7 Let D and C be finite sets. Prove that if there is a correspondence f : D → C
then the two have the same number of elements. Hint: for each you can do
induction either on |C | or |D| .
(a) If f is one-to-one then |C | ≥ |D| .
(b) If f is onto then |C | ≤ |D| .
Part Five
Notes
Notes
These are citations or discussions that supplement the text body. Each refers to a word or phrase from that text
body, in italics, and then the note is in plain text. Many of the entries include links to more detail.
Cover
Calculating the bonus http://www.loc.gov/pictures/item/npc2007012636/
Preface
in addition to technical detail, also attends to a breadth of knowledge S Pinker emphasizes that a liberal approach
involves understanding in a context (Pinker 2014). “It seems to me that educated people should know something
about the 13-billion-year prehistory of our species and the basic laws governing the physical and living world,
including our bodies and brains. They should grasp the timeline of human history from the dawn of agriculture
to the present. They should be exposed to the diversity of human cultures, and the major systems of belief
and value with which they have made sense of their lives. They should know about the formative events in
human history, including the blunders we can hope not to repeat. They should understand the principles behind
democratic governance and the rule of law. They should know how to appreciate works of fiction and art as
sources of aesthetic pleasure and as impetuses to reflect on the human condition. On top of this knowledge,
a liberal education should make certain habits of rationality second nature. Educated people should be able
to express complex ideas in clear writing and speech. They should appreciate that objective knowledge is
a precious commodity, and know how to distinguish vetted fact from superstition, rumor, and unexamined
conventional wisdom. They should know how to reason logically and statistically, avoiding the fallacies and
biases to which the untutored human mind is vulnerable. They should think causally rather than magically, and
know what it takes to distinguish causation from correlation and coincidence. They should be acutely aware of
human fallibility, most notably their own, and appreciate that people who disagree with them are not stupid
or evil. Accordingly, they should appreciate the value of trying to change minds by persuasion rather than
intimidation or demagoguery.” See also https://www.aacu.org/leap/what-is-a-liberal-education
Who among us has not had an Ah-ha! moment I ask the indulgence of these experts. I believe that I have used their
informal speech only in appropriate contexts, but I would like to be corrected if not.
Prologue
Entscheidungsproblem Pronounced en-SHY-dungs-problem.
D Hilbert and W Ackermann Hilbert was a very prominent mathematician, perhaps the world’s most prominent
mathematician, and Ackermann was his student. So they made an impression when they wrote, “[This] must
be considered the main problem of mathematical logic” (Hilbert and Ackermann 1950), p 73.
mathematical statement Specifically, the statement as discussed by Hilbert and Ackermann comes from a first-order
logic (versions of the Entscheidungsproblem for other systems had been proposed by other mathematicians).
First-order logic differs from propositional logic, the logic of truth tables, in that it allows variables. Thus for
instance if you are studying the natural numbers then you can have a Boolean function Prime(x). (In this
context a Boolean function is traditionally called ‘predicate’.) To make a statement that is either true or false we
must then quantify statements, as in the (false) statement “for all x ∈ N, Prime(x) implies PerfectSquare(x).”
The modifier “first-order” means that the variables used by the Boolean functions are members of the domain
of discourse (for Prime above it is N), but we cannot have that variables themselves are Boolean functions.
(Allowing Boolean functions to take Boolean functions as input is possible, but would make this a second-order,
or even higher-order, logic.)
after a run He was 22 years old at the time. (Hodges 1983), p 96. This book is the authoritative source for Turing’s
fascinating life. During the Second World War, he led a group of British cryptanalysts at Bletchley Park, Britain’s
code breaking center, where his section was responsible for German naval codes. He devised a number of
techniques for breaking German ciphers, including an electromechanical machine that could find settings for
the German coding machine, the Enigma. Because the Battle of the Atlantic was critical to the Allied war effort,
and because cracking the codes was critical to defeating the German submarine effort, Turing’s work was very
important. (The major motion picture on this The Imitation Game (Wikipedia 2016) is a fun watch but is not a
slave to historical accuracy.) After the war, at the National Physical Laboratory he made one of the first designs
for a stored-program computer. In 1952, when it was a crime in the UK, Turing was prosecuted for homosexual
acts. He was given chemical castration as an alternative to prison. He died in 1954 from cyanide poisoning
which an inquest determined was suicide. In 2009, following an Internet campaign, British Prime Minister
Gordon Brown made an official public apology on behalf of the British government for “the appalling way he
was treated.”
Olympic marathon His time at the qualifying event was only ten minutes behind what was later the winning time
in the 1948 Olympic marathon. For more, see https://www.turing.org.uk/book/update/part6.html and
http://www-groups.dcs.st-and.ac.uk/~history/Extras/Turing_running.html.
clerk Before the engineering of computing machines had advanced enough to make capable machines widely
available, much of what we would today do with a program was done by people, then called “computers.” This
book’s cover shows human computers at work.
Another example is that, as told in the film Hidden Figures, the trajectory for US astronaut John Glenn’s
pioneering orbit of Earth was found by the human computer Katherine Johnson and her colleagues, African
American women whose accomplishments are all the more impressive because they occurred despite appalling
discrimination.
don’t involve random methods We can build things that return completely random results; one example is a device
that registers consecutive clicks on a Geiger counter and if the second gap between clicks is longer then the
first it returns 1, else it returns 0. See also https://blog.cloudflare.com/randomness-101-lavarand-in-
production/.
analog devices See (A/V Geeks 2013) about slide rules, (Wikipedia contributers 2016c) about nomograms,
(navyreviewer 2010) about a naval firing computer, and (Unknown 1948) about a more general-purpose machine.
reading results off of a slide rule or an instrument dial Suppose that an intermediate result of a calculation is 1.23.
If we read it off the slide rule with the convention that the resolution accuracy is only one decimal place then we
write down 1.2. Doubling that gives 2.4. But doubling the original number 2 · 1.23 = 2.46 and then rounding
to one place gives 2.5.
no upper bound This explication is derived from (Rogers 1987), p 1–5.
more is provided Perhaps the clerk has a helper, or the mechanism has a tender.
A reader may object that this violates the goal of the definition, to model physically-realizable computations We often
describe computations that do not have a natural resource bound. The algorithm for long division that we learn
in grade school has no inherent bounds on the lengths of either inputs or outputs, or on the amount of available
scratch paper.
are so elementary that we cannot easily imagine them further divided (Turing 1937)
LEGO’s See for instance https://www.youtube.com/watch?v=RLPVCJjTNgk&t=114s.
Finally, it trims off a 1 The instruction q 4 11q 5 won’t ever be reached, but it does no harm. It is there for the
definition of a Turing machine, to make ∆ defined on all qp Tp . See also the note to that definition.
transition function The definition describes ∆ as a function ∆ : Q × Σ → (Σ ∪ { L, R }) × Q . That is a white lie. In
Ppred the state q 3 is used only for the purpose of halting the machine, and so there is no defined next state. In
Padd , the state q 5 plays the same role. So strictly speaking, the transition function is a partial function, one
where for some members of the domain there is no associated value; see page 357. (Alternatively, we could
write the set of states as Q ∪ Q̂ where the states in Q̂ are there only for halting, and the transition function’s
definition is ∆ : Q × Σ → (Σ ∪ { L, R }) × (Q ∪ Q̂).) We have left this point out of the main presentation since it
doesn’t seem to cause confusion and the discussion can be a distraction.)
a complete description of how these machines act It is reasonable to ask why our standard model is one that is so
basic that programming can be annoying. Why not choose a real world machine? The reason is that, as here,
we can completely describe the actions of the Turing machine model, or of any of the other simple model that
are sometimes used, in only a few paragraphs. A real machine might take a full book. Turing machine because
they are simple to describe, historically important, and the work in Chapter Five needs them.
q is a state, a member of Q We are vague about what ‘states’ are but we assume that whatever they are, the set of
states Q is disjoint from the set Σ ∪ { L, R }.
a snapshot, an instant in a computation So the configuration, along with the Turing machine, encapsulates the
future history of the computation.
omit the part about interpreting the strings We do this for the same reason that we would say, “This is me when I
was ten.” instead of, “This is a picture of me when I was ten.”
a physical system evolves through a sequence of discrete steps that are local, meaning that all the action takes place
within one cell of the head Adapted from (Widgerson 2017).
constructed the first machine See (Leupold 1725).
A number of mathematicians See also (Wikipedia contributers 2014).
Church suggested to Gödel (Soare 1999)
established beyond any doubt (Gödel 1995)
Church’s Thesis is central to the Theory of Computation Some authors have claimed that neither Church nor Turing
stated anything as strong as is given here but instead that they proposed that the set of things that can be done
by a Turing machine is the same as the set of things that are intuitively computable by a human computer; see
for instance (B. J. Copeland and Proudfoot 1999). But the thesis as stated here, that what can be done by a
Turing machine is what can be done by any physical mechanism that is discrete and deterministic, is certainly
the thesis as it is taken in the field today. And besides, Church and Turing did not in fact distinguish between
the two cases; (Hodges 2016) points to Church’s review of Turing’s paper in the Journal of Symbolic Logic: “The
author [i.e. Turing] proposes as a criterion that an infinite sequence of digits 0 and 1 be ‘computable’ that it
shall be possible to devise a computing machine, occupying a finite space and with working parts of finite size,
which will write down the sequence to any desired number of terms if allowed to run for a sufficiently long
time. As a matter of convenience, certain further restrictions are imposed on the character of the machine,
but these are of such a nature as obviously to cause no loss of generality — in particular, a human calculator,
provided with pencil and paper and explicit instructions, can be regarded as a kind of Turing machine.” This
has Church referring to the human calculator not as the prototype but instead as a special case of the class of
defined machines.
we cannot give a mathematical proof We cannot give a proof that starts from axioms whose justification is on
firmer footing than the thesis itself. R Williams has commented, “[T]he Church-Turing thesis is not a formal
proposition that can be proved. It is a scientific hypothesis, so it can be ‘disproved’ in the sense that it is
falsifiable. Any ‘proof’ must provide a definition of computability with it, and the proof is only as good as that
definition.” (SE user Ryan Williams 2010)
formalizes the notion of ‘effective’ or ‘intuitively mechanically computable’ Kleene wrote that “its role is to delimit
precisely an hitherto vaguely conceived totality.” (Kleene 1952), p 318.
Turing wrote (Turing 1937)
systematic error (Dershowitz and Gurevich 2008) p 304.
it is the right answer Gödel wrote, “the great importance . . . [of] Turing’s computability [is] largely due to the
fact that with this concept one has for the first time succeeded in giving an absolute definition of an interesting
epistemological notion, i.e., one not depending on the formalism chosen.” (Gödel 1995), pages 150–153.
can compute all of the functions that can be done by a machine with two or more tapes For instance, we can simulate
a two-tape machine P2 on a one-tape machine P1 . One way to do this is by having P1 use its even-numbered
tape positions for P2 ’s first tape and using its odd tape positions for P2 ’s second tape. (A more hand-wavy
explanation is: a modern computer can clearly simulate a two-tape Turing machine but a modern computer has
sequential memory, which is like the one-tape machine’s sequential tape.)
evident immediately (Church 1937)
S Aaronson has made this point From his blog Shtetl-Optimized, (Aaronson 2012b).
supply a stream of random bits Some CPU’s come with that capability built in; see for instance https://en.
wikipedia.org/wiki/RdRand.
beyond discrete and deterministic From (SE author Andrej Bauer 2016): “Turing machines are described concretely
in terms of states, a head, and a working tape. It is far from obvious that this exhausts the computing possibilities
of the universe we live in. Could we not make a more powerful machine using electricity, or water, or quantum
phenomena? What if we fly a Turing machine into a black hole at just the right speed and direction, so that it
can perform infinitely many steps in what appears finite time to us? You cannot just say ‘obviously not’ — you
need to do some calculations in general relativity first. And what if physicists find out a way to communicate
and control parallel universes, so that we can run infinitely many Turing machines in parallel time?”
everything that experiments with reality would ever find to be possible Modern Physics is a sophisticated and
advanced field of study so we could doubt that anything large has been overlooked. However, there is historical
reason for supposing that such a thing is possible. The physicists H von Helmholtz in 1856, and S Newcomb
in 1892, calculated that the Sun is about 20 million years old (they assumed that the Sun glowed from the
energy provided by its gravitational contraction in condensing from a nebula of gas and dust to its current
state). Consistently with that, one of the world’s most reputable physicists, W Kelvin, estimated in 1897 that
the Earth was, “more than 20 and less than 40 million year old, and probably much nearer 20 than 40” (he
calculated how long it would take the Earth to cool from a completely molten object to its present temperature).
He said, “unless sources now unknown to us are prepared in the great storehouse of creation” then there was
not enough energy in the system to justify a longer estimate. One person very troubled by this was Darwin,
having himself found that a valley in England took 300 million years to erode, and consequently that there was
enough time, called “deep time,” for the slow but steady process of evolution of species to happen. Then, in
1896, everything changed. A Becquerel discovered radiation. All of the prior calculations did not account for it
and the apparent discrepancy vanished. (Wikipedia contributers 2016a)
the unique solution is not computable See (Pour-El and Richards 1981).
compute a solution See http://www.smbc-comics.com/?id=3054.
Three-Body Problem See https://en.wikipedia.org/wiki/Three-body_problem
we can still wonder See (Piccinini 2017).
This big question remains open A sample of readings: frequently cited is (Black 2000), which takes the thesis to
be about what is humanly computable, and (B. Jack Copeland 1996), (B. Jack Copeland 1999), and (B. Jack
Copeland 2002) argue that computations can be done that are beyond the capabilities of Turing machines, while
(Davis 2004), (Davis 2006), and (Gandy 1980) give arguments that most Theory of Computing researchers
consider persuasive.
Often when we want to show that something is computable by a Turing machine The same point stated another
way, from (SE author Andrej Bauer 2018): In books on computability theory it is common for the text to skip
details on how a particular machine is to be constructed. The author of the computability book will mumble
something about the Turing-Church thesis somewhere in the beginning. This is to be read as “you will have to
do the missing parts yourself, or equip yourself with the same sense of inner feeling about computation as I did”.
Often the author will give you hints on how to construct a machine, and call them “pseudo-code”, “effective
procedure”, “idea”, or some such. The Church-Turing thesis is the social convention that such descriptions of
machines suffice. (Of course, the social convention is not arbitrary but rather based on many years of experience
on what is and is not computable.) . . . I am not saying that this is a bad idea, I am just telling you honestly what
is going on. . . . So what are we supposed to do? We certainly do not want to write out detailed constructions
of machines, because then students will end up thinking that’s what computability theory is about. It isn’t.
Computability theory is about contemplating what machines we could construct if we wanted to, but we don’t.
As usual, the best path to wisdom is to pass through a phase of confusion.
Plonk! See (Wikipedia contributers 2015a).
Suppose that you have infinitely many dollars. (Joel David Hamkins 2010)
H Grassmann produced a more elegant definition Dedekind used this definition to give the first rigorous proof of
the laws of elementary school arithmetic.
logically problematic The sense of something perplexing about recursion is often expressed with an story. The
philosopher W James gave a public lecture on cosmology, and was approached by an older woman from the
audience. “Your theory that the sun is the center of the solar system, and the earth orbits around it, has a
good ring, Mr James, but it’s wrong.” she said. “Our crust of earth lies on the back of a giant turtle.” James
gently asked, “If your theory is correct then what does this turtle stand on?” “You’re very clever, Mr James,”
she replied, “but I have an answer. The first turtle stands on the back of a second, far larger, turtle.” James
persisted, “And this second turtle, Madam?” Immediately she crowed, “It’s no use Mr James — it’s turtles all the
way down.” (Wikipedia contributers 2016e)
See also Room 8, winner of the 2014 short film award from the British Academy of Film and Television Arts.
define the function on higher-numbered inputs using only its values on lower-numbered ones For the function specified
by f (0) = 1 and f (n) = n · f (f (n − 1) − 1), try computing the values f (0) through f (5).
One elegant thing about Grassmann approach A Perl’s epigram, “Recursion is the root of computation since it
trades description for time” expresses this elegance. The grade school definition of addition is prescriptive in
that it gives a procedure. But Grass man’s definition is descriptive in giving the meaning, the semantics, of
the operation. The recursive definition implicitly includes steps, and with them time, in that you need to keep
expanding the recursive calls. But it does not include them in preference to what they are about.
the first sequence of numbers ever computed on an electronic computer It was computed on EDSAC, on 1949-May-06.
See (N. J. A. Sloane 2019) and (William S. Renwick 1949).
Towers of Hanoi The puzzle was invented by E Lucas in 1883 but the next year H De Parville made of it quite a
great problem with the delightful problem statement.
hyperoperation (Goodstein 1947)
H3 (4, 4) is much greater than the number of elementary particles in the universe The radius of the universe if about
45 × 109 light years. That’s about 1062 Plank units. A system of much more than r 1 . 5 particles packed in
r Plank units will collapse rapidly. So the number of particles is less than 1092 , which is about 2305 , which is
much less than H3 (4, 4). (Levin 2016)
a programming language having only bounded loops computes all of the primitive recursive functions (Meyer and
Ritchie 1966)
output only primes In fact, there is no polynomial with integer coefficients that outputs a prime for all integer
inputs, except if the polynomial is constant. This was shown in 1752 by C Goldbach. The proof is so simple, and
delightful, and not widely known, that we will give it here. Suppose p is a polynomial with integer coefficients
that on integer inputs returns only primes. Fix some n̂ ∈ N, and then p(n̂) = m̂ is a prime. Into the polynomial
plug n̂ + k · m̂ , where k ∈ Z. Expanding gives lots of terms with m̂ in them, and gathering together like terms
shows this.
p(n̂ + k · m̂) ≡ p(n̂) mod m̂
Because p(n̂) = m̂ , this gives that p(n̂ + k · m̂) = m̂ since that is the only prime number that is a multiple of m̂ ,
and p outputs only primes. But with that, p(n) = m̂ has infinitely many roots, and is therefore the constant
polynomial.
looking for something that is not there Goldbach’s conjecture is that every even number can be written as the sum
of at most two primes. Here are the first few instances: 2 = 2, 4 = 2 + 2, 6 = 3 + 3, 8 = 5 + 3, 10 = 7 + 3. A
natural attack is to do an unbounded computer search. As of this writing the conjecture has been tested up to
1018 .
Collatz conjecture See (Wikipedia contributors 2019a).
sin(x) may be calculated via its Taylor polynomial The Taylor series is sin(x) = x − x 3 /3! + x 5 /5! − x 7 /7! + · · · .
We might do a practical calculation by deciding that a sufficiently good approximation is to terminate that series
at the x 5 term, giving a Taylor polynomial.
C Shannon See http://www.newyorker.com/tech/elements/claude-shannon-the-father-of-the-information-
age-turns-1100100.
master’s thesis See https://en.wikipedia.org/wiki/A_Symbolic_Analysis_of_Relay_and_Switching_Circuits.
kind of not gate This shows an N-type Metal Oxide Semiconductor Transistor. There are many other types.
problem of humans on Mars To get there the idea was to use a rocket ship impelled by dropping a sequence of
atom bombs out the bottom; the energy would let the ship move rapidly around the solar system. This sounds
like a crank plan but it is perfectly feasible (Brower 1983). Having been a key person in the development of the
atomic bomb, von Neumann was keenly aware of their capabilities.
J Conway Conway was a magnetic person, and extraordinarily creative. See an excerpt from an excellent biography
at https://www.ias.edu/ideas/2015/roberts-john-horton-conway.
earliest computer crazes (Bellos 2014)
zero-player game See https://www.youtube.com/watch?v=R9Plq-D1gEk.
a rabbit Discovered by A Trevorrow in 1986.
For technical convenience This presentation is based on that of (Hennie 1977), (Smoryński 1991), and (Robinson
1948).
giving a programming language that computes primitive recursive functions See the history at (Brock 2020).
LOOP program (Meyer and Ritchie 1966)
Background
Deep Field movie https://www.youtube.com/watch?v=yDiD8F9ItX0
two paradoxes These are veridical paradoxes: they may at first seem absurd but we will demonstrate that they are
nonetheless true. (Wikipedia contributers 2018)
Galileo’s Paradox He did not invent it but he gave it prominence in his celebrated Discourses and Mathematical
Demonstrations Relating to Two New Sciences.
same cardinality Numbers have two natures. First, in referring to the set of stars known as the Pleiades as the
“Seven Sisters” we mean to take them as a set, not ordered in any way. In contrast, second, in referring to
the “Seven Deadly Sins,” well, clearly some of them rate higher than others. The first reference speaks to
the cardinal nature of numbers and the second is their ordinal nature. For finite numbers the two are bound
together, as Lemma 1.5 says, but for infinite numbers they differ.
was proposed by G Cantor in the 1870’s For his discoveries, Cantor was reviled by a prominent mathematician and
former professor L Kronecker as a “corrupter of youth.” That was pre-Elvis.
which is Cantor’s definition (Gödel 1964)
the most important infinite set is N Its existence is guaranteed by the Axiom of Infinity, one of the standard axioms
of Mathematics, the Zermelo-Frankel axioms.
due to Zeno Zeno gave a number of related paradoxes of motion. See (Wikipedia contributers 2016f) (Huggett
2010), (Bragg 2016), as well as http://www.smbc-comics.com/comic/zeno and this xkcd.
Courtesy xkcd.com
the distances x i+1 − x i shrink toward zero, there is always further to go because of the open-endedness at the left of the
interval (0 .. ∞) A modern version of exploiting open-endedness is the Thomson’s Lamp Paradox: a person
turns on the room lights and then a minute later turns them off, a half minute later turns them on again, and a
quarter minute later turns them off, etc. After two minutes, are the lights on or off? This paradox was devised
in 1954 by J F Thomson to analyze the possibility of a supertask, the completion of an infinite number of tasks.
Thomson’s answer was that it creates a contradiction: “It cannot be on, because I did not ever turn it on without
at once turning it off. It cannot be off, because I did in the first place turn it on, and thereafter I never turned
it off without at once turning it on. But the lamp must be either on or off” (Thomson 1954). See also the
discussion of the Littlewood Paradox (Wikipedia contributers 2016d).
numbers the diagonals Really, these are the anti-diagonals, since the diagonal is composed of the pairs ⟨n, n⟩ .
arithmetic series with total d(d + 1)/2 It is called the d -th trianglar number
cantor(x, y) = x + [(x + y)(x + y + 1)/2] The Fueter-Pólya Theorem says that this is essentially the only quadratic
function that serves as a pairing; see (Smoryński 1991). No one knows whether there are pairing functions that
are any other kind of polynomial.
memoization The term was invented by Donald Michie (Wikipedia contributers 2016b), who among other
accomplishments was a coworker of Turing’s in the World War II effort to break the German secret codes.
assume that we have a family of correspondences f j : N → S j To pass from the original collection of infinitely many
onto functions fi : N → S i to a single, uniform, family of onto functions f j (i) = f (j, y) we need some version of
the Axiom of Choice, perhaps Countable Choice. We omit discussion of that because it would take us far afield.
doesn’t matter much For more on “much” see (Rogers 1958).
but that we won’t make precise One problem with this scheme is that it depends on the underlying computer.
Imagine that your computer uses eight bit words. If we want the map from a natural number to a source code
and the input number is 9 then in binary that’s 1001, which is not eight bits and to disassemble it you need to
pad the it out to the machine’s word length, as 00001001. Another issue is the ambiguity caused by leading 0’s,
e.g.the bit string 00000000 00001001 also represent 9 but disassembles to a two-operation source. We could
address this by imagining that the operation with instruction code 00000000 is NOP and then disallow source
code that starts with such an instruction (reasoning that starting a serial program with fewer NOP’s won’t change
its input-output behavior), except for the source consisting of a single NOP. But we are getting into the weeds
of computer architecture here, which is not where we want to be, so we take this numbering scheme only
informally.
adding the instruction q j+k BBq j+k
This is essentially what a compiler calls ‘unreachable code’ in that it is not a state that the machine will ever be
in.
central to the entire Theory of Computation The classic text (Rogers 1987) says, “It is not inaccurate to say that our
theory is, in large part, a ‘theory of diagonalization’.”
This technique is diagonalization The argument just sketched is often called Cantor’s diagonal proof, although it
was not Cantor’s original argument for the result, and although the argument style is not due to Cantor but
instead to Paul du Bois-Reymond. The fact that scientific results are often attributed to people who are not
their inventor is Stigler’s law of eponymy, because it wasn’t invented by Stigler (who attributes it to Merton). In
mathematics this is called Boyer’s Law, who didn’t invent it either. (Wikipedia contributers 2015b).
Musical Chairs It starts with more children than chairs. Some music plays and the children walk around the chairs.
But the music stops suddenly and each child tries to sit, leaving someone without a chair. That child has to
leave the game, a chair is removed, and the game proceeds.
so many real numbers This is a Pigeonhole Principle argument.
there are jobs that no computer can do To a person with training in programming, where all of the focus is on
getting the computer to do things, the existence of jobs that cannot be done can be a surprise, perhaps even
a shock. One thing that it points out is that the topics introduced here are nontrivial, that formalizing the
definition of mechanical computation and the results about infinity leads to interesting conclusions.
Your friend is confused about the diagonal argument From (SE author Kaktus and various others 2019).
ENIAC, reconfigure by rewiring. Jean Jennings (left), Marlyn Wescoff (center), and Ruth Lichterman program the
ENIAC, circa 1946. U. S. Army Photo
A pattern in technology is for jobs done in hardware to migrate to software One story that illustrates the naturalness
of this involves the English mathematician C Babbage, and his protogee A Lovelace. In 1812 Babbage was
developing tables of logarithms. These were calculated by computers — the word then current for the people
who computed them by hand. To check the accuracy he had two people do the same table and compared. He
was annoyed at the number of discrepencies and had the idea to build a machine to do the computing. He got a
government grant to design and construct a machine called the difference engine, which he started in 1822.
This was a single-purpose device, what we today would call a calculator. One person who became interested in
the computations was an acquaintance of his, Lovelace (who at the time was named Byron, as she was the
daughter of the poet Lord Byron).
However, this machine was never finished because Babbage had the thought to make a device that would be
programmable, and that was too much of a temptation. Lovelace contributed an extensive set of notes on a
proposed new machine, the analytical engine, and has become known as the first programmer.
controlled by paper cards It weaves with hooks whose positions, raised or lowered, are determined by holes
punched in the cards
have the same output behavior A technical point: Turing machines have a tape alphabet. So a universal machine’s
input or output can only involve symbols that it is defined as able to use. If another machine has a different
tape alphabet then how can the universal machine simulate it? As usual, we define things so that the universal
machine manipulates representations of the other machine’s alphabet. This is similar to the way that an
everyday computer represents decimals using binary.
flow chart Flowcharts are widely used to sketch algorithms; here is one from XKCD.
Courtesy xkcd
won’t be a physically-realizable discrete and deterministic mechanism Turing introduced oracles in his PhD thesis. He
said, “We shall not go any further into the nature of this oracle apart from saying that it cannot be a machine.”
(Turing 1938)
magic smoke See (Wikipedia contributers 2017f).
we will instead use Church’s Thesis For a full treatment see (Rogers 1987).
the notion of partial computable function seems to have an in-built defense against diagonalization (Odifreddi 1992),
p 152.
this machine’s name is its behavior Nominative determinism is the theory that a person’s name has some influence
over what they do with their life. Examples are: the sprinter Usain Bolt, the US weatherman Storm Fields, the
baseball player Prince Fielder. and the Lord Chief Justice of England and Wales named Igor Judge, I Judge. See
https://en.wikipedia.org/wiki/Nominative_determinism.
considered mysterious, or at any rate obscure For example, “The recursion theorem . . . has one of the most
unintuitive proofs where I cannot explain why it works, only that it does.” (Fortnow and Gasarch 2002)
Once upon a time This mathematical fable came from David Hilbert in 1924. It was popularized by George Gamow
in One, Two, Three . . . Infinity. (Kragh 2014).
Napoleon’s downfall in the early 1800’s See (Wikipedia contributers 2017d).
period of prosperity and peace See (Wikipedia contributers 2017i).
A A Michelson, who wrote in 1899, “The more important fundamental laws and facts of physical science have all been
discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence
of new discoveries is exceedingly remote.” Michaelson was a major figure. From 1901 to 1903 he was president of
the American Physical Society. In 1910–1911 he was president of the American Association for the Advancement
of Science and from 1923–1927 he was president of the National Academy of Sciences. In 1907 he received the
Copley Medal from the Royal Society in London, and the Nobel Prize. He remains well known today for the
Michelson–Morley experiment that tried to detect the presence of aether, the hypothesized medium through
with light waves travel.
working out the rules of a game by watching it being played See https://www.youtube.com/watch?v=o1dgrvlWML4
many observers thought that we basically had got the rules An example is that Max Planck was advised not to
go into physics by his professor, who said, “in this field, almost everything is already discovered, and all that
remains is to fill a few unimportant holes.” (Wikipedia contributors 2017)
the discovery of radiation This happened in 1896, before Michaelson’s statement. Often the significance of things
takes time to be apparent
“everything is relative.” Of course, the history around Einstein’s work is vastly more complex and subtle. But we
are speaking of the broad understanding, not of the truth.
loss of certainty This phrase is the title of a famous popular book on mathematics, by M Klein. The book is fun
and a thought-provoking read. Also thought-provoking are some criticisms of the book. (Wikipedia contributors
2019b) is good introduction to both.
the development of a fetus is that it basically just expands The issue was whether the fetus began preformed or as a
homogeneous mass, see (Maienschein 2017). Today we have similar questions about the Big Bang — we are
puzzled to explain how a mathematical point, which is without internal structure and entirely homogeneous,
could develop into the very non-homogeneous universe that we see today.
potential infinite regress This line of thinking often depends on the suggestion that all organisms were created at
the same time, that they have existed since the beginning of the posited creation.
discovery by Darwin and Wallace of descent with modification through natural selection Darwin wrote in his
autobiography, “The old argument of design in nature, as given by Paley, which formerly seemed to me so
conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for
instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a
door by man. There seems to be no more design in the variability of organic beings and in the action of natural
selection, than in the course which the wind blows. Everything in nature is the result of fixed laws.”
the car is in some way less complex than the robot This is an information theoretic analog of the Second Law
of Thermodynamics. E Musk has expressed something of the same sentiment, “The extreme difficulty of
scaling production of new technology is not well understood. It’s 1000% to 10,000% harder than making
a few prototypes. The machine that makes the machine is vastly harder than the machine itself.” See
https://twitter.com/elonmusk/status/1308284091142266881.
self-reference ‘Self-reference’ describes something that refers to itself. The classic example is the Liar paradox,
the statement attributed to the Cretian Epimenides, “All Cretans are liars.” Because he is Cretian we take
the statement to be an utterance about utterances by him, that is, to be about itself. If we suppose that the
statement is true then it asserts that anything he says is false, so the statement is false. But if we suppose that it
is false then we take that he is saying the truth, that all his statements are false. Its a paradox, meaning that the
reasoning seems locally sound but it leads to a global impossibility.
This is related to Russell’s paradox, which lies at the heart of the diagonalization technique, that if we define
the collection of sets R = {S | S < S } then R ∈ R holds if and only if R < R holds.
Self-reference is obviously related to recurrence. You see it sometimes pictured as an infinite recurrence, as
here on the front of a chocolate product.
Because of this product, having a picture contain itself is sometimes known as the Droste effect. See also https://
www.smithsonianmag.com/science-nature/fresh-off-the-3d-printer-henry-segermans-mathematical-
sculptures-2894574/?no-ist
Besides the Liar paradox there are many others. One is Quine’s paradox, a sentence that asserts its own
falsehood.
“Yields falsehood when preceded by its quotation”
yields falsehood when preceded by its quotation.
If this sentence were false then it would be saying something that is true. If this sentence were true then what it
says would hold and it would be not true.
A wonderful popular book exploring these topics and many others is (Hofstadter 1979).
quine Named for the philosopher Willard Van Orman Quine.
for routines to have access to their code Introspection is the ability to inspect code in the system, such as to inspect
the type of objects. Reflection is the ability to make modifications at runtime.
We will show how a routine can know its source This is derived from the wonderful presentation in (Sipser 2013).
The verb ‘to quine’ Invented by D Hofstadter .
which n -state Turing Machine leaves the most 1’s after halting R H Bruck famously wrote (R H Bruck n.d.), “I might
compare the high-speed computing machine to a remarkably large and awkward pencil which takes a long time
to sharpen and cannot be held in the fingers in the usual manner so that it gives the illusion of responding to my
thoughts, but is fitted with a rather delicate engine and will write like a mad thing provided I am willing to let
it dictate pretty much the subjects on which it writes.” The Busy Beaver machine is the maddest writer possible.
Radó noted in his 1962 paper This paper (Radó 1962) is exceptionally clear and interesting.
Σ(n) is unknowable See (Aaronson 2012a). See also https://www.quantamagazine.org/the-busy-beaver-
game-illuminates-the-fundamental-limits-of-math-20201210/.
a 7918-state Turing machine The number of states needed has since been reduced. As of this writing it is 1919.
the standard axioms for Mathematics This is ZFC, the Zermelo–Fraenkel axioms with the Axiom of Choice. (In
addition, they also took the hypothesis of the Stationary Ramsey Property.)
take the floor Let the n -th triangle number be t(n) = 0 + 1 + · · · + n = n(n + 1)/2. The function t is monotonicly
increasing and there are infinitely many triangle numbers. Thus for every natural number c there is a unique
triangle number t(n) that is maximal so that c = t(n) + k for some k ∈ N. Because t(n + 1) = t(n) + n + 1, we
see that k < n + 1, that is, k ≤ n . Thus, to compute the diagonal number d from the Cantor number c of a pair,
we have (1/2)d(d + 1) ≤√c < (1/2)(d + 1)(d + 2). Applying
√ the quadratic formula
√ to the left half and right
halves gives (1/2)(−3 + 1 + 8c) < d ≤ (1/2)(−1 + 1 + 8c). Taking (1/2)(−1 + 1 + 8c) to be α gives that
c ∈ (α − 1 .. α] so that d = ⌊α⌋ . (Scott 2020)
let’s extend to tuples of any size See https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it.
Languages
having elephants move to the left side of a road or to the right Less fancifully, we could be making a Turing machine
out of LEGO’s and want to keep track by sliding a block from one side of a column to the other. Or, we could
use an abacus.
we could translate any such procedure While a person may quite sensibly worry that elephants could be not just on
the left side or the right, but in any of the continuum of points in between, we will make this assertion without
more philosophical analysis than by just referring to the discrete nature of our mechanisms (as Turing basically
did). That is, we take it as an axiom.
finite set { 1000001, 1100001 } Although it looks like two strings plucked from the air, the language is not without
sense. The bitstring 1000001 represents capital A in the ASCII encoding, while 1100001 is lower case a. The
American Standard Code for Information Interchange, ASCII, is a widely used, albeit quite old, way of encoding
character information in computers. The most common modern character encoding is UTF-8, which extends
ASCII. For the history see https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt.
palindrome Sometimes people tease Psychology by labeling it the study of college freshmen because so many
studies start, roughly, “we put a bunch of college freshmen in a room, lied to them about what we were doing,
and . . . ” In the same way, Theory of Computing sometimes seems like the study of palindromes.
words from English that are palindromes Some people like to move beyond single word palindromes to make
sentence-length palindromes that make some sense. Some of the more famous are: (1) supposedly the first
sentence ever uttered, “Madam, I’m Adam” (2) Napoleon’s lament, “Able was I ere I saw Elba” and (3) “A man, a
plan, a canal: Panama”, about Theodore Roosevelt.
In practice a language is usually given by rules Linguists started formalizing the description of language, including
phrase structure, at the start of the 1900’s. Meanwhile, string rewriting rules as formal, abstract systems were
introduced and studied by mathematicians including Axel Thue in 1914, Emil Post from the 1920’s through the
1940’s and Turing in 1936. Noam Chomsky, while teaching linguistics to students of information theory at MIT,
combined linguistics and mathematics by taking Thue’s formalism as the basis for the description of the syntax
of natural language. (Wikipedia contributers 2017e)
“the red big barn” sounds wrong. Experts vary on the exact rules but one source gives (article) + number +
judgment/attitude + size, length, height + age + color + origin + material + purpose + (noun), so that “big
red barn” is size + color + noun, as is “little green men.” This is called the Royal Order of Adjectives; see
http://english.stackexchange.com/a/1159. A person may object by citing “big bad wolf” but it turns out
there is another, stronger, rule that if there are three words then they have to go I-A-O and if there are two
words then the order has to be I followed by either A or O. Thus we have tick tock but not tock tick. Similarly
for tic-tac-toe, mishmash, King Kong, or dilly dally.
very strict rules Everyone who has programmed has had a compiler chide them about a syntax violation.
grammars are the language of languages. From Matt Swift, http://matt.might.net/articles/grammars-bnf-
ebnf/.
this grammar Taken from https://en.wikipedia.org/wiki/Formal_grammar.
dangling else See https://en.wikipedia.org/wiki/Dangling_else.
postal addresses. Adapted from https://en.wikipedia.org/wiki/BackusNaur_Form.
Recall Turing’s prototype computer In this book we stick to grammars where each rule head is a single nonterminal.
That greatly restricts the languages that we can compute. More general grammars can compute more, including
every set that can be decided by a Turing machine.
often state their problems For instance, see the blogfeed for Theoretical Computer Science http://cstheory-
feed.org/ (Various authors 2017)
represent a graph in a computer Example 3.2 make the point that a graph is about the connections between vertices,
not about how it is drawn. This graph representation via a matrix also illustrates that point because it is, after
all, not drawn.
a standard way to express grammars One factor influencing its adoption was a letter that D Knuth wrote to the
Communications of the ACM (D. E. Knuth 1964). He listed some advantages over the grammar-specification
methods that were then widely used. Most importantly, he contrasted BNF’s ‘<addition operator>’ with ‘A’,
saying that the difference is a great addition to “the explanatory power of a syntax.” He also proposed the name
Backus Naur Form.
some extensions for grouping and replication The best current standard is https://www.w3.org/TR/xml/.
Time is a difficult engineering problem One complication of time, among many, is leap seconds. The Earth is
constantly undergoing deceleration caused by the braking action of the tides. The average deceleration of the
Earth is roughly 1.4 milliseconds per day per century, although the exact number varies from year to year
depending on many factors, including major earthquakes and volcanic eruptions. To ensure that atomic clocks
and the Earth’s rotational time do not differ by more than 0.9 seconds, occasionally an extra second is added to
civil time. This leap second can be either positive or negative depending on the Earth’s rotation — on occasion
there are minutes with only 58 seconds, and on occasion minutes with 60.
Adding to the confusion is that the changes in rotation are uneven and we cannot predict leap seconds far into
the future. The International Earth Rotation Service publishes bulletins that announce leap seconds with a
few weeks warning. Thus, there is no way to determine how many seconds there will be between the current
instant and ten years from now. Since the first leap second in 1972, all leap seconds have been positive and
there were 23 leap seconds in the 34 years to January 2006. (U.S. Naval Observatory 2017)
RFC 3339 (Klyne and Newman 2002)
strings such as 1958-10-12T23:20:50.52Z This format has a number of advantages including human readability,
that if you sort a collection of these strings then earlier times will come earlier, simplicity (there is only one
format), and that they include the time zone information.
a BNF grammar Some notes: (1) Coordinated Universal Time, the basis for civil time, is often called UTC, but is
sometimes abbreviated Z, (2) years are four digits to prevent the Y2K problem (Encyclopædia Britannica 2017),
(3) the only month numbers allowed are 01–12 and in each month only some day numbers are allowed, and
(4) the only time hours allowed are 00–23, minutes must be in the range 00–59, etc. (Klyne and Newman 2002)
Automata
what jobs can be done by a machine with bounded memory From Rabin, Scott, Finite Automata and Their Decision
Problems, 1959: Turing machines are widely considered to be the abstract prototype of digital computers; workers
in the field, however, have felt more and more that the notion of a Turing machine is too general to serve as an
accurate model of actual computers. It is well known that even for simple calculations it is impossible to give an a
priori upper bound on the amount of tape a Turing machine will need for any given computation. It is precisely this
feature that renders Turing’s concept unrealistic. In the last few years the idea of a finite automaton has appeared
in the literature. These are machines having only a finite number of internal states that can be used for memory
and computation. The restriction on finiteness appears to give a better approximation to the idea of a physical
machine. Of course, such machines cannot do as much as Turing machines, but the advantage of being able to
compute an arbitrary general recursive function is questionable, since very few of these functions come up in practical
applications.
transition function ∆ : Q × Σ → Q Some authors allow the transition function to be partial. That is, some authors
allow that for some state-symbol pairs there is no next state. This choice by an author is a matter of convenience,
as for any such machine you can create an error state q error or dead state, that is not an accepting state and that
transitions only to itself, and send all such pairs there. This transition function is total, and the new machine
has the same collection of accepted strings as the old.
Unicode While in the early days of computers characters could be encoded with standards such as ASCII, which
includes only upper and lower case unaccented letters, digits, a few punctuation marks, and a few control
characters, today’s global interconnected world needs more. The Unicode standard assigns a unique number
called a code point to every character in every language (to a fair approximation). See (Wikipedia contributers
2017k).
how phone numbers used to be handled in North America See the description of the North America Numbering Plan
(Wikipedia contributers 2017g).
same-area local exchange Initially, large states, those divided into multiple numbering plan areas were assigned
area codes with a 1 in the second position while areas that covered entire states or provinces got codes with 0
as the middle digit. This was abandoned by the early 1950s. (Wikipedia contributers 2017g).
switching with physical devices The devices to do the switching were invented in 1889 by an undertaker whose
competitor’s wife was the local telephone operator and routed calls to her husband’s business. (Wikipedia
contributers 2017b)
Alcuin of York (735–804) See https://www.bbc.co.uk/programmes/m000dqy8.
a wolf, a goat, and a bundle of cabbages This translation is from A Raymond, from the University of Washington.
Traveling Salesman problem See https://nbviewer.jupyter.org/url/norvig.com/ipython/TSP.ipynb.
roads in the US lower forty eight states
See https://wiki.openstreetmap.org/wiki/TIGER.
no-state A person can wonder about no-state. Where is it, exactly? We can think that it is like what happens if
you write a program with a sequence of if-then statements and forget to include an else. Obviously a computer
goes somewhere, the instruction pointer points to some next address, but what happens is not sensible in terms
of the model you’ve written.
Alternatively, the wonderful book (Hofstadter 1979) describes a place named Tumbolia, which is where holes
go when they are filled (also where your lap goes when you stand). Perhaps the machines go there.
amb(S, R 0 , R 1 ... R n−1 ) The name amb abbreviates ‘ambiguous function’. Here is a small example. Essentially
Amb(x,y,z) splits the computation into three possible futures: a future in which the value x is yielded, a future
in which the value y is yielded, and a future in which the value z is yielded. The future which leads to a
successful subsequent computation is chosen. The other “parallel universes” somehow go away. (Amb called
with no arguments fails.) The output is 2 4 because Amb(1,2,3) correctly chooses the future in which x has
value 2, Amb(7,6,4,5) chooses 4, and consequently Amb(x*y = 8) produces a success.
These were described by John McCarthy in (McCarthy 1963). “Ambiguous functions are not really functions.
For each prescription of values to the arguments the ambiguous function has a collection of possible values. An
example of an ambiguous function is less(n) defined for all positive integer values of n . Every non-negative
integer less than n is a possible value of less(n). First we define a basic ambiguity operator amb(x, y) whose
possible values are x and y when both are defined: otherwise, whichever is defined. Now we can define less(n)
by less(n) = amb(n − 1, less(n − 1)).”
a demon The term ‘demon’ comes from Maxwell’s demon. This is a a thought experiment created in 1867 by the
physicist J C Maxwell, about the second law of thermodynamics, which says that it takes energy to raise the
temperature of a sealed system. Maxwell imagined a chamber of gas with a door controlled by an all-knowing
demon. When the demon sees a gas molecule of gas approaching that is slow-moving, it opens the door and lets
that molecule out of the chamber, thereby raising the chamber’s temperature without applying any external
heat. See (Wikipedia contributors 2019c).
Pronounced KLAY-nee His son Ken Kleene, wrote, “As far as I am aware this pronunciation is incorrect in all known
languages. I believe that this novel pronunciation was invented by my father.” (Computing 2017)
mathematical model of neurons (Wikipedia contributers 2017c)
have a vowel in the middle Most speakers of American English cite the vowels as ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’. See (Bigham
2014).
before and after pictures This diagram is derived from (Hopcroft, Motwani, and Ullman 2001).
The fact that we can describe these languages in so many different ways (SE author David Richerby 2018).
just list all the cases In practice the suggestion in the first paragraph to list all the cases may not be reasonable.
For example, there are finitely many people and each has finitely many active phone numbers so the set of all
currently-active phone numbers is a regular language. But constructing a Finite State machine for it is silly. In
addition, a finite regular language doesn’t have to be large for it to be difficult, in a sense. Take Goldbach’s
conjecture, that every even number greater than 2 is the sum of two primes, as in 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5,
. . . (see https://en.wikipedia.org/wiki/Goldbach%27s_conjecture). Computer testing shows that this
pattern continues to hold up to very large numbers but no one knows if it is true for all evens. Now consider the
set consisting of the string σ ∈ { 0, ... 9 }∗ representing the smallest even number that is not the sum of two
primes. This set is finite since it has either one member or none. But while that set is tiny, we don’t know what
it contains.
performing that operation on its members always yields another member Familiar examples are that adding two
integers always gives an integer so the integers are closed under the operation of addition, and that squaring an
integer always results in an integer so that the integers are closed under squaring.
the machine accepts at least one string of length k , where n ≤ k < 2n This gives an algorithm that inputs a Finite
State machine and determines, in a finite time, if it recognizes an infinite language.
be aware that another algorithm See (Knuutila 2001).
For the third This is derived from the presentation in (Hopcroft, Motwani, and Ullman 2001).
\d We shall ignore cases of non-ASCII digits, that is, cases outside 0–9.
ZIP codes ZIP stands for Zone Improvement Plan. The system has been in place since 1963 so it, like the music
movement called ‘New Wave’, is an example of the danger of naming your project something that will become
obsolete if that project succeeds.
a colon and two forward slashes The inventor of the World Wide Web, T Berners Lee, has admitted that the two
slashes don’t have a purpose (Firth 2009).
more power than the theoretical regular expressions that we studied earlier Omitting this power, and keeping the
implementation in sync with the theory, has the advantage of speed. See (Cox 2007).
valid email addresses
This expression follows the RFC 822 standard. The full listing is at http://www.ex-
parrot.com/pdw/Mail-RFC822-Address.html. It is due to Paul Warren who did not write it by hand but
instead used a Perl program to concatenate a simpler set of regular expressions that relate directly to the
grammar defined in the RFC. To use the regular expression, should you be so reckless, you would need to
remove the formatting newlines.
J Zawinski The post is from alt.religion.emacs on 1997-Aug-12. For some reason it keeps disappearing from
the online archive.
Now they have two problems. A classic example is trying to use regular expressions to parse significant parts of an
HTML document. See (bobnice 2009).
regex golf See https://alf.nu/RegexGolf, and https://nbviewer.jupyter.org/url/norvig.com/ipython/
xkcd1313.ipynb.
Complexity
A natural next step is to look to do jobs efficiently S Aaronson states it more provocatively as, “[A]s computers
became widely available starting in the 1960s, computer scientists increasingly came to see computability
theory as not asking quite the right questions. For, almost all the problems we actually want to solve turn out
to be computable in Turing’s sense; the real question is which problems are efficiently or feasibly computable.”
(Aaronson 2011)
A Karatsuba See https://en.wikipedia.org/wiki/Anatoly_Karatsuba.
clever algorithm The idea is: let k = ⌈n/2⌉ and write x = x 1 2k + x 0 and y = y1 2k + y0 (so for instance,
678 = 21 · 25 + 6 and 42 = 1 · 25 + 10). Then xy = A · 22k + B · 2k + C where A = x 1y1 , and B = x 1y0 + x 0y1 ,
and C = x 0y0 (for example, 28 476 = 21 · 210 + 216 · 25 + 60). The multiplications by 22k and 2k are just
bit-shifts to known locations independent of the values of x and y , so they don’t affect the time much. But the
two multiplications for B seem remove all the advantage and still give n 2 time. However, Karatsuba noted that
B = (x 0 + x 1 ) · (y0 + y1 ) − A − C Boom: done. Just one multiplication.
table below shows why This table is adapted from (Garey and Johnson 1979).
there are 3.16 × 107 seconds in a year The easy way to remember this is the bumper sticker slogan by Tom Duff
from Bell Labs: “π seconds is a nanocentury.”
very, very much larger than polynomial growth According to an old tale from India, the Grand Vizier Sissa Ben
Dahir was granted a wish for having invented chess for the Indian King, Shirham. Sissa said, “Majesty, give
me a grain of wheat to place on the first square of the board, and two grains of wheat to place on the second
square, and four grains of wheat to place on the third, and eight grains of wheat to place on the fourth, and so
on. Oh, King, let me cover each of the 64 squares of the board.”
“And is that all you wish, Sissa, you fool?” exclaimed the astonished King.
“Oh, Sire,” Sissa replied, “I have asked for more wheat than you have in your entire kingdom. Nay, for more
wheat that there is in the whole world, truly, for enough to cover the whole surface of the earth to the depth of
the twentieth part of a cubit.”
Sissa has the right idea but his arithmetic is slightly off. A cubit is the length of a forearm, from the tip of
the middle finger to the bottom of the elbow, so perhaps twenty inches. The geometric series formula gives
1 + 2 + 4 + · · · + 263 = 264 − 1 = 18 446 744 073 709 551 615 ≈ 1.84 × 1019 grains of rice. The surface are
of the earth, including oceans, is 510 072 000 square kilometers. There are 1010 square centimeters in each
square kilometer so the surface of the earth is 5.10 × 1018 square centimeters. That’s between three and four
grains of rice on every square centimeter of the earth. Not rice an inch thick, but still a lot.
Another way to get a sense of the amount of rice is: there are about 7.5 billion people on earth so it is on the
order of 108 grains of rice for each person in the world. There are about 1 000 000 = 107 grains of rice in a
bushel. In sum, ten bushels for each person.
Cobham’s thesis Credit for this goes to both A Cobham and J Edmonds, separately; see (Cobham 1965) and
(Edmunds 1965).
Cobham’s paper starts by asking whether “is it harder to multiply than to add?” a question that we still cannot
answer. Clearly we can add two n -bit numbers in O(n) time, but we don’t know whether we can multiply in
linear time.
Cobham then goes on to point out the distinction between the complexity of a problem and the running time
of a particular algorithm to solve that problem, and notes that many familiar functions, such as addition,
multiplication, division, and square roots, can all be computed in time “bounded by a polynomial in the lengths
of the numbers involved.” He suggests we consider the class of all functions having this property.
As for Edmunds, in a “Digression” he writes: “An explanation is due on the use of the words ‘efficient algorithm.’
According to the dictionary, ‘efficient’ means ‘adequate in operation or performance.’ This is roughly the
meaning I want — in the sense that it is conceivable for [this problem] to have no efficient algorithm. . . . There
is an obvious finite algorithm, but that algorithm increases in difficulty exponentially with the size of the graph.
It is by no means obvious whether or not there exists an algorithm whose difficulty increases only algebraically
with the size of the graph . . . If only to motivate the search for good, practical algorithms, it is important to
realize that it is mathematically sensible even to question their existence.”
tractable Another word that you can see in this context is ‘feasible’. Some authors use them to mean the same
thing, roughly that we can solve reasonably-sized problem instances using reasonable resources. But some
authors use ‘feasible’ to have a different connotation, for instance explicitly disallowing inputs are too large,
such as having too many bits to fit in the physical universe. The word ‘tractable’ is more standard and works
better with the definition that includes the limit as the input size goes to infinity, so here we stick with it.
slower by nine extra assignments Assuming that the compiler does not optimize it out of the loop.
The definition of Big O ignores constant factors This discussion originated as (SE author babou and various others
2015).
the order of magnitude of these constants For a rough idea of that these may be, here are some numbers that every
programmer should know.
S A T O R
A R E P O
T E N E T
O P E R A
R O T A S
It appears in many places in the Roman Empire, often as graffiti. For instance, it was found in the ruins of
Pompeii. Like many word game solutions it sacrifices comprehension for form but it is a perfectly grammatical
sentence that translates as something like, “The farmer Arepo works the wheel with effort.”
popularized as a toy It was invented by Noyes Palmer Chapman, a postmaster in Canastota, New York. As early as
1874 he showed friends a precursor puzzle. By December 1879 copies of the improved puzzle were circulating
in the northeast and students in the American School for the Deaf and other started manufacturing it. They
become popular as the “Gem Puzzle.” Noyes Chapman had applied for a patent in February, 1880. By that time
the game had became a craze in the US, somewhat like Rubik’s Cube a century later. It was also popular in
Canada and Europe. See (Wikipedia contributers 2017a)
we know of no efficient algorithm to find divisors An effort in 2009 to factor a 768-bit number (232-digits) used
hundreds of machines and took two years. The researchers estimated that a 1024-bit number would take about
a thousand times as long.
Factoring seems, as far as we know today, to be hard Finding factors has for many years been thought hard. For
instance, a number is called a Mersenne prime if it is a prime number of the form 2n − 1. They are named after
M Mersenne, a French friar and important figure in the early sharing of scientific results, who studied them
in the early 1600’s. He observed that if n is prime then 2n − 1 may be prime, for instance with n = 3, n = 7,
n = 31, and n = 127. He suspected that others of that form were also prime, in particular n = 67.
On 1903-Oct-31 F N Cole, then Secretary of the American Mathematical Society, made a presentation at a
math meeting. When introduced, he went to the chalkboard and in complete silence computed 267 − 1 =
147 573 952 589 676 412 927. He then moved to the other side of the board, wrote 193 707 721 times
761 838 257 287, and worked through the calculation, finally finding equality. When he was done Cole returned
to his seat, having not uttered a word in the hour-long presentation. His audience gave him a standing ovation.
Cole later said that finding the factors had been a significant effort, taking “three years of Sundays.”
Platonic solids See (Wikipedia contributers 2017j).
as shown Some PDF readers cannot do opacity, so you may not see the entire Hamiltonian path.
Six Degrees of Kevin Bacon One night, three college friends, Brian Turtle, Mike Ginelli, and Craig Fass, were
watching movies. Footloose was followed by Quicksilver, and between was a commercial for a third Kevin Bacon
movie. It seemed like Kevin Bacon was in everything! This prompted the question of whether Bacon had ever
worked with De Niro? The answer at that time was no, but De Niro was in The Untouchables with Kevin Costner,
who was in JFK with Bacon. The game was born. It became popular when they wrote to Jon Stewart about it
and appeared on his show. (From (Blanda 2013).) See https://oracleofbacon.org/.
uniform family of tasks From (Jones 1997).
There is no widely-accepted formal definition of ‘algorithm’ This discussion derives from (Pseudonym 2014).
default interpretation of ‘problem’ Not every computational problem is naturally expressable as a language decision
problem Consider the task of sorting the characters of strings into ascending order. We could try to express it as
the language of sorted strings {σ ∈ Σ∗ σ is sorted }. But this does not require that we find a good way to sort
an unsorted input. Another thought is to consider the language of pairs ⟨σ , p⟩ where p is a permutation of the
numbers 0, ... |σ | − 1 that brings the string into ascending order. But here also the formulation seems to not
capture the sorting problem, in that recognizing a correct permutation feels different than generating one from
scratch.
input two numbers and output their midpoint See https://hal.archives-ouvertes.fr/file/index/docid/
576641/filename/computing-midpoint.pdf.
final two bits are 00 Decimal representation is not much harder since a decimal number is divisible by four if and
only if the final two digits are in the set { 00, 04, ... 96 }.
everything of interest can be represented with reasonable efficiency by bitstrings See https://rjlipton.wordpress.
com/2010/11/07/what-is-a-complexity-class/. Of course, a wag may say that if it cannot be represented
by bitstrings then it isn’t of interest. But we mean something less tautological: we mean that if we could want to
compute with it then it can be put in bitstrings. For example, we find that we can process speech, adjust colors
on an image, or regulate pressure in a rocket fuel tank, all in bitstrings, despite what may at first encounter
seem to be the inherently analog nature of these things.
Beethoven’s 9th Symphony The official story is that CD’s are 72 minutes long so that they can hold this piece.
researchers often do not mention representations This is like a programmer saying, “My program inputs a number”
rather than, “My program inputs the binary representation of a number.” It is also like a person saying, “That’s
me on the card” rather than “On that card is a picture of me.”
leaving implementation details to a programmer (Grossman 2010)
complexity class There are various definitions, which are related but not equivalent. Some authors fold in the
requirement that that a class be associated with some resource specification. This has some implications because
if an author, for instance, requires that each class be problems that are somehow solvable by Turing machines
then each class is countable. Our definition is more general and does not imply that a class is countable.
the time and space behavior We will concentrate our attention resource bounds in the range from logarithmic and
exponential, because these are the most useful for understanding problems that arise in practice.
less than centuries See the video from Google at https://www.youtube.com/watch?v=-ZNEzzDcllU and S Aaron-
son’s Quantum Supremacy FAQ at https://www.scottaaronson.com/blog/?p=4317.
the claim is the subject of scholarly controversy See the posting from IBM Research at https://www.ibm.com/blogs/
research/2019/10/on-quantum-supremacy/ and G Kalai’s Quantum Supremacy Skepticism FAQ at https:
//gilkalai.wordpress.com/2019/11/13/gils-collegial-quantum-supremacy-skepticism-faq/.
We will give the class P a lot of attention This discussion gained much from the material in (Allender, Loui, and
Regan 1997). This includes several direct quotations.
not able rightly to apprehend This refers to a time when C Babbage was looking for money to build a mechanical
computer and was asked in a parliamentary committee, “Pray, Mr. Babbage, if you put into the machine wrong
figures, will the right answers come out?” and he said, “I am not able rightly to apprehend the kind of confusion
of ideas that could provoke such a question.” Babbage for the win.
RE Recall that ‘recursively enumerable’ is an older term for ‘computably enumerable’.
adds some wrinkles But it avoids a wrinkle that we needed for Finite State machines, ε transitions, since Turing
machines are not required to consume their input one character at a time.
function computed by a nondeterministic machine One thing that we can do is to define that the nondeterministic
machine computes f : B∗ → B∗ if on an input σ , all branches halt and they all leave the same value on the
tape, which we call f (σ ). Otherwise, the value is undefined, f (σ )↑.
might be much faster R Hamming gives this example to demonstrate that an order of magnitude change in speed
can change the world, can change what can be done: we walk at 4 mph, a car goes at 40 mph, and an airplane
goes at 400 mph. This relates to the bug picture that opens this chapter.
the problem of chess Chess is known to be a solvable game. This is Zermelo’s Theorem (Wikipedia contributers
2017l) — there is a strategy for one of the two players that forces a win or a draw, no matter how the opponent
plays
at least appears to take exponential time In the terminology of a later section, chess is known to be EXP complete.
See (Fraenkel and Lichtenstein 1981).
in a sense, useless Being given an answer with no accompanying justification is a problem. This is like the Feynman
algorithm for doing Physics: “The student asks . . . what are Feynman’s methods? [M] Gell-Mann leans coyly
against the blackboard and says: Dick’s method is this. You write down the problem. You think very hard. (He
shuts his eyes and presses his knuckles parodically to his forehead.) Then you write down the answer.” (Gleick
1992) It is also like the mathematician S Ramanujan, who relayed that the advanced formulas that he produced
came in dreams from the god Narasimha. Some of these formulas were startling and amazing, but some of
them were wrong. (India Today 2017) And of course the most famous example of a failure to provide backing is
Fermat writing in a book he owned that that there are no nontrivial instances of x n + y n , z n for n > 2 and
then saying, “I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.”
Countdown See https://en.wikipedia.org/wiki/Countdown_(game_show).
quite popular See also https://en.wikipedia.org/wiki/8_Out_of_10_Cats_Does_Countdown.
a graph This is the Petersen graph, a rich source of counterexamples for conjectures in Graph Theory
Drummer problem This is often called the Marriage problem, where the men name suitable women. But perhaps
it is time for a new paradigm.
NP complete The name came from a contest run by Knuth; see http://blog.computationalcomplexity.org/
2010/11/by-any-other-name-would-be-just-as-hard.html.
there are many such problems The “at least as hard” is true in the sense that such problems can answer questions
about any other problem in that class. However, note that it might be that one NP complete problem runs in
nondeterministic time that is O(n) while another runs in O(n 1 000 000 ) time. So this sense is at odds with our
earlier characterization of problems that are harder to solve.
The following list These are from the classic standard reference (Garey and Johnson 1979).
tied up with the question of whether P is unequal to NP Ladner’s theorem is that if P , NP then there is a problem
in NP − P that is not NP complete.
A large class See (Karp 1972).
often an ending point That is, as P Pudlàk observes, we treat P , NP as an informal axiom. (Pudlàk 2013)
caricature Paul Erdős joked that a mathematician is a machine for turning coffee into theorems.
completely within the realm of possibility that ϕ(n) grows that slowly Hartmanis observes in (Hartmanis 2017) that
it is interesting that Gödel, the person who destroyed Hilbert’s program of automating mathematics, seemed to
think that these problems quite possibly are solvable in linear or quadratic time.
In 2018 a poll The poll was conducted by W Gasarch, a prominent researcher and blogger in Computational
Complexity. There were 124 respondents. For the description see https://www.cs.umd.edu/users/gasarch/
BLOGPAPERS/pollpaper3.pdf. Note the suggestions that both respondents and even the surveyor took the
enterprise in a light-hearted way.
88% thought that P , NP Gasarch divided respondents into experts, the people who are known to have seriously
thought about the problem, and the masses. The experts were 99% for P , NP.
Cook is one See (S. Cook 2000).
Many observers For example, (Viola 2018)
O(n lg 7 ) method (lg 7 ≈ 2.81) Strassen’s algorithm is used in practice. The current record is O(n 2 . 37 ) but it is
not practical. It is a galactic algorithm because while runs faster than any other known algorithm when the
problem is sufficiently large, but the first such problem is so big that we never use the algorithm. For other
examples see (Wikipedia contributors 2020b).
Matching problem The Drummer problem described earlier is a special case of this for bipartite graphs.
Even with only a hundred people There are about 1080 atoms in the universe. A graph with 100 vertices has the
100
potential for 2 edges, which is about 1002 . Trying every edge would be 210 000 ≈ 1010 000/3 . 32 cases, which
is much greater than 1080 .
since the 1960’s we have an algorithm Due to J Edmonds.
Theory of Computing blog feed (Various authors 2017)
R J Lipton captured this sense (Lipton 2009)
Knuth has a related but somewhat different take (D. Knuth 2014)
exploits the difference Recent versions of the algorithm used in practice incorporate refinements that we shall not
discuss. The core idea is unchanged.
Their algorithm, called RSA Originally the authors were listed in the standard alphabetic order: Adleman, Rivest,
and Shamir. Adleman objected that he had not done enough work to be listed first and insisted on being listed
last. He said later, “I remember thinking that this is probably the least interesting paper I will ever write.”
tremendous amount of interest and excitement In his 1977 column, Martin Gardner posed a $100 challenge, to
crack this message: 9686 9613 7546 2206 1477 1409 2225 4355 8829 0575 9991 1245 7431 9874 6951
2093 0816 2982 2514 5708 3569 3147 6622 8839 8962 8013 3919 9055 1829 9451 5781 5254 The
ciphertext was generated by the MIT team from a plaintext (English) message using e = 9007 and this number
n (which is too long to fit on one line).
114, 381, 625, 757, 888, 867, 669, 235, 779, 976, 146, 612, 010, 218, 296, 721, 242,
362, 562, 561, 842, 935, 706, 935, 245, 733, 897, 830, 597, 123, 563, 958, 705,
058, 989, 075, 147, 599, 290, 026, 879, 543, 541
In 1994, a team of about 600 volunteers announced that they had factored n .
p =3, 490, 529, 510, 847, 650, 949, 147, 849, 619, 903, 898, 133, 417, 764,
638, 493, 387, 843, 990, 820, 577
and
q = 32, 769, 132, 993, 266, 709, 549, 961, 988, 190, 834, 461, 413, 177, 642, 967,
992, 942, 539, 798, 288, 533
That enabled them to decrypt the message: the magic words are squeamish ossifage.
computer searches suggest that these are very rare For instance, among the numbers less than 2.5 × 1010 there are
only 21 853 ≈ 2.2 × 104 pseudoprimes base 2; that’s six orders of magnitude.
any reasonable-sized k Selecting an appropriate k is an engineering choice between the cost of extra iterations
and the gain in confidence.
we are quite confident that it is prime We are confident, but not certain. There are numbers, called Carmichael
numbers, that are pseudoprime for every base a relatively prime to n . The smallest example is n = 561 = 3 · 11 · 17,
and the next two are 1 105 and 1 729. Like pseudoprimes, these seem to be very rare. Among the numbers
less than 1016 there are 279 238 341 033 922 primes, about 2.7 × 1014 , but only 246 683 ≈ 2.4 × 105 -many
Carmichael numbers.
the minimal pub crawl See (W. Cook et al. 2017).
Appendix
empty string, denoted ε Possibly ε came as an abbreviation for ‘empty’. Some authors use λ , possibly from the
German word for ‘empty’, leer. (Sirén 2016)
reversal σ R of a string The most practical current notion of a string, the Unicode standard, does not have string
reversal. All of the naive ways to reverse a string run into problems for arbitrary Unicode strings which may
contain non-ASCII characters, combining characters, ligatures, bidirectional text in multiple languages, and
so on. For example, merely reversing the chars (the Unicode scalar values) in a string can cause combining
marks to become attached to the wrong characters. Another example is: how to reverse ab<backspace>ab?
The Unicode Consortium has not gone through the effort to define the reverse of a string because there is no
real-world need for it. (From https://qntm.org/trick.)
Credits
Prologue
I.1.11 SE user Shuzheng, https://cs.stackexchange.com/q/45589/50343
I.1.12 Question by SE user Arsalan MGR, https://cs.stackexchange.com/q/
135343/50343
Background
II.2 Image credit: Robert Williams and the Hubble Deep Field Team (STScI) and
NASA.
II.1 Image credit File:Galilee.jpg. (2018, September 27). Wikimedia Commons, the
free media repository. Retrieved 22:19, January 26, 2020 from https://commons.
wikimedia.org/w/index.php?title=File:Galilee.jpg&oldid=322065651.
II.2.39 The answer derives from one by Edward James, along with one by Keith
Ramsay.
II.3.17 User scherk at pbworks.com.
II.3.27 Michael J Neely
II.3.29 Answer from Stack Exchange member Alex Becker.
II.4 ENIAC Programmers, 1946 U. S. Army Photo from Army Research Labs Technical
Library
II.4.5 Started onStack Exchange
II.4.8 From a Stack Exchange question.
II.5.12 SE user npostavs, https://cs.stackexchange.com/a/44875/50343
II.5.29 SE user Raphael https://cs.stackexchange.com/a/44901/50343
II.6.10 Question by SE user MathematicalOrchid, https://cs.stackexchange.
com/q/2811/67754, and answer by SE user Andrej Bauer.
II.6.24 SE user Rajesh R
II.8.13 http://people.cs.aau.dk/~srba/courses/tutorials-CC-10/t5-sol.
pdf
II.8.15 SE user Karolis Juodelė
II.8.18 SE user Noah Schweber
II.9.10 (Rogers 1987), p 214.
II.9.12 (Rogers 1987), p 214.
II.9.15 (Rogers 1987), p 214.
II.A.1 https://www.ias.edu/ideas/2016/pires-hilbert-hotel
II.C.2 https://research.swtch.com/zip and Kevin Matulef
II.C.3 http://en.wikipedia.org/wiki/Quine_%28computing%29
II.C.4 J Avigad Computability and Incompleteness Lecture notes, https://www.
andrew.cmu.edu/user/avigad/Teaching/candi_notes.pdf
Languages
III.1.25 F Stephan, https://www.comp.nus.edu.sg/~fstephan/toc01slides.
pdf
III.1.36 SE user babou
III.2.9 SE user Rick Decker
III.2.16 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.2.19 (Hopcroft, Motwani, and Ullman 2001), exercise 5.1.2.
III.2.31 https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_
buffalo_buffalo_Buffalo_buffalo, https://cse.buffalo.edu/~rapaport/
BuffaloBuffalo/buffalobuffalo.html
III.2.35 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.3.23 T Zaremba, http://www.geom.uiuc.edu/~zarembe/graph3.html.
III.A.10 http://people.cs.ksu.edu/~schmidt/300s05/Lectures/GrammarNotes/
bnf.html
Automata
IV.1.42 From Introduction to Languages by Martin, edition four, p 77.
IV.4.24 (Rich 2008), https://math.stackexchange.com/a/1102627
IV.4.27 (Rich 2008)
IV.5.19 SE user David Richerby, https://cs.stackexchange.com/a/97885/67754
Complexity
V.4 Some of the discussion is from https://softwareengineering.stackexchange.
com/a/20833.
V.4 Discussion of the third issue started as https://cs.stackexchange.com/
questions/9957/justification-for-neglecting-constants-in-big-o.
V.4 The fourth point derives from https://stackoverflow.com/a/19647659.
V.4 This discussion originated as (SE author templatetypedef 2013).
V.1.50 Stack Exchange user templatetypedef https://stackoverflow.com/a/
19647659/7168267
V.1.52 Stack Exchange user Daniel Fischer, https://math.stackexchange.com/
a/674039, and Stack Exchange user anon, https://math.stackexchange.com/
a/61741
V.1.58 Stack Exchange user Ilmari Karonen, https://math.stackexchange.com/
questions/925053/using-limits-to-determine-big-o-big-omega-and-big-
theta
V.2.24 Sean T. McCulloch, https://npcomplete.owu.edu/2014/06/03/3-dimensional-
matching/
V.2.61 Jan Verschelde, http://homepages.math.uic.edu/~jan/mcs401/partitioning.
pdf
V.3.9 A.A. at https://rjlipton.wordpress.com/2010/11/07/what-is-a-complexity-
class/#comment-8872
V.4.16 https://cs.stackexchange.com/q/57518
V.5.18 Paul Black, https://xlinux.nist.gov/dads/HTML/nondetermAlgo.html
V.7.16 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/
np.html
V.7.17 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/
np-sol.html
V.7.19 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/
np.html
V.7.22 Y Lyuu, https://www.csie.ntu.edu.tw/~lyuu/complexity/2016/20161129s.
pdf
V.7.28 SE user Yuval Filmus https://cs.stackexchange.com/a/132902/50343
V.8.17 SE user Yuval Filmas https://cs.stackexchange.com/a/54452/50343
Appendix
Notes
Bibliography
A/V Geeks, YouTube user, ed. (2013). Slide Rule - Proportion, Percentage, Squares And
Square Roots (1944). Division of Visual Aids, US Office of Education. url: https :
//www.youtube.com/watch?v=dT7bSn03lx0 (visited on 08/09/2015).
Aaronson, Scott (Aug. 14, 2011). Why Philosophers Should Care About Computational Com-
plexity. url: https://arxiv.org/abs/1108.1791.
— (May 3, 2012a). The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam
Yedidia and me. url: http://www.scottaaronson.com/blog/?p=2725.
— (Aug. 30, 2012b). The Toaster-Enhanced Turing Machine. url: http://www.scottaaronson.
com/blog/?p=1121 (visited on 05/28/2015).
Adams, Douglas (1979). The Hitchhiker’s Guide to the Galaxy. Harmony Books. isbn:
9780345391803.
Allender, Eric, Michael C. Loui, and Kenneth W. Regan (1997). “Complexity Classes”. In:
ed. by Mikhail J. Atallah and Marina Blanton. Boca Raton, Florida: CRC Press. Chap. 27.
Bellos, Alex (Dec. 15, 2014). “The Game of Life: a beginner’s guide”. In: The Guardian.
url: http://www.theguardian.com/science/alexs-adventures-in-numberland/
2014/dec/15/the-game-of-life-a-beginners-guide (visited on 07/14/2015).
Bernstein, Ethan and Umesh Vazirani (1997). “Quantum Complexity Theory”. In: SIAM
Journal of Compututing 26.5, pp. 1411–1473.
Bigham, D S (Aug. 19, 2014). How Many Vowels Are There in English? (Hint: It’s More Than
AEIOUY.) Slate. url: http://www.slate.com/blogs/lexicon_valley/2014/08/
19/aeiou_and_sometimes_y_how_many_english_vowels_and_what_is_a_vowel_
anyway.html (visited on 06/12/2017).
Black, Robert (2000). “Proving Church’s Thesis”. In: Philosophia Mathematica 8, pp. 244–258.
Blanda, Stephanie (2013). The Six Degrees of Kevin Bacon. [Online; accessed 2019-Apr-
01]. url: https://blogs.ams.org/mathgradblog/2013/11/22/degrees- kevin-
bacon/.
bobnice, stackoverflow user (2009). Answer to: RegEx match open tags except XHTML self-
contained tags. url: https://stackoverflow.com/a/1732454/7168267 (visited on
01/27/2019).
Bragg, Melvyn (Sept. 2016). Zeno’s Paradoxes. Podcast. Guests: Marcus du Sautoy, Barbara
Sattler, and James Warren. British Broadcasting Corporation. url: https://www.bbc.
co.uk/programmes/b07vs3v1.
Brock, David C. (2020). Discovering Dennis Ritchie’s Lost Dissertation. [Online; accessed
2020-Jun-20]. url: https://computerhistory.org/blog/discovering- dennis-
ritchies-lost-dissertation/.
Brower, Kenneth (1983). The Starship and the Canoe. Harper Perennial; Reprint edition. isbn:
978-0060910303.
Church, Alonzo (1937). “Review of Alan M. Turing, On computable numbers, with an
application to the Entscheidungsproblem”. In: Journal of Symbolic Logic 2, pp. 42–43.
Cobham, A (1965). “The intrinsic computational difficulty of functions”. In: Logic, Methodology
and Philosophy of Science: Proceedings of the 1964 International Congress. Ed. by Y Bar-
Hillel. North-Holland Publishing Company, pp. 24–30.
BIBLIOGRAPHY 397
Computing, Free Online Dictionary of (2017). Stephen Kleene. [Online; accessed 21-June-2017].
url: http://foldoc.org/Stephen%20Kleene.
Cook, Stephen (2000). The P vs NP Problem. Official problem description. Clay Mathematics
Institute. url: https: // www.claymath .org /sites/ default/ files/ pvsnp .pdf
(visited on 01/11/2018).
Cook, William et al. (2017). UK Pubs Travelling Salesman Problem. url: http://www.math.
uwaterloo.ca/tsp/pubs/index.html (visited on 12/16/2017).
Copeland, B. J. and D. Proudfoot (1999). “Alan Turing’s Forgotten Ideas in Computer
Science”. In: Scientific American 280.4, pp. 99–103.
Copeland, B. Jack (Sept. 1996). “What is Computation?” In: Computation, Cognition and AI,
pp. 335–359.
— (1999). “Beyond the universal Turing machine”. In: Australasian Journal of Philosophy
77.1, pp. 46–67.
— (Aug. 19, 2002). The Church-Turing Thesis; Misunderstandings of the Thesis. url: http://
plato.stanford.edu/entries/church-turing/#Bloopers (visited on 01/07/2016).
Cox, Russ (2007). Regular Expression Matching Can Be Simple And Fast (but is slow in Java,
Perl, PHP, Python, Ruby, . . .) url: https://swtch.com/~rsc/regexp/regexp1.html
(visited on 06/29/2019).
Davis, Martin (2004). “The Myth of Hypercomputation”. In: Alan Turing: Life and Legacy of
a Great Thinker. Ed. by Christof Teuscher. Springer, pp. 195–211. isbn: ISBN 978-3-662-
05642-4.
— (2006). “Why there is no such discipline as hypercomputation”. In: Applied Mathematics
and Computation 178, pp. 4–7.
Dershowitz, Nachum and Yuri Gurevich (Sept. 2008). “A Natural Axiomatization of Com-
putability and Proof of Church’s Thesis”. In: Bulletin of Symbolic Logic 14.3, pp. 299–
350.
Edmunds, Jack (1965). “Paths, trees, and flowers”. In: Canadian Journal of Mathematics 17,
pp. 449–467.
Encyclopædia Britannica, The Editors of (2017). Y2K bug. url: https://www.britannica.
com/technology/Y2K-bug (visited on 05/10/2017).
Euler, L (1766). “Solution d’une question curieuse que ne paroit soumise a aucune analyse
(Solution of a curious question which does not seem to have been subjected to any
analysis)”. In: Mémoires de l’Academie Royale des Sciences et Belles Lettres, Année 1759 15.
[Online; accessed 2017-Sep-23, article 309], pp. 310–337. url: http://eulerarchive.
maa.org/.
Firth, Niall (Oct. 14, 2009). “Sir Tim Berners-Lee admits the forward slashes in every
web address ‘were a mistake’”. In: Daily Mail. url: https://www.dailymail.co.
uk / sciencetech / article - 1220286 / Sir - Tim - Berners - Lee - admits - forward -
slashes-web-address-mistake.html (visited on 11/29/2018).
Fortnow, Lance and Bill Gasarch (2002). Computational Complexity Blog. [Online; ac-
cessed 2017-Nov-13]. url: http://blog.computationalcomplexity.org/2002/11/
foundations-of-complexitylesson-7.html.
Fraenkel, Aviezri S. and David Lichtenstein (1981). “Computing a Perfect Strategy for n × n
Chess Requires Time Exponential in n ”. In: Journal Of Combinatorial Theory, Series A,
pp. 199–214.
398 BIBLIOGRAPHY
Gandy, Robin (1980). “Church’s Thesis and Principles for Mechanisms”. In: The Kleene
Symposium. Ed. by J. Barwise, H. J. Keisler, and K Kunen. North-Holland Amsterdam,
pp. 123–148. isbn: 978-0-444-85345-5.
Garey, Michael and David Johnson (1979). Computers and Intractability, A Guide to the
Theory of NP Completeness. W. H. Freeman.
Gleick, James (Sept. 20, 1992). “Part Showman, All Genius”. In: New York Times Magazine.
url: https : / / www . nytimes . com / 1992 / 09 / 20 / magazine / part - showman - all -
genius.html (visited on 11/27/2020).
Gödel, K. (1964). “What is Cantor’s Continuum Problem?” In: Philosophy of Mathematics:
Selected Readings. Ed. by Paul Benacerraf and Hilary. Putnam. Cambridge University
Press, pp. 470–494.
— (1995). “Undecidable diophantine propositions”. In: Collected works Volume III: Unpub-
lished essays and lectures. Ed. by S. Feferman et al. Oxford University Press.
Goodstein, R. L. (Dec. 1947). “Transfinite Ordinals in Recursive Number Theory”. In: Journal
of Symbolic Logic 12.4, pp. 123–129.
Grossman, Lisa (2010). Metric Math Mistake Muffed Mars Mereorology Misstion. [Online;
accessed 2017-May-25]. url: https://www.wired.com/2010/11/1110mars-climate-
observer-report/.
Hartmanis, J (2017). Gödel, von Neumann and the P =?NP Problem. url: http://www.cs.
cmu.edu/~15455/hartmanis-on-godel-von-neumann.pdf (visited on 12/25/2017).
Hennie, Fred (1977). Introduction to Computability. Addison-Wesley.
Hilbert, David and Wilhelm Ackermann (1950). Principles of theoretical logic. Trans. by
R E Luce. AMS Chelsea Publishing.
Hodges, Andrew (2016). Alan Turing in the Stanford Encyclopedia of Philosophy. url: http:
//www.turing.org.uk/publications/stanford.html (visited on 04/06/2016).
— (1983). Alan Turing: the enigma. Simon and Schuster. isbn: 0-671-49207-1.
Hofstadter, Douglas R. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books.
Hopcroft, John E, Rajeev Motwani, and Jeffrey D Ullman (2001). Introduction to Automata
Theory, Languages, and Computation. 2nd ed. Pearson Education. isbn: 0201441241.
Huggett, Nick (2010). Zeno’s Paradoxes — Stanford Encyclopedia of Philosophy. [Online;
accessed 23-Dec-2016]. url: https : / / plato . stanford . edu / entries / paradox -
zeno/#ParMot.
India Today (Apr. 26, 2017). “Srinivasa Ramanujan: The mathematical genius who credited his
3900 formulae to visions from Goddess Mahalakshmi”. In: India Today. url: https://
www.indiatoday.in/education-today/gk-current-affairs/story/srinivasa-
ramanujan-life-story-973662-2017-04-26 (visited on 11/27/2020).
Joel David Hamkins, mathoverflow.net user (2010). Answer to: Infinite CPU clock rate and
hotel Hilbert. url: https://mathoverflow.net/a/22038 (visited on 04/19/2017).
Jones, Neil D. (1997). Computability and Complexity From a Programming Perspective. 1st ed.
MIT Press. isbn: 978-0262100649.
Karp, Richard M (1972). “Reducibility Among Combinatorial Problems”. In: ed. by R. E.
Miller and J. W. Thatcher. New York: Plenum, pp. 85–103. url: https://www.loc.gov/
resource/cph.3c10471/ (visited on 12/21/2017).
Kleene, Stephen (1952). Introduction to Metamathematics. North-Holland Amsterdam.
Klyne, G. and C. Newman (July 2002). Date and Time on the Internet: Timestamps. RFC 3339.
RFC Editor, pp. 1–18. url: https://www.ietf.org/rfc/rfc3339.txt.
BIBLIOGRAPHY 399
Knuth, Donald (May 20, 2014). Twenty Questions for Donald Knuth. url: http://www.
informit.com/articles/article.aspx?p=2213858 (visited on 02/17/2018).
Knuth, Donald E. (Dec. 1964). Backus Normal Form vs. Backus Naur Form. Letter to the
Editor.
Knuutila, Timo (2001). “Redescribing an algorithm by Hopcroft”. In: Theoretical Computer
Science 250, pp. 333–363.
Kragh, Helge (Mar. 27, 2014). The True (?) Story of Hilbert’s Infinite Hotel. url: http :
//arxiv.org/abs/1403.0059.
Leupold, Jacob, 1674–1727 (1725). “Details of the mechanisms of the Leibniz calculator, the
most advanced of its time”. In: Illustration in: Theatrum arithmetico-geometricum, das
ist . . . [bound with Theatrum machinarium, oder, Schau-Platz der Heb-Zeuge/Jacob
Leupold. Leipzig, 1725]. Leipzig: Zufinden bey dem Autore und Joh. Friedr. Gleditschens
seel. Sohn: Gedruckt bey Christoph Zunkel, 1727. url: https : / / www . loc . gov /
resource/cph.3c10471/ (visited on 11/14/2016).
Levin, Leonid A. (Dec. 7, 2016). Fundamentals of Computing. url: https://www.cs.bu.
edu/fac/lnd/toc/.
Lipton, Richard Jay (Sept. 22, 2009). It’s All Algorithms, Algorithms and Algorithms.
url: https : / / rjlipton . wordpress . com / 2009 / 09 / 22 / its - all - algorithms -
algorithms-and-algorithms/ (visited on 02/17/2018).
Maienschein, Jane (2017). “Epigenesis and Preformationism”. In: The Stanford Encyclopedia
of Philosophy. Ed. by Edward N. Zalta. Spring 2017. Metaphysics Research Lab, Stanford
University.
McCarthy, John (1963). A Basis for a Mathematical Theory of Computation. url: http://www-
formal.stanford.edu/jmc/basis1.pdf (visited on 06/15/2017).
Meyer, Albert R. and Dennis M. Ritchie (1966). Research report: The complexity of loop
programs. Tech. rep. 1817. IBM.
N. J. A. Sloane, editor (2019). The On-Line Encyclopedia of Integer Sequences, A000290. url:
https://oeis.org/A000290 (visited on 03/02/2019).
navyreviewer, YouTube user (2010). Mechanical computer part 1. url: https : / / www .
youtube.com/watch?v=mpkTHyfr0pM (visited on 08/09/2015).
Odifreddi, Piergiorgio (1992). Clasical Recursion Theory. Elsevier Science. isbn: 0-444-87295-7.
Piccinini, Gualtiero (2017). “Computation in Physical Systems”. In: The Stanford Encyclopedia
of Philosophy. Ed. by Edward N. Zalta. Summer 2017. Metaphysics Research Lab, Stanford
University.
Pinker, Steven (Sept. 4, 2014). The Trouble With Harvard. url: https://newrepublic.com/
article/119321/harvard-ivy-league-should-judge-students-standardized-
tests (visited on 12/23/2020).
Pour-El, M. B. and I. Richards (1981). “The wave equation with computable initial data such
that its unique solution is not computable”. In: Adv. in Math 39, pp. 215–239.
Pseudonym, cs.stackexchange user (2014). Answer to: What exactly is an algorithm? url:
https://cs.stackexchange.com/a/31953 (visited on 12/27/2018).
Pudlàk, Pavel (2013). Logical Foundations of Mathematics and Computational Complexity.
Springer. isbn: 978-3-319-34268-9.
R H Bruck (n.d.). “Computational Aspects of Certain Combinatorial Problems”. In: AMS
Symposium in Applied Mathematics 6, p. 31.
400 BIBLIOGRAPHY
Radó, Tibor (May 1962). “On Non-computable Functions”. In: Bell Systems Technical Journal,
pp. 877–884. url: https : / / ia601900 . us . archive . org / 0 / items / bstj41 - 3 -
877/bstj41-3-877.pdf.
Rendell, Paul (2011). http://rendell-attic.org/gol/tm.htm. url: http://rendell-attic.
org/gol/tm.htm (visited on 07/21/2015).
Rich, Elaine (2008). Automata, Computability, and Complexity. Pearson. isbn: 978-0-13-
228806-4.
Robinson, Raphael (1948). “Recursion and Double Recursion”. In: Bulletin of the American
Mathematical Society 10, pp. 987–993.
Rogers Jr., Hartley (Sept. 1958). “Gödel numberings of partial recursive functions”. In:
Journal of Symbolic Logic 23.3, pp. 331–341.
— (1987). Theory of Recursive Functions and Effective Computability. MIT Press. isbn:
0-262-68052-1.
Scott, Brian M. (Feb. 14, 2020). Inverting the Cantor pairing function. Stack Exchange
user http : / / math . stackexchange . com / users / 12042 / brian - m - scott. url:
http://math.stackexchange.com/q/222835 (visited on 10/28/2012).
SE author Andrej Bauer (2016). Answer to: Is a Turing Machine “by definition” the most
powerful machine? [Online; accessed 2017-Nov-05]. Stack Overflow discussion board.
url: https://cs.stackexchange.com/a/66753/78536.
— (2018). Answer to: Problems understanding proof of smn theorem using Church-Turing
thesis. [Online; accessed 2020-Feb-13]. Stack Overflow discussion board. url: https:
//cs.stackexchange.com/a/97946/67754.
SE author babou and various others (2015). Justification for neglecting constants in Big O.
[Online; accessed 2017-Oct-29]. Computer Science Stack Exchange discussion board.
url: https://cs.stackexchange.com/a/41000/78536.
SE author David Richerby (2018). Why is there no permutation in Regexes? (Even if regular
languages seem to be able to do this). [Online; accessed 2020-Jan-01]. Stack Overflow
discussion board. url: https://cs.stackexchange.com/a/100215/67754.
SE author JohnL (2020). How to decide whether a language is decidable when not involving
turing machines? [Online; accessed 2020-Jun-11]. Computer Science Stack Exchange
discussion board. url: https://cs.stackexchange.com/a/127035/67754.
SE author Kaktus and various others (2019). Georg Cantor’s diagonal argument, what
exactly does it prove? [Online; accessed 2019-Dec-25]. Computer Science Stack Exchange
discussion board. url: https://math.stackexchange.com/q/2176304.
SE author templatetypedef (2013). What is pseudopolynomial time? How does it differ from
polynomial time? [Online; accessed 2017-Oct-29]. Stack Overflow discussion board. url:
https://stackoverflow.com/a/19647659.
SE user Ryan Williams (Sept. 2, 2010). Comment to answer for What would it mean to disprove
Church-Turing thesis? url: https://cstheory.stackexchange.com/a/105/4731
(visited on 06/24/2019).
Sipser, Michael (2013). Introduction to the Theory of Compuation. 3rd ed. Cengage. isbn:
978-1-133-18779-0.
Sirén, Jouni (CS StackExchange user) (2016). Answer to: What is the origin of λ for empty
string? Accessed 2016-October-20. url: http://cs.stackexchange.com/a/64850/
50343.
Smoryński, Craig (1991). Logical Number Theory I. Springer-Verlag.
BIBLIOGRAPHY 401
Wikipedia contributers (2016f). Zeno’s paradoxes — Wikipedia, The Free Encyclopedia. [Online;
accessed 23-December-2016]. url: %5Curl%7Bhttps://en.wikipedia.org/w/index.
php?title=Zeno%27s_paradoxes&oldid=752685211%7D.
— (2017a). 15 puzzle — Wikipedia, The Free Encyclopedia. [Online; accessed 16-September-
2017]. url: https://en.wikipedia.org/w/index.php?title=15_puzzle&oldid=
789930961.
— (2017b). Almon Brown Strowger — Wikipedia, The Free Encyclopedia. [Online; accessed
9-June-2017]. url: https://en.wikipedia.org/w/index.php?title=Almon_Brown_
Strowger&oldid=783883144.
— (2017c). Artificial neuron — Wikipedia, The Free Encyclopedia. [Online; accessed 21-June-
2017]. url: https://en.wikipedia.org/w/index.php?title=Artificial_neuron&
oldid=780239713.
— (2017d). Aubrey–Maturin series — Wikipedia, The Free Encyclopedia. [Online; accessed
28-March-2017]. url: https://en.wikipedia.org/w/index.php?title=Aubrey%E2%
80%93Maturin_series&oldid=771937634.
— (2017e). Backus–Naur form — Wikipedia, The Free Encyclopedia. [Online; accessed
7-May-2017]. url: https://en.wikipedia.org/w/index.php?title=Backus%E2%
80%93Naur_form&oldid=778354081.
— (2017f). Magic smoke — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-October-
11]. url: https://en.wikipedia.org/w/index.php?title=Magic_smoke&oldid=
785207817.
— (2017g). North American Numbering Plan — Wikipedia, The Free Encyclopedia. [Online;
accessed 9-June-2017]. url: https://en.wikipedia.org/w/index.php?title=
North_American_Numbering_Plan&oldid=780178791.
— (2017h). Ouija — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2017]. url:
https://en.wikipedia.org/w/index.php?title=Ouija&oldid=776109372.
— (2017i). Pax Britannica — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-
2017]. url: https://en.wikipedia.org/w/index.php?title=Pax_Britannica&
oldid=775067301.
— (2017j). Platonic solid — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-
October-22]. url: https://en.wikipedia.org/w/index.php?title=Platonic_
solid&oldid=801264236.
— (2017k). Unicode — Wikipedia, The Free Encyclopedia. url: https://en.wikipedia.
org/w/index.php?title=Unicode&oldid=784443067.
— (2017l). Zermelo’s theorem (game theory) — Wikipedia, The Free Encyclopedia. [Online;
accessed 2017-Nov-26]. url: https://en.wikipedia.org/w/index.php?title=
Zermelo%27s_theorem_(game_theory)&oldid=806070716.
— (2018). Paradox — Wikipedia, The Free Encyclopedia. [Online; accessed 14-December-
2018]. url: https://en.wikipedia.org/w/index.php?title=Paradox&oldid=
871193884.
Wikipedia contributors (2017). Philipp von Jolly — Wikipedia, The Free Encyclopedia. [Online;
accessed 30-January-2019]. url: https://en.wikipedia.org/w/index.php?title=
Philipp_von_Jolly&oldid=764485788.
— (2019a). Collatz conjecture — Wikipedia, The Free Encyclopedia. [Online; accessed
15-February-2019].
— (2019b). Mathematics: The Loss of Certainty — Wikipedia, The Free Encyclopedia. [Online;
accessed 30-January-2019]. url: https://en.wikipedia.org/w/index.php?title=
Mathematics:_The_Loss_of_Certainty&oldid=879406248.
— (2019c). Maxwell’s demon — Wikipedia, The Free Encyclopedia. [Online; accessed 1-
January-2020]. url: %5Curl%7Bhttps://en.wikipedia.org/w/index.php?title=
Maxwell%27s_demon&oldid=930445803%7D.
— (2019d). Partial application — Wikipedia, The Free Encyclopedia. [Online; accessed
26-December-2019].
— (2020a). Foobar — Wikipedia, The Free Encyclopedia. [Online; accessed 2020-Feb-14].
url: https://en.wikipedia.org/w/index.php?title=Foobar&oldid=934819128.
— (2020b). Galactic algorithm — Wikipedia, The Free Encyclopedia. [Online; accessed
2020-Jun-17]. url: https://en.wikipedia.org/w/index.php?title=Galactic_
algorithm&oldid=957279293.
William S. Renwick (May 6, 1949). The start of the EDSAC log. [Online; accessed 2019-Mar-02].
url: https://www.cl.cam.ac.uk/relics/elog.html.
Index
+ operation on a language, 217 picture, 170
15 Game problem, 283, 297 Backus-Naur form, BNF, 170
3 Dimensional Matching problem, 327 Berra, Y
3-SAT , see 3-Satisfiability problem picture, 189
3-Satisfiability problem, 279, 296, 324, 327, Big O, 263
338 Big Θ, 265
4-Satisfiability problem, 337 bijection, 359
bit string, see bitstring
accept a language, 147 bitstring, 354
accept an input, 184, 193, 196 blank, 8, 306
acceptable numbering, 73 blank, B, 5
accepted language, see recognized, 291 BNF, 170–175
accepting state, 13, 179, 306 body of a production, 150
nondeterministic Finite State machine,
Boole, G
192
picture, 278
Pushdown machine, 236
boolean, 279
accepts, 180, 184, 193, 196
expression, 279
Ackermann function, 30–33, 35, 49–53
function, 279
Ackermann, W, 3
variable, 279
picture, 33
bottom, ⊥, 236
action set, 8
BPP, Bounded-Error Probabilistic Polynomial
action symbol, 8
Time problem, 343
addition, 6
bridge, 296
adjacency matrix, 163
Broadcast problem, 281
adjacent, 162
Busy Beaver, 132–135
Adleman, L
button
picture, 345
start, 5
Agrawal, M
picture, 284
c.e. set, see computably enumerable
AKS primality test, 284
caching, 71
algorithm, 290
Cantor’s correspondence, 68–76
definition, 290
Cantor’s Theorem, 78
reliance on model, 290
alphabet, 145, 354 Cantor, G
input, 179 and diagonalization, 372
Pushdown machine, 236 picture, 63
tape, 8 cardinality, 61–83
amb function, 381 less than or equal to, 78
ambiguous grammar, 155 Chromatic Number problem, 278
argument, to a function, 356 Church’s Thesis, 14–21
Aristotle’s Paradox, 61, 63 and uncountability, 79
Assignment problem, 324 argument by, 19
asymptotically equivalent, 266, 274 clarity, 17
atom, 285 consistency, 16
convergence, 16
Backus, J coverage, 15
Extended, 301 K is complete, 114
Church, A computably enumerable set, 107
picture, 14 collection of, RE, 304
Thesis, 15 computation
circuit, 163, 300 distributed, 290
Euler, 163 Finite State machine, 184
gate, 300 nondeterministic Finite State machine,
Hamiltonian, 163 192, 196
wire, 300 relative to an oracle, 112
Circuit Evaluation problem, 300 step, 8
class, 146, 298 Turing machine, 9
complexity, 298 concatenation of languages, 147
Clique problem, 281, 291, 303, 320, 328 concatenation of strings, 354
clique, in a graph, 281 configuration, 8, 184, 192, 195
closed walk, 162 halting, 184, 192, 196
closure under an operation, 214 initial, 8
CNF, Conjunctive Normal Form, 279 conjunctive normal form, 43, 279
co-NP, 310 connected component, 296
Cobham’s Thesis, 269 connected graph, 162
Cobham, A context free
picture, 383 grammar, 151
Thesis, 269 language, 243
codomain, 356 control, of a machine, 5
codomain versus range, 357 converge, 10
Collatz conjecture, 100 Conway, J
coloring of a graph, 164 picture, 46
colors, 278 Cook reducibility, 321
complete, 114 Cook, S
for a class, 326 picture, 325
NP, 326 Cook-Levin theorem, 326
complete graph, 166 correspondence, 62, 359
complexity class, 298 Cantor’s, 70
canonical, 342 countable, 64
P, 299 countably infinite, 64
polytime, 299 Course Scheduling problem, 287
complexity function, 263 CPU of Turing machine, 5
Complexity Zoo, 299, 343 Crossword problem, 283
Composite problem, 284, 291, 296, 311 current symbol, 8
composition, 359 CW, 167
computable cycle, 162
relative to a set, 112 Cyclic Shift problem, 321
set, 107
computable function, 11 daemon, see demon
computable relation, 11 DAG, directed acyclic graph, 162
computable set, 11 dangling else, 155
computably enumerable, 107–111 De Morgan, A
in an oracle, 117 picture, 277
decidable, 291 picture, 383
language, 102 effective, 3
set, 107 effective function, 9–11
decidable language, 304 Electoral College, 283
decide a language, 147 empty language
decided, 13 decision problem, 295
decided language, 184, 291 empty string, ε , 8, 354
of a nondeterministic Turing machine, encrypter, 345
307 Entscheidungsproblem, 3, 14, 291, 333
decider, 11 unsolvability, 100
decides enumerate, 64
language, 304 ε moves, 194
set, 11 ε transitions, 194–197
decides language, 304 equinumerous sets, 63
decision problem, 3, 291 equivalent growth rates, 265
decrypter, 345 Euler Circuit problem, 277, 297
degree of a vertex, 165 Euler circuit, 163
degree sequence, 165 Euler, L
demon, or daemon, 191 picture, 276
derivation, 151, 153 eval, 85
derivation tree, 151 EXP, 340
description number, 73 expansion of a production, 150
determinism, 8, 16 Extended Church’s Thesis, 301
diagonal enumeration, 70 extended regular expression, 244
diagonalization, 76–83, 117 extended transition function, 184
effectivized, 93 for nondeterministic Finite State ma-
digraph, 162 chines, 196
directed acyclic graph, 162 nondeterministic Finite State machine,
directed graph, 162 193
Discrete Logarithm problem, 296
disjunctive normal form, DNF, 43 F − SAT problem, 296
distinguishable states, 226 Factoring problem, 331, 345
distributed computation, 290 Prime Factorization problem, 296
diverge, 10 Fibonacci numbers, 29
Divisor problem, 284, 296 final state, 179
domain, 356, 357 nondeterministic Finite State machine,
Double-SAT problem, 315 192
doubler function, 3, 13 finite set, 64
dovetailing, 107 Finite State automata, see Finite State ma-
Droste effect, 376 chine
Drummer problem, 323 Finite State machine, 178–189
DSPACE, 341 accept string, 184, 196
DTIME, 341 accepting state, 179
alphabet, 179
edge, 161 computation, 184
edge weight, 162 configuration, 184
Edmunds, J final state, 179
halting configuration, 184 left inverse, 359
initial configuration, 184 logarithmic growth, 267
input string, 184 µ recursive function (mu recursive), 35
language of, 184 next-state, 8, 179
minimization, 225–235 one-to-one, 62
next-state function, 179 one0-to-one, 358
nondeterministic, 192 onto, 62, 358
powerset construction, 198 order of growth, 263
product, 214 output, 356
reject string, 184 pairing, 69, 70
state, 179 partial, 10, 357
step, 184 partial recursive, 35
transition function, 179 polynomial growth, 267
Fixed point theorem, 117–123 predecessor, 6
discussion, 120–122 projection, 24, 35
Flauros, Duke of Hell range, 357
picture, 191 recursive, 11, 35
flow, 323 reduction, 317
flow chart, 85 restriction, 357
Four Color problem, 278 right inverse, 359
function, 356 successor, 12, 21, 24, 35
91 (McCarthy), 30 surjection, 358
Ackermann, 50 total, 10, 357
argument, 356 transition, 8, 179, 306
Big O, 263 unpairing, 69, 70
Big Θ, 265 value, 356
boolean, 279 well-defined, 356, 357
codomain, 356 zero, 24, 35
composition, 359 function problem, 290
computable, 11 functions, 356–360
computed by a Turing machine, 9 same behavior, 102
converge, 10 same order of growth, 265
correspondence, 62, 359
definition, 356 Galilei, G (Galileo)
diverge, 10 picture, 61
domain, 356 Galileo, 61
doubler, 3, 13 Galileo’s Paradox, 61, 63, 65
effective, 3 Game of Life, 46–49
enumeration, 64 gate, 43, 300
exponential growth, 267 general recursion, 30–37
extended transition, 184 general recursive function, 35
general recursive, 35 general unsolvability, 96–100
identity, 359 Gödel number, 73
image under, 357 Gödel, K, 14
index, 356 letter to von Neumann, 333
injection, 358 picture, 15
inverse, 359 picture with Einstein, 126
Gödel’s theorem, 14 planar, 167, 278
Goldbach’s conjecture, 102 representation, 163–164
grammar, 150–161 simple, 161
ambiguous, 155 spanning subgraph, 280
Backus-Naur form, BNF, 170 subgraph, 163
BNF, Backus-Naur form, 170 trail, 162
body of a production, 150 transition, 7
context free, 151 traversal, 162–163
derivation, 151 tree, 162, 280
expansion of a production, 150 vertex, 161
head, 150 vertex cover, 280
linear, 203 vertex degree, 165
nonterminal, 151 walk, 162
production, 150, 152 walk length, 162
rewrite rule, 150, 152 weighted, 162
right linear, 203 Graph Colorability problem, 278, 297, 319
start symbol, 152 Graph Connectedness problem, 296, 298
syntactic category, 151 Graph Isomorphism problem, 296, 331
terminal, 151 Grassmann, H, 22
graph, 161–170 picture, 21
adjacent edges, 162 guessing by a machine, 191, 194
bridge edge, 296
circuit, 163 Hailstone function, 100
clique, 281 Halt light, 5
closed walk, 162 halting configuration, 184, 192, 196
coloring, 164 Halting problem, 92–94, 102
colors, 278 as a decision problem, 297
complete, 166 discussion, 94–95
connected, 162 in wider culture, 124–126
connected component, 296 reduction to another problem, 99
cycle, 162 significance, 95–96
degree sequence, 165 unsolvability, 94
digraph, 162 halting state, 12
directed, 162 Halts on Three problem, 316
directed acyclic, 162 Hamilton, W R
edge, 161 picture, 275
edge weight, 162 Hamiltonian circuit, 163
Euler circuit, 163 Hamiltonian Circuit problem, 276, 297, 311,
Hamiltonian circuit, 163 322, 328
induced subgraph, 163 Hamiltonian Path problem, 311, 337
isomorphism, 164–165 hard
loop, 162 for a class, 326
matrix representation, 163 NP, 326
multigraph, 162 haystack, 300
node, 161 head
open walk, 162 read/write, 4
path, 162 head of a production, 150
Hilbert’s Hotel, 123–124 picture, 203
Hilbert, D, 3 Kn , 166
picture, 125 Knapsack problem, 282, 315, 328
Hofstadter, D, 377 Knight’s Tour problem, 276
hyperoperation, 32 Knuth, D
picture, 270
I/O head, see read/write head Kolmogorov, A
identity function, 359 picture, 259
Ignorabimus, 125 Königsberg, 277
image under a function, 357
Incompleteness theorem, 14 L’Hôpital’s Rule, 266
Independent Set problem, 288, 298, 322, 324, lambda calculus, λ calculus, 15
338 language, 145–150
index number, 73 + operation, 217
index set, 103 accept, 147
induced subgraph, 163 accepted by a Finite State machine, see
infinite set, 64 language,recognized by a Finite
infinity, 61–83 State machine
initial configuration, 8, 184, 192, 195 accepted by Turing machine, 106, 291
injection, 358 class, 146
input alphabet, 179 concatenation, 147
input string, 184, 192, 195 context free, 243
input symbol, 8 decidable, 102, 304
input, to a function, 356 decide, 147
instruction, 5, 8, 306 decided, 291
Integer Linear Programming problem, 314, 324 decided by a Finite State machine, 184
inverse of a function, 359 decided by a Turing machine, 13, 304
left, 359 decision problem, 291
right, 359 derived from a grammar, 153
two-sided, 359 grammar, 151
isomorphic, 165 Kleene star, 147
isomorphism, 165 non-regular, 219–225
of a Finite State machine, 184
K , the Halting problem set, 93, 109 of a nondeterministic Finite State ma-
complete among c.e. sets, 114 chine, 193
K 0 , set of halting pairs, 101, 110, 113 operations on, 147
Karatsuba, A, 260 power, 147
Karp reducible, 317 recognize, 147
Karp, R recognized, 291
picture, 327 recognized by a Finite State machine,
Kayal, N 184
picture, 284 recognized by Turing machine, 291
Kleene star, 64, 145, 147, 354 regular, 213–219
regular expression, 205 reversal, 147
Kleene’s fixed point theorem, 118 language decision problem, 291
Kleene’s theorem, 206–210 language recognition problem, 291
Kleene, S, 35 last in, first out (LIFO) stack, 236
left inverse, 359 Nearest Neighbor problem, 296, 298
leftmost derivation, 151 next state, 5, 8
LEGO, 5 next tape symbol, 5
length, 162 next-state function, 8, 179
length of a string, 354 nondeterministic Finite State machine,
Life, Game of, 46–49 192
LIFO stack, 236 NFSM, see nondeterministic Finite State ma-
light chine
Halt, 5 node, 161
Linear Divisibility problem, 315 nondeterminism, 189–203
Linear Programming language decision prob- for Finite State machines, 192
lem, 298, 324 for Turing machines, 305
Linear Programming optimization problem, nondeterministic Finite State machine, 192
323 accept string, 193, 196
Lipton’s Thesis, 293 computation, 192, 196
Longest Path problem, 315, 337 configuration, 192, 195
loop, 162 convert to a deterministic machine, 198
LOOP program, 53–58 ε moves, 194
ε transitions, 194
machine halting configuration, 192, 196
state, 9 initial configuration, 192, 195
map, see function input string, 192, 195
Marriage problem, see Drummer problem language of, 193
Matching problem, 335 language recognized, 193
matching, three dimensional, 282 reject string, 193, 196
Max-Flow problem, 323 nondeterministic Pushdown machine, 240–
McCarthy’s 91 function, 30 242
memoization, 71 nondeterministic Turing machine
memory, 4 accepting state, 306
metacharacter, 151, 171, 204 decided language, 307
minimization, 33 definition, 306
minimization of a Finite State machine, 225– instruction, 306
235 recognized language, 307
minimization, unbounded, 35 rejecting state, 306
Minimum Spanning Tree problem, 280 transition function, 306
modulus, 345 nonterminal, 151
Morse code, 167 NP, 305–316
µ -recursion (mu recursion), 33 NP complete, 325–331
µ recursive function, 35 basic problems, 327
multigraph, 162 NP hard, 326
multiset, 282 NSPACE, 341
Musical Chairs, 78 NTIME, 341
numbering, 73
n -distinguishable states, 225 acceptable, 73
n -indistinguishable states, 226
Naur, P Ω, Big Omega, 266
picture, 170 o , omicron, 266
one-to-one function, 62, 358 predecessor function, 6
onto function, 62, 358 prefix of a string, 355
open walk, 162 present state, 5, 8
optimization problem, 290 present tape symbol, 5
oracle, 111–117 Prime Factorization problem, 284, 290, 337
computably enumerable in, 117 primitive recursion, 21–30, 35
oracle Turing machine, 112 arity, 23
order of growth, 259–275 primitive recursive functions, 24
function, 263 private key, 345
Hardy hierarchy, 268 problem, 289
ouroboros, 84 decision, 291
output, from a function, 356 function, 290
Halting, 93, 94
P, 298–305 language decision, 291
P hard, 317 language recognition, 291
P versus NP, 331–336 optimization, 290
pairing function, 69, 70 search, 291
Paley, W unsolvable, 94
picture, 127 problem miscellany, 275–289
palindrome, 14, 146, 240, 355 problem reduction, 325
paradox problems
Aristotle’s, 61 tractable, 269
Galileo’s, 61 unsolvable, 107
Zeno’s, 65 product construction, 214
parameter, 88 production, 152
Parameter theorem, 88 production in a grammar, 150
parametrization, 88–90 program, 290
parametrizing, 88 projection function, 24, 35
parse tree, 151 propositional logic
partial function, 10, 357 atom, 285
partial recursive function, 35 pseudopolynomial, 272, 274
Partition problem, 283, 291, 314, 328, 329, 338 public key, 345
path, 162 Pumping lemma, 220
perfect number, 96 pumping length, 220
Péter, R Pushdown automata, see pushdown machine
picture, 49 Pushdown machine, 235–244
Petersen graph, 166 halting, 237
pipe symbol, | , 150 input alphabet, 236
planar graph, 167, 278 nondeterministic, 240–242
pointer, in C, 121 stack alphabet, 236
polynomial time, 299 transition function, 236, 237
polynomial time reducibility, 317
polytime, 299 Quantum Computing, 301
polytime reduction, 316 Quantum Supremacy, 301
power of a language, 147 Quantum Supremacy, 301
power of a string, 355 quine, 128
powerset construction, 198 Quine’s paradox, 376
r.e. set, see computably enumerable set relation, computable, 11
Radó, T replication of a string, 355
picture, 133 representation, of a problem, 293
RAM, see Random Access machine restriction, 357
Random Access machine, 270 reversal of a language, 147
range of a function, 357 reversal of a string, 355
RE, computably enumerable sets, 304 rewrite rule, 150, 152
reachable vertex, 163, 279 Rice’s theorem, 102–107
read/write head, 4 right inverse, 359
recognize a language, 147 right linear, 203
recognized Ritchie, D, 53
by a nondeterministic Finite State ma- picture, 53
chine, 193 Rivest, R
recognized language, 291 picture, 345
of a Finite State machine, 184 RSA Encryption, 344–349
of a nondeterministic Turing machine,
307 s -m-n theorem, 88
recursion, 21–37 same behavior, functions with, 102
Recursion theorem, 118 same order of growth, 265
recursive function, 11, 35 SAT , see Satisfiability, 292, 309
recursive set, 11 Satisfiability problem, 279, 288, 309, 319, 320,
recursively enumerable set, see computably 326, 340
enumerable set as a language recognition problem, 292
reduces to, 112 on a nondeterministic Turing machine,
reducibility 308
Cook, 321 satisfiable, 279
Karp, 317 Saxena, N
polynomial time, 317 picture, 284
polytime, 317 schema of primitive recursion, 23
polytime many-one, 317 Science United, 290
polytime mapping, 317 search problem, 291
reduction self reproducing program, 128
from the Halting problem, 99 self reproduction, 127–131
reduction function, 317 self-reference, 376
Reflections on Trusting Trust, 131 semicomputable set, 107
regex, 244 semidecidable set, 107
regular expression, 203–213 semidecide a language, 147
extended, 244 semiprime, 284
in practice, 244–252 Semiprime problem, 314
operator precedence, 205 set
regex, 244 c.e., 107
semantics, 205 cardinality, 63
syntax, 205 computable, 11, 107
regular language, 213–219 computably enumerable, 107–111
reject an input, 184, 193, 196 countable, 64
rejecting state, 13, 306 countably infinite, 64
rejects, 180 decidable, 107
decider, 11 next, 5
equinumerous, 63 present, 5
finite, 64 reject, 13
index, 103 rejecting, 306
infinite, 64 start, 6
oracle, 111–117 unreachable, 106
r.e., see computably enumerable set working, 12
recursive, 11 state machine, 9
recursively enumerable, see computably states, 4
enumerable set distinguishable, 226
reduces to, 112 n -distinguishable, 225
semicomputable, 107 n -indistinguishable, 226
semidecidable, 107 set of, 8
T equivalent, 113 Stator Square, 386
Turing equivalent, 113 STCON problem, see Vertex-to-Vertex Path
uncountable, 77 problem
undecidable, 94 step, 8, 184
Set Cover problem, 322 store, of a machine, 4
Shamir, A string, 145, 354–355
picture, 345 concatenation, 354
Shannon, C decomposition, 355
picture, 43 empty, 8, 354
Shortest Path problem, 277, 297, 298, 317 length, 354
∼, asymptotically equivalent, 274 power, 355
simple graph, 161 prefix, 355
SPACE, 341 replication, 355
span a graph, 280 reversal, 355
spanning subgraph, 280 substring, 355
st -Connectivity problem, see Vertex-to-Vertex suffix, 355
Path problem string accepted
st -Path problem, see Vertex-to-Vertex Path by deterministic Finite State machine,
problem 180, 184
stack, 235 by nondeterministic Finite State ma-
alphabet, 236 chine, 193, 196
bottom, ⊥, 236 string rejected, 180
LIFO, Last-In, First-Out, 236 String Search problem, 300
pop, 236 subgraph, 163
push, 236 induced, 163
Start button, 5, 180 Subset Sum problem, 282, 291, 298, 315, 338
start state, 6, 179 substring, 355
Pushdown machine, 236 Substring problem, 321
start symbol, 152 successor function, 12, 21, 24, 35
state, 179 suffix of a string, 355
accept, 13 surjection, 358
accepting, 179, 306 symbol, 8, 145, 354
final, 179 action, 8
halting, 12 current, 8
input, 8 control, 5
next, 8 CPU, 5
syntactic category, 151 current symbol, 8
decidable, 291
T equivalent, 113 decides a set, 11
T reducible, 112 deciding a language, 304
table, transition, 7 definition, 8
tape, 4 description number, 73
tape alphabet, 8 deterministic, 8
tape symbol, 8 for addition, 6
blank, 5 function computed, 9
terminal, 151 Gödel number, 73
tetration, 32 index number, 73
Thompson, K input symbol, 8
picture, 131 instruction, 5, 8
Three Dimensional Matching problem, 314 language accepted, 291
Three-dimensional Matching problem, 282 language decided, 13, 291
time taken by a machine, 270 language recognized, 291
token, 145, 354 multitape, 21
total function, 10, 357 next state, 5, 8
Towers of Hanoi, 26 next symbol, 5, 8
tractable, 268–269 next-state function, 8
trail, 162 nondeterminism, 305
transformation function, see reduction func- numbering, 73
tion palindrome, 14
transition function, 8, 179, 306 present state, 5, 8
extended, 184, 196 present symbol, 5
graph of, 7 rejecting state, 13
Pushdown machine, 236 simulator, 37–41
table of, 7 tape alphabet, 8
transition graph, 7 transition function, 8
transition table, 7 universal, 84–86
Traveling Salesman problem, 189, 276, 308, with oracle, 112
322, 324, 328, 337 Turing reducible, 112, 316
tree, 162, 280 Turing, A
Triangle problem, 303 picture, 3
triangular numbers, 25 Turnpike problem, 315
truth table, 41, 279 two-sided inverse, 359
Turing equivalent, 113
Turing machine, 3–14 unbounded minimization, 33
accept a language, 106 unbounded search, 34
accepting a language, 304 uncountable, 77
accepting state, 13 undecidable, 94
action set, 8 Unicode, 181, 380
action symbol, 8 uniformity, 86–87
computation, 9 Universal Turing machine, 84–86
configuration, 8 universality, 83–92
unpairing function, 69, 70
unreachable state, 106
unsolvability, 107
unsolvable problem, 94, 107
use-mention distinction, 120
walk, 162
walk length, 162
weight, 162
weighted graph, 162
well-defined, 356, 357
wire, 300
word, see string
working state, 12
⊢, yields
Finite State machine, 184
for Turing machines, 9
nondeterministic Finite State machine,
192
Zeno’s Paradox, 65
zero function, 24, 35
Zoo, Complexity, 343