Book

Theory of Computation
Making Connections
Jim Heﬀeron
https://hefferon.net/computation
Notation Description
P (S) power set, collection of all subsets of S
Sc complement of the set S
1S characteristic function of the set S
⟨a 0 , a 1 , ... ⟩ sequence
N, Z, Q, R natural numbers { 0, 1, ... }, integers, rationals, reals
a, b, . . . characters
Σ alphabet, set of characters
B alphabet of bits, B = { 0, 1 }
σ, τ strings (any lower-case greek letter)
ε empty string
Σ∗ set of all strings over the alphabet
L language, subset of Σ∗
P Turing machine, either deterministic or nondeterministic
ϕ effective function, function computed by a Turing machine
ϕ(x)↓, ϕ(x)↑ function converges on that input, or diverges
UP universal Turing machine
G graph
M Finite State machine, either deterministic or nondeterministic
P complexity class of deterministic polynomial time problems
NP complexity class of nondeterministic polynomial time problems
V verifier for NP
SAT language for the Satisfiability problem
Greek letters with pronounciation

Character Name Character Name
α alpha AL-fuh ν nu NEW
β beta BAY-tuh ξ, Ξ xi KSIGH
γ, Γ gamma GAM-muh o omicron OM-uh-CRON
δ, ∆ delta DEL-tuh π, Π pi PIE
ε epsilon EP-suh-lon ρ rho ROW
ζ zeta ZAY-tuh σ, Σ sigma SIG-muh
η eta AY-tuh τ tau TOW as in cow
θ, Θ theta THAY-tuh υ, ϒ upsilon OOP-suh-LON
ι iota eye-OH-tuh ϕ, Φ phi FEE, or FI as in high
κ kappa KAP-uh χ chi KI as in high
λ, Λ lambda LAM-duh ψ, Ψ psi SIGH, or PSIGH
µ mu MEW ω, Ω omega oh-MAY-guh
Capitals letters shown are the ones that differ from Roman capitals.
Cover photo: Bonus Bureau, Computing Divison, 1924-Nov-24.

Calculating the bonus owed to each WW I US veteran.
(Auto-generated cropping decoration added.)
Preface
The Theory of Computation is a wonderful thing. It is beautiful. It has deep

connections with other areas of computer science and mathematics, as well as with
the wider intellectual world. It is full of ideas, exciting and arresting ideas. And,
looking forward into this century, clearly a theme will be the power of computation.
So it is timely also.
It makes a delightful course. Its organizing question — what can be done? — is
both natural and compelling. Students see the contrast between computation’s
capabilities and limits. There are well-understood principles and within easy reach
are as-yet unknown areas.
This text aims to reflect all of that: to be precise, topical, insightful, and perhaps
sometimes even delightful.
For students Have you wondered, while you were learning how to instruct
computers to do your bidding, what cannot be done? And what can be done in
principle but cannot be done feasibly? In this course you will see the signpost
results in the study of these questions, and you will learn to use the tools to address
these issues as they come up in your work.
We will consider the very nature of computation. This has been intensively
studied for a century, so you will not see all that is known, but you will see enough
to end with some key insights.
We do not stint on precision — why would we want to? — but we approach
the ideas liberally, in a way that, in addition to technical detail, also attends to a
breadth of knowledge. We will be eager to make connections with other fields, with
things that you have previously studied, and with other modes of thinking. People
learn best when the topic fits into a whole, as the first quote below expresses.
The presentation here encourages you be an active learner, to explore and
reflect on the motivation, development, and future of those ideas. It gives you the
chance to follow up on things that you find interesting; the back of the book has
lots of notes to the main text, many of which contain links that will take you even
deeper. And, the Extra sections at the end of each chapter also help you explore
further. Whether or not your instructor covers them formally in class, these will
further your understanding of the material and of where it can lead.
The subject is big, and a challenge. It is also a great deal of fun, and, it will
change the way that you see the world. Enjoy!
For instructors We cover the definition of computability, unsolvability, languages,
automata, and complexity. The audience is undergraduate majors in computer
science, mathematics, and nearby areas.
The prerequisite is Discrete Mathematics, including propositional logic, proof
methods with induction, graphs, some number theory, sets, functions, and rela-
tions. For graphs and big-O, there are review sections. The big-O section uses
derivatives, so students need a background in Calculus. They should also have
some programming experience.
A text does readers a disservice if it is not precise. The details matter. But
students can also fail to understand the subject because they have not had a chance
to reflect on the underlying ideas. The presentation here stresses motivation and
naturalness and, where practical, sets the results in a network of connections.
An example comes at the start, with taking the Turing machine as the computing
model. This is the historical model, leads naturally to Finite State machines, and
is standard in complexity. The downside is that it is awkward for extensive
computation. However, here we don’t do extensive computation. We immediately
discuss Church’s Thesis, and rely on the intuition that students have from their
programming experience. Besides motivating the formalities, this allows us to give
algorithms in an outlined form, which communicates the ideas better than code
written for any abstract computing model.
A second example is nondeterminism. We introduce it in the context of Finite
State machines and pair that introduction with a discussion giving students a
chance to reflect on this important but tricky concept. Another example comes
with the inclusion in the complexity material of a section on the kinds of problems
that drive the work today. Still another is the discussion of the current state of
P versus NP. All of these examples, and many more, taken together encourage
students to connect with the underlying ideas.
Exploration and Enrichment The Theory of Computation is a fascinating subject.
This book aims to show that to readers, to draw them in, to be absorbing. It is
written in lively language and contains many illustrations, including pictures of
some of the people who have developed the topics, and a few others just for fun.
One way to stimulate readers to make the material explorable. Where practical,
the references are clickable. For example, the pictures of the people who discovered
the subject are a link to their Wikipedia page. Making them links makes them very
much more likely to be seen than is the same content in a physical library. The
presentation also encourages connection-making through the many notes in the
back that fill out, and add a spark to, the core discussion.
We also make connections through the fact that many of the questions are
taken from online forums. We want learners to build a mental scaffold for the
material, and one resource that is available, and good, is to ask them to think
through questions that other students have actually asked.
Another way in which this development offers enrichment is the use — in the
context of discussions and background — of the words of experts who are speaking
informally, such as in a blog. Informality has the potential to be a problem, but
it also has great potential to be valuable. Who among us has not had an Ah-ha!
moment in a hand-wavy hallway conversation?
Finally, students can also explore the end of chapter topics. These are suitable as
one-day lectures, or for group work or extra credit, or just for reading for pleasure.
Schedule This is my most recent semester. Chapter I defines models of compu-
tation, Chapter II covers unsolvability, Chapter III reviews languages, Chapter IV
does automata, and Chapter V is computational complexity. I assign the readings
as homework and quiz on them.
Sections Reading Notes

Week 1 I.1, I.3 I.2
2 I.4, II.1 II.A
3 II.2, II.3
4 II.4, II.5 II.B
5 II.6, II.7 II.C
6 II.9 III.A Exam
7 III.1–III.3
8 IV.1, IV.2
9 IV.3, IV.3 IV.A
10 IV.4, IV.5
11 IV.7 IV.1.4 Exam
12 V.1, V.2 V.A
13 V.3, V.4 V.3.2
14 V.5, V.6 V.B
15 V.7
License This book is Free. You can use it without cost. You can also redistribute
it — an instructor can make copies and give it away or sell it through their bookstore
or their school’s intranet. You can also get the LATEX source and modify it to suit
your class; see https://hefferon.net/computation.
One reason that the book is Free is that it is written in LATEX, which is Free, as
is our Scheme implementation, as is Asymptote that drew the illustrations, as is
Emacs and all of GNU software, and the entire Linux platform on which this book
was developed. And besides those, all of the research that this text presents was
made freely available by scholars.
I believe that the synthesis here adds value — I hope so, indeed — but the
masters have left a well-marked trail and following it seems only right.
Acknowledgments I owe a great debt to my wife, whose patience with this
project has gone beyond all reasonable bounds. Thank you, Lynne.
My students have made the book better in many ways. I greatly appreciate all
of the contributions.
And, I must honor my teachers. First among them is M Lerman. Thank you,
Manny.
My teachers also include H Abelson, G J Sussmann, and J Sussmann, who
had the courage with Structure and Interpretation of Computer Programs to show
students just how mind-blowing it all is. When I see a programming text where
the examples are about managing inventory in a used car dealership, I can only
say: Thank you, for believing in me.
Memory works far better when you learn networks of facts rather than
facts in isolation.
– T Gowers, WHAT MATHS A-LEVEL DOESN’T NECESSARILY GIVE
YOU
Teach concepts, not tricks.

– Gian-Carlo Rota, TEN LESSONS I WISH I HAD LEARNED BEFORE I
STARTED TEACHING DIFFERENTIAL EQUATIONS
[W]hile many distinguished scholars have embraced [the Jane Austen

Society] and its delights since the founding meeting, ready to don period
dress, eager to explore antiquarian minutiae, and happy to stand up at
the Saturday-evening ball, others, in their studies of Jane Austen’s works,
. . . have described how, as professional scholars, they are rendered
uneasy by this performance of pleasure at [the meetings]. . . . I am not
going to be one of those scholars.
– Elaine Bander, PERSUASIONS, 2017
The power of modern programming languages is that they are expressive,

readable, concise, precise, and executable. That means we can eliminate
middleman languages and use one language to explore, learn, teach, and
think.
– A Downey, PROGRAMMING AS A WAY OF THINKING
Lisp has jokingly been called “the most intelligent way to misuse a
computer.” I think that description is a great compliment because it
transmits the full flavor of liberation: it has assisted a number of our
most gifted fellow humans in thinking previously impossible thoughts.
– E Dijkstra, CACM, 15:10
Of what use are computers? They can only give answers.

– P Picasso, THE PARIS REVIEW, SUMMER-FALL 1964
Jim Hefferon
Saint Michael’s College
Colchester, VT USA
joshua.smcvt.edu/computing
Draft: version 0.99, 2020-Dec-27.
Contents
I Mechanical Computation 3
1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Effective functions . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 What it does not say . . . . . . . . . . . . . . . . . . . . . . . . 17
4 An empirical question? . . . . . . . . . . . . . . . . . . . . . . . 18
5 Using Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1 Primitive recursion . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 General recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1 Ackermann functions . . . . . . . . . . . . . . . . . . . . . . . . 30
2 µ recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A Turing machine simulator . . . . . . . . . . . . . . . . . . . . . . . 37
B Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
C Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D Ackermann’s function is not primitive recursive . . . . . . . . . . . . 49
E LOOP programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
II Background 61
1 Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2 Cantor’s correspondence . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
1 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1 Universal Turing machine . . . . . . . . . . . . . . . . . . . . . . 84
2 Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 The Halting problem . . . . . . . . . . . . . . . . . . . . . . . . . . 92
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 General unsolvability . . . . . . . . . . . . . . . . . . . . . . . . 96
6 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7 Computably enumerable sets . . . . . . . . . . . . . . . . . . . . . . 107
8 Oracles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9 Fixed point theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 117
1 When diagonalization fails . . . . . . . . . . . . . . . . . . . . . 117
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A Hilbert’s Hotel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
B The Halting problem in Wider Culture . . . . . . . . . . . . . . . . . 124
C Self Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
D Busy Beaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
E Cantor in Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
III Languages 145

1 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
2 Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3 Graph representation . . . . . . . . . . . . . . . . . . . . . . . . 163
4 Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5 Graph isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . 164
A BNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
IV Automata 178
1 Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . 178
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
2 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
3 ε transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4 Equivalence of the machine types . . . . . . . . . . . . . . . . . . 197
3 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 203
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
2 Kleene’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
2 Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5 Languages that are not regular . . . . . . . . . . . . . . . . . . . . 219
6 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7 Pushdown machines . . . . . . . . . . . . . . . . . . . . . . . . . . 235
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
2 Nondeterministic Pushdown machines . . . . . . . . . . . . . . . 240
3 Context free languages . . . . . . . . . . . . . . . . . . . . . . . 243
A Regular expressions in the wild . . . . . . . . . . . . . . . . . . . . 244
B The Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . . . . . . 252
V Computational Complexity 259
1 Big O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
3 Tractable and intractable . . . . . . . . . . . . . . . . . . . . . . 268
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
2 A problem miscellany . . . . . . . . . . . . . . . . . . . . . . . . . 275
1 Problems with stories . . . . . . . . . . . . . . . . . . . . . . . . 275
2 More problems, omitting the stories . . . . . . . . . . . . . . . . 279
3 Problems, algorithms, and programs . . . . . . . . . . . . . . . . . . 289
1 Types of problems . . . . . . . . . . . . . . . . . . . . . . . . . . 290
2 Statements and representations . . . . . . . . . . . . . . . . . . 293
4 P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
2 Effect of the model of computation . . . . . . . . . . . . . . . . . 301
3 Naturalness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
5 NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
1 Nondeterministic Turing machines . . . . . . . . . . . . . . . . . 306
2 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
3 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
6 Polytime reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
7 NP completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
1 P = NP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
8 Other classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
1 EXP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
2 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 340
3 Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 341
4 The Zoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
A RSA Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
B Tractability and good-enoughness . . . . . . . . . . . . . . . . . . . 350
Appendix 353
A Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
B Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Notes 363
Bibliography 396
Part One
Classical Computability
Chapter
I Mechanical Computation
What can be computed? For instance, the function that doubles its input, that
takes in x and puts out 2x , is intuitively mechanically computable. We shall call
such functions effective.
The question asks for the things that can be computed, more than it asks for
how to compute them. In this Part we will be more interested in the function, in
the input-output behavior, than in the details of implementing that behavior.
Section
I.1 Turing machines
Despite this desire to downplay implementation, we follow the approach of
A Turing that the first step toward defining the set of computable
functions is to reflect on the details of what mechanisms can do.
The context of Turing’s thinking was the Entscheidungsproblem,†
proposed in 1928 by D Hilbert and W Ackermann, which asks for an
algorithm that decides, after taking as input a mathematical state-
ment, whether that statement is true or false.‡ So he considered the
kind of symbol-manipulating computation familiar in mathematics,
as when we factor a polynomial or verify a step in a plane geometry
proof.
After reflecting on it for a while, one day after a run,§ Turing laid
down in the grass and imagined a clerk doing by-hand multiplication
with a sheet of paper that gradually becomes covered with columns of
numbers. With this image as a touchstone, Turing posited conditions
for the computing agent.
First, it (or he, or she) has a memory facility, such as the clerk’s Alan Turing 1912–
paper, to store and retrieve information. 1954
Second, the computing agent must follow a definite procedure, a
precise set of instructions, with no room for creative leaps. Part of what makes the
procedure definite is that the instructions don’t involve random methods, such as
counting clicks from radioactive decay, to determine which of two possibilities to
perform.
The other thing making the procedure definite is that the agent does not use
continuous methods or analog devices. So there is no question about the precision
Image: copyright Kevin Twomey, kevintwomey.com/lowtech.html † German for “decision problem.”
‡
When it finished computing it might turn on a light for ‘true’, or print the symbol 1. § He was a
serious candidate for the 1948 British Olympic marathon team.
4 Chapter I. Mechanical Computation
of operations as there might be, say, when reading results off of a slide rule or an
instrument dial. Instead, the agent works in a discrete fashion, step-by-step. For
instance, if needed they could pause between steps, note where they are (“about
to carry a 1”), and later pick up again. We say that at each moment the clerk is in
one of a finite set of possible states, which we denote q 0 , q 1 , . . .
Turing’s third condition arose because he wanted to investigate what is com-
putable in principle. He therefore imposed no upper bound on the amount of
available memory. More precisely, he imposed no finite upper bound — should
a calculation threaten to run out of storage space then more is provided. This
includes imposing no upper bound on the amount of memory available for inputs
or for outputs, and no bound on the amount of extra storage, scratch memory,
needed in addition to that for inputs and outputs.† He similarly put no upper
bound on the number of instructions. And, he left unbounded the number of steps
that a computation performs before it finishes.‡
The final question Turing faced is: how smart is the computing agent? For
instance, can it multiply? We don’t need to include a special facility for multi-
plication because we can in principle multiply via repeated addition. We don’t
even need addition because we can iterate the successor operation, the add-one
operation. In this way he pared the computing agent down until it was quite basic,
quite easy to understand, until the operations are so elementary that we cannot
easily imagine them further divided, while still keeping the agent powerful enough
to do anything that can, in principle, be done.
Definition Based on these reflections, Turing pictured a box containing a mecha-

nism and fitted with a tape.
The tape is the memory, sometimes called the store. The box can read from
and write to it, one character at a time, as well as move a read/write head relative
to the tape in either direction. For instance, to multiply, the computing agent
can get the two input multiplicands from the tape (the drawing shows 74 and 72,
represented in binary and separated by a blank), can use the tape for scratch work,
†
It is true that a physical computer such as your cell phone has memory space that is bounded (putting
aside storing things in the Cloud). However, that space is extremely large. In this Part, when working
with the model devices we find that imposing a bound on memory is irrelevant, or even a hindrance.
‡
Some authors describe the availability of resources such as the amount of memory as ‘infinite’.
Turing himself does this. A reader may object that this violates the goal of the definition, to model
physically-realizable computations, and so the development here instead says that the resources have
no finite upper bound. But really, it doesn’t matter. If we show that something cannot be computed
when there are no bounds then we have shown that it cannot be computed on any real-world device.
Section 1. Turing machines 5
and can write the output to the tape.

The box is the computing agent, the CPU, sometimes called the control. The
Start button sets the computation going. When it is finished, the Halt light comes
on. The engineering inside the box is not important — perhaps it has integrated
circuits like the machines we are used to, or perhaps it has gears and levers, or
perhaps LEGO’s — what matters is that each of its finitely many parts can be in
only finitely many states. If it has chips then each register has a finite number of
possible values and if it is made with gears or bricks then then each settles in only
a finite number of possible positions. Thus, however it is made, in total the box
has only finitely many states.
While executing a calculation, the mechanism steps from state to state. For
instance, an agent doing multiplication may determine, because of what state it is
in now and what it is now reading on the tape, that they next need to carry a 1.
The agent transitions to a new state, one whose intuitive meaning is that carries
take place there.
Consequently, machine steps involve four pieces of information. We denote the
present state as qp and the next state as qn . The other two, Tp and Tn , describe
what tape symbol the read/write head is presently pointing to and what happens
next with the tape. As to the set of characters that go on the tape, we will choose
whatever is convenient but except for finitely many places every tape is filled with
blanks, so that must be one of the symbol (we denote blank with B when leaving
an empty space could cause confusion). The things that can happen next with the
tape are: writing a symbol to the tape without moving the head, which we denote
with that symbol, for instance with Tn = 1, or moving the tape head left or right
without writing, which we denote with Tn = L or Tn = R,†
The four-tuple qpTpTn qn is an instruction. For example, the instruction q 3 1Bq 5
is executed only if the machine is now in state q 3 and is reading a 1 on the tape.
If so then the machine writes a blank to the tape, replacing the 1, and passes to
state q 5 .
1.1 Example This Turing machine with the character set Σ = { B, 1 } has six instruc-
tions.
Ppred = {q 0 BLq 1 , q 0 1Rq 0 , q 1 BLq 2 , q 1 1Bq 1 , q 2 BRq 3 , q 2 1Lq 2 }
To trace its execution, below we’ve represented this machine in an initial configu-
ration. This shows a stretch of the tape, including all its non-blank contents, along
with the machine’s state and the position of its read-write head.
111
q0
†
Whether we move the tape or the head doesn’t matter, what matters is their relative motion. Thus
Tn = L means that one or the other moves such that the head now points to the location one place to
the left. In drawings we hold the tape steady and move the head because then comparing graphics step
by step is easier.
We take the convention that when we press Start the machine is in state q 0 . The
picture shows it reading 1 so instruction q 0 1Rq 0 applies. Thus the first step is
that the machine moves its tape head right and stays in state q 0 . Below, the first
line shows this and later lines show the machine’s configuration after later steps.
Roughly, the computation slides to the right, blanks out the final 1, and slides back
to the start.
Step Configuration Step Configuration

111 11
1 6
q0 q2
111 11
2 7
q0 q2
111 11
3 8
q0 q2
111 11
4 9
q1 q3
11
5
q1
Next, because there is no state q 3 , no instruction applies and the machine halts.
We can think of this machine as computing the predecessor function
(
x − 1 – if x > 0
pred(x) =
0 – else
because if we initialize the tape so that it contains only a string of n -many 1’s and
the machine’s head points to the first, then at the end the tape will have n − 1-many
1’s (except for n = 0, where the tape will end with no 1’s).
1.2 Example This machine adds two natural numbers.
Padd = {q 0 BBq 1 , q 0 1Rq 0 , q 1 B1q 1 , q 1 11q 2 , q 2 BBq 3 , q 2 1Lq 2 ,

q 3 BRq 3 , q 3 1Bq 4 , q 4 BRq 5 , q 4 11q 5 }
The input numbers are represented by strings of 1’s that are separated with a blank.
The read/write head starts under the first symbol in the first number. This shows
the machine ready to compute 2 + 3.
11 111
q0
The machine scans right, looking for the blank separator. It changes that to a 1,
then scans left until it finds the start. Finally, it trims off a 1 and halts with the
read/write head to the start of the string. Here are the steps.

11 111 111111
1 7
q0 q2
11 111 111111
2 8
q0 q2
11 111 111111
3 9
q1 q3
111111 111111
4 10
q1 q3
111111 11111
5 11
q2 q4
111111 11111
6 12
q2 q5
Instead of giving a machine’s instructions as a list, we can use a table or a
diagram. Here is the transition table for Ppred and its transition graph.
∆ B 1
q0 Lq 1 Rq 0 q0 q1 q2 q3
B, L B, L B, R
q1 Lq 2 Bq 1
q2 Rq 3 Lq 2 1, R 1, B 1, L
q3 – –
And here is the corresponding table and graph for Padd .
∆ B 1
q0 Bq 1 Rq 0
1, 1
q1 1q 1 1q 2 q0 q1 q2 q3 q4 q5
B, B 1, 1 B, B 1, B
q2 Bq 3 Lq 2 B, R
q3 Rq 3 Bq 4 1, R B, 1 1, L B, R
q4 Rq 5 1q 5
q5 – –
The graph is how we will use most often present machines that are small, but if
there are lots of states then it can be visually confusing.
Next, a crucial observation. Some Turing machines, for at least some starting
configurations, never halt.
1.3 Example The machine Pinf loop = {q 0 BBq 0 , q 0 11q 0 } never halts, regardless of the
input.
B, B q0 1, 1
The exercises ask for examples of Turing machines that halt on some inputs and
not on others.
It is high time for definitions. We take a symbol to be something that the device
can write and read, for storage and retrieval.†
1.4 Definition A Turing machine P is a finite set of four-tuple instructions qpTpTn qn .
In an instruction, the present state qp and next state qn are elements of a set
of states Q . The input symbol or current symbol Tp is an element of the tape
alphabet set Σ, which contains at least two members, including one called blank
(and does not contain L or R). The action symbol or next symbol Tn is an element
of the action set Σ ∪ { L, R }.
The set P must be deterministic: different four-tuples cannot begin with the
same qpTp . Thus, over the set of instructions qpTpTn qn ∈ P, the association of
present pair qpTp with next pair Tn qn defines a function, the transition function
or next-state function ∆ : Q × Σ → (Σ ∪ { L, R }) × Q .
We denote a Turing machine with P because the thing from our everyday
experience that a Turing machine is most like is a program.
Of course, the point of a machine is what it does. A Turing machine is a blueprint
for a computation — it is like a program — and so to finish the formalization started
by the definition we give a complete description of how these machines act.
We saw in tracing through Example 1.1 and Example 1.2 that a machine acts
by transitioning from one configuration to the next. A configuration of a Turing
machine is a four-tuple C = ⟨q, s, τL , τR ⟩ , where q is a state, a member of Q , s is a
character from the tape alphabet Σ, and τL and τR are strings of elements from
the tape alphabet, including possibly the empty string ε . These signify the current
state, the character under the read/write head, and the tape contents to the left
and right of the head. For instance, line 2 of the trace table of Example 1.2, where
the state is q = q 0 , the character under the head s is the blank, and to the left
of the head is τL = 11 while to the right is τR = 111, graphically represents the
configuration ⟨q, s, τL , τR ⟩ . That is, a configuration is a snapshot, an instant in a
computation.
We write C (t) for the machine’s configuration after the t -th transition, and say
that this is the configuration at step t . We extend that to step 0, and say that the
initial configuration C (0) is the machine’s configuration before we press Start.
Suppose that at step t a machine P is in configuration C (t) = ⟨q, s, τL , τR ⟩ . To
make the next transition, find an instruction qpTpTn qn ∈ P with qp = q and Tp = s .
If there is no such instruction then at step t + 1 the machine P halts.
†
How the device does this depends on its construction details. For instance, to have a machine with
two symbols, blank and 1, we can either read and write marks on a paper tape, or align magnetic
particles on a plastic tape, or bits on a chip, or we can push LEGO bricks to the left or right side of a
slot. Discreteness ensures that the machine can cleanly distinguish between the symbols, in contrast
with the trouble an instrument might have in distinguishing two values near its limit of resolution.
Otherwise there will be only one such instruction, by determinism. There

are three possibilities. (1) If Tn is a symbol in the tape alphabet set Σ then
the machine writes that symbol to the tape, so that the next configuration is
C (t + 1) = ⟨qn ,Tn , τ L , τR ⟩ . (2) If Tn = L then the machine moves the tape head
to the left. That is, the next configuration is C (t + 1) = ⟨qn , ŝ, τ̂L , τ̂R ⟩ , where ŝ is
the rightmost character of the string τL (if τL = ε then ŝ is the blank character),
where τ̂L is τL with its rightmost character omitted (if τL = ε then τ̂L = ε also),
and where τ̂R is the concatenation of ⟨s⟩ and τR . (3) If Tn = R then the machine
moves the tape head to the right. This is like (2) so we omit the details.
If two configurations are related by being a step apart then we write C (i) ⊢
C (i + 1).† A computation is a sequence C (0) ⊢ C (1) ⊢ C (2) ⊢ · · · . We abbreviate
such a sequence with ⊢∗ .‡ If the computation halts then the sequence has a final
configuration C (h), so we may write a halting computation as C (0) ⊢∗ C (h).
1.5 Example In Example 1.1, the pictures that trace the machine’s execution show the
successive configurations. So the computation is this.
⟨q 0 , 1, ε, 11⟩ ⊢ ⟨q 0 , 1, 1, 1⟩ ⊢ ⟨q 0 , 1, 11, ε⟩ ⊢ ⟨q 0 , B, 111, ε⟩ ⊢ ⟨q 1 , 1, 11, ε⟩

⊢ ⟨q 1 , B, 11, ε⟩ ⊢ ⟨q 2 , 1, 1, ε⟩ ⊢ ⟨q 2 , 1, ε, 1⟩ ⊢ ⟨q 2 , B, ε, 11⟩ ⊢ ⟨q 3 , 1, ε, 1⟩
That description of the action of a Turing machine emphasizes that it is a state
machine — Turing machine computation is about the transitions, the discrete steps
taking one configuration to another.
Effective functions In this chapter’s opening we declared that our interest is not
so much in the machines as it is in the things that they compute. We close with a
definition of the set of functions that are mechanically computable.
A function is an association of inputs with outputs.§ The simplest candidate for
a definition of the function computed by a machine is the association of the string
on the tape when the machine starts with the string on the tape when it halts, if
indeed it does halt. (For the following definition, note that where X is a set of
characters, we use X ∗ to denote the set of finite-length strings of those characters).
1.6 Definition Let P be a Turing machine with tape alphabet Σ, and let Σ0 be
Σ − { B }. The function ϕ P : Σ0 ∗ → Σ0 ∗ computed by P is: for input σ ∈ Σ0 ∗ ,
the output is the string that results from placing σ on an otherwise blank tape,
pointing the read/write head to its left-most symbol, and running the machine. If
P halts, and the non-blank characters are consecutive, and the first character is
under the head, then ϕ P (σ ) is that string.
This illustrates the computation of a function where ϕ(111) = 11111.|| (If there
†
Read the turnstile symbol ⊢ aloud as “yields.” We could, where I is the applicable instruction,
write ⊢I , but we will never need that construct. ‡ Read this aloud as “yields eventually.” § A review of
functions is on page 356. || Mathematicians began the subject by studying the effective computation
of mathematical functions, principally functions from number theory. They often worked with the
simplest tape alphabet, Σ = { B, 1 }. This approach has proven fruitful so researchers still often discuss
the subject in these terms.
is only one machine under discussion then we may omit the subscript and just
write ϕ .)
1 1 1 1 1 1 1 1
q0 7→ qh
That definition has two fine points, both needed to make the input-output
association well-defined. One is that just specifying that the machine starts with σ
on the tape is not enough since the initial position of the head can change the
output.† And, the definition omits blanks from σ and τ since the machine would
not be able to distinguish blanks at the end of those strings from blanks that are
part of the unbounded tape.
The definition says “If P halts . . . ” What if it doesn’t?
1.7 Definition If for a Turing machine the value of a computation is not defined
on some input σ ∈ Σ0 then we say that the function computed by the machine
diverges, written ϕ(σ )↑ (or ϕ(σ ) = ⊥ ). Where the machine does have an
associated output value, we say that its function converges, written ϕ(σ )↓. If ϕ is
defined for each input in Σ0 then it is a total function. If it diverges for at least
one member of Σ0 then it is a partial function.
Very important: note the difference between a machine P and the function
computed by that machine ϕ P .‡ For example, the machine Ppred is a set of four-
tuples but the predecessor function is a set of input-output pairs, which we might
write as x 7→ pred(x). Another example of the difference is that machines halt or
fail to halt, while functions converge or diverge.
That definition appears to only allow functions with a single input and out-
put; what about functions with multiple inputs or outputs? For instance, the
function that takes in two natural numbers a and b and returns ab is intuitively
mechanically computable but isn’t obviously covered.
The trick here is to consider the input string of 1’s to be an encoding, a repre-
sentation, of multiple inputs. For instance, we could set an exponentiation routine
up so that it inputs a string of x -many 1’s, then performs a prime factorization to
get x = 2a 3b · k for some k ∈ N, and then returns ab . In this way we can get a
two-input function from a single input string.
OK then, what about computing with non-numbers? For instance, we may want
to find the shortest path through a graph. In an extension of the prior paragraph,
to compute with a graph we find a way to represent it with a string. Programs
that work with graphs first decode the input string, then compute the answer, and
finish by encoding the answer as a string.
These codings may seem awkward, and they are. (Of course, a programming
language does conversions from decimal to binary and back again that are somewhat
†
Some authors don’t require that the first character in the output is under the head. But this way is
neater. ‡ The introduction to this chapter says that we are most interested in effective functions, ϕ P ,
and that we study machines P with an eye mostly to getting information about what they compute.
similar.) But a Turing machine simply manipulates characters and we externally

provide an interpretation.
When we describe the function computed by a machine we typically omit the
part about interpreting the strings. We might say, “this shows ϕ(3) = 5” rather than,
“this shows ϕ taking a string representing 3 in unary to a string representing 5.”
1.8 Remark Early researchers, working before computing machines were widely
available, needed airtight arguments that there is a mechanical computation of,
say, the function that takes in a number n and returns the n -th prime. So they
worked through the details, demonstrating that their definitions and arguments
accorded with their intuition by building up a large body of evidence. But here
we take a different tack. Our everyday experience with the machines around us is
that they are able to use their alphabet, binary, to get a reasonable representation
of anything that our intuition says is computable. So our development will not
be to fix an encoding and work through the details of demonstrating that certain
functions, which we think are obviously computable, are indeed computable by the
definitions we’ve given. We omit this simply to get sooner to interesting material.
The next section says more.
1.9 Definition A computable function, or recursive function,† is a total or partial

function that is computed by some Turing machine. A computable set, or recursive
set, is one whose characteristic function is computable. A Turing machine decides
for a set if it computes the characteristic function of that set. A relation is
computable just if it is computable as a set.
We close with a summary. We have given a characterization of mechanical
computation. We view it as a process whereby a physical system evolves through
a sequence of discrete steps that are local, meaning that all the action takes
place within one cell of the head. This has led to a precise definition of which
functions are mechanically computable. In the next subsection we will discuss this
characterization, including the evidence that leads to its widespread acceptance.
I.1 Exercises
Unless the exercise says otherwise, assume that Σ = { B, 1 }. Also assume that any
machine must start with its head under the leftmost input character and arrange for
it to end with the head under the leftmost output character.
1.10 How is a Turing machine like a program? How is it unlike a program? How
is it like the kind of computer we have on our desks? How is it unlike?
1.11 Why does the definition of a Turing machine, Definition 1.4, not include a
definition of the tape?
1.12 Your study partner asks you, “The opening paragraphs talk about the Entschei-
dungsproblem, to mechanically determine whether a mathematical statement is
†
The term ‘recursive’ used to be universal but is now old-fashioned.
true or false. I write programs with bits like if (x>3) all the time. What’s the
problem?” Help your friend out.
✓ 1.13 Trace each computation, as in Example 1.5.
(a) The machine Ppred from Example 1.1 when starting on a tape with two 1’s.
(b) The machine Padd from Example 1.2 the addends are 2 and 2.
(c) Give the two computations as configuration sequences, as on section 1.
✓ 1.14 For each of these false statements about Turing machines, briefly explain the
fallacy.
(a) Turing machines are not a complete model of computation because they
can’t do negative numbers.
(b) The problem with Example 1.3 is that the instructions don’t have any extra
states where the machine goes to halt.
(c) For a machine to reach state q 50 it must run for at least fifty one steps.
1.15 We often have some states that are halting states, where we send the machine
solely to make it halt. In this case the others are working states. For instance,
Example 1.1 uses q 3 as a halting state and its working states are q 0 , q 1 , and q 2 .
Name Example 1.2’s halting and working states.
✓ 1.16 Trace the execution of Pinf loop for ten steps, from a blank tape. Show the
sequence of tapes.
1.17 Trace the execution on each input of this Turing machine with alphabet
Σ = { B, 0, 1 } for ten steps, or fewer if it halts.
{q 0 BBq 4 , q 0 0Rq 0 , q 0 1Rq 1 , q 1 BBq 4 , q 1 0Rq 2 , q 1 1Rq 0 , q 2 BBq 4 , q 2 0Rq 0 , q 2 1Rq 3 }
(a) 11 (b) 1011 (c) 110 (d) 1101 (e) ε

✓ 1.18 Give the transition table for the machine in the prior exercise.
✓ 1.19 Write a Turing machine that, if it is started with the tape blank except for a
sequence of 1’s, will replace those with a blank and then halt.
✓ 1.20 Produce Turing machines to perform these Boolean operations, using Σ =
{ B, 0, 1 }. (a) Take the ‘not’ of a bit b ∈ Σ0 = Σ − { B }. That is, convert the input
b = 0 into the output 1, and convert 1 into 0. (b) Take as input two characters
drawn from Σ0 and give as output the single character that is their logical ‘and’.
That is, if the input is 01 then the output should be 0, while if the input is 11 then
the output should be 1. (c) Do the same for ‘or’.
1.21 Give a Turing machine that takes as input a bit string, using the alphabet
{ B, 0, 1 }, and adds 01 at the back.
1.22 Produce a Turing machine that computes the constant function ϕ(x) = 3. It
inputs a number written in unary, so that n is represented as n -many 1’s, and
outputs the number 3 in unary.
✓ 1.23 Produce a Turing machine that computes the successor function, that takes
as input a number n and gives as output the number n + 1 (in unary).
✓ 1.24 Produce a doubler, a Turing machine that computes f (x) = 2x .

(a) Assume that the input and output is in unary. Hint: you can erase the first 1,
move to the end of the 1’s, past a blank, and put down two 1’s. Then move
left until you are at the start of the first sequence of 1’s. Repeat.
(b) Instead assume that the alphabet is Σ = { B, 0, 1 } and the input is represented
in binary.
✓ 1.25 Produce a Turing machine that takes as input a number n written in unary,
represented as n -many 1’s, and if n is odd then it gives as output the number 1 in
unary, with the head under that 1, while if n is even it gives the number 0 (which
in a unary representation means the tape is blank).
1.26 Write a machine P with tape alphabet Σ that, in addition to blank B and stroke
1, also contains the comma ‘,’ character. Where Σ0 = Σ − { B }, if we interpret
the input σ ∈ Σ0 as a comma-separated list of natural numbers represented in
unary, then this machine should return the sum, also in unary. For instance,
ϕ P (1111,,111,1) = 11111111.
1.27 Is there a Turing machine configuration without any predecessor? Restated,
is there a configuration C = ⟨q, s, τL , τR ⟩ for which there does not exist any
configuration Cˆ = ⟨q̂, ŝ, τ̂L , τ̂R ⟩ and instruction I = q̂ ŝ Tn qn such that if a machine
is in configuration Cˆ then instruction I applies and Cˆ ⊢ C ?
1.28 One way to argue that Turing machines can do anything that a modern
CPU can do involves showing how to do all of the CPU’s operations on a Turing
machine. For each, describe a Turing machine that will perform that operation.
You need not produce the machine, just outline the steps. Use the alphabet
Σ = { 0, 1, B }.
(a) Take as input a 4-bit string and do a bitwise NOT, so that each 0 becomes a
1 and each 1 becomes a 0.
(b) Take as input a 4-bit string and do a bitwise circular left shift, so that from
b3b2b1b0 you end with b2b1b0b3 .
(c) Take as input two 4-bit strings and perform a bitwise AND.
✓ 1.29 For each, produce a machine meeting the condition. (a) It halts on exactly
one input. (b) It fails to halt on exactly one input. (c) It halts on infinitely many
inputs, and fails to halt on infinitely many.
1.30 A common alternative definition of Turing machine does not use what is on
the tape when the machine halts. Rather, it designates one state as an accepting
state and one as a rejecting state, and the language decided by the machine is
the set of strings that it accepts. Write a Turing machine with alphabet { B, a, b }
that will halt in state q 3 if the input string contains two consecutive b’s, and will
halt in state q 4 otherwise.
1.31 Definition 1.9 talks about a relation being computable. Consider the ‘less
than or equal’ relation between two natural numbers, i.e., 3 is less than or equal
to 5, but 2 is not less than or equal to 1. Produce a Turing machine with tape
alphabet Σ = { 0, 1, B } that takes in two numbers represented in unary and

outputs τ = 1 if the first number is less than the second, and τ = 0 if not.
1.32 Write a Turing machine that decides if its input is a palindrome, a string that
is the same backward as forward. Use Σ = { B, 0, 1 }. Have the machine end with
a single 1 on the tape if the input was a palindrome, and with a blank tape if not.
1.33 Turing machines tend to have many instructions and to be hard to understand.
So rather than exhibit a machine, people often give an overview. Do that for a
machine that replicates the input: if it is started with the tape blank except for a
contiguous sequence of n -many 1’s, then it will halt with the tape containing two
sequences of n -many 1’s separated by a single blank.
1.34 Show that if a Turing machine has the same configuration at two different
steps then it will never halt. Is that sufficient condition also necessary?
1.35 Show that the steps in the execution of a Turing machine are not necessarily
invertible. That is, produce a Turing machine and a configuration such that if
you are told the machine was brought to that configuration after some number of
steps, and you were asked what was the prior configuration, you couldn’t tell.
Section
I.2 Church’s Thesis
History Algorithms have always played a central role in mathematics. The simplest
example is a formula such as the one giving the height of a ball dropped from the
Leaning Tower of Pisa, h(t) = −4.9t 2 + 56. This is a kind of program: get the
height output by squaring the time input, multiplying by −4.9, and adding 56.
In the 1670’s Gottfried Wilhelm von Leibniz, the co-creator
of Calculus, constructed the first machine that could do addition,
subtraction, multiplication, division, and square roots as well. This
led him to speculate on the possibility of a machine that manipulates
not just numbers but symbols and could thereby determine the
truth of scientific statements. To settle any dispute, Leibniz wrote,
scholars could just say, “Calculemus!”† This is a version of the
Entscheidungsproblem.
The real push to understand computation arose in 1927 from
the Incompleteness theorem of K Gödel. This says that for any
(sufficiently powerful) axiom system there are statements that,
Leibniz’s Stepped while true in any model of the axioms, are not provable from
Reckoner those axioms. Gödel gave an algorithm that inputs the axioms and
outputs the statement. This made evident the need to define what
is ‘algorithmic’ or ‘intuitively mechanically computable’ or ‘effective’.
†
Latin for “Let us calculate!”
Section 2. Church’s Thesis 15
A number of mathematicians proposed formalizations. One was

A Church,† who proposed the λ -calculus. Church and his students used
this system to derive many functions that are intuitively mechanically
computable, including the polynomial functions and number-theoretic
functions such as finding the remainder on division. They could not find
any such function that the λ -calculus could not do. Church suggested
to Gödel, the most prominent expert in the area, that the set of effective
functions, the set of functions that are intuitively mechanically computable, Alonzo Church
which is not precisely given, is the same as the set of functions that are 1903–1995
λ-computable, which is. But Gödel was unconvinced.
That changed when Gödel read Turing’s masterful analysis, outlined in the
prior section. He subsequently wrote, “That this really is the correct definition of
mechanical computability was established beyond any doubt by Turing.”
2.1 Church’s Thesis The set of things that can be computed by a discrete and
deterministic mechanism is the same as the set of things that can be computed by
a Turing machine.‡
Church’s Thesis is central to the Theory of Computation. It says that our
technical results have a larger importance — they describe the devices that are
on our desks and in our pockets. So in this section we pause to expand on some
points, particularly ones that experience has shown can lead to misunderstandings.
Evidence We cannot prove Church’s Thesis. That is, we cannot give a mathematical
proof. The definition of a Turing machine, or of lambda calculus or other equivalent
schemes, formalizes the notion of ‘effective’ or ‘intuitively mechanically computable’.
When a researcher agrees that it correctly explicates ‘computable on a discrete
and deterministic mechanism’ and consents to work within that formalization,
they are then free to proceed with reasoning mathematically about these systems.
So in a sense, Church’s Thesis comes before the mathematics, or at any rate sits
outside the usual derivation and verification work of mathematics. Turing wrote,
“All arguments which can be given are bound to be, fundamentally, appeals to
intuition, and for this reason rather unsatisfactory mathematically.”
Despite not being the conclusion of a deductive system, Church’s Thesis
is very widely accepted. We will give four points in its favor that persuaded
Gödel, Church, and others at the time, and that still persuade researchers
today — coverage, convergence, consistency, and clarity.
First, coverage: everything that people have thought of as intuitively
computable has proven to be computable by a Turing machine. This
Kurt Gödel includes not just the number theoretic functions investigated by researchers
1906–1978 in the 1930’s but also everything ever computed by every program written
†
After producing his machine model, Turing became a PhD student of Church at Princeton.
‡
Some authors call this the Church-Turing Thesis. Here we figure that because Turing has the machine,
we can give Church sole possession of the thesis.
for every existing computer, because all of them can be compiled to run on
a Turing machine.
Despite this weight of evidence, the argument by coverage would collapse if
someone exhibited even one counterexample, one operation that can be done in
finite time on a physically-realizable discreet and deterministic device but that
cannot be done on a Turing machine. So this argument is strong but at least
conceivably not decisive.
The second argument is convergence: in addition to Turing and Church, many
other researchers then and since have proposed models of computation. For
instance, the next section on General Recursive Functions will give us a taste of
another influential model. However, despite this variation, our experience is that
every model yields the same set of computable functions. For instance, Turing
showed that the set of functions computable with his machine model is equal to
the set of functions computable with Church’s λ -calculus.
Now, everyone could be wrong. There could be some systematic error in
thinking around this point. For centuries geometers seemed unable to imagine
the possibility that Euclid’s Parallel Postulate does not hold and perhaps a similar
cultural blindness is happening here. Nonetheless, if a number of very smart
people go off and work independently on a question, and when they come back
you find that while they have taken a wide variety of approaches, they all got the
same answer, then you may well suppose that it is the right answer. At the least,
convergence says that there is something natural and compelling about this set of
functions.
An argument not available to Turing, Church, Gödel, and others in the 1930’s,
since it depends on work done since, is consistency: the details of the definition of
a Turing machine are not essential to what can be computed. For example, we can
show that a one-tape machine can compute all of the functions that can be done
by a machine with two or more tapes. Thus, the fact that Definition 1.4’s machines
have only one tape is not an essential point.
Similarly, machines whose tape is unbounded in only one direction can compute
all the functions computable with a tape unbounded in both directions. And
machines with more than one read/write head compute the same functions as
those with only one. As to symbols, we can compute any intuitively computable
function just by taking a single symbol beyond the blank that covers the all
but finitely-many cells on the starting tape, that is, with Σ = { 1, B }. Likewise,
restricting to write-only machines that cannot change marks once they are on the
tape suffices to compute this set of functions. Also, although restricting to machines
having only one state does not suffice, two-state machines are equipowerful with
the unboundedly-many states machines given in Definition 1.4.
There is one more condition that does not change the set of computable
functions, determinism. Recall that the definition of Turing machine given above
does not allow, say, both of the instructions q 5 1Rq 6 and q 5 1Lq 4 in the same machine,
because they both begin with q 5 1. If we drop this restriction then the class of
machines that we get are called nondeterministic. We will have much more to say
on this later but the collection of nondeterministic Turing machines computes the
same set of functions as does the collection of deterministic machines.
Thus, for any way in which the Turing machine definition seems to make an
arbitrary choice, making a different choice still yields the same set of computable
functions. This is persuasive in that any proper definition of what is computable
should posses this property; for instance, if two-tape machines computed more
functions than one-tape machines and three-tape machines more than those, then
identifying the set of computable functions with those computable by single-tape
machines would be foolish. But as with the prior argument, while this means that
the class of Turing machine-computable functions is natural and wide-ranging, it
still leaves open a small crack of a possibility that the class does not exhaust the
list of functions that are mechanically computable.
The most persuasive single argument for Church’s Thesis — what caused Gödel
to change his mind and what convinces scholars still today — is clarity: Turing’s
analysis is compelling. Gödel noted this in the quote given above and Church felt
the same way, writing that Turing machines have, “the advantage of making the
identification with effectiveness . . . evident immediately.”
What it does not say Church’s Thesis does not say that in all circumstances
the best way to understand a discrete and deterministic computation is via the
Turing machine model. For example, a numerical analyst studying the in-practice
performance of a floating point algorithm should use a computer model that has
registers. Church’s Thesis says that the calculation could in principle be done by a
Turing machine but for this use registers are more felicitous.†
Church’s Thesis also does not say that Turing machines are all there is to any
computation in the sense that if, say, you are studying an automobile antilock
braking system then the Turing machine model accounts for the logical and
arithmetic computations but not the entire system, with sensor inputs and actuator
outputs. S Aaronson has made this point, “Suppose I . . . [argued] that . . .
[Church’s] Thesis fails to capture all of computation, because Turing machines
can’t toast bread. . . . No one ever claimed that a Turing machine could handle
every possible interaction with the external world, without first hooking it up to
suitable peripherals. If you want a Turing machine to toast bread, you need to
connect it to a toaster; then the TM can easily handle the toaster’s internal logic.”
In the same vein, we can get physical devices that supply a stream of random
bits. These are not pseudorandom bits that are computed by a method that is
deterministic but which passes statistical tests. Instead, well-established physics
tells us these bits are truly random. Its relevance here is that Church’s Thesis only
claims that Turing machines model the discrete and deterministic computations
†
Brain scientists also find Turing machines to be not the most suitable model. Note, though, that saying
that an interrupt-driven brain model is a better fit is not the same as saying that the brain operations
could not, in principle, be done using a Turing machine as the substrate.
that we can do after we are given input bits from such a device.
An empirical question? Church’s Thesis posits that Turing machines can do any
computation that is discrete and deterministic. That raises a big question: even
if we accept Church’s Thesis, can we do more by going beyond discrete and
deterministic? For instance, would analog methods — passing lasers through a gas,
say, or some kind of subatomic magic — allow us to compute things that no Turing
machine can compute? Or are these an ultimate in physically-possible machines?
Did Turing, on that day, lying on that grassy river bank, intuit everything that
experiments with reality would ever find to be possible?
For a taste of the conversation, we can prove that there is a case where the wave
equation† has initial conditions that are computable (for the initial real numbers x
there is a program that inputs i ∈ N and outputs the i -th decimal place of x ), but
the unique solution is not computable. So does the wave tank modeled by this
equation compute something that Turing machines cannot? Stated for rhetorical
effect: do the planets in their orbits compute a solution to the Three-Body Problem?
In this case we can object that an experimental apparatus is subject to noise
and measurement problems including a finite number of decimal places in the
instruments, etc. But even if careful analysis of the physics of a wave tank leads us
to discount it as reliably computing a function, we can still wonder whether there
are other apparatuses that would.
This big question remains open. As yet no analysis of a wider notion of physically-
possible mechanical computation in the non-discrete case has the support that
Turing’s analysis has garnered in its more narrow domain. In particular, no one
has yet produced a generally accepted example of a non-discrete mechanism that
computes a function that no Turing machine computes.
We will not pursue this any further, instead only observing that the community
of researchers has weighed in by taking Church’s Thesis as the basis for its work.
For us, ‘computation’ will refer to the kind of work that Turing analyzed. That’s
because we want to think about symbol-pushing, not numerical analysis and not
toast.
Using Church’s Thesis Church’s Thesis asserts that each of the models of com-
putation — for instance, Turing machines, λ calculus, and the general recursive
functions that we will see in the next section — are maximally capable. Here we
emphasize it because it imbues our results with a larger importance. When, for
instance, we will later describe a function for which we can prove that that no
Turing machine can compute it then, with the thesis in mind, we will take the
technical statement to mean that this function cannot be computed by any discrete
and deterministic device.
Another aspect of Church’s Thesis is that because they are each maximally
capable, these models, and others that we won’t describe, therefore all compute
†
A partial differential equation that describes the propagation of waves.
the same things. So we can fix one of them as our preferred formalization and get
on with the mathematical analysis. For this, we choose Turing machines.
Finally, we will also leverage Church’s Thesis to make life easier. As the exercises
in the prior section illustrate, while writing a few Turing machines gives some
insight, after a short while you may well find that doing more machines does not
give any more illumination. Worse, focusing too much on Turing machine details
(or on the low-level details of any computing model) can obscure larger points. So
if we can be clear and rigorous without actually having to handle a mass of detail
then we will be delighted.
Church’s Thesis helps with this. Often when we want to show that something
is computable by a Turing machine, we will first argue that it is intuitively
computable and then cite Church’s Thesis to assert that it is therefore Turing
machine computable. With that, our argument can proceed, “Let P be that
machine . . . ” without us ever having exhibited a set of four-tuple instructions. Of
course, there is some danger that we will get ‘intuitively computable’ wrong but
we all have so much more experience with this than people in the 1930’s that the
danger is minimal. The upside is that we can make rapid progress through the
material; we can get things done.
In many cases, to claim that something is intuitively computable we will produce
a program, or sketch a program, doing that thing. For these we like to use a
modern programming language, and our choice is a Scheme, specifically, Racket.
I.2 Exercises
2.2 Why is it Church’s Thesis instead of Church’s Theorem?
✓ 2.3 We’ve said that the thing from our everyday experience that Turing Machines
are most like is programs. What is the difference: (a) between a Turing
Machine and an algorithm? (b) between a Turing Machine and a computer?
(c) between a program and a computer? (d) between a Turing Machine and a
program?
✓ 2.4 Each of these is frequently voiced on the interwebs as a counterargument to
Church’s Thesis. Explain why each is bogus, said by clueless noobs. Plonk!
(a) Turing machines have an infinite tape so it is not a realistic model.
(b) The total size of the universe is finite, so there are in fact only finitely many
configurations possible for any computing device, whereas a Turing machine
can use more than that many configurations, so it is not a realistic model.
✓ 2.5 One of these is a correct statement of Church’s Thesis, and the others are
not. Which one is right? (a) Anything that can be computed by any mechanism
can be computed by a Turing machine. (b) No human computer, or machine
that mimics a human computer, can out-compute a Turing machine. (c) The
set of things that are computable by a discrete and deterministic mechanism
is the same as the set of things that are computable by a Turing machine.
(d) Every product of a persons mind, or product of a mechanism that mimics the
activity of a person’s mind, can be produced by some Turing machine.
2.6 List two benefits from adopting Church’s Thesis.
✓ 2.7 Refute this objection to Church’s Thesis: “Some computations have unbounded
extent. That is, sometimes we look for our programs to halt but some computations,
such as an operating system, are designed to never halt. The Turing machine is
an inadequate model for these.”
2.8 The computers we use every day are binary. Use Church’s Thesis to argue that
if they were ternary, where instead of bits with two values they used trits with
three, then they would compute exactly the same set of functions.
2.9 Use Church’s thesis to argue that the indicated function exists and is com-
putable.
(a) Suppose that f 0 , f 1 : N → N are computable partial functions. Show that
h : N → N is a computable partial function where h(x) = 1 if x is in the
intersection of the domain of f 0 and the domain of f 1 , and h(x)↑ otherwise.
(b) Do the same as in the prior item, but take the union of the two domains.
(c) Suppose that f : N → N is a computable function that is total. Show that
h : N → N is a computable partial function, where h(x) = 1 if x is in the
range of f and and h(x)↑ otherwise.
(d) Suppose f 0 , f 1 : N → N are computable total functions. Show that their
composition h = f 1 ◦ f 0 is a computable function h : N → N.
(e) Suppose f 0 , f 1 : N → N are computable partial functions. Show that their
composition is a computable partial function f 1 ◦ f 0 : N → N.
✓ 2.10 Suppose that f : N → N is a total computable function. Use Church’s Thesis
to argue that this function is computable.
(
0 – if n is in the range of f
h(n) =
↑ – otherwise
2.11 Let f , д : N → N be computable functions that may be either total or partial

functions. Use Church’s Thesis to argue that this function is computable.
(
1 – if both f (n)↓ and д(n)↓
h(n) =
↑ – otherwise
✓ 2.12 If you allow processes to take infinitely many steps then you can have all
kinds of fun. Suppose that you have infinitely many dollars. Feeling flush you go
to a bar. The Devil is there. He proposes an infinite sequence of transactions, in
each of which he will hand you two dollars and take from you one dollar. (The
first will take 1/2 hour, the second 1/4 hour, etc.) You figure you can’t lose. But
he proves to be particular about the order in which you exchange bills. First he
Section 3. Recursion 21
numbers your bills as 1, 3, 5, . . . At each step he buys your lowest-numbered

bill and pays you with higher-numbered bills. Thus, on the first transaction he
accepts from you bill number 1 and pays you with his own bills, numbered 2
and 4. Next he buys from you bill number 2 and pays you with his bills numbered
6 and 8. How much do you end with?
The remaining exercises involve multitape Turing machines. A good way to define
a k -tape machine is to start with Definition 1.4’s single tape transition function
∆ : Q × Σ → (Σ ∪ { L, R }) × Q and extend it to ∆ : Q × Σk → (Σ ∪ { L, R })k × Q .
Thus, a typical four-tuple for a k = 2-tape machine with alphabet Σ = { 0, 1, B } is
q 4 ⟨1, B⟩ ⟨0, L⟩q 3 . It means that if the machine is in state q 4 and the head on tape 0
is reading 1 while that on tape 1 is reading a blank, then the machine writes 0 to
tape 0, moves left on tape 1, and goes into state q 3 .
2.13 Write the transition table of a two-tape machine to complement a bitstring.
The machine has alphabet { 0, 1, B }. It starts with a string σ of 0’s and 1’s on
tape 0 (the tape 0 head starts under the leftmost bit) and tape 1 is blank. When
it finishes, on on tape 1 is the complement of σ , with input 0’s changed to 1’s and
input 1’s changed to 0’s, and with the tape 1 head under the leftmost bit.
2.14 Write a two-tape Turing machine to take the logical and of two bitstrings.
The machine starts with two same-length strings of 0’s and 1’s on the two tapes.
The tape 0 head starts under the leftmost bit, as does the tape 1 head. When the
machine halts, the tape 1 head is under the leftmost bit of the result (we don’t
care about the tape 0 head).
Section
I.3 Recursion
In the 1930’s researchers other than Turing also saw the need to make precise
the notion of mechanical computability. Here we will outline an approach that is
different than Turing’s, both to give a sense of another approach and because we
will find it useful.†
This approach has a classical mathematics flavor. It lists initial functions that
are intuitively mechanically computable, along with intuitively computable ways
to combine existing functions, to make new functions from old. An example is that
one effective initial function is successor S : N → N described by S (x) = x + 1,
and an effective combiner is function composition. Then the composition S ◦ S ,
the plus-two operation, is also intuitively mechanically computable.
We now introduce another combiner that is intuitively mechanically computable.
Primitive recursion Grade school students learn addition and multipli-

cation as mildly complicated algorithms (“carry the one”). H Grassmann
†
It also has the advantage of not needing the codings discussed for Turing machines since it works
directly with the functions.
Hermann Grass-
produced a more elegant definition. Here is the formula for addition,

plus : N2 → N, which takes as given the successor map, S (n) = n + 1.
(
x – if y = 0
plus(x, y) =
S (plus(x, z)) – if y = S (z) for z ∈ N
This is definition by recursion, since ‘plus’ recurs in its definition.†

A common reaction on first seeing recursion is to wonder whether
it is logically problematic — isn’t defining something in terms of itself
a fallacy? The expansion below shows why the definition is not a
problem: plus(3, 2) is not defined in terms of itself, it is defined in
terms of plus(3, 1). And, plus(3, 1) is defined in terms of plus(3, 0), whose definition
is clearly not a problem.
plus(3, 2) = S (plus(3, 1))
= S (S (plus(3, 0)))
= S (S (3))
=5
The key idea here is to define the function on higher-numbered inputs using only
its values on lower-numbered ones.
One elegant thing about Grassmann approach is that it extends naturally to
other operations. Multiplication has the same form.
(
0 – if y = 0
product(x, y) =
plus(product(x, z), x) – if y = S (z)
3.1 Example The expansion of product(2, 3) reduces to a sum of three 2’s.
product(2, 3) = plus(product(2, 2), 2)

= plus(plus(product(2, 1), 2), 2)
= plus(plus(plus(product(2, 0), 2), 2), 2)
= plus(plus(plus(0, 2), 2), 2)
Exponentiation works the same way.
(
1 – if y = 0
power(x, y) =
product(power(x, z), x) – if y = S (z)
We are interested in Grassmann’s definition because it is effective; it translates

immediately into a program. Here is code based on the definition of plus.‡ Starting
with a successor operation,
† ‡
That is, recursion is discrete feedback. Obviously Racket comes with an addition operator, as in
(+ 3 2) , with a multiplication operator, as in (* 3 2) , as well as other common arithmetic operators.
(define (successor x)
(+ x 1))
this code exactly fits the definition of plus.

(define (plus x y)
(let ((z (- y 1)))
(if (= y 0)
x
(successor (plus x z)))))
(The (let ..) creates the local variable z.) The same is true for product and
power.
(define (product x y)
(let ((z (- y 1)))
(if (= y 0)
0
(plus (product x z) x))))
(define (power x y)
(let ((z (- y 1)))
(if (= y 0)
1
(product (power x z) x))))
3.2 Definition A function f is defined by the schema‡ of primitive recursion from

the functions д and h if it has this form.
(
д(x 0 , ... x k −1 ) – if y = 0
f (x 0 , ... x k −1 , y) =
h(f (x 0 , ... x k −1 , z), x 0 , ... x k −1 , z) – if y = S (z)
The bookkeeping is that the arity of f , the number of inputs, is one more than
the arity of д and one less than the arity of h . We sometimes abbreviate x 0 , ... , x k −1
as x®.
3.3 Example The function plus is defined by primitive recursion from д(x 0 ) = x 0
and h(w, x 0 , z) = S (w). The function product is defined by primitive recursion
from д(x 0 ) = 0 and h(w, x 0 , z) = plus(w, x 0 ). The function power is defined by
primitive recursion from д(x 0 ) = 1 and h(w, x 0 , z) = product(w, x 0 ).
Primitive recursion, along with function composition, suffices to define many
familiar functions.
3.4 Example The predecessor function is like an inverse to successor. However, with
our restriction to the natural numbers we can’t give a predecessor of zero, so
instead consider pred : N → N described by: pred(y) equals y − 1 if y > 0 and
equals 0 if y = 0. This definition fits the primitive recursive schema.
(
0 – if y = 0
pred(y) =
z – if y = S (z)
‡
A schema is an underlying organizational pattern or structure.
The arity bookkeeping is that pred has no x i ’s so д is a function of zero-many

inputs, and is therefore constant, д( ) = 0, while h has two inputs h(a, b) = b .
3.5 Example We can’t do subtractions that result in negative numbers so consider
proper subtraction, denoted x −. y , described by: if x ≥ y then x −. y equals x − y
and otherwise x −. y equals 0. This definition of that function fits the primitive
recursion scheme.
(
x – if y = 0
propersub(x, y) =
pred(propersub(x, z)) – if y = S (z)
In the terms of Definition 3.2, д(x 0 ) = x 0 and h(w, x 0 , z) = pred(w); the bookkeep-
ing works since the arity of д is one less than the arity of f , and, because h has
dummy arguments, its arity is one more than the arity of f .
The computer code above make clear that primitive recursion fits into the plan
of specifying combiners that preserve the property of effectiveness: if д and h are
effective then so is f .
3.6 Definition The set of primitive recursive functions consists of those that can be
derived from the initial operations of the zero function Z (®
x) = Z (x 0 , ... x n−1 ) = 0,
x) = x + 1, and the projection† functions I i (®
the successor function S (® x) = x i , by a
finite number of applications of the combining operations of function composition
and primitive recursion.
Function composition covers not just the simple case of two functions f and д
whose composition is defined by f ◦ д (® x) = f (д(® x)). It also covers the case
of simultaneous substitution, where from f (x 0 , ... , x n ) and h 0 (y1 , ... , ym0 ), . . .,
hn (y1 , ... , ymn ), we get f (h 0 (y0, 0 , ... , y0,m0 ), ... , hn (yn, 0 , ... , yn,mn )), which is a
function with (m 0 + 1) + · · · + (mn + 1)-many inputs.
Besides addition and proper subtraction, we commonly use many other primitive
recursive functions such as finding remainders and testing for less-than. See the
exercises for these. The list is so extensive that a person could wonder whether
every mechanically computed function is primitive recursive. The next section
shows that the answer is no, that there are intuitively mechanically computable
functions that are not primitive recursive.
I.3 Exercises
✓ 3.7 What is the difference between primitive recursion and primitive recursive?
3.8 What is the difference between total recursive and primitive recursive?
3.9 In defining 00 there is a conflict between the desire to have that every power
of 0 is 0 and the desire to have that every number to the 0 power is 1. What does
the definition of power given above do?
†
There are infinitely many projections, one for each pair of natural numbers n, i . Projection is a
generalization of the identity function, which is why we use the use the letter I .
✓ 3.10 As the section body describes, recursion doesn’t have to be logically problem-
atic. But some recursions are; consider this one.
(
0 – if n = 0
f (n) =
f (2n − 2) – otherwise
(a) Find f (0) and f (1). (b) Try to find f (2).

3.11 Consider this function.
(
42 – if y = 0
F (y) =
F (y − 1) – otherwise
(a) Find F (0), . . . F (10).

(b) Show that F is primitive recursive by describing it in the form given in
Definition 3.2, giving suitable functions д and h (Hint: д is a function of no
arguments, a constant). You can use functions already defined in this section.
3.12 The function plus_two : N → N adds two to its input. Show that it is a
primitive recursive function.
3.13 The Boolean function is_zero inputs natural numbers and return T if the input
is zero, and F otherwise. Give a definition by primitive recursion, representing
T with 1 and F with 0. Hint: you only need a zero function, successor, and the
schema of primitive recursion.
✓ 3.14 These are the triangular numbers because if you make a square that has
n dots on a side and divide it down the diagonal, including the diagonal, then the
triangle that you get has t(n) dots.
(
0 – if y = 0
t(y) =
y + t(y − 1) – otherwise
(a) Find t(0), . . . t(10).

(b) Show that t is primitive recursive by describing it in the form given in
Definition 3.2, giving suitable functions д and h (Hint: д is a function of no
✓ 3.15 This is the first sequence of numbers ever computed on an electronic com-
puter. (
0 – if y = 0
s(y) =
s(y − 1) + 2y − 1 – otherwise
(a) Find s(0), . . . s(10).
(b) Verify that t is primitive recursive by putting it in the form given in Def-
inition 3.2, giving suitable functions д and h (Hint: д is a function of no
3.16 Consider this recurrence.

(
0 – if y = 0
d(y) =
s(y − 1) + 3y 2 + 3y + 1 – otherwise
(a) Find d(0), . . . d(5).

(b) Verify that d is primitive recursive by putting it in the form given in Def-
✓ 3.17 The Towers of Hanoi is a famous puzzle: In the great temple at Benares . . .
beneath the dome which marks the center of the world, rests a brass plate in which
are fixed three diamond needles, each a cubit high and as thick as the body of a
bee. On one of these needles, at the creation, God placed sixty-four discs of pure
gold, the largest disc resting on the brass plate, and the others getting smaller and
smaller up to the top one. This is the Tower of Brahma. Day and night unceasingly
the priests transfer the discs from one diamond needle to another according to the
fixed and immutable laws of Brahma, which require that the priest on duty must not
move more than one disc at a time and that he must place this disc on a needle so
that there is no smaller disc below it. When the sixty-four discs shall have been thus
transferred from the needle on which at the creation God placed them to one of the
other needles, tower, temple, and Brahmans alike will crumble into dust, and with a
thunderclap the world will vanish. It gives the recurrence below because to move
a pile of discs you first move to one side all but the bottom, which takes H (n − 1)
steps, then move that bottom one, which takes one step, then re-move the other
disks into place on top of it, taking another H (n − 1) steps.
(
1 – if n = 1
H (n) =
2 · H (n − 1) + 1 – if n > 0
(a) Compute the values for n = 1, . . ., 10.

(b) Verify that H is primitive recursive by putting it in the form given in Def-
3.18 Define the factorial function fact(y) = y · (y − 1) · · · 1 by primitive recursion,
using product and a constant function.
✓ 3.19 Recall that the greatest common divisor of two positive integers is the largest
integer that divides them both. We can compute the greatest common divisor
using Euclid’s recursion
(
n – if m = 0
gcd(n, m) =
gcd(m, rem(n, m)) – if m > 0
where rem(a, b) is the remainder when a is divided by b . Note that this fits
the schema of primitive recursion. Use Euclid’s method to compute these.

(a) gcd(28, 12) (b) gcd(104, 20) (c) gcd(309, 25)
✓ 3.20 Many familiar mathematical operations are primitive recursive. Show that
these functions and predicates are in the collection. (A predicate is a truth-valued
function; we take an output of 1 to mean ‘true’ while 0 is ‘false’.) For each you
may use functions already shown to be primitive recursive in the subsection body,
or in a prior item. As the definition states, you must use some combination of the
zero function, sucessor, projection, function composition, and primitive recursion
to define each function.
(a) Constant function: for k ∈ N, Ck (® x) = Ck (x 0 , ... x n−1 ) = k . Hint: instead of
doing the general case with k , try C 4 (x 0 , x 1 ), the function that returns 4 for
all input pairs. Also, for this you need only use the zero function, successor,
and function composition.
(b) Maximum and minimum of two numbers: max(x, y) and min(x, y). Hint: use
addition and proper subtraction.
(c) Absolute difference function: absdiff (x, y) = |x − y| .
(d) Sign predicate: sign(y), which gives 1 if y is greater than zero and 0
otherwise.
(e) Negation of the sign predicate: negsign(y), which gives 0 if y is greater than
zero and 1 otherwise.
(f) Less-than predicate: lessthan(x, y) = 1 if x is less than y , and 0 otherwise.
(The greater-than predicate is similar.)
✓ 3.21 Show that each of these is a primitive recursive function. You can use
functions from this section already shown to be primitive recursive, or functions
from the exercises prior to this one.
(a) Boolean functions: where x, y are inputs with values 0 or 1 there is the
standard one-input function
(
1 – if x = 0
not(x) =
0 – otherwise
and two-input functions.

( (
1 – if x = y = 1 0 – if x = y = 0
and(x, y) = or(x, y) =
0 – otherwise 1 – otherwise
(b) Equality predicate: equal(x, y) = 1 if x = y and 0 otherwise.

(c) Inequality predicate: notequal(x, y) = 0 if x = y and 1 otherwise.
(d) Functions defined by a finite and fixed number of cases, as with these.
 7 if x = 1  7 if x = 1 and y = 2

 


 

m(x) = 9 if x = 5 n(x, y) = 9 if x = 5 and y = 5
 0 otherwise  0 otherwise

 

 
✓ 3.22 We will show that the function rem(a, b) giving the remainder when a is
divided by b is primitive recursive.
(a) Fill in this table.
a 0 1 2 3 4 5 6 7
rem(a, 3)
(b) Observe that rem(a + 1, 3) = rem(a) + 1 for many of the entries. When is
this relationship not true?
(c) Fill in the blanks.


 – if a = 0


rem(a, 3) = – if a = S(z) and rem(z, 3) + 1 = 3
– if a = S(z) and rem(z, 3) + 1 , 3




(d) Show that rem(a, 3) is primitive recursive. You can use the prior item, along
with any functions shown to be primitive recursive in the section body,
Exercise 3.20 and Exercise 3.21. (Compared with Definition 3.2, here the two
arguments are switched, which is only a typographic difference.)
(e) Extend the prior item to show that rem(a, b) is primitive recursive.
3.23 The function div : N2 → N gives the integer part of the division of the first
argument by the second. Thus, div(5, 3) = 1 and div(10, 3) = 3.
(a) Fill in this table.
a 0 1 2 3 4 5 6 7 8 9 10
div(a, 3)
(b) Much of the time div(a + 1, 3) = div(a, 3). Under what circumstance does it
not happen?
(c) Show that div(a, 3) is primitive recursive. You can use the prior exercise,
along with any functions shown to be primitive recursive in the section body,
Exercise 3.20 and Exercise 3.21. (Compared with Definition 3.2, here the two
arguments are switched, which is only a difference of appearance.)
(d) Show that div(a, b) is primitive recursive.
3.24 Show that each of these is primitive recursive. You may use any function
shown to be primitive recursive in the section body, in the prior exercise, or in a
prior item.
(a) Bounded sum function: the partial sums of a series whose terms д(i) are
given by a primitive recursive function, Sд (y) = 0 ≤i <y д(i) = д(0) + д(1) +
Í
· · · + д(y − 1) (the sum of zero-many terms is Sд (0) = 0). Contrast this with
the final item of the prior question; here the number of summands is finite
but not fixed.
(b) Bounded product function: the partial products of a series whose terms
д(i) are given by a primitive recursive function, Pд (y) = 0 ≤i <y д(i) =
Î
д(0) · д(1) · · · д(y − 1) (the product of zero-many terms is Pд (0) = 1).
(c) Bounded minimization: let m ∈ N and let p(® x, i) be a predicate. Then the
minimization operator M(® x, i), typically written µ m i[p(®x, i)], returns the
x, i) = 0, or else returns m. Hint: Consider the
smallest i ≤ m such that p(®
bounded sum of the bounded products of the predicates.
3.25 Show that each is a primitive recursive function. You can use functions from
this section or functions from the prior exercises.
(a) Bounded universal quantification: suppose that m ∈ N and that p(® x, i) is
a predicate. Then U (® x, m), typically written ∀i ≤m p(® x, i), has value 1 if
x, 0) = 1, ... , p(®
p(® x, m) = 1 and value 0 otherwise. (The point of writing
the functional expression U (® x, m) is to emphasize the required uniformity.
Stating one formula for the m = 1 case, p(® x, 0) · p(®
x, 1), and another for the
m = 2 case, p(® x, 0) · p(®
x, 1) · p(®
x, 2), etc., is the best we can do. We can get
a single derivation, that follows the rules in Definition 3.6, and that works
for all m .)
(b) Bounded existential quantification: let m ∈ N and let p(® x, i) be a predi-
cate. Then A(® x, m), typically written ∃i ≤m p(® x, i), has value 1 if p(® x, 0) =
0, ... , p(®
x, m) = 0 is not true, and has value 0 otherwise.
(c) Divides predicate: where x, y ∈ N we have divides(x, y) if there is some k ∈ N
with y = x · k .
(d) Primality predicate: prime(y) if y has no nontrivial divisor.
3.26 The floor function f (x/y) = ⌊x/y⌋ returns the largest natural number less
than or equal to x/y . Show that it is primitive recursive. Hint: you may use
any function defined in the section or stated in a prior exercise but bounded
minimization is the place to start.
3.27 In 1202 Fibonacci asked: A certain man put a pair of rabbits in a place
surrounded on all sides by a wall. How many pairs of rabbits can be produced from
that pair in a year if it is supposed that every month each pair begets a new pair
which from the second month on becomes productive? This leads to a recurrence.
(
1 – if n = 0 or n = 1
F (n) =
F (n − 1) + F (n − 2) – otherwise
(a) Compute F (0) through F (10). (Note: this is not now in a form that matches
the primitive recursion schema, although we could rewrite it that way using
Exercise 3.20 and Exercise 3.24.)
(b) Show that F is primitive recursive. You may use the results from earlier,
including Exercise 3.20, 3.21, 3.24, and 3.25.
3.28 Let C(x, y) = 0 + 1 + 2 + · · · + (x + y) + y .
(a) Make a table of the values of C(x, y) for 0 ≤ x ≤ 4 and 0 ≤ y ≤ 4.
(b) Show that C(x, y) is primitive recursive. You can use the functions shown
to be primitive recursive in the section body, along with Exercise 3.20,
Exercise 3.20, Exercise 3.25, and Exercise 3.25.
3.29 Pascal’s Triangle gives the coefficients of the powers of x in the expansion
of (x + 1)n . For example, (x + 1)2 = x 2 + 2x + 1 and row two of the triangle is
⟨1, 2, 1⟩ . This recurrence gives the value at row n, entry m , where m, n ∈ N.
0 – if m > n




P(n, m) = 1 – if m = 0 or m = n
P(n − 1, m) + P(n − 1, m − 1) – otherwise



(a) Compute P(3, 2).
(b) Compute the other entries from row three: P(3, 0), P(3, 1), and P(3, 3).
(c) Compute the entries in row four.
(d) Show that this is primitive recursive. You may use the results from Exer-
cise 3.20 and Exercise 3.24.
✓ 3.30 This is McCarthy’s 91 function.
(
M(M(x + 11)) – if x ≤ 100
M(x) =
x − 10 – if x > 100
(a) What is the output for inputs x ∈ { 0, ... 101 }? For larger inputs? (You may
want to write a small script.)
(b) Use the prior item to show that this function is primitive recursive. You may
use the results from Exercise 3.20.
3.31 Show that every primitive recursive function is total.
3.32 Let д, h be natural number functions (that are total). Where f is defined by
primitive recursion from д and h , show that f is well-defined. That is, show that
if two functions both satisfy Definition 3.2 then they are equal, so the same inputs
they will yield the same outputs.
Section
I.4 General recursion
Every primitive recursive function is intuitively mechanically computable. What
about the converse: is every intuitively mechanically computable function primitive
recursive? In this section we will answer ‘no’.†
Ackermann functions One reason to think that there are functions that are
intuitively mechanically computable but are not primitive recursive is that some
mechanically computable functions are partial, meaning that they do not have an
output for some inputs, but all primitive recursive functions are total.
We could try to patch this, perhaps with: for any f that is intuitively mechanically
computable consider the function fˆ whose output is 0 if f (x) is not defined, and
†
That’s why the diminutive ‘primitive’ is in the name — while the class is interesting and important, it
isn’t big enough to contain every effective function.
Section 4. General recursion 31
whose output otherwise is fˆ(x) = f (x) + 1. Then fˆ is a total function that in a

sense has the same computational content as f . Were we able to show that any
such fˆ is primitive recursive then we would have simulated f with a primitive
recursive function. However, no such patch is possible. We will now give a function
that is intuitively mechanically computable and total but that is not primitive
recursive.
An important aspect of this function is that it arises naturally, so we will
develop it from familiar operations. Recall that the addition operation is repeated
successor, that multiplication is repeated addition, and that exponentiation is
repeated multiplication.
x + y = S ( S ( · · · S (x))) x ·y = x +x + ··· +x xy = x · x · · · · · x
| {z } | {z } | {z }
y many y many y many
This is a compelling pattern.

The pattern is especially compelling when we express these functions in the
form of the schema of primitive recursion. Start by letting H0 be the successor
function, H0 = S .
(
x – if y = 0
plus(x, y) = H1 (x, y) =
H0 (x, H1 (x, y − 1)) – otherwise
(
0 – if y = 0
product(x, y) = H2 (x, y) =
(
1 – if y = 0
power(x, y) = H3 (x, y) =
The pattern shows in the ‘otherwise’ lines. Each one satisfies that Hn (x, y) =
Hn−1 (x, Hn (x, y − 1)). Because of this pattern we call each Hn the level n function,
so that addition is the level 1 operation, multiplication is the level 2 operation, and
exponentiation is level 3. These ‘otherwise’ lines step the function up from level to
level. The definition below takes n as a parameter, writing H(n, x, y) in place of
Hn (x, y), to get all the levels into one formula.
4.1 Definition This is the hyperoperation H : N3 → N.


y+1 – if n = 0
– if n = 1 and y = 0

x




H(n, x, y) = 0 – if n = 2 and y = 0
1 – if n > 2 and y = 0





 H(n − 1, x, H(n, x, y − 1)) – otherwise


4.2 Lemma H0 (x, y) = y + 1, H1 (x, y) = x + y , H2 (x, y) = x · y , H3 (x, y) = x y .

Proof The level 0 statement H0 (x, y) = y + 1 is in the definition of H.
We prove the level 1 statement H1 (x, y) = x + y by induction on y . For the
y = 0 base step, the definition is that H(1, x, 0) = x , which equals x + 0 = x + y .
For the inductive step, assume that the statement holds for y = 0, . . ., y = k and
consider the y = k + 1 case. The definition is H1 (x, k + 1) = H0 (x, H1 (x, k)). Apply
the inductive hypothesis to get H0 (x, x + k). By the prior paragraph this equals
x + k + 1 = x + y.
The other two, H2 and H3 , are Exercise 4.13.
4.3 Remark Level 4, the level above exponentiation, is tetration. The first few values
are H4 (x, 0) = H3 (x, 1) = x 1 = x , and H4 (x, 1) = H3 (x, H4 (x, 0)) = x 1 = x , and
H4 (x, 2) = H3 (x, H4 (x, 1)) = x x , as well as these two.
x xx
H4 (x, 3) = H3 (x, H4 (x, 2)) = x x H4 (x, 4) = x x
This is a power tower. To evaluate these, recall that in exponentiation the

parentheses are significant, so for instance these two are unequal: (33 )3 = 273 =
3
39 = 19 683 and 3(3 ) = 327 = 7 625 597 484 987. Tetration does it in the second,
larger, way. The rapid growth of the output values is a striking aspect of tetration,
and of the hyperoperation in general. For instance, H3 (4, 4) is much greater than
the number of elementary particles in the universe.
Hyperoperation is mechanically computable. Its code is a transcription of the
definition.
(define (H n x y)
(cond
[(= n 0) (+ y 1)]
[(and (= n 1) (= y 0)) x]
[(and (= n 2) (= y 0)) 0]
[(and (> n 2) (= y 0)) 1]
[else (H (- n 1) x (H n x (- y 1)))]))
However, hyperoperation’s recursion line
H(n, x, y) = H(n − 1, x, H(n, x, y − 1))

does not fit the form of primitive recursion.
f (x 0 , ... , x k −1 , y) = h(f (x 0 , ... , x k −1 , y − 1), x 0 , ... , x k −1 , y − 1)
The problem is not that the arguments are in a different order; that is cosmetic. The
reason H does not work as h is that the definition of primitive recursive function,
Definition 3.2, requires that h be a function for which we already have a primitive
recursive derivation.
Of course, just because one definition has the wrong form doesn’t mean
that there is no definition with the right form. However, Ackermann†
proved that there isn’t, that H is not primitive recursive. The proof is a
detour for us so it is in an Extra Section but in summary: H grows faster
than any primitive recursive function. That is, for any primitive recursive
function f of three inputs, there is a sufficiently large N ∈ N such that
for all n, x, y ∈ N, if n, x, y > N then H(n, x, y) > f (n, x, y). This proof is
about uniformity. At every level, the function Hn is primitive recursive but Wilhelm
Acker-
no primitive recursive function encompasses all levels at once — there is no
mann
single, uniform, primitive recursive way to compute them all.
1896–1962
4.4 Theorem The hyperoperation H is not primitive recursive.
This relates to a point from the discussion of Church’s Thesis. We have
observed that if a function is primitive recursive then it is intuitively mechanically
computable. We have built a pile of natural and interesting functions that are
intuitively mechanically computable, and demonstrated that they are primitive
recursive. So ‘primitive recursive’ may seem to have many of the same characteristics
as ‘Turing machine computable’. The difference is that we now have an intuitively
mechanically computable function that is not primitive recursive. That is, ‘primitive
recursive’ fails the test that in the Church’s Thesis discussion we called coverage.
To cover all mechanically computable functions under a recursive rubric we need
to expand from primitive recursive functions to a larger set.
µ recursion The right direction is hinted at in Exercise 3.24 and Exercise 3.25.
Primitive recursion does bounded operations. We can show that a programming
language having only bounded loops computes all of the primitive recursive
functions; see the Extra section. To include every function that is intuitively
mechanically computable we must add unbounded operations.
4.5 Definition Suppose that д : Nn+1 → N is total, so that for every input n -
tuple there is a defined output number. Then f : Nn → N is defined from д by
minimization or µ -recursion, written f (® x, y) = 0 ,† if f (®
x) = µy д(® x) is the the
least number y such that д(®
x, y) = 0.
†
We have seen Ackermann already, as one of the people who stated the Entscheidungsproblem. Functions
having the same recursion as H are Ackermann functions. † Recall that x® abbreviates x 0, ... x n−1 .
This is unbounded search: we have in mind the case that д is mechanically

computable, perhaps even primitive recursive, and we find д(® x, 0) and then д(®
x, 1),
etc., waiting until one of them gives the output 0. If that ever happens, so that
x, n) = 0 for some least n , then f (®
д(® x) = n. If it never happens that the output is
zero then f (®
x) is undefined.
4.6 Example The polynomial p(y) = y 2 + y + 41 looks interesting because it seems,
at least at the start, to output only primes.
y 0 1 2 3 4 5 6 7 8 9
p(y) 41 43 47 53 61 71 83 97 113 131
We could think to test this with a program that searches for non-primes by trying
p(0), p(1) . . . Start with a function that computes quadratic polynomials,
x, y) = p(x 0 , x 1 , x 2 , y) = x 2y 2 + x 1y + x 0 and consider a test for the primality of
p(®
the output. (
0 – if p(®
x, y) is prime
д(®
x, y) =
1 – otherwise
Now, do the search with f (® x, y) = 0 .

x) = µy д(®
Some code illustrates an important point. Start with a test for primality,
(define (prime? n)
(define (prime-helper n c)
(cond [(< n (* c c)) 0]
[(zero? (modulo n c)) 1]
[else (prime-helper n (add1 c))]))
(prime-helper n 2))
and a way to compute the output of y 7→ x 2y 2 + x 1y + x 0 .

(define (p x0 x1 x2 y)
(+ (* x2 y y) (* x1 y) x0))
Now, this is д.
(define (g-sub-p x0 x1 x2 y)
(prime? (p x0 x1 x2 y)))
It is called g-sub-p because p is hard-coded into the source. Likewise the search
routine has g-sub-p baked in. That is the point the definition makes with “ f is
defined from д.”
(define (f-sub-g x0 x1 x2)
(define (f-sub-g-helper y)
(if (= 0 (g-sub-p x0 x1 x2 y))
y
(f-sub-g-helper (add1 y))))
(let ([y 0])

(f-sub-g-helper y)))
With that, the search function finds that the polynomial above returns some
non-primes.
> (f-sub-g 1 1 41)
40
Unbounded search is a theme in the Theory of Computation. For instance, we

will later consider the question of which programs halt and a natural way for a
program to not halt is because it is looking for something that is not there.
Using the minimization operator we can get functions whose output value is
undefined for some inputs.
4.7 Example If д(x, y) = 1 for all x, y ∈ N then f (x) = µy[д(x, y) = 0] is undefined
for all x .
4.8 Definition A function is general recursive or partial recursive, or µ -recursive,
or just recursive, if it can be derived from the initial operations of the zero
function Z (® x) = 0, the successor function S (x) = x + 1, and the projection
functions I i (x 0 , ... , x i ... x k −1 ) = x i by a finite number of applications of function
composition, the schema of primitive recursion, and minimization.
S Kleene showed that the set of functions satisfying this definition is the same
as the set given in Definition 1.9, of computable functions.
I.4 Exercises
Some of these have answers that are tedious to compute. You should use a computer,
for instance by writing a script or using Sage.
✓ 4.9 Find the value of H4 (2, 0), H4 (2, 1), H4 (2, 2), H4 (2, 3), and H4 (2, 4).
4.10 Graph H1 (2, y) up to y = 9. Also graph H2 (2, y) and H3 (2, y) over the same
range. Put all three plots on the same axes.
✓ 4.11 How many years is H4 (3, 3) seconds?
4.12 What is the ratio H3 (3, 3)/H2 (2, 2)?
✓ 4.13 Finish the proof of Lemma 4.2 by verifying that H2 (x, y) = x · y and
H3 (x, y) = x y .
4.14 This variant of H is often labeled “the” Ackermann function.
y + 1 – if k = 0




A(k, y) = A(k − 1, 1) – if y = 0 and k > 0
 A(k − 1, A(k, y − 1)) – otherwise



It has different boundary conditions but the same recursion, the same bottom line.
(In general, any function with that recursion is an Ackermann function. More
about this variant is on Extra D.) Compute A(k, y) for 0 ≤ k < 4 and 0 ≤ y < 6.
4.15 Prove that the computation of H(n, x, y) always terminates.
4.16 In defining general recursive functions, Definition 4.8, we get all computable
functions by starting with the primitive recursive functions and adding minimiza-
tion. What if instead of minimization we had added Ackermann’s function; would
we then have all computable functions?
✓ 4.17 Let д(x, y) = x + y and let f (x) = µy д(x, y) = 100 . For each, find the

value or say that it is not defined. (a) f (0) (b) f (1) (c) f (50) (d) f (100)
(e) f (101) Give an expression for f that does not include µ -recursion.
4.18 Let д(x, y) = ⌈(x + 1)/(y + 1) − 1⌉ and let f (x) = µy д(x, y) = 0 .

(a) Find f (x) for 0 ≤ x < 6.
(b) Give a description of f that does not use µ -recursion.
4.19 (a) Prove that the function remtwo : N → { 0, 1 } giving the remainder on
division by two is primitive recursive.
(b) Use that to prove that this function is µ -recursive: f (n) = 1 if n is even, and
f (n)↑ if n is odd.
✓ 4.20 Consider the Turing machine P = {q 0 B1q 1 , q 0 1Rq 0 , q 1 BRq 2 , q 1 1Lq 1 }. De-
fine д(x, y) = 0 if the machine P , when started on a tape that is blank except for
x -many consecutive 1’s and with the head under the leftmost 1, has halted after
step y . Otherwise, д(x, y) = 1. Find f (x) = µy д(x, y) = 0 for x < 6.
✓ 4.21 Define д(x, y) by: start P = {q 0 B1q 2 , q 0 1Lq 1 , q 1 B1q 2 , q 1 11q 2 } on a tape
that is blank except for x -many consecutive 1’s and with the head under the
leftmost 1. If Phas halted after step y then д(x, y) = 0 and otherwise д(x, y) = 1.
Let f (x) = µy д(x, y) = 0 . Find f (x) for x < 6. (This machine does the same

task as the one in the prior exercise, but faster.)
4.22 Consider this Turing machine.
{q 0 BRq 1 , q 0 1Rq 1 , q 1 BRq 2 , q 1 1Rq 2 , q 2 BLq 3 , q 2 1Lq 3 , q 3 BLq 4 , q 3 1Lq 4 }
Let д(x, y) = 0 if this machine, when started on a tape that is all blank except for
x -many consecutive 1’s and with the head under the leftmost 1, has halted after
y steps. Otherwise, д(x, y) = 1. Let f (x) = µy д(x, y) = 0 . Find: (a) f (0)
(b) f (1) (c) f (2) (d) f (x).
✓ 4.23 Define (
n/2 – if n is even
h(n) =
3n + 1 – else
and let H (n, k) be the k -fold composition of h with itself, so H (n, 1) = h(n),
H (n, 2) = h ◦ h (n), H (n, 3) = h ◦ h ◦ h (n), etc. (We can take
H (n, 0) = 0,
although its value isn’t interesting.) Let C(n) = µk H (n, k) = 1 .
(a) Compute H (4, 1), H (4, 2), and H (4, 3).
(b) Find C(4), if it is defined.
(c) Find C(5), it is defined.
(d) Find C(11), it is defined.
Extra A. Turing machine simulator 37
(e) Find C(n) for all n ∈ [1 .. 20), where defined.

The Collatz conjecture is that C(n) is defined for all n . No one knows if it is true.
Extra
I.A Turing machine simulator
Writing code to simulate a Turing Machine is a reasonable programming project.
Here we exhibit an implementation. It has three design goals. The main one is
to track closely the description of the action of a Turing machine on section 1.
Secondary goals are to output a picture of the configuration after each step, and to
be easy to understand for a reader new to Racket.
We earlier saw this Turing machine that computes the predecessor function.
Ppred = {q 0 BLq 1 , q 0 1Rq 0 , q 1 BLq 2 , q 1 1Bq 1 , q 2 BRq 3 , q 2 1Lq 2 }
To simulate it, the program will use this file.

0 B L 1
0 1 R 0
1 B L 2
1 1 B 1
2 B R 3
2 1 L 2
Thus the simulator for any particular Turing machine is really the pair consisting
of the code shown below along with this machine’s file description.
The data structure for a Turing machine is the simplest one, a list of instructions.
For the instructions, the program converts each of the above six lines into a list with
four members, a number, two characters, and a number. Thus, a Turing machine
is stored as a list of lists. The above machine is this (the line break is there only to
make it fit in the margins).
'((0 #\B #\L 1) (0 #\1 #\R 0) (1 #\B #\L 2) (1 #\1 #\B 1)
(2 #\B #\R 3) (2 #\1 #\L 2))
After some convenience constants

(define BLANK #\B) ;; Easier to read than space
(define STROKE #\1) ;;
(define LEFT #\L) ;; Move tape pointer left
(define RIGHT #\R) ;; Move tape pointer right
we define a configuration.
;; A configuration is a list of four things:
;; the current state, as a natural number
;; the symbol being read, a character
;; the contents of the tape to the left of the head, as a list of characters
;; the contents of the tape to the right of the head, as a list of characters
(define (make-config state char left-tape-list right-tape-list)
(list state char left-tape-list right-tape-list))
(define (get-current-state config) (first config))

(define (get-current-symbol config)
(let ([cs (second config)]) ;; make horizontal whitespace like a B
(if (char-blank? cs)

#\B
cs)))
(define (get-left-tape-list config) (third config))
(define (get-right-tape-list config) (fourth config))
Note that get-current-symbol translates any blank character to a B.

The heart of a Turing machine is its ∆ function, which inputs the current state
and current tape symbol and returns the action to be taken — either L, or R, or a
character from the tape alphabet — and the next state.
;; delta Find the applicable instruction
(define (delta tm current-state tape-symbol)
(define (delta-test inst)
(and (= current-state (first inst))
(equal? tape-symbol (second inst))))
(let ([inst (findf delta-test tm)])

(if (not inst)
(list #\X HALT-STATE) ;; X is arbitrary placeholder char
(list (third inst) (fourth inst)))))
(The Racket function findf searches through tm for a member on which delta-test
returns a value of true.)
Turing machine work discretely, step by step. If there is no relevant instruction
then the machine halts, and otherwise it moves one cell left, one cell right, or
writes one character.
;; step Do one step; from a config and the tm, yield the next config
(define (step config tm)
(let* ([current-state (get-current-state config)]
[left-tape-list (get-left-tape-list config)]
[current-symbol (get-current-symbol config)]
[right-tape-list (get-right-tape-list config)]
[action-next-state (delta tm current-state current-symbol)]
[action (first action-next-state)]
[next-state (second action-next-state)])
(cond
[(char=? LEFT action) (move-left config next-state)]
[(char=? RIGHT action) (move-right config next-state)]
[else (make-config next-state
action ;; not L or R so it is in tape alphabet
left-tape-list
right-tape-list)])))
Because moving left and right are more complicated, they are in separate routines.
;; tape-right-char Return the element nearest the head on the right side
(define (tape-right-char right-tape-list)
(if (empty? right-tape-list)
BLANK
(car right-tape-list)))
;; tape-left-char Return the element nearest the head on the left

(define (tape-left-char left-tape-list)
(tape-right-char (reverse left-tape-list)))
;; tape-right-pop Return the right tape list without char nearest the head
(define (tape-right-pop right-tape-list)
(if (empty? right-tape-list)
'()
Extra A. Turing machine simulator 39
(cdr right-tape-list)))
;; tape-left-pop Return the left tape list without char nearest the head
(define (tape-left-pop left-tape-list)
(reverse (tape-right-pop (reverse left-tape-list))))
;; move-left Respond to Left action

(define (move-left config next-state)
(let ([left-tape-list (get-left-tape-list config)]
[prior-current-symbol (get-current-symbol config)]
[right-tape-list (get-right-tape-list config)])
;; push old tape head symbol onto the right tape list
(make-config next-state
(tape-left-char left-tape-list) ;; new current symbol
(tape-left-pop left-tape-list) ;; strip symbol off left
(cons prior-current-symbol right-tape-list))))
;; move-right Respond to Right action

(define (move-right config next-state)
(let ([left-tape-list (get-left-tape-list config)]
[prior-current-symbol (get-current-symbol config)]
[right-tape-list (get-right-tape-list config)])
;; push old head symbol onto the left tape list
(make-config next-state
(tape-right-char right-tape-list) ;; new current symbol
(reverse (cons prior-current-symbol (reverse left-tape-list)))
(tape-right-pop right-tape-list)))) ;; strip symbol off right
Finally, the implementation executes the machine by iterating the operation of

a single step.
;; execute Run a turing machine step-by-step until it halts
(define (execute tm initial-config)
(define (execute-helper config s)
(if (= (get-current-state config) HALT-STATE)
(fprintf (current-output-port)
"step ~s: HALT\n"
s)
(begin
(fprintf (current-output-port)
"step ~s: ~a\n"
s
(configuration->string config))
(execute-helper (step config tm) (add1 s)))))
(execute-helper initial-config 0))
The execute routine calls the following one to give a simple picture of the
machine, showing the state number and the tape contents, with the current symbol
displayed between asterisks.
;; configuration-> string Return a string showing the tape
(define (configuration->string config)
(let* ([state-number (get-current-state config)]
[state-string (string-append "q" (number->string state-number))]
[left-tape (list->string (get-left-tape-list config))]
[current (string #\* (get-current-symbol config) #\*)] ;; wrap *'s
[right-tape (list->string (get-right-tape-list config))])
Besides the prior routine, the implementation has other code to do dull things
such as reading the lines from the file and converting them to instruction lists.
(define (current-state-string->number s)
(if (eq? #$ (string-ref s 0)) ;; allow instr to start with (
(string->number (substring s 1))
(string->number s)))
(define (current-symbol-string->char s)
(string-ref s 0))
(define (action-symbol-string->char s)
(string-ref s 0))
(define (next-state-string->number s)
(if (eq? #$ (string-ref s (- (string-length s) 1))) ;; ends with )?
(string->number (substring s 0 (- (string-length s) 1)))
(string->number s)))
(define (string->instruction s)
(let* ([instruction (string-split (string-trim s))]
[current-state (current-state-string->number (first instruction))]
[current-symbol (current-symbol-string->char (second instruction))]
[action (action-symbol-string->char (third instruction))]
[next-state (next-state-string->number (fourth instruction))])
(list current-state
current-symbol
action
next-state)))
And, there is a bit more code for getting the file name from the command line, etc.,
that does not bear at all on simulating a Turing machine so we will leave it aside.
Below is a run of the simulator, with its command line invocation. It takes
input from the file pred.tm shown earlier. When the machine starts the input is
111, with a current symbol of 1 and the tape to the right of 11 (the tape to the left
is empty).
$ ./turing-machine.rkt -f machines/pred.tm -c "1" -r "11"
step 0: q0: *1*11
step 1: q0: 1*1*1
step 2: q0: 11*1*
step 3: q0: 111*B*
step 4: q1: 11*1*B
step 5: q1: 11*B*B
step 6: q2: 1*1*BB
step 7: q2: *1*1BB
step 8: q2: *B*11BB
step 9: q3: B*1*1BB
step 10: HALT
The output is crude but good enough for small scale experiments.
I.A Exercises
A.1 Run the simulator on the predecessor machine Ppred starting with five 1’s.
Also run it on an empty tape.
A.2 Run the simulator on Example 1.2’s Padd to add 1 + 2. Also simulate 0 + 2
and 0 + 0.
A.3 Write a Turing machine to perform the operation of adding 3, so that given
as input a tape containing only a string of n consecutive 1’s, it returns a tape with
a string of n + 3 consecutive 1’s. Follow our convention that when the program
starts and ends the head is under the first 1. Run it on the simulator, with an
input of 4 consecutive 1’s, and also with an empty tape.
Extra B. Hardware 41
A.4 Write a machine to decide if the input contains the substring 010. Fix
Σ = { 0, 1, B }. The machine starts with the tape blank except for a contiguous
string of 0’s and 1’s, and with the head under the first non-blank symbol. When
it finishes, the tape will have either just a 1 if the input contained the desired
substring, or otherwise just a 0. We will do this in stages, building a few of what
amounts to subroutines.
(a) Write instructions, starting in state q 10 , so that if initially the machine’s head
is under the first of a sequence of non-blank entries then at the end the head
will be to the right of the final such entry.
(b) Write a sequence of instructions, starting in state q 20 , so that if initially the
head is just to the right of a sequence of non-blank entries, then at the end
all entries are blank.
(c) Write the full machine, including linking in the prior items.
A.5 Modify the simulator so that it can run for a limited number of steps.
(a) Modify it so that, given k ∈ N, if the Turing machine hasn’t halted after k
steps then the simulator stops.
(b) Do the same, but replace k with a function (k n) where n is the input
number (assume the machine’s input is a string of 1’s).
Extra
I.B Hardware
Following Turing, we’ve gone through a development that starts by considering

general physical computing devices and ends at transition tables. What about the
converse; given a finite transition table, is there a physical implementation with
that behavior?
Put another way, in programming languages there are built-in mathematical
operators that are constructed from other, simpler, mathematical operators. For
instance, sin(x) may be calculated via its Taylor polynomial from addition and
multiplication. But how do the simplest operators work?
We will show how to get any desired behavior. For this, we will work with
machines that take finite binary sequences, bitstrings, as inputs and outputs.
The easiest approach is via propositional logic. A proposition is a statement
that has a Boolean value, either T or F . For instance, ‘7 is odd’ and ‘8 is prime’
are propositions, with values T and F . (In contrast, ‘x is a perfect square’ is not a
proposition because for some x it is T while for others it is not.)
We often combine propositions. We might conjoin two by saying, ‘5 is prime
and 7 is prime’, or we might negate with ‘it is not the case that 8 is prime’.
These truth tables define the behavior of the logical operators not (sometimes
called negation), and (or conjunction), and or (or disjunction). The first is a unary,
or one input, operator while the others are binary operators. The tables write F
as 0 and T as 1, as is the convention in electrical engineering. In an electronic
computer, these would stand for different voltage levels. For both tables, inputs
are on the left while outputs are on the right.
not P P and Q P or Q
P ¬P P Q P ∧Q P ∨Q
0 1 0 0 0 0
1 0 0 1 0 1
1 0 0 1
1 1 1 1
Thus, where ‘7 is odd’ is P , and ‘8 is prime’ is Q , get the value of the conjunction ‘7
is odd and 8 is prime’ from the third line of the right-hand table: 0. The value of
the disjunction ‘7 is odd or 8 is prime’ is 1.
Truth tables help us work out the behavior of complex propositional logic state-
ments, by building them up from their components. This shows the input/output
behavior of the statement (P ∨ Q) ∧ ¬(P ∨ (R ∧ Q)).
P Q R P ∨Q R ∧Q P ∨ (R ∧ Q) ¬(P ∨ (R ∧ Q)) statement

0 0 0 0 0 0 1 0
0 0 1 0 0 0 1 0
0 1 0 0 0 0 1 0
0 1 1 0 0 0 1 0
1 0 0 0 0 0 1 0
1 0 1 0 0 0 1 0
1 1 0 0 0 0 1 0
1 1 1 0 0 0 1 0
There are operators other than ‘not’, ‘and’, and ‘or’ but an advantage of the set of
these three is that with them we can reverse the activity of the prior paragraph: we
can go from a table to a statement with that table. That is, we can go from from a
specified input-output behavior to a statement with that behavior.
Below are two examples. To make a statement with the behavior shown in the
table on the left, focus on the row with output 1. It is the row where P is false and
Q is false. Therefore the statement ¬P ∧ ¬Q makes this row take on value 1 and
every other row take on value 0.
P Q P Q R
0 0 1 0 0 0 0
0 1 0 0 0 1 1
1 0 0 0 1 0 1
1 1 0 0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
Extra B. Hardware 43
Next consider the table on the right and again focus on the rows with 1’s. Target
the second row with ¬P ∧ ¬Q ∧ R . For the third row use ¬P ∧ Q ∧ ¬R and for the
fifth row use P ∧ ¬Q ∧ ¬R . To finish, put these parts together with ∨’s to get the
overall statement.
(¬P ∧ ¬Q ∧ R) ∨ (¬P ∧ Q ∧ ¬R) ∨ (P ∧ ¬Q ∧ ¬R) (∗)
Thus, we can produce statements with any desired behavior. Statements of this
form, clauses connected by ∨’s, where inside each clause the statement is built
from ∧’s, are in disjunctive normal form. (Also commonly used is conjunctive
normal form, where statements consist of clauses connected by ∧’s and each clause
uses only ∨’s as binary connectives.)
Now we translate those statements into physical devices. We can
use electronic devices, called gates, that perform logical operations on
signals (for this discussion we will take a signal to be the presence of
5 volts). The observation that you can use this form of a propositional
logic statement to systematically design logic circuits was made by
C Shannon in his master’s thesis. On the left below is the schematic
symbol for an and gate with two input wires and one output wire,
whose behavior is that a signal only appears on the output if there Claude Shannon
is a signal on both inputs. Symbolized in the middle is an or gate, 1916–2001
where there is signal out if either input has a signal. On the right is a
not gate.
A schematic of a circuit that implements statement (∗), given below, shows

three input signals on the three wires at left. For instance, to implement the first
clause, the top and gate is fed the not P , the not Q , and the R signals. The second
and third clauses are implemented in the other two and gates. Then the output of
the and gates goes through the or gate.
PQR
Clearly by following this procedure we can in principle build a physical device

with any desired input/output behavior. In particular, we can build a Turing
machine in this way.
We will close with an aside. A person can wonder how these gates are
constructed internally, and in particular can wonder how a not gate is possible;
isn’t having voltage out when there is no voltage in creating energy out of nothing?
The answer is that the descriptions above abstract out that issue. Here is the
internal construction of a kind of not gate.
G 5 volts
Vout
Vin S
On the right is a battery, which we shall see provides the extra voltage. On the top
left, shown as a wiggle, is a resistor. When current is flowing around the circuit,
this resistor regulates the power output from the battery.
On the bottom left, shown with the circle, is a transistor. This is a semiconductor,
with the property that if there is enough voltage between G and S then this
component allows current from the battery to flow through the D-S line. (Because
it is sometimes open and sometimes closed it is depicted as a switch, although
internally it has no moving parts.) This transistor is manufactured such that an
input voltage Vin of 5 volts will trigger this event.
To verify that this circuit inverts the signal, assume first that Vin = 0. Then
there is is a gap between D and S so no current flows. With no current the resistor
provides no voltage drop. Consequently the output voltage Vout across the gap is
all of the voltage supplied by the battery, 5 volts. So Vin = 0 results in Vout = 5.
Conversely, now assume that Vin = 5. Then the gap disappears, the current
flows between D and S, the resistor drops the voltage, and the output is Vout = 0.
Thus, for this device the voltage out Vout is the opposite of the voltage in Vin .
And, when Vin = 0 the output voltage of 5 doesn’t come from nowhere; it is from
the battery.
I.B Exercises
B.1 Make a truth table for each of these propositions. (a) (P ∧Q)∧R (b) P ∧(Q∧R)
(c) P ∧ (Q ∨ R) (d) (P ∧ Q) ∨ (P ∧ R)
B.2 Make a truth table for these. (a) ¬(P ∨ Q) (b) ¬P ∧ ¬Q (c) ¬(P ∧ Q)
(d) ¬P ∨ ¬Q
B.3 (a) Make a three-input table for the behavior: the output is 1 if a majority of
the inputs are 1’s. (b) Draw the circuit.
B.4 For the table below, construct a disjunctive normal form propositional logic
statement and use that to make a circuit.
Extra C. Game of Life 45
P Q
0 0 0
0 1 1
1 0 1
1 1 0
B.5 For the tables below, construct a disjunctive normal form propositional
logic statement and use that to make a circuit. (a) the table on the left,
(b) the one on the right.
P Q R P Q R
0 0 0 0 0 0 0 1
0 0 1 1 0 0 1 0
0 1 0 1 0 1 0 1
0 1 1 0 0 1 1 0
1 0 0 0 1 0 0 0
1 0 1 1 1 0 1 0
1 1 0 0 1 1 0 1
1 1 1 0 1 1 1 1
B.6 One propositional logic operator that was not covered in the description is
Exclusive Or XOR. It is defined by: P XOR Q is T if P , Q , and is F otherwise.
Make a truth table, from it construct a disjunctive normal form propositional logic
statement, and use that to make a circuit.
B.7 Construct a circuit with the behavior specified in the tables below: (a) the
table on the left, (b) the one on the right.
P Q R P Q R
0 0 0 1 0 0 0 1
0 0 1 0 0 0 1 0
0 1 0 0 0 1 0 1
0 1 1 0 0 1 1 0
1 0 0 0 1 0 0 0
1 0 1 1 1 0 1 1
1 1 0 0 1 1 0 1
1 1 1 0 1 1 1 0
B.8 The most natural way to add two binary numbers works like the grade school
addition algorithm. Start at the right with the one’s column. Add those two and
possibly carry a 1 to the next column. Then add down the next column, including
any carry. Repeat this from right to left.
(a) Use this method to add the two binary numbers 1011 and 1101.
(b) Make a truth table giving the desired behavior in adding the numbers in a
column. It must have three inputs because of the possibility of a carry. It
must also have two output columns, one for the total and one for the carry.
(c) Draw the circuits.
Extra
I.C Game of Life
John von Neumann was one of the twentieth century’s most influ-
ential mathematicians. One of the many things he studied was the
problem of humans on Mars. He thought that to colonize Mars we
should first send robots. Mars is red because it is full of iron oxide,
rust. Robots could mine that rust, break it into iron and oxygen,
and release the oxygen into the atmosphere. With all the iron, the
robots could make more robots. So von Neumann was thinking
about making machines that could self-reproduce. (We will study
more about self-reproduction later.)
His thinking, along with a suggestion from S Ulam, who was
John von Neumann
studying crystal growth, led him to use a cell-based approach. So 1903-1957
von Neumann laid out some computational devices in a grid of
interconnections, making a cellular automaton.
Interest in cellular automata greatly increased with the appearance of the Game
of Life, by J Conway. It was featured in an October 1970 magazine column of
Scientific American. The rules of the game are simple enough that a person could
immediately take out a pencil and start experimenting. Lots of people did. When
personal computers came out, Life became one of the earliest computer crazes,
since it is easy for a beginner to program.
To start, draw a two-dimensional grid of square cells, like graph
paper or a chess board. The game proceeds in stages, or generations.
At each generation each cell is either alive or dead. Each cell has eight
neighbors, the ones that are horizontally, vertically, or diagonally
adjacent. The state of a cell in the next generation is determined by:
(i) a live cell with two or three live neighbors will again be live at
the next generation but any other live cell dies, (ii) a dead cell with
John Conway 1937–
exactly three live neighbors becomes alive at the next generation but
2020
other dead cells stay dead. (The backstory goes that live cells will
die if they are either isolated or overcrowded, while if the environment is just right
then life can spread.) To begin, we seed the board with some initial pattern.
As Gardner noted, the rules of the game balance tedious simplicity against
impenetrable complexity.
The basic idea is to start with a simple configuration of counters (organisms), one to a
cell, then observe how it changes as you apply Conway’s “genetic laws” for births, deaths,
and survivals. Conway chose his rules carefully, after a long period of experimentation,
to meet three desiderata:
1. There should be no initial pattern for which there is a simple proof that the
population can grow without limit.
2. There should be initial patterns that apparently do grow without limit.
3. There should be simple initial patterns that grow and change for a considerable
period of time before coming to end in three possible ways: fading away completely
Extra C. Game of Life 47
(from overcrowding or becoming too sparse), settling into a stable configuration

that remains unchanged thereafter, or entering an oscillating phase in which they
repeat an endless cycle of two or more periods.
In brief, the rules should be such as to make the behavior of the population unpredictable.
The result is, as Conway says, a “zero-player game.” It is a mathematical
recreation in which patterns evolve in fascinating ways.
Many starting patterns do not result in any interesting behavior at all. The
simplest nontrivial pattern, a single cell, immediately dies.
Generation 0 Generation 1
The pictures show the part of the game board containing the cells that are alive.
Two generations suffice to show everything that happens, which isn’t much.
Some other patterns don’t die, but don’t do much of anything, either. This is a
block. It is stable from generation to generation.
Because it doesn’t change, Conway calls this a “still life.” Another still life is the
beehive.
But many patterns are not still. This three-cell pattern, the blinker, does a
simple oscillation.
Generation 0 Generation 1 Generation 2
Other patterns move. This is a glider, the most famous pattern in Life.
It moves one cell vertically and one horizontally every four generations, crawling
across the screen.
C.1 Animation: Gliding, left and right.
When Conway came up with the Life rules, he was not sure whether there is a
pattern where the total number of live cells keeps on growing. Bill Gosper showed
that there is, by building the glider gun which produces a new glider every thirty
generations.
The glider pattern an example of a spaceship, a pattern that reappears, displaced,
after a number of generations. Here is another, the medium weight spaceship.
It also crawls across the screen.
C.2 Animation: Moving across space.
Another important pattern is the eater, which eats gliders and other spaceships.
Here it eats a medium weight spaceship.

Extra D. Ackermann’s function is not primitive recursive 49
C.3 Animation: Eating a spaceship.
How powerful is the game, as a computational system? Although it is beyond

our scope, you can build Turing machines in the game and so it is able to compute
anything that can be mechanically computed (Rendell 2011).
I.C Exercises
C.4 A methuselah is a small pattern that stabilizes only after a long time. This
pattern is a rabbit. How long does it take to stabilize?
C.5 How many 3 × 3 blocks are there? 4 × 4? Write a program that inputs a
dimension n and returns the number of n × n blocks.
C.6 How many of the 3 × 3 patterns will result in any cells on the board that
survive into the next generation? That survive ten generations?
C.7 Write code that takes in a number of rows n , a number of columns m and a
number of generations i , and returns how many of the n × m patterns will result
in any surviving cells after i generations.
Extra
I.D Ackermann’s function is not primitive recursive
We have see that the hyperoperation, whose definition is repeated below, is the
natural generalization of successor, addition, multiplication, etc.


y+1 – if n = 0
– if n = 1 and y = 0

x




H(n, x, y) = 0 – if n = 2 and y = 0
1 – if n > 2 and y = 0





 H(n − 1, x, H(n, x, y − 1)) – otherwise


We have quoted a result that H, while intuitively mechanically computable,
is not primitive recursive. The details of the proof are awkward. For technical
convenience we will instead show that a closely related function, which is also
intuitively mechanically computable, is not primitive recursive.
In H’s ‘otherwise’ line, while the level is n and the recursion is on y ,

the variable x does not play an active role. R Péter noted this and got
a function with a simpler definition, lowering the number of variables
by one, by considering H(n, y, y). That, and tweaking the initial value
of each level, gives this.
y + 1 – if k = 0




A(k, y) = A(k − 1, 1) – if y = 0 and k > 0
 A(k − 1, A(k, y − 1)) – otherwise


 Rózsa Péter 1905–
Any function based on the recursion in the bottom line is called an 1977
Ackermann function.† We will prove that A is not primitive recursive.
Since the new function has only two variables we can show a table.
y=0 y=1 y=2 y=3 y=4 y=5
k =0 1 2 3 4 5 6 ...
k =1 2 3 4 5 6 7 ...
k =2 3 5 7 9 11 13 ...
k =3 5 13 29 61 125 253 ...
k =4 13 65 533 ...
The next two entries give a sense of the growth rate of this function.
65536
A(4, 2) = 265536 − 3 A(4, 3) = 2(2 )
−3
Those are big numbers.
D.1 Lemma (a) A(k, y) > y
(b) A(k, y + 1) > A(k, y), and in general if ŷ > y then A(k, ŷ) > A(k, y)
(c) A(k + 1, y) ≥ A(k, y + 1)
(d) A(k, y) > k
(e) A(k + 1, y) > A(k, y) and in general if k̂ > k then A(k̂, y) > A(k, y)
(f) A(k + 2, y) > A(k, 2y)
Proof We will verify the first item here and leave the the others as exercises. They
all proceed the same way, with an induction inside of an induction.
This is the first item. We will prove it by induction on k .
(∗)

∀k ∀y A(k, y) > y
The k = 0 base step is A(0, y) = y + 1 > y . For the inductive step, assume that
statement (∗) holds for k = 0, . . ., k = n and consider the k = n + 1 case.
We must verify this statement,
∀y A(n + 1, y) > y (∗∗)

†
There are many different Ackermann functions in the literature. A common one is the function of one
variable A(k, k ).
Extra D. Ackermann’s function is not primitive recursive 51
which we will do by induction on y . In the y = 0 base step of this inside induction,

the definition gives A(n + 1, 0) = A(n, 1) and by the inductive hypothesis that
statement (∗) is true when k = n we have that A(n, 1) > 1 > y = 0.
Finally, in the inductive step of the inside induction, assume that statement (∗∗)
holds for y = 0, . . ., y = m and consider y = m + 1. The definition gives
A(n+ 1, m+ 1) = A(n, A(n+ 1, m)). By (∗∗)’s inductive hypothesis, A(n+ 1, m) > m .
By (∗)’s inductive hypothesis, when A(n, A(n+ 1, m)) has a second argument greater
than m then it’s result is greater than m , as required.
We will abbreviate the function input list x 0 , ... , x n−1 by the vector x®. And
we will write the maximum of the vector max(® x) to mean the maximum of its
components max({x 0 , ... , x n−1 }).
D.2 Definition A function s is level k , where k ∈ N, if A(k, max(® x) for all x®.
x)) > s(®
By Lemma D.1.e, if a function is level k then it is also level k̂ for any k̂ > k .
D.3 Lemma If for some k ∈ N each function д0 , ... , дm−1 , h is level k , and if the
function f is obtained by composition as f (®
x) = h(д0 (® x)), then f is
x), ... , дm−1 (®
level k + 2.
Proof Apply Lemma D.1’s item c, and then the definition of A.
A(k + 2, max(®
x)) ≥ A(k + 1, max(®
x) + 1) = A(k, A(k + 1, max(®
x))) (∗)
Focusing on the second argument of the right-hand expression, use Lemma D.1.e
and the assumption that each function д0 , ... , дm−1 is level k to get that for each
index i ∈ { 1, ... , m − 1 } we have A(k + 1, max(® x)) > A(k, max(® x). Thus
x)) > дi (®
A(k + 1, max(® x)) > max({ д1 (® x) }).
x), ... , дm−1 (®
Lemma D.1.b says that A is monotone in the second argument, so returning to
equation (∗) and swapping out A(k + 1, max(® x)) gives the first inequality here
A(k + 2, max(®
x)) > A(k, max( {д1 (®
x), ... , дm−1 (®
x) } ))
> h(д0 (®
x), ... , дm−1 (®
x)) = f (®
x)
and the second holds because the function h is level k .
D.4 Lemma If for some k ∈ N the functions д and h are level k , and if the function f
is obtained by the schema of primitive recursion as
(
д(®
x) – if y = 0
f (®
x, y) =
h(f (® ® z) – if y = S (z)
x, z), x,
then f is level k + 3.
Proof Let n be such that f : Nn+1 → N, so that д : Nn → N and h : Nn+2 → N.

The core of the argument is to show that this statement holds.
∀k A(k, max(® ( ∗)

x) + y) > f (®
x, y)
We show this by induction on y . The y = 0 base step is that A(k, max(® x) + 0) =

A(k, max(® x)) is greater than f (®
x, 0) = д(®
x) because д is level k .
For the inductive step assume that (∗) holds for y = 0, . . ., y = z and
consider the y = z + 1 case. The definition is that A(k + 1, max(® x) + z + 1) =
A(k, A(k + 1, max(® x) + z)). The second argument A(k + 1, max(® x) + z) is larger
than max(® x) + z by Lemma D.1.a, and so is larger than any x i and larger than z ,
and is larger than f (®x, z) by the inductive hypothesis.
A(k + 1, max(®
x) + z) > max({ f (®
x, z), x 0 , ... , x n−1 , z })
Use Lemma D.1.b, monotonicity of A in the second argument, and the fact that h
is a level k function.
A(k + 1, max(®
x) + z + 1) = A(k, A(k + 1, max(®
x) + z))
> A(k, max({ f (® x, z), x 0 , ... , x n−1 , z }))
> h(f (® x, z + 1)
® z) = f (®
x, z), x,
That finishes the inductive verification of statement (∗).
To finish the argument, Lemma D.1.f gives that for all x 0 , ... , x n−1 , y
A(k + 3, max({x 0 , ... , y })) > A(k + 1, 2 · max({x 0 , ... , y }))
≥ A(k + 1, max(®
x) + y)
(the latter holds because 2 · max(® x, y) ≥ max(® x) + y and because of Lemma D.1.b).
In turn, by the first part of this proof, that is greater than f (®
x, y).
D.5 Theorem (Ackermann, 1925) For each primitive recursive function f there is a
number k ∈ N such that f is level k .
Proof The definition of primitive recursive functions Definition 3.6 specifies that
each f is built from a set of initial function by the operations of composition and
primitive recursion. With Lemma D.3 and Lemma D.4 we need only show that
each initial operation is of some level.
The zero function Z (x) = 0 is level 0 since A(0, x) = x + 1 > 0. The
successor function S (x) = x + 1 is level 1 since A(1, x) > A(0, x) = x + 1 by
Lemma D.1.e. Each projection function I i (x 0 , ... , x i ... , x n−1 ) = x i is level 0 since
A(0, max(® x)) = max(®x) + 1 is greater than max(® x), which is greater than or equal
to x i .
D.6 Corollary The function A is not primitive recursive.
Proof If A were primitive recursive then it would be of some level, k 0 , so
A(k 0 , max({x, y })) > A(x, y) for all x, y . Taking x and y to be k 0 gives a contra-
diction.
I.D Exercises
D.7 If expressed in base 10, how many digits are in A(4, 2) = 265536 − 3?
Extra E. LOOP programs 53
D.8 Show that for any k, y the evaluation of A(k, y) terminates.

D.9 Prove these parts of Lemma D.1. (a) Item b (b) Item c (c) Item d
(d) Item e (e) Item f
D.10 Verify each identity. (a) A(0, y) = y + 1 (b) A(1, y) = 2 + (y + 3) − 3
(c) A(2, y) = 2 ·(y + 3)− 3 (d) A(3, y) = 2y + 3 − 3 (e) A(4, y) = 2 ↑↑ (n + 3)− 3
In the last one, the up-arrow notation (due to D Knuth) means that there is a
power tower containing n + 3 many 2’s. Recall that powers do not associate, so
2
2(2 ) , (22 )2 ; the notation means the first type of association, from the top down.
D.11 The prior exercise shows that at least the initial levels of A are primitive
recursive. In fact, all levels are. But how does that work: all the parts of A are
primitive recursive but as a whole it is not?
D.12 A(k + 1, x) = A(k, A(k, ... A(k, 1) ...)) where there are x + 1-many A’s.
D.13 Prove that A(k, y) = H(k, 2, n + 3) − 3. Conclude that H is not primitive
recursive.
Extra
I.E LOOP programs
Compared to general recursive functions, primitive recursive functions have the

advantage that their computational behavior is easy to analyze. We will support this
contention by giving a programming language that computes primitive recursive
functions.
The most familiar looping constructs are for and while. The difference is that
a while loop can go an unbounded number of times, but in a for loop you know
in advance the number of times that the code will pass through the loop.
E.1 Theorem (Meyer and Ritchie, 1967) A function is primitive recursive if and
only if it can be computed without using unbounded loops. More precisely, we
can compute in advance, using only primitive recursive functions, how many
iterations will occur.
We will show this by computing primitive recursive functions in
a language that lacks unbounded loops. Programs in this language
execute on a machine model that has registers r0, r1, . . . , which hold
natural numbers.
A LOOP program is a sequence of instructions, of four
kinds: (i) x = 0 sets the contents of the register named x to zero,
(ii) x = x + 1 increments the contents of register x, (iii) x = y copies
the contents of register y into register x, leaving y unchanged, and Dennis Ritchie
1941–2011, in-
(iv) loop x ... end.
ventor of C
For the last, the dots the middle are replaced by a sequence of any
of the four kinds of statements. In particular, it might contain a nested loop. The
semantics are that the instructions of the inside program are executed repeatedly,
with the number of repetitions given by the natural number in register x.

Running the program below results in the register r0 getting the value of 6
(the indenting is only for visual clarity).
r1 = 0
r1 = r1 + 1
r1 = r1 + 1
r2 = r1
r2 = r2 + 1
r0 = 0
loop r1
loop r2
r0 = r0 + 1
end
end
Very important: in loop x ... end, changes in the contents of register x while
the inside code is run do not alter the number of times that the machine steps
through that loop. Thus, when this loop ends the value in r0 will be twice what it
was at the start.
loop r0
r0 = r0 + 1
end
We want to interpret LOOP programs as computing functions so we need a

convention for input and output. Where the function takes n inputs, we will start
the program after loading the inputs into the registers numbered 0 through n − 1.
And where the function has m outputs, we take the values to be the integers that
remain in the registers numbered 0 through m − 1 when the program has finished.
For example, this LOOP program computes the two-input one-output function
proper subtraction f (x, y) = x −. y .
loop r1
r0 = 0
loop r0
r1 = r0
r0 = r0 + 1
end
end
That is, if we load x into r0 and y into r1, and run the above routine, then the
output x −. y will be in r0.
To show that for each primitive recursive function there is a LOOP program,
we can show how to compute each initial function, and how to do the combining
operations of function composition and primitive recursion.
The zero function Z (x) = 0 is computed by the LOOP program whose single
line is r0 = 0. The successor function S (x) = x + 1 is computed by the one-line
r0 = r0 + 1. Projection I i (x 0 , ... x i , ... x n−1 ) = x i is computed by r0 = ri .
The composition of two functions is easy. Suppose that д(x 0 , ... x n ) and
f (y0 , ... ym ) are computed by LOOP programs Pд and P f , and that д is an m -output
function so that the composition f ◦ д is defined. Then concatenating, so that the
instructions of Pд are followed by the instructions of P f , gives the LOOP program

for f ◦ д, since it uses the output of д as input to compute the action of f .
General composition starts with
f (x 0 , ... x n ), h 0 (y0, 0 , ... y0,m0 ), ... and hn (yn, 0 , ... yn,mn )
and produces f (h 0 (y0, 0 , ... y0,m0 ), ... hn (yn, 0 , ... yn,mn )). The issue is that were we
to load the sequence of inputs y0, 0 , . . . into r0, . . . and start computing h 0 then,
for one thing, there is a danger that it could overwrite the inputs for h 1 . So we
must do some machine language-like register manipulations to shuttle data in and
out as needed.
Specifically, let P f , Ph0 , . . . Phn compute the functions. Each uses a limited
number of registers so there is an index j large enough that no program uses
register j . By definition, the LOOP program P to compute the composition will
be given the sequence of inputs starting in register 0. The first step is to copy
these inputs to start in register j . Next, zero out the registers below register j ,
copy h 0 ’s arguments down to begin at r0 and run Ph0 . When it finishes, copy its
output above the final register holding the inputs (that is, to the register numbered
(m 0 + 1) + · · · (mn + 1)). Repeat for the rest of the hi ’s. Finish by copying the
outputs down to the initial registers, zeroing out the remaining registers, and
running P f .
The other combiner operation is primitive recursion.
(
д(x 0 , ... x k −1 ) – if y = 0
f (x 0 , ... x k −1 , y) =
h(f (x 0 , ... x k −1 , z), x 0 , ... x k −1 , z) – if y = S (z)
Suppose that we have LOOP programs Pд and Ph . The register swapping needed is
similar to what happens for composition so we won’t discuss it. The program P f
starts by running Pд . Then it sets a fresh register to 0; call that register t. Now it
enters a loop based on the register y (that is, successive times through the loop
count down as y , y − 1, etc.). The body of the loop computes f (x 0 , ... x k −1 , t + 1) =
h(f (x 0 , ... x k −1 , t), x 0 , ... x k −1 , t) by running Ph and then it increments t.
Thus if a function is primitive recursive then it is computed by a LOOP program.
The converse holds also, but proving it is beyond our scope.
We have an interpreter for the LOOP language with two interesting aspects.
The first is that we change the syntax, replacing the C-looking syntax above with a
LISP-ish one. For instance, we swap the syntax on the left for that on the right.
r1 = r1 + 1 ((incr r1) (loop r1 (incr r0)))

loop r1
r0 = r0 + 1
end
The advantage of this switch is that the parentheses automatically match the
beginning of a loop with the matching end and so the interpreter that we write
will not need a stack to keep track of loop nesting.
This interpreter has registers r0, r1, . . . , that hold natural numbers. We keep
track of them in a list of pairs.
;; A register is a pair (name:symbol contents:natural number)
(define REGLIST '())
(define (show-regs) ; debugging
(write REGLIST) (newline))
(define (clear-regs!)
(set! REGLIST '()))
;; make-reg-name return the symbol giving the standard name of a register

(define (make-reg-name i)
(string->symbol (string-append "r" (number->string i))))
;; Getters and setters for the list of registers

;; set-reg-value! Set the value of an existing register or initialize a new one
(define (set-reg-value! r v)
(set! REGLIST (alist-update! r v REGLIST equal?)))
;; get-reg Return pair whose car is given r; if no such reg, return (r . 0)
(define (get-reg r)
(let ((val (assoc r REGLIST)))
(if val
val
(begin
(set-reg-value! r 0)
(cons r 0)))))
(define (get-reg-value r)
There are an unlimited number of registers; when set-reg-value! is asked to act

on a register that is not on the list, it puts it on the list.
Besides the initialization done by set-reg-value!, two of the remaining three
LOOP operations are straightforward.
;; increment-reg! Increment the register
(define (increment-reg! r)
(set-reg-value! r (+ 1 (get-reg-value r))))
;; copy-reg! Copy value from r0 to r1, leave r0 unchanged
(define (copy-reg! r0 r1)
(set-reg-value! r1 (get-reg-value r0)))
;; Implement each operation

(define (intr-zero pars)
(set-reg-value! (car pars) 0))
(define (intr-incr pars)
(increment-reg! (car pars)))
(define (intr-copy pars)
(set-reg-value! (car pars) (get-reg-value (cadr pars))))
The last LOOP operation is loop itself. Such an instruction can have inside it
the body of an entire LOOP program.
(define (intr-loop pars)
(letrec ((reps (get-reg-value (car pars)))
(body (cdr pars))
(iter (lambda (rep)
(cond
((equal? rep 0) '())
(else (intr-body body)
(iter (- rep 1)))))))

(iter reps)))
;; intr-body Interpret the body of loop programs

(define (intr-body body)
(cond
((null? body) '())
(else (let ((next-inst (car body))
(tail (cdr body)))
(let ((key (car next-inst))
(pars (cdr next-inst)))
(cond
((eq? key 'zero) (intr-zero pars))
((eq? key 'incr) (intr-incr pars))
((eq? key 'copy) (intr-copy pars))
((eq? key 'loop) (intr-loop pars))))
(intr-body tail)))))
Finally, there is the code to interpret a program, including initializing the the
registers so we can view the input-output behavior as computing a function.
;; The data is a list of the values to put in registers r0 r1 r2 ..
;; Value of a program is the value remaining in r0 at end.
(define (interpret progr data)
(init-regs data)
(intr-body progr)
(get-reg-value (make-reg-name 0)))
;; init-regs Initialize the registers r0, r1, r2, .. to the values in data
(define (init-regs data)
(define (init-regs-helper i data)
(if (null? data)
'()
(begin
(set-reg-value! (make-reg-name i) (car data))
(init-regs-helper (+ i 1) (cdr data)))))
(clear-regs!)
(set-reg-value! (make-reg-name 0) 0)
(init-regs-helper 0 data))
As given, this prints only the value of r0, which is all we shall need here.
Here is a sample usage. The LOOP program, in LISP syntax, is pe.
#;1> (load "loop.scm")
#;2> (define pe '((incr r0) (incr r0) (loop r0 (incr r0))))
#;3> (interpret pe '(5))
14
With an initial value of 5, after being incremented twice then r0 will have a value
of 7. So the loop runs seven times, each time incrementing r0, resulting in an
output value of 14.
We can now make an interpreter for the C-like syntax shown earlier. We first do
some bookkeeping such as splitting the program into lines and dropping comments.
Then we convert the instructions as a purely string operation. Thus r0 = 0 becomes
(zero r0). Similarly, r0 = r0 + 1 becomes (incr r0) and r0 = r1 becomes
(copy r0 r1). Finally, loop r0 becomes (loop r0 (note the missing closing
paren), and end becomes ).
Here is the second interesting thing about the interpreter. Now that the C-like
syntax has been converted to a string in LISP-like syntax, we just need to interpret
the string as a list. We write it to a file and then load that file. That is, unlike in
many programming languages, in Scheme we can create code on the fly.
Here is an example of running the interpreter. The program in C-like syntax is
this.
r1 = r1 + 1
r1 = r1 + 1
loop r1
r0 = r0 + 1
end
And here we run that in the Scheme interpreter.

#;4> (define p "r1 = r1 + 1\nr1 = r1 + 1\nloop r1\nr0 = r0 + 1\nend")
#;5> (loop-without-parens p '())
; loading fn.scm ...
2
I.E Exercises
E.2 Write a LOOP program that triples its input.
E.3 Write a LOOP program that adds two inputs.
E.4 Modify the interpreter to allow statements like r0 = r0 + 2.
E.5 Modify the interpreter to allow statements like r0 = 1.
E.6 Modify the definition of interpret so that it takes one more argument, a
natural number m , and returns the contents of the first m registers.
Chapter
II Background
The first chapter began by saying that we are more interested in the things
that can be computed than in the details of how they are computed. In particular,
we want to understand the set of functions that are effective, that are intuitively
mechanically computable, which we formally defined as computable by a Turing
machine. The major result of this chapter and the single most important result in
the book is that there are functions that are uncomputable — there is no Turing
machine to compute them. There are jobs that no machine can do.
Section
II.1 Infinity
We will show that there are more functions than Turing machines, and that
therefore there are some functions with no associated machine.
Cardinality The set of functions and the set of Turing machines are both
infinite. We will begin with two paradoxes that dramatize the challenge
to our intuition posed by comparing the sizes of infinite sets. We will then
produce the mathematics to resolve these puzzles and apply it to the sets
of functions and Turing machines.
The first is Galileo’s Paradox. It compares the size of the set of perfect
squares with the size of the set of natural numbers. The first is a proper
subset of the second. However, the figure below shows that the two sets
Galileo Galilei
can be made to correspond, to match element-to-element, so in this sense
1564–1642
there are exactly as many squares as there are natural numbers.
0 1 2 3 4 5 6 7 8 9 10 11 ...
0 1 4 9 16 25 36 49 64 81 100 121 ...
1.1 Animation: Correspondence n ↔ n 2 between the natural numbers and the squares.
The second paradox of infinity is Aristotle’s Paradox. On the left below are two
circles, one with a smaller radius. If we roll them through one revolution then the
trail left by the smaller one is shorter. However, if we put the smaller inside the
larger and roll them, as in a train wheel, then they leave equal-length trails.
Image: This is the Hubble Deep Field image. It came from pointing the Hubble telescope to the darkest
part of the sky, the very background, and soaking up light for eleven days. It covers an area of the sky
about the same width as that of a dime viewed seventy five feet away. Every speck is a galaxy. There
are thousand of them — there is a lot in the background. Robert Williams and the Hubble Deep Field
Team (STScI) and NASA. (Also see the Deep Field movie.)
62 Chapter II. Background
1.2 Animation: Circles of different radii 1.3 Animation: Embedded circles

have different circumferences. rolling together.
As with Galileo’s Paradox, the puzzle is that we might think of the set of points
on the circumference of larger circle as being a bigger set. But the right idea is
that the two sets have the same number of elements in that they correspond —
point-for-point, the circumference of the smaller matches the circumference of the
larger.
The animations below illustrate matching the points in two ways. The first
shows them as nested circles, with points on the inside matching points on the
outside. The second straightens that out so that the circumferences make segments
and then for every point on the top there is a matching point on the bottom.
1.4 Animation: Corresponding points on the circumferences x · (2πr 0 ) ↔ x · (2πr 1 ).
Recall that a correspondence is a function that is both one-to-one and onto.

A function f : D → C is one-to-one if f (x 0 ) = f (x 1 ) implies that x 0 = x 1 for
x 0 , x 1 ∈ D . It is onto if for any y ∈ C there is an x ∈ D such that y = f (x). Below,
the left map is one-to-one but not onto because there is a codomain element with
no associated domain element. The right map is onto but not one-to-one because
two domain elements map to the same codomain output.
1.5 Lemma For any function with a finite domain, the number of elements in that
domain is greater than or equal to the number of elements in the range. If such a
function is one-to-one then its domain has the same number of elements as its
range. If it is not one-to-one then its domain has more elements than its range.
Consequently, two finite sets have the same number of elements if and only if they
correspond, that is, if and only if there is a function from one to the other that is
a correspondence.
Proof Exercise 1.48.

Section 1. Infinity 63
1.6 Lemma The relation between two sets S 0 and S 1 of ‘there is a correspondence
f : S 0 → S 1 ’ is an equivalence relation.
Proof Reflexivity is clear since a set corresponds to itself via the identity function.
For symmetry assume that there is a correspondence f : S 0 → S 1 and recall that
its inverse f −1 : S 1 → S 0 exists and is a correspondence in the other direction. For
transitivity assume that there are correspondences f : S 0 → S 1 and д : S 1 → S 2
and recall also that the composition д ◦ f : S 0 → S 2 is a correspondence.
We now give that relation a name. This definition extends Lemma 1.5’s
observation about same-sized sets from the finite to the infinite.
1.7 Definition Two sets have the same cardinality or are equinumerous, written
|S 0 | = |S 1 | , if there is a correspondence between them.
1.8 Example Stated in terms of the definition, Galileo’s Paradox is that the set
of perfect squares S = {n 2 n ∈ N } has the same cardinality as N because the
function f : N → S given by f (n) = n 2 is a correspondence. It is one-to-one
because if f (x 0 ) = f (x 1 ) then x 02 = x 12 and thus, since these are natural numbers,
x 0 = x 1 . It is onto because any element of the codomain y ∈ S is the square of
some n from the domain N by the definition of S , and so y = f (n).
1.9 Example Aristotle’s Paradox is that for r 0 , r 1 ∈ R+ , the interval [0 .. 2πr 0 ) has the
same cardinality as the interval [0 .. 2πr 1 ). The map д(x) = x · (2πr 1 /2πr 0 ) is a
correspondence; verification is Exercise 1.42.
1.10 Example The set of natural numbers greater than zero, N+ = { 1, 2, ... } has the
same cardinality as N. A correspondence is f : N → N+ given by n 7→ n + 1.
Comparing the sizes of sets, even infinite sets, in this way was
proposed by G Cantor in the 1870’s. As the paradoxes above dramatize,
Definition 1.7 introduces a deep idea. We should convince ourselves that
it captures what we mean by sets having the ‘same number’ of elements.
One supporting argument is that it is the natural generalization of the
finite case, Lemma 1.5. A second is Lemma 1.6, that it partitions sets into
classes so that inside of a class all sets have the same cardinality. That
is, it gives the ‘equi’ in equinumerous. The most important supporting
argument is that, as with Turing’s definition of his machine, Cantor’s
Georg Cantor definition is persuasive in itself. Gödel noted this, writing “Whatever
1845–1918 ‘number’ as applied to infinite sets may mean, we certainly want it to
have the property that the number of objects belonging to some class
does not change if, leaving the objects the same, one changes in any way . . . e.g.,
their colors or their distribution in space . . . From this, however, it follows at once
that two sets will have the same [cardinality] if their elements can be brought into
one-to-one correspondence, which is Cantor’s definition.”
1.11 Definition A set is finite if it is empty, or if it has the same cardinality as

{ 0, 1, ... n } for some n ∈ N. Otherwise the set is infinite.
For us, by far the most important infinite set is N.
1.12 Definition A set with the same cardinality as the natural numbers is countably
infinite. If a set is either finite or countably infinite then it is countable. A function
whose domain is the natural numbers enumerates, or is an enumeration of, its
range.
The idea behind the term ‘enumeration’ is that f : N → S lists the range
set: first f (0), then f (1), etc. The listing may have repeats, so that perhaps for
some n 0 , n 1 we have f (n 0 ) = f (n 1 ). As always, our main interest is the case of
functions that are computable. The phrase ‘a function whose domain is the natural
numbers’ implies that the function is total but in section 7 we will show how to use
computably partial functions in the place of computable total functions.

1.13 Example The set of multiples of three, 3N = { 3k k ∈ N }, is countable. The
natural map д : N → 3N is д(n) = 3n . Of course, this function is effective.
1.14 Example The set N − { 2, 5 } = { 0, 1, 3, 4, 6, 7, ... } is countable. The function
below, both defined and illustrated with a table, closes up the gaps.
– if n < 2
n


n 0 1 2 3 4 5 6 ...


f (n) = n + 1 – if n ∈ { 2, 3 }
f (n) 0 1 3 4 6 7 8 ...
n + 2 – if n ≥ 4



This function is clearly one-to-one and onto. It is also computable; we could write
a program whose input/output behavior is f .
1.15 Example The set of prime numbers P is countable. There is a function p : N → P
where p(n) is the n -th prime, so that p(0) = 2, p(1) = 3, etc. We won’t produce a
formula for this function but obviously we can write a program whose input/output
behavior is p , so it is a correspondence that is effective.
1.16 Example Fix the set of symbols Σ = { a, ... , z }. Consider the set of strings made
of those symbols, such as az, xyz, and abba. The set of all such strings, denoted
Σ∗ , is countable. This table illustrates the correspondence that we get by taking
the strings in ascending order of length.
n∈N 0 1 2 ... 26 27 28 ...

f (n) ∈ Σ∗ ε a b ... z aa ab ...
(The first entry is the empty string, ε = ‘ ’.) This correspondence is also effective.
1.17 Example The set of integers Z = { ... , −2, −1, 0, 1, 2, ... } is countable. The
natural correspondence, alternating between positive and negative numbers, is
also effective.
n∈N 0 1 2 3 4 5 6 ...
f (n) ∈ Z 0 +1 −1 +2 −2 +3 −3 ...
We have not given any non-computable functions because a goal of this chapter
is to show that such functions exist, and we are not there yet.
We close this section by circling back to the paradoxes of infinity that we began
with. In the prior example, the naive expectation is that the positives and the
negatives combined make Z twice as big as N. But this is the point of Galileo’s
Paradox; the right way to measure how many elements a set has is not through
superset and subset, the right way is through cardinality.
Finally, we will mention one more paradox, due to Zeno (circa 450 BC). He
imagines a tortoise challenging swift Achilles to a race, asking only for a head start.
Achilles laughs but the tortoise says that by the time Achilles reaches the spot x 0 of
the head start, the tortoise will have moved on to x 1 . On reaching x 1 , Achilles finds
that the tortoise has moved ahead to x 2 . At any x i , Achilles will always be behind
and so, the tortoise reasons, Achilles can never win. The heart of this argument is
that while the distances x i+1 − x i shrink toward zero, there is always further to go
because of the open-endedness at the left of the interval (0 .. ∞).
1.18 Figure: Zeno of Elea shows Youths the Doors to Truth and False, by covering half
the distance to the door, and then half of that, etc. (By either B Carducci (1560–
1608) or P Tibaldi (1527–1596).)
In this book we shall often leverage open-endedness, usually the open-endedness

of N at infinity. We have already seen it in Galileo’s Paradox.
II.1 Exercises

✓ 1.19 Verify Example 1.13, that the function д : N → { 3k k ∈ N } given by n 7→ 3n
is both one-to-one and onto.
1.20 A friend tells you, “The perfect squares and the perfect cubes have the same
number of elements because these sets are both one-to-one and onto.” Straighten
them out.
1.21 Let f , д : Z → Z be f (x) = 2x and д(x) = 2x − 1. Give a proof or a
counterexample for each. (a) If f one-to-one? Is it onto? (b) If д one-to-one?
Onto? (c) Are f and д inverse to each other?
✓ 1.22 Decide if each function is one-to-one, onto, both, or neither. You cannot
just answer ‘yes’ or ‘no’, you must justify the answer. (a) f : N → N given
by f (n) = n + 1 (b) f : Z → Z given by f (n) = n + 1 (c) f : N → N given
by f (n) = 2n (d) f : Z → Z given by f (n) = 2n (e) f : Z → N given by
f (n) = |n| .
1.23 Decide if each is a correspondence (you must also verify): (a) f : Q → Q
given by f (n) = n + 3 (b) f : Z → Q given by f (n) = n + 3 (c) f : Q → N
given by f (a/b) = |a · b| . Hint: this is a trick question.
1.24 Decide if each set finite or infinite and justify your answer. (a) { 1, 2, 3 }
(b) { 0, 1, 4, 9, 16, ... } (c) the set of prime numbers (d) the set of real roots of
x 5 − 5x 4 + 3x 2 + 7
1.25 Show that each pair of sets has the same cardinality by producing a one-to-one
and onto function from one to the other. You must verify that the function is a
correspondence. (a) { 0, 1, 2 }, { 3, 4, 5 } (b) Z, {i 3 i ∈ Z }

✓ 1.26 Show that each pair of sets has the same cardinality by producing a correspon-
dence (you must verify that the function is a correspondence): (a) { 0, 1, 3, 7 } and
{ π , π + 1, π + 2, π + 3 } (b) the even natural numbers and the perfect squares
(c) the real intervals (1 .. 4) and (−1 .. 1).
✓ 1.27 Verify that the function f (x) = 1/x is a correspondence between the subsets
(0 .. 1) and (1 .. ∞) of R.
1.28 Give a formula for a correspondence between the sets { 1, 2, 3, 4, ... } and
{ 7, 10, 13, 16 ... }.
✓ 1.29 Consider the set of characters C = { 0, 1, ... 9 } and the set of integers
A = { 48, 49, ... 57 }.
(a) Produce a correspondence f : C → A.
(b) Verify that the inverse f −1 : A → C is also a correspondence.
✓ 1.30 Show that each pair of sets have the same cardinality. You must give a
suitable function and also verify that it is one-to-one and onto.
(a) N and the set of even numbers
(b) N and the odd numbers
(c) the even numbers and the odd numbers
✓ 1.31 Although sometimes there is a correspondence that is natural, correspon-
dences need not be unique. Produce the natural correspondence from (0 .. 1) to
(0 .. 2), and then produce a different one, and then another different one.
1.32 Example 1.8 gives one correspondence between the natural numbers and the
perfect squares. Give another.
1.33 Fix c ∈ R such that c > 1. Show that f : R → (0 .. ∞) given by x 7→ c x is a
correspondence.
1.34 Show that the set of powers of two { 2k k ∈ N } and the set of powers of

three { 3k k ∈ N } have the same cardinality. Generalize.

1.35 For each give functions from N to itself. You must justify your claims. (a) Give
two examples of functions that are one-to-one but not onto. (b) Give two examples
of functions that are onto but not one-to-one. (c) Give two that are neither.
(d) Give two that are both.
1.36 Show that the intervals (3 .. 5) and (−1 .. 10) of real numbers have the same
cardinality by producing a correspondence. Then produce a second one.

1.37 Show that the sets have the same cardinality. (a) { 4k k ∈ N }, { 5k k ∈ N }
(b) { 0, 1, ... 99 }, {m ∈ N m 2 < 10 000 } (c) { 0, 1, 3, 6, 10, 15, ... }, N
✓ 1.38 Produce a correspondence between each pair of open intervals of reals.
(a) (0 .. 1), (0 .. 2)
(b) (0 .. 1), (a .. b) for real numbers a < b
(c) (0 .. ∞), (a .. ∞) for the real number a
(d) This shows a correspondence x 7→ f (x) between a finite interval of reals
and an infinite one, f : (0 .. 1) → (0 .. ∞).
P y=1
f (x)
x
The point P is at (−1, 1). Give a formula for f .
✓ 1.39
√ Not
every set involving irrational numbers is uncountable. The set S =
n
{ 2 n ∈ N and n ≥ 2 } contains only irrational numbers. Show that it is count-
able.
1.40 Let B be the set of characters from which bit strings are made, B = { 0, 1 }.
(a) Let B be the set of finite bit strings where the initial bit is 1. Show that B is
countable.
(b) Let B∗ be the set of finite bit strings, without the restriction on the initial bit.
Show that it also is countable. Hint: use the prior item.
1.41 Use the arctangent function to prove that the sets (0 .. 1) and R have the
same cardinality.
1.42 Example 1.9 restates Aristotle’s Paradox as: the intervals I 0 = [0 .. 2πr 0 ) and
I 1 = [0 .. 2πr 1 ) have the same cardinality, for r 0 , r 1 ∈ R+ .
(a) Verify it by checking that д : I 0 → I 1 given by д(x) = x · (r 1 /r 0 ) is a corre-
spondence.
(b) Show that where a < b , the cardinality of [0 .. 1) equals that of [a .. b).
(c) Generalize by showing that where a < b and c < d , the real intervals [a .. b)
and [c .. d) have the same cardinality.
1.43 Suppose that D ⊆ R. A function f : D → R is strictly increasing if x < x̂
implies that f (x) < f (x̂) for all x, x̂ ∈ D . Prove that any strictly increasing
function is one-to-one; it is therefore a correspondence between D and its range.
(The same applies if the function is strictly decreasing.) Does this hold for
D ⊆ N?
✓ 1.44 A paradoxical aspect of both Aristotle’s and Galieo’s examples is that they
gainsay Euclid’s “the whole is greater than the part,” because they name sets
where that set equinumerous with a proper subset. Here, show that each
pair
of a set and a proper subset has the same cardinality. (a) N, { 2n n ∈ N }
(b) N, {n ∈ N n > 4 }
1.45 Example 1.14 illustrates that we can take away a finite number of elements
from the set N without changing the cardinality. Prove that — prove that if S is a
finite subset of N then N − S is countable.
1.46 (a) Let D = { 0, 1, 2, 3 } and C = { Spades, Hearts, Clubs, Diamonds }, and
let f : D → C be given by f (0) = Spades, f (1) = Hearts, f (2) = Clubs,
f (3) = Diamonds. Find the inverse function f −1 : C → D and verify that it
is a correspondence.
(b) Let f : D → C be a correspondence. Show that the inverse function exists.
That is, show that associating each y ∈ C with the x ∈ D such that f (x) = y
gives a well-defined function f −1 : C → D .
(c) Show that show that the inverse of a correspondence is also a correspondence,
that the function defined in the prior item is a correspondence.
1.47 Prove that a set S is infinite if and only if it has the same cardinality as a
proper subset of itself.
1.48 Prove Lemma 1.5 by proving each.
(a) For any function with a finite domain, the number of elements in that domain
is greater than or equal to the number of elements in the range. Hint: use
induction on the number of elements in the domain.
(b) If such a function is one-to-one then its domain has the same number of
elements as its range. Hint: again use induction on the size of the domain.
(c) If it is not one-to-one then its domain has more elements than its range.
(d) Two finite sets have the same number of elements if and only if there is a
correspondence from one to the other.
Section
II.2 Cantor’s correspondence
Countability is a property of sets so we naturally ask how it interacts with set
operations. Here we are interested in the cross product operation — after all,
Turing machines are sets of four-tuples.
2.1 Example The set S = { 0, 1 } × N consists of ordered pairs ⟨i, j⟩ where i ∈ { 0, 1 }
and j ∈ N. The diagram below shows two columns, each of which looks like
the natural numbers in that it is discrete and unbounded in one direction. So
informally, S is twice the natural numbers. As in Galelio’s Paradox this might lead
to a mistaken guess that it has more members than N. But S is countable.
To count it, the mistake to avoid is to go vertically up a column, which will
Section 2. Cantor’s correspondence 69
never get to the other column. Instead, alternate between the columns.
.. ..
. .
h0, 3i h1, 3i
h0, 2i h1, 2i
h0, 1i h1, 1i
h0, 0i h1, 0i
2.2 Animation: Enumerating { 0, 1 } × N.
This illustrates the correspondence as a table.
n∈N 0 1 2 3 4 5 ...
⟨i, j⟩ ∈ { 0, 1 } × N ⟨0, 0⟩ ⟨1, 0⟩ ⟨0, 1⟩ ⟨1, 1⟩ ⟨0, 2⟩ ⟨1, 2⟩ ...
The map from the top row to the bottom row is a pairing function because it
outputs pairs. Its inverse, from bottom to top, is an unpairing function. This
method extends to counting three copies { 0, 1, 2 } × N, four copies, etc.
2.3 Lemma The cross product of two finite sets is finite, and therefore countable.
The cross product of a finite set and a countably infinite set, or of a countably
infinite set and a finite set, is countably infinite.
Proof Exercise 2.35; use the above example as a model.
2.4 Example The natural next set has infinitely many copies: N × N.
.. .. .. ..
. . . .
⟨0, 3⟩ ⟨1, 3⟩ ⟨2, 3⟩ ⟨3, 3⟩ ···
⟨0, 2⟩ ⟨1, 2⟩ ⟨2, 2⟩ ⟨3, 2⟩ ···
⟨0, 1⟩ ⟨1, 1⟩ ⟨2, 1⟩ ⟨3, 1⟩ ···
⟨0, 0⟩ ⟨1, 0⟩ ⟨2, 0⟩ ⟨3, 0⟩ ···
Counting up the first column or out the first row won’t work; here also we need
to alternate. So instead do a breadth-first traversal: start in the lower left with
⟨0, 0⟩ , then take pairs that are one away, ⟨1, 0⟩ and ⟨0, 1⟩ , then those that are
two away, ⟨2, 0⟩ , ⟨1, 1⟩ and ⟨0, 2⟩ etc.
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i
...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
2.5 Animation: Counting N × N.
This presents the correspondence as a table.
Number 0 1 2 3 4 5 6 ...
Pair ⟨0, 0⟩ ⟨0, 1⟩ ⟨1, 0⟩ ⟨0, 2⟩ ⟨1, 1⟩ ⟨2, 0⟩ ⟨0, 3⟩ . . .
That this procedure gives a correspondence is perfectly evident. But the
formula for going from the bottom line to the top is amusing so we will develop
it. Animation 2.5 numbers the diagonals.
.. .. ..
. . .
h0, 3i h1, 3i h2, 3i h3, 3i
h0, 2i h1, 2i h2, 2i h3, 2i ...
h0, 1i h1, 1i h2, 1i h3, 1i ...
h0, 0i h1, 0i h2, 0i h3, 0i ...
Diagonal 0 1 2 3
Consider for example the pair ⟨1, 2⟩ . It is on diagonal number 3 and, just as
3 = 1 + 2, in general the diagonal number of a pair is the sum of its entries.
Diagonal 0 has one entry, diagonal 1 has two entries, and diagonal 2 has three
entries, so before diagonal 3 come six pairs. Thus, on diagonal 3 the initial
pair ⟨0, 3⟩ gets enumerated as number 6. With that, the pair ⟨1, 2⟩ is number 7.
So to find the number corresponding to ⟨x, y⟩ , note first that it lies on diagonal
d = x + y . The number of entries prior to diagonal d is 1 + 2 + · · · + d . This
is an arithmetic series with total d(d + 1)/2. Thus on diagonal d the first pair,
⟨0, x + y⟩ , has number (x + y)(x + y + 1)/2. The next pair on that diagonal,
⟨1, x + y − 1⟩ , gets the number 1 + [(x + y)(x + y + 1)/2], etc.
2.6 Definition Cantor’s correspondence cantor : N2 → N or unpairing function,
or diagonal enumeration† is cantor(x, y) = x + [(x + y)(x + y + 1)/2]. Its inverse
is the pairing function, pair : N → N2 .
2.7 Example Two early examples are cantor(1, 2) = 7 and cantor(2, 0) = 5. A later
one is cantor(0, 36) = 666.
†
Some authors use diamond brackets, writing ⟨x, y ⟩ where we write cantor(x, y).
2.8 Lemma Cantor’s correspondence is a correspondence, so the cross product N × N

is countable. Further, the sets N3 = N × N × N, and N4 , . . . are all countable.
Proof The function cantor : N × N → N is one-to-one and onto by construction.
That is, the construction ensures that each output natural number is associated
with one and only one input pair.
The prior paragraph forms the base step of an induction argument. For
example, to do N3 the idea is to consider a triple such as ⟨1, 2, 3⟩ to be a
pair whose first entry is a pair, ⟨⟨1, 2⟩, 3⟩ . That is, define cantor3 : N3 → N by
cantor3 (x, y, z) = cantor(cantor(x, y), z). Exercise 2.29 shows that this function
is a correspondence. The full induction step details are routine.
2.9 Corollary The cross product of finitely many countable sets is countable.
Proof Suppose that S 0 , ... S n−1 are countable and that each function fi : N → S i
is a correspondence. By the prior result, the function cantorn−1 : N → Nn is a
correspondence. Write cantorn−1 (k) = ⟨k 0 , k 1 , ... kn−1 ⟩ . Then the composition
k 7→ ⟨f 0 (k 0 ), f 1 (k 1 ), ... fn−1 (kn−1 )⟩ from N to S 0 × · · · Sn−1 is a correspondence,
and so S 0 × S 1 × S n−1 is countable.
2.10 Example The set of rational numbers Q is countable. We know how to alternate
between positives and negatives so we will be done showing this if we count
the nonnegative rationals, f : N → Q+ ∪ { 0 }. A nonnegative rational number
is a numerator-denominator pair ⟨n, d⟩ ∈ N × N+ , except for the complication
that pairs collapse, meaning for instance that when the numerator is 4 and the
denominator is 2 then we get the same rational as when n = 2 and d = 1.
We will count with a program instead of a formula. Given an input i , the
program finds f (i) by using prior values, f (0), f (1), . . . f (i − 1). It loops,
using the pairing function cantor−1 to generate pairs: cantor−1 (0), cantor−1 (1),
cantor−1 (2), . . . For each generated pair ⟨a, b⟩ , if the second entry is 0 or if the
rational number a/b is in the list of prior values then the program rejects the pair,
going on to try the next one. The first pair that it does not reject is f (i).
The technique of that example is memoization or caching and it is widely used.
For example, when you visit a web site your browser saves any image to your disk.
If you visit the site again then your browser checks if the image has changed. If
not then it will use the prior copy, reducing download time.
The next result establishes that we can use memoization in general.
2.11 Lemma A set S is countable if and only if either S is empty or there is an onto
map f : N → S .
Proof Assume first that S is countable. If it is empty then we are done. If it is
finite but nonempty, S = {s 0 , ... sn−1 }, then this f : N → S map is onto.
(
si – if i < n
f (i) =
s 0 – otherwise
If S is infinite and countable then it has the same cardinality as N so there is a

correspondence f : N → S . A correspondence is onto.
For the converse assume that either S is empty or there is an onto map from
N to S . If S = then it is countable by Definition 1.12 so suppose that there is
an onto map f . If S is finite then it is countable so suppose that S is infinite.
Define fˆ : N → S by fˆ(n) = f (k) where k is the least natural number such that
f (k) < { fˆ(0), ... fˆ(n − 1) }. Such a k exists because S is infinite and f is onto.
Observe that fˆ is both one-to-one and onto, by construction.
This section starts off by noting that it is natural to see how countability
interacts with set operations.
2.12 Corollary (1) Any subset of a countable set is countable. (2) The intersection
of two countable sets is countable. The intersection of countably many countable
sets is countable. (3) The union of two countable sets is countable. The union of
any finite number of countable sets is countable. The union of countably many
countable sets is countable.
Proof Suppose that S is countable and that Sˆ ⊆ S . If S is empty then so is Sˆ, and
thus it is countable. Otherwise by the prior lemma there is an onto f : N → S . If
Sˆ is empty then it is countable, and if not then fix some ŝ ∈ Sˆ so that this map
fˆ : N → Sˆ is onto.
f (n) – if f (n) ∈ Sˆ
(
fˆ(n) =
ŝ – otherwise
Item (2) is immediate from (1) since the intersection is a subset.

For item (3) in the two-set case, suppose that S 0 and S 1 are countable. If
either set is empty, or both sets are empty, then the result is trivial because for
instance S 0 ∪ = S 0 . So instead suppose that f 0 : N → S 0 and f 1 : N → S 1 are
onto. Lemma 2.3 gives a correspondence taking N to { 0, 1 } × N, inputting natural
numbers and outputting pairs ⟨i, j⟩ where i is either 0 or 1. Call that function
д : N → { 0, 1 } × N. Then this is the desired function onto the set S 0 ∪ S 1 .
(
f 0 (j) – if д(n) = ⟨0, j⟩
fˆ(n) =
f 1 (j) – if д(n) = ⟨1, j⟩
This approach extends to any finite number of countable sets.

Finally, we start with countably many countable sets, S i for i ∈ N, and show
that their union S 0 ∪ S 1 ∪ · · · is countable. If all but finitely many are empty then
we can fall back to the finite case so assume that infinitely many of the sets are
nonempty. Throw out the empty ones because they don’t affect the union, write
S j for the remaining sets, and assume that we have a family of correspondences
f j : N → S j . Then use Cantor’s pairing function: the desired map from N onto
S 0 ∪ S 1 ∪ · · · is fˆ(n) = f j (k) where pair(n) = ⟨j, k⟩ .
Very important: Lemma 2.3 and Lemma 2.8 on the cross product of countable
sets are effectivizable. That is, if sets correspond to N via some effective numbering
then their cross product corresponds to N via an effective numbering. We finish
this section by applying that to Turing machines — we will give a way to effectively
number the Turing machines.
Turing machines are sets of instructions. Each instruction is a four-tuple,
a member of Q × Σ × (Σ ∪ { L, R }) × Q , where Q is the set of states and Σ is
the tape alphabet. So by the above numbering results, we can number the
instructions: there is an instruction whose number is 0, one with number 1, etc.
This is effective, meaning that there is a program that takes in a natural number
and outputs the corresponding instruction, as well as a program that takes in an
instruction and outputs the corresponding number (see Exercise 2.24).
With that, we can effectively number the Turing machines. One way is: starting
with a Turing machine P, use the prior paragraph to convert each of its instructions
to a number, giving a set {i 0 , i 1 , ... i n }, and then output the number associated
with that machine as e = д(P ) = 2i 0 + 2i 1 + · · · + 2i n .
The association in the other direction is much the same. Given a natural
number e , represent it in binary e = 2j0 + · · · + 2jk , form the set of instructions
corresponding to the numbers j 0 , . . . jk , and that is the output Turing machine.
(Except that we must check that this set is deterministic, that no two of the
instructions begin with the same qpTp , which we can do effectively, and if it is
not deterministic then let the output be the empty machine P = { }.)
The exact numbering that we use doesn’t matter much as long as it is has
certain properties, the ones in the following definition, for the rest of the book we
will just fix a numbering and cite its properties rather than mess with its details.
2.13 Definition A numbering is a function that assigns to each Turing machine
a natural number. For any Turing machine, the corresponding number is its
index number, or Gödel number, or description number. For the machine with
index e ∈ N we write Pe . For the function computed by Pe we write ϕ e .
A numbering is acceptable if it is effective: (1) there is a program that takes
as input the set of instructions and gives as output the associated number, (2) the
set of numbers for which there is an associated machine is computable, and
(3) there is an effective inverse that takes as input a natural number and gives as
output the associated machine.
Think of the machine’s index as its name. We will refer to it frequently, for
instance by saying “the e -th Turing machine.”
The takeaway point is that because the numbering is acceptable, the index is
source-equivalent — we can go effectively from the index to the machine source,
the set of four-tuple instructions, or from the source to the index.
2.14 Remark Here is an alternative scheme that is simple and is useful for thinking
about numbering, but that we won’t make precise. On a computer, the text of a
program is saved as a bit string, which we can interpret as a binary number, e .
In the other direction, given a binary e on the disk, we can disassemble it into
assembly language source code. So there is an association between binary
numbers and source code.
2.15 Lemma (Padding lemma) Every computable function has infinitely many in-
dices: if f is computable then there are infinitely many distinct ei ∈ N with
f = ϕ e0 = ϕ e1 = · · · . We can effectively produce a list of such indices.
Proof Let f = ϕ e . Let q j be the highest-numbered state in the set Pe . For each
k ∈ N+ consider the Turing machine obtained from Pe by adding the instruction
q j+k BBq j+k , This gives an effective sequence of Turing machines Pe1 , Pe2 , . . .
with distinct indices, all having the same behavior, ϕ ek = ϕ e = f .
2.16 Remark Stated in terms of everyday programming, we can get infinitely different
many source codes that have the same compiled behavior. One way is by starting
with one source code and adding to the bottom a comment line containing the
number k .
Now that we have counted the Turing machines we are close to this book’s most
important result. The next section shows that there are so many natural number
functions that they cannot be counted, they cannot be put in correspondence
with N. This will prove that there are functions not computed by any Turing
machine.
II.2 Exercises
✓ 2.17 Extend the table of Example 2.1 through n = 12. Where f (n) = ⟨x, y⟩ , give
formulas for x and y .
✓ 2.18 For each pair ⟨a, b⟩ find the pair before it and the pair after it in Cantor’s
correspondence. That is, where cantor(a, b) = n , find the pair associated
with n + 1 and the pair with n − 1. (a) ⟨50, 50⟩ (b) ⟨100, 4⟩ (c) ⟨4, 100⟩
(d) ⟨0, 200⟩ (e) ⟨200, 0⟩
✓ 2.19 Corollary 2.12 says that the unionof two countable sets is countable.
(a) For each of the two sets T = { 2k k ∈ N } and F = { 5m m ∈ N } produce
a correspondence fT : N → T and f F : N → F . Give a table listing the
values of fT (0), . . . fT (9) and give another table listing f F (0), . . . f F (9).
(b) Give a table listing the first ten values for a correspondence f : N → T ∪ F .
2.20 Give an enumeration of N × { 0, 1 }. Find the pair matching 0, 10, 100, and
101. Find the number corresponding to ⟨2, 1⟩ , ⟨20, 1⟩ , and ⟨200, 1⟩ .
✓ 2.21 Example 2.1 says that the method for two columns extends to three. Give
an enumeration of { 0, 1, 2 } × N. That is, where д(n) = ⟨x, y⟩ give a formula for
x and y . Find the pair corresponding to 0, 10, 100, and 1 000. Find the number
corresponding to ⟨1, 2⟩ , ⟨1, 20⟩ , and ⟨1, 200⟩ .
2.22 Give an enumeration f of { 0, 1, 2, 3 } × N. That is, where f (n) = ⟨x, y⟩ ,
give a formula for x and y . Also give an enumeration f of { 0, 1, 2, ... k } × N.
✓ 2.23 Extend the table of Example 2.4 to cover correspondences up to 16.

✓ 2.24 Definition 2.6’s function cantor(x, y) = x + [(x + y)(x + y + 1)/2] is clearly
effective since it is given as a formula. Show that its inverse, pair : N → N2 , is
also effective by sketching a way to compute it with a program.
2.25 Prove that is A and B are countable sets then their symmetric difference
A∆B = (A − B) ∪ (B − A) is countable.

2.26 Show that the subset S = {a + bi a, b ∈ Z } of the complex numbers is
countable.
2.27 List the first dozen nonnegative rational numbers enumerated by the method
described in Example 2.10.
2.28 We will show that Z[x] = {an x n + · · · + a 1x + a 0 n ∈ N and an ... a 0 ∈ Z },

the set of polynomials in the variable x with integer coefficients, is countable.

(a) Fix a natural number n . Prove that the set of polynomials with n + 1-many
terms Zn [x] = {an x n + · · · + a 0 an , ... a 0 ∈ Z } is countable.
(b) Finish the argument.
✓ 2.29 The proof of Lemma 2.8 says that the function cantor3 : N3 → N given by
cantor3 (a, b, c) = cantor(cantor(a, b), c) is a correspondence. Verify that.
2.30 Define c 3 : N3 → N by ⟨x, y, z⟩ 7→ cantor(x, cantor(y, z)). (a) Compute
c 3 (0, 0, 0), c 3 (1, 2, 3), and c 3 (3, 3, 3). (b) Find the triples corresponding to 0, 1,
2, 3, and 4. (c) Give a formula.
2.31 Say that an entry in N × N is on the diagonal if it is ⟨i, i⟩ for some i . Show
that an entry on the diagonal has a Cantor number that is a multiple of four.
2.32 Corollary 2.12 says that the union of any finite number of countable sets is
countable. The base case is for two sets (and the inductive step covers larger
numbers of sets). Give a proof specific to the three set case.
2.33 Show that the set of all functions from { 0, 1 } to N is countable.
2.34 Show that the image under any function of a countable set is countable.
That is, show that if S is countable
and there is a function f : S → T then the
range set f (S) = ran(f ) = {y y = f (x) for some x ∈ S } is also countable.
2.35 Give a proof of Lemma 2.3.
✓ 2.36 Consider a programming language using the alphabet Σ consisting of the
twenty six capital ASCII letters, the ten digits, the space character, open and
closed parenthesis, and the semicolon. Show each.
(a) The set of length-5 strings Σ5 is countable.
(b) The set of strings of length at most 5 over this alphabet is countable.
(c) The set of finite-length strings over this alphabet is countable.
(d) The set of programs in this language is countable.
2.37 There are other correspondences from N2 to N besides Cantor’s.
(a) Consider д : N2 → N given by ⟨n, m⟩ 7→ 2n (2m + 1) − 1. Find the number
corresponding to the pairs in { ⟨n, m⟩ ∈ N2 0 ≤ n, m < 4 }.
(b) Show that д is a correspondence.

(c) The box enumeration goes: (0, 0), then (0, 1), (1, 1), (1, 0), then (0, 2),
(1, 2), (2, 2), (2, 1), (2, 0), etc. To what value does (3, 4) correspond?
2.38 The formula for Cantor’s unpairing function cantor(x, y) = x + [(x + y)(x +
y + 1)/2] give a correspondence for natural number input. What about for real
number input?
(a) Find cantor(2, 1).
(b) Fix x = 1 and find two different y ∈ R so that cantor(1, y) = cantor(2, 1).
2.39 It is fun to prove directly, rather than via the cross product, that the
countable union of countably many countable sets is countable.
(a) For the union of two countable sets, S 0 and S 1 , partition the natural numbers
into the odds and evens, C 0 and C 1 , that is, into the set of numbers whose
binary representation does not end in 0 and the set whose representation
does end in 0. Each is countably infinite so there are correspondences
д0 : N → C 0 and д1 : N → C 1 . Use the д’s to produce an onto function
fˆ : N → S 0 ∪ S 1 .
(b) For the union of three countable sets S 0 , S 1 , and S 2 , instead split the natural
numbers into three parts: the set C 0 of numbers whose binary expansion
does not end in 0, the set C 1 of numbers whose expansion ends in one
but not two 0’s, and C 2 , those numbers ending in two 0’s. (Take 0 to be
an element of the second set.) There are correspondences д0 : N → C 0 ,
д1 : N → C 1 and д2 : N → C 2 . Produce an onto fˆ : N → S 0 ∪ S 1 ∪ S 2 .
(c) To show that the countable union of countable sets is countable start with
countably many countable sets, S i for i ∈ N. Assume that there are infinitely
many nonempty sets, throw out the empty ones because they don’t affect
the union, call the rest S j , and extend the prior item.
Section
II.3 Diagonalization
Cantor’s definition of cardinality led us to produce correspondences. But it
can also happen that no correspondence exists. We now introduce a powerful
technique to show that. It is central to the entire Theory of Computation.
Diagonalization There is a set so large that it is not countable, that is, a set for
which no correspondence exists with N or any subset of it. It is the set of reals, R.
3.1 Theorem There is no onto map f : N → R. Hence, the set of reals is not
countable.
This result is important but so is the technique of proof that we will use. We
will pause to develop the intuition behind it. The table below illustrates a function
f : N → R, listing some inputs and outputs, with the outputs aligned on the
Section 3. Diagonalization 77
decimal point.
n Decimal expansion of f (n)
0 42 . 3 1 2 7 7 0 4 ...
1 2.0 1 0 0 0 0 0 ...
2 1.4 1 4 1 5 9 2 ...
3 −20 . 9 1 9 5 9 1 9 ...
4 0.1 0 1 0 0 1 0 ...
5 −0 . 6 2 5 5 4 1 8 ...
.. ..
. .
We will show that this function is not onto. We will do this by producing a number
z ∈ R that does not equal any of the outputs, any of the f (n)’s.
Ignore what is to the left of the decimal point. To its right go down the
diagonal, taking the digits 3, 1, 4, 5, 0, 1 . . . Construct the desired z by making
its first decimal place something other than 3, making its second decimal place
something other than 1, etc. Specifically: if the diagonal digit is a 1 then z gets
a 2 in that decimal place and otherwise z gets a 1 there. Thus, in this example
z = 0.121112 ...
By this construction, z differs from the number in the first row, z , f (0),
because they differ in the first decimal place. Similarly, z , f (1) because they
differ in the second place. In this way z does not equal any of the f (n). Thus f is
not onto. This technique is diagonalization.
(In this argument we have skirted a technicality, that some real numbers have
two different decimal representations. For instance, 1.000 ... = 0.999 ... because
the two differ by less than 0.1, less than 0.01, etc. This is a potential snag because
it means that even though we have constructed a representation that is different
than all the representations on the list, it still might not be that the number is
different than all the numbers on the list. However, dual representation only
happens for decimals when one of the representations ends in 0’s while the other
ends in 9’s. That’s why we build z using 1’s and 2’s.)
Proof We will show that no map f : N → R is onto.
Denote the i -th decimal digit of f (n) as f (n)[i] (if f (n) is a number with two
decimal representations then use the one ending in 0’s). Let д be the map on the
decimal digits { 0, ... , 9 } given by: д(j) = 2 if j is 1, and д(j) = 1 otherwise.
Now let z be the real number that has 0 to the left of its decimal point, and
whose i -th decimal digit is д(f (i)[i]). Then for all i , z , f (i) because z[i] , f (i)[i].
So f is not onto.
3.2 Definition A set that is infinite but not countable is uncountable.

We next define when one set has fewer, or more, elements than another. Out
intuition comes from trying to make a correspondence between the two finite
sets { 0, 1, 2 } and { 0, 1, 2, 3 }. There are just too many elements in the codomain
for any map to cover them all. The best we can do is something like this, which is
one-to-one but not onto.
3
2
2
1
1
0
0
3.3 Definition The set S has cardinality less than or equal to that of the set T ,
denoted |S | ≤ |T | , if there is a one-to-one function from S to T .
3.4 Example There is a one-to-one function from N to R, namely the inclusion map
that sends n ∈ N to itself, n ∈ R. So | N | ≤ | R | . (By Theorem 3.1 above the
cardinality is actually strictly less.)
3.5 Remark We cannot emphasize too strongly that the work in this chapter,
including the prior example, is startling and profound. Some infinite sets have
more elements than others. And, in particular, the reals have more elements than
the naturals. As dramatized by Galelio’s Paradox, this is not just that the naturals
are a subset of the reals. Instead it means that the set of naturals cannot be made
to correspond with the set of reals. This is like the children’s game Musical Chairs.
We have countably many chairs P 0 , P 1 , ..., chairs indexed by the natural numbers,
but there are so many children, so many real numbers, that some child is left
without a chair.
The wording of that definition implies that if both |S | ≤ |T | and |T | ≤ |S | then
|S | = |T | . That is true but the proof is beyond our scope; see Exercise 3.31.
For the next result recall that a set’s characteristic function 1S is the Boolean
function determining membership: 1S (s) = 1 if s ∈ S and 1S (s) = 0 if s < S .
Thus for the set of two letters S = { a, c }, the characteristic function with domain
Σ = { a, ... , z } is 1S (a) = 1, 1S (b) = 0, 1S (c) = 1, 1S (d) = 0, ... 1S (z) = 0.
Recall also that the power set P (S) is the collection of subsets of S . For instance,
if S = { a, c } then P (S) = { , { a }, { c }, { a, c } }.
3.6 Theorem (Cantor’s Theorem) A set’s cardinality is strictly less than that of its
power set.
Before stating the proof we first illustrate it. The easy half is starting with a
set S and producing a function to P (S) that is one-to-one: just map s ∈ S to {s }.
The harder half is showing that no map from S to P (S) is onto. As an example,
consider S = { a, b, c } and this function f : S → P (S).
f f f
a 7−→ { b, c } b 7−→ { b } c 7−→ { a, b, c } (∗)
In the table below, the first row lists the values of the characteristic function
1f (a) : S → { 0, 1 } on the inputs a, b, and c. The second row lists the input/output
values for 1f (b) . And, the third row lists 1f (c) .
s ∈S 1f (s) (a) 1f (s) (b) 1f (s) (c)

a 0 1 1
b 0 1 0
c 1 1 1
We show that f is not onto by producing a subset of S that is not one of the three
sets in (∗). For that, diagonalize: go down the table’s diagonal 011 and flip the
bits from 0 to 1 or from 1 to 0. We get 100. That’s the characteristic function
of R = { a }. This set is not equal to f (a) because it differs on a, it is not f (b)
because it differs on b, and it is not f (c) because it differs on c.
Proof One half is easy: consider the embedding map ι : S → P (S) given by
ι(s) = {s}. It is one-to-one so the cardinality of S is less than or equal to the
cardinality of P (S).
For the other half, to show that no map from a set to its power set is onto, fix
any f : S → P (S) and consider this element of P (S).

R = {s s < f (s) }
We will show that no member of the domain maps to R and thus f is not onto.
Suppose that there exists ŝ ∈ S such that f (ŝ) = R . Consider
whether ŝ is an
element of R . We have that ŝ ∈ R if and only if ŝ ∈ {s s < f (s) }. By definition
of membership that holds if and only if ŝ < f (ŝ), which holds if and only if ŝ < R .
The contradiction means that no such ŝ exists.
3.7 Corollary Let F be the set of functions f : N → N. The cardinality of the

set N is strictly less than the cardinality of F .
Proof There is a one-to-one map from P (N) to F : associate each subset S ⊆ N
with its characteristic function 1S : N → N. Therefore | N | < | P (N)| ≤ |F | .
3.8 Corollary (Existence of uncomputable functions) There is a function

f : N → N that is not computable: f , ϕ e for all e .
Proof There are more members of the set of functions from N to itself, than
there are members of the set N. Lemma 2.8 shows that there are as many
Turing machines as there are members of N. So, the cardinality of the set of
functions f : N → N is greater than the cardinality of the set of Turing machines.
Consequently some natural number function is without an associated Turing
machine.
This is an epochal result. In the light of Church’s Thesis we understand it to
prove that there are jobs that no computer can do.
II.3 Exercises
3.9 Your friend is confused about the diagonal argument. “If you had an infinite
list of numbers, it would clearly contain every number, right? I mean, if you had
a list that was truly INFINITE, then you simply couldn’t find a number that is
not on the list!” Straighten them out.
3.10 Your classmate says, “Professor, you’ve made a mistake. The set of numbers
with one decimal place, such as 25.4 and 0.1, is clearly countable — just take
the integers and shift all the decimal places by one. The set with two decimal
places, such as 2.54 and 6.02 is likewise countable, etc. This is countably many
sets, each of which is countable, and so the union is countable. The union is the
whole reals, so the reals are countable.” Where is your friend’s mistake?
3.11 Verify Cantor’s Theorem, Theorem 3.6, for these finite sets. (a) { 0, 1, 2 }
(b) { 0, 1 } (c) { 0 } (d) { }
✓ 3.12 Use Definition 3.3 to prove that the first set has cardinality less than or
equal to the second set.
(a) S = { 1, 2, 3 } , Sˆ = { 11, 12, 13 }
(b) T = { 0, 1, 2 } , T̂ = { 11, 12, 13, 14 }
(c) U = { 0, 1, 2 } , the set of odd numbers
(d) the set of even numbers, the set of odds
3.13 One set is countable and the other is uncountable. Which is which?
(a) {n ∈ N n + 3 < 5 }

(b) {x ∈ R x + 3 < 5 }
✓ 3.14 Characterize each set as countable or uncountable. You need only give
a one-word answer. (a) [1 .. 4) ⊂ R (b) [1 .. 4) ⊂ N (c) [5 .. ∞) ⊂ R
(d) [5 .. ∞) ⊂ N
3.15 List all of the functions with domain A2 = { 0, 1 } and codomain P (A2 ).
How many functions are there for a set A3 with three elements? n elements?
3.16 List all of the functions from S to T . How many are one-to-one?
(a) S = { 0, 1 } , T = { 10, 11 }
(b) S = { 0, 1 } , T = { 10, 11, 12 }
✓ 3.17 Short answer: fill each blank by choosing from (i) uncountable, (ii) countable
or uncountable, (iii) finite, (iv) countable, (v) finite, countably infinite, or
uncountable (you might use an answer more than once, or not at all). Give the
sharpest conclusion possible. You needn’t give a proof.
(a) If A and B are finite then A ∪ B is .
(b) If A is countable and B is finite then A ∪ B is .
(c) If A is countable and B is uncountable then A ∪ B is .
(d) if A is countable and B is uncountable then A ∩ B is .
3.18 Short answer: suppose that S is countable and consider f : S → T . List all
of these that are possible: (i) S is finite, (ii) T is finite, (iii) S is countably infinite,
(iv) T is countably infinite, (v) T is uncountable, provided that (a) the map is onto,
(b) the map is one-to-one.
✓ 3.19 Name a set with a larger cardinality than R.
✓ 3.20 Recall that B = { 0, 1 }.

(a) Show that the set of finite bit strings, ⟨b0b1 ... bk −1 ⟩ where bi ∈ B and
k ∈ N, is countable.
(b) An infinite bit string f = ⟨b0 , b1 , ...⟩ is a function f : N → B. Show that
the set of infinite bit strings is uncountable, using diagonalization.
3.21 Prove that for two sets, S ⊆ T implies |S | ≤ |T | .
3.22 Use diagonalization to show that this statement is false: all functions f : N → N
with a finite range are computable.
3.23 You study with someone who says, “Yes, obviously there are different sizes
of infinity. The plane R2 obviously has infinitely many more points then the
line R, so it is a larger infinity.” Convince them that, although there are indeed
different sizes of infinity, their argument is wrong because the cardinality of
the plane is the same as the cardinality of the line. Hint: consider the function
f : R2 → R such that f (x, y) interleaves the digits of the two input numbers.
3.24 In mathematics classes we mostly work with rational numbers, perhaps
leaving the impression that irrational numbers are rare. Actually, there are more
irrational numbers than rationals. Prove that while the set of rational numbers
is countable, the set of irrational numbers is uncountable.
✓ 3.25 Example 2.10 shows that the rational numbers are countable. What happens
when the diagonal argument given in Theorem 3.1 is applied to a listing of the
rationals? Consider a sequence q 0 , q 1 , ... that contains all of the rationals. For
each of those numbers use a decimal expansion qi = di .di, 0di, 1di, 2 ... (with
di ∈ Z and di, j ∈ { 0, ... 9 }) that does not end in all 9’s, so that the decimal
expansion is determined.
(a) Let д be the map on the decimal digits 0, 1, ... 9 given by д(1) = 2, and
д(0) = д(2) = д(3) = · · · = 1. Define z = n ∈N д(dn,n ) · 10−(n+1) . Show
Í
that z is irrational.
(b) Use the prior item to conclude that the diagonal number d = n ∈N dn,n ·
Í
10−(n+1) is irrational. Hint: show that, unlike a rational number, it has no
repeating pattern in its decimal expansion.
(c) Why is the fact that the diagonal is not rational not a contradiction to the
fact that we can enumerate all of the rationals?
3.26 Verify Cantor’s Theorem in the finite case by showing that if S is finite then
the cardinality of its power set is | P (S)| = 2 |S | .

3.27 The definition R = {s s < f (s) } is the key to the proof of Cantor’s Theorem,
Theorem 3.1. This story illustrates the idea: a high school yearbook asks each
graduating student si make a list f (si ) of class members that they predict will
someday be famous. Define the set of humble students H to be those who are
not on their own list. Show that no student’s list equals H .
3.28 The proof of Theorem 3.1 works around the fact that some numbers have
more than one base ten representation. Base two also has the property that
some numbers have more than one representation; an example is 0.01000 ...
and 0.00111 .... How could you make the argument work in base two?
3.29 The discussion after the statement of Theorem 3.1 includes that the real
number 1 has two different decimal representations, 1.000 ... = 0.999 ...
(a) Verify this equality using the formula for an infinite geometric series,
a + ar + ar 2 + ar 3 + · · · = a · 1/(1 − r ).
(b) Show that if a number has two different decimal representations then in
the leftmost decimal place where they differ, they differ by 1. Hint: that is
the biggest difference that the remaining decimal places can make up.
(c) In addition show that, for the one with the larger digit in that first differing
place, all of the digits to its right are 0, while the other representation has
that all of the remaining digits are 9’s. Hint: this is similar to the prior item.
3.30 Show that there is no set of all sets. Hint: use Theorem 3.6.
3.31 Definition 3.3 extends the definition of equal cardinality to say that |A| ≤ |B|
if there is a one-to-one function from A to B . The Schröder–Bernstein theorem
is that if both |S | ≤ |T | and |T | ≤ |S | then |S | = |T | . We will walk through
the proof. It depends on finding chains of images: for any s ∈ S we form the
associated chain by iterating application of the two functions, both to the right
and the left, as here.
... f −1 (д−1 (s)), д−1 (s), s, f (s), д(f (s)), f (д(f (s))) ...
(Starting with s the chain to the right is s, f (s), д(f (s)), f (д(f (s))), ... while the
chain to the left is ... f −1 (д−1 (s)), д−1 (s), s .) For any t ∈ T define the associated
chain similarly.
An example is to take a set of integers S = { 0, 1, 2 } and a set of characters T =
{ a, b, c }, and consider the two one-to-one functions f : S → T and д : T → S
shown here.
s f (s) 2 c t д(t)
0 b 1 b
a 0
1 c 0 a
b 1
2 a c 2
Starting at 0 ∈ S gives a single chain that is cyclic, ... 0, b, 1, c, 2, a, 0 ...
(a) Consider S = { 0, 1, 2, 3 } and T = { a, b, c, d } . Let f associate 0 7→ a,
1 7→ b, 2 7→ d and 3 7→ c. Let д associate a 7→ 0, b 7→ 1, c 7→ 2 and
d 7→ 3. Check that these maps are one-to-one. List the chain associated
with each element of S and the chain associated with each element of T .
(b) For infinite sets a chain can have a first element, an element without any
preimage. Let S be the even numbers and let T be the odds. Let f : S → T
be f (x) = x + 1 and let д : T → S be д(x) = x + 1. Show each map is
one-to-one. Show there is a single chain and that it has a first element.
(c) Argue that we can assume without loss of generality that S and T are
disjoint sets.
Section 4. Universality 83
(d) Assume that S and T are disjoint and that f : S → T and д : T → S are
one-to-one. Show that every element of either set is in a unique chain, and
that each chain is of one of four kinds: (i) those that repeat after some
number of terms (ii) those that continue infinitely in both directions without
repeating (iii) those that continue infinitely to the right but stop on the left
at some element of S , and (iv) those that continue infinitely to the right but
stop on the left at some element of T .
(e) Show that for any chain the function below is a correspondence between
the chain’s elements from S and its elements from T .
(
f (s) – if s is in a sequence of type (i), (ii), or (iii)
h(s) = − 1
д (s) – if s is in a sequence of type (iv)
Section
II.4 Universality
We have seen a number of Turing machines: one whose behavior is that its output
is the successor of its input, one that interprets its input as two numbers and adds
them, etc. These are single-purpose devices, where to get different input-output
behavior we needed to get a new machine, that is, new hardware. This was what
we meant by saying that a good first take on Turing machines is that they are more
like a modern computer program than a modern computer.
The picture below shows programmers of an early electronic computer. They
are changing its behavior by changing its circuits, using the patch cords.
ENIAC, reconfigure by rewiring.
Imagine having a phone where to change from running a browser to taking a

call you must pull one chip and replace it with another. The patch cords are an
improvement over a soldering iron but are not a final answer.
Universal Turing machine A pattern in technology is for jobs done in hardware

to migrate to software. The classic example is weaving.
Weaving by hand, as the loom operator on the left is doing, is intricate and slow.
We can make a machine to reproduce her pattern. But what if we want a different
pattern; do we need another machine? In 1801 J Jacquard built a loom like the
one on the right, controlled by paper cards. Getting a different pattern does not
require a new loom, it only requires swapping cards.
Turing introduced the analog for computing devices. He produced a single
Turing machine that we can give the instructions: “Consider the following Turing
machine. Have the same output behavior as this machine would on receiving the
following input.” We don’t need infinitely many different machines, we just need
this one, and it can be made to have any desired computable behavior.
“Have the same output behavior” means that if the specified machine halts on
that input then the universal machine halts and gives the same output, and if the
specified machine does not halt on that input then the universal machine also does
not halt.
Before we state the theorem, we will first address a question. This
machine may seem to present a chicken and egg problem: how can
we give a Turing machine as input to a Turing machine? In particular,
since the universal machine is itself a Turing machine, the theorem
seems to allow the possibility of giving it to itself — won’t feeding a
machine to itself lead to infinite regress?
We run Turing machines by loading symbols on the tape and
An ouroboros, pressing Start. So we don’t feed a machine to itself — instead, it inputs
a snake swal- symbols. True, we can feed a universal machine a pair e, x where e
lowing its own is the index of of the universal machine, and thus is computationally
tail equivalent to that machine’s source. But even so, the universe won’t
collapse — we can absolutely use a text editor to edit the bits that are
its own source, or give a compiler a source code listing for itself. Similarly, we
can feed a universal machine its own number. Certainly, lots of interesting things
happen as a result, but the point is that there is no inherent impossibility.
4.1 Theorem (Turing, 1936) There is Turing machine that when given the input
e, x will have the same output behavior as does Pe on input x .
This is a Universal Turing Machine.† Observe that, in some ways, universal

machines are familiar from our everyday computing.
Consider your computer and its operating system, along with some
Start
program that will run on it.‡ Think of the program as like some
Turing machine, Pe . It may take input, a string of 0’s and 1’s, that Read e, x
we can interpret as the number x . When asked, the operating system
Simulate Pe on
loads the program into memory and then feeds the input to that input x
program. That is, when asked, the operating system arranges that your
computer will behave like the stored program, machine e with input x . Print result
Universal Turing machines change their behavior in software, as does
End
an operating system. No patch chords.
Another everyday computer experience that is like a universal machine is an
interpreter. Below is a session with a Scheme interpreter. After the interpreter’s
command line invocation, in line 1 the system gets the source of a program that
takes in i and sum the first i numbers. In line 2 the interpreter runs that source
with the input i = 4. and returns the output of 10.
$ csi
CHICKEN
(c) 2008-2013, The Chicken Team
(c) 2000-2007, Felix L. Winkelmann
Version 4.8.0.5 (stability/4.8.0) (rev 5bd53ac)
linux-unix-gnu-x86-64 [ 64bit manyargs dload ptables ]
compiled 2013-10-03 on aeryn.xorinia.dim (Darwin)
#;1> (define sum

(lambda (i tot)
(if (= i 0)
tot
(sum (- i 1) (+ tot i)))))
#;2> (sum 4 0)
10
#;3> ((lambda (i) (if (= i 0) 1 0)) 1)
0
Illustrating this even more directly, in line 3 the interpreter gets as a single
expression both the source of a routine (it is shown highlighted) and the input,
and it returns the result of applying the source to the input. That is, like the loom’s
punched cards, our mechanism allows us to swap behaviors in and out, at will.
The most direct example of our everyday experience with computing systems
that act as universal machines is a programming language’s eval statement.
#;1> (define (utm s)
(eval s (scheme-report-environment 5)))
#;2> (define TEST '(lambda (i) (if (= i 0) 1 0)))
#;3> TEST
(lambda (i) (if (= i 0) 1 0))
†
Another often-used way to define a Universal Turing machine that is to have it take the single-number
input cantor(e, x ). ‡ The figure is a flow chart, which gives a high level outline of a routine, here of an
operating system or of a Universal Turing machine. We use three types of boxes. Rectangles are for the
ordinary flow of control. Round corner boxes are for Start and End. Diamond boxes, which appear in
later flow charts, are for decisions, if statements.
#;4> ((utm TEST) 5)

0
#;5> ((utm TEST) 0)
1
In line 1 the utm is defined to evaluate its input (the scheme-report-environment

ensures that the routines defined in the specification for Scheme 5 are available to
eval).† In line 2 the (lambda (i) (if (= i 0) 1 0)) is quoted so that Scheme
will keep it as a string and not execute it. Lines 4 and 5 have that string fed to utm,
which runs eval on it, so the first entry in the list is now a routine. This routine
acts on the argument, here 5 or 0, producing the output.
To finish, to justify Theorem 4.1, we have already exhibited what amounts to a
Universal Turing machine. In the Turing Machine simulator section on page 37 we
gave code that reads an arbitrary Turing machine from a file, and then simulates
it. The code is in Scheme but by Church’s Thesis we could write this as a Turing
machine. (For a Universal Turing machine that meets the criteria, we must not
input the arbitrary Turing machine from a file, but instead we input its index,
which we then convert to a Turing machine. But we’ve already argued that we can
write a routine to go from the index to the Turing machine.)
Uniformity Consider this job: given a real number r ∈ R, write a program to
produce its digits. More precisely, the job is to produce a family of machines, a Pr
for each r ∈ R, such that when given n ∈ N as input, Pr returns the n -th decimal
place of r (for n = 0, it returns the integer to the left of the decimal point).
We know that there is no such family because there are countably many Turing
machines but uncountably many real numbers. But why can’t we do it? One of the
enjoyable things about coding is the feeling of being able to get the machine to
do anything that we like — what’s stopping us from producing whatever digits we
like?
There are indeed some real numbers for which there is such a program. One
is r = 1/4. For a more generic number, say, some r = 0.703 ... , we might
momentarily imagine brute-forcing it.
read n
if n==0:
print 0
elif n==1:
print 7
elif n==2:
print 0
...
But that’s silly. We can have if .. elif .. branches for a few cases but because
programs have finite length, code must handle all but finitely many n ’s uniformly.
There must be a branch that handles infinitely many inputs (there may be a finite
†
Writing a program that allows general users to evaluate arbitrary code is powerful but not safe,
especially if these users just surf in from the Internet. Restricting which commands the user can
evaluate, known as sandboxing, forms part of being careful with that power. For us, however, the
software engineering issues are not relevant.
number of such branches), and all except for finitely many inputs must be handled
on such a branch.
Thus, the fact that Turing machines have only finitely many instructions imposes
a requirement of uniformity. What this machine does on 1 is unconnected to what
it does on other inputs
read n
if n==1:
print 42
else:
print 2*n
but in any program there are only a finite number of different cases.
4.2 Example Associating in this way the idea that ‘something is computable’ with ‘it is
uniformily computable’ has some surprising consequences. Consider the problem of
producing a program that inputs a number n and decides whether somewhere in the
decimal expansion of π = 3.14159 ... there is a length n sequence of consecutive
nines.
The answer: there are two possibilities. Either for all n such a sequence exists
or else there is some number n 0 where a sequence of 9’s exists for lengths less
than n 0 and no such sequence exists when n ≥ n 0 . Therefore the problem is solved
by one of these two programs. However, we don’t know which one.
read n read n
print 1 if n < n0:
print 1
else:
print 0
One aspect that is surprising is that neither of the two have anything to do with π .
Also surprising, and perhaps unsettling, is that we have shown that the problem is
solvable without showing how to solve it. That is, there is a difference between
showing that this function is computable
(
1 – if there are n consecutive 9’s in π
f (n) =
0 – otherwise
and possessing an algorithm to compute it. This observation shows that the idea
“something is computable if you can write a a program for it” is naive or, at least,
doesn’t go into enough detail to make the subtleties clear.
In contrast, consider a subroutine that inputs i ∈ N and outputs π ’s i -th decimal
place. With it, we can write a program that takes in n and looks through π for n
consecutive 9’s by searching the digits. This approach is constructive in that we
are constructing the answer, not just saying that it exists. It is also uniform in the
sense that we could modify it to take other subroutines as input and thus look for
strings of 9’s in other numbers. However, this approach has the disadvantage that
if n 0 is such that for n ≥ n 0 never does π have n consecutive 9’s then this program
will just search forever, without printing 0.
Parametrization Universality says that there is a Turing machine that takes in

inputs e and x and returns the same value as we would get by running Pe on
input x (including not halting, if that machine does not halt). That is, there is a
computable function ϕ : N2 → N such that ϕ(e, x) = ϕ e (x) if ϕ e (x)↓ and ϕ(e, x)↑
if ϕ e (x)↑.
There, the letter e travels from the function’s argument to an index. We now
generalize.
Start with a program that takes two inputs such as this one.
(define (P x y)
(+ x y))
Freeze the first argument, that is, lock x = a for some a ∈ N. The result is a
one-input program. This shows what happens when we freeze x at a = 7.
(define (P_7 y)
(P 7 y))
And here is the result for a = 8.

(define (P_8 y)
(P 8 y))
This is partial application because we are not freezing all of the input variables.
Instead, we are parametrizing the variable x to get a family of functions P 0 , P 1 , . . .
Obviously the programs in the family are related to the starting one. Denoting
the function computed by the starting program P as ψ (x, y) = x + y , partial
application gives a family of programs and functions: ψ 0 (y) = y , ψ 1 (y) = 1 + y ,
ψ 2 (y) = 2 + y , . . . The next result is that from the index of the starting program
or function, and from the values that are frozen, we can effectively compute the
family members.
4.3 Theorem (s-m-n theorem, or Parameter theorem) For every m, n ∈ N there is
a computable total function sm,n : N1+m → N such that for the m + n -ary function
ϕ e (x 0 , ... xm−1 , xm , ... xm+n−1 ), freezing the initial m variables at a 0 , ... am−1 ∈ N
gives an n -ary function equal to ϕ s(e,a0, ...am−1 ) (xm , ... xm+n−1 ).
The function ϕ e (x 0 , ... xm−1 , xm , ... xm+n−1 ) could be partial, that is, it could be
that the Turing machine Pe fails to halt on some inputs x 0 , ... xm−1 , xm , ... xm+n−1 .
Proof We will produce the function s to satisfy three requirements: it must be
effective, it must input an index e and an m -tuple a 0 , ... am−1 , and it must output
the index of a machine P̂ that, when given the input xm , ... xm+n−1 , will return the
value ϕ e (a 0 , ... am−1 , xm , ... xm+n−1 ), or diverge if that function diverges.
The idea is that the machine that computes s will construct the instructions
for P̂ . We can get from the instruction set to the index using Cantor’s encoding, so
with that we will be done.
Below on the left is the flowchart for the machine that computes s and on the
right is the flowchart for P̂ .
Start
Start
Move left a0 + · · · + am−1 + m cells
Read e, a0, ... , am−1
Put a0 , . . ., am−1 on tape
Create instructions for P̂ separated by blanks
Return index of Move I/O head to start of a0

that instruction set
Simulate Pe
End
End
Recall that we are being flexible about the convention for input and output
representations for Turing machines but to be precise in this argument we assume
that input is encoded in unary, that multiple inputs are separated with a single
blank, and that when the machine is started the head should be under the input’s
left-most 1.
With that, we construct the machine P̂ so that the first thing it does is not read
its inputs xm , ... xm+n−1 . Instead, P̂ first moves left and puts a 0 , ... am−1 on the
tape, in unary and separated by blanks, and with a blank between am−1 and xm .
Then, using universality, P̂ simulates Turing machine Pe , and lets it run on that
input list.
In the notation sm,n , the subscript m is the number of inputs being frozen
while n is the number of inputs left free. As the prior example suggests, they can
sometimes be a bother and we usually omit them.
4.4 Example Consider the two-input routine sketched by this flowchart.
Start
Read x , y
( ∗)
Print x · y
End
By Church’s Thesis there is a Turing machine that fills in the sketch, and computes
the function ψ (x, y) = x · y . let that machine have index e . We can use the s-m-n
theorem to freeze the value of x to 0. On the left below is the flowchart sketching
the machine Ps1, 1 (e, 0) . It computes the function ϕ s1, 1 (e, 0) (y) = 0; for example,
ϕ s1, 1 (e, 0) (5) = 0.
Start Start Start
Read y Read y Read y

...
Print 0 · y Print 1 · y Print 2 · y
End End End

Similarly the other two are flowcharts summarizing Ps1, 1 (e, 1) and Ps1, 1 (e, 2) , which
freeze the value of x at 1 and 2. The machine sketched in the center computes
ϕ s1, 1 (e, 1) (y) = y , so for instance ϕ s1, 1 (e, 1) (5) = 5. On the right the machine
computes ϕ s1, 1 (e, 2) (y) = 2y , and an example is ϕ s1, 1 (e, 2) (5) = 10.
In general, this is the flowchart for Ps1, 1 (e,x ) .
Start
Read y
(**)
Print x · y
End
Compare this to the flowchart in (∗) above. The difference is that this machine
does not read x . Rather, as in the three charts above, x is hard-coded into the
program body. That is, Ps1, 1 (e,x ) is a family of Turing machines, the first three of
which are in the prior paragraph. This family is parametrized by x , and the indices
are uniformly computable from e and x , using the function s .
The s-m-n Theorem says that we can hard code the values of parameters into
the machine’s source. But it says more. It also says that the resulting family of
functions is uniformly computable; there is a single computable function, s , going
from the index e and the parameter value x to the index of the result in (∗∗). So,
the s-m-n Theorem is about uniformity.
II.4 Exercises
4.5 Your friend asks, “What can a Universal Turing machine do that a regular
Turing machine cannot?” Help them out.
✓ 4.6 Has anyone ever built a Universal Turing machine, or a machine equivalent to
one?
4.7 Can a Universal Turing machine simulate another Universal Turing machine,
or for that matter can it simulate itself?
✓ 4.8 Your class has a jerk who keeps throwing out pronouncements that the prof
must patiently correct. This time its, “Universal Turing machines make no sense.
How could a machine simulate another machine that has more states? Obviously
it can’t.” Clue this chucklehead in.
4.9 Is there more than one Universal Turing machine?
4.10 What happens if we feed a Universal Turing machine to itself? For instance,
where the index e 0 is such that ϕ e0 (e, x) = ϕ e (x) for all x , what is the value of
ϕ e0 (e 0 , 5)?
4.11 Consider the function f (x 0 , x 1 ) = 3x 0 + x 0 · x 1 .
(a) Freeze x 0 to have the value 4. What is the resulting one-variable function?
(b) Freeze x 0 at 5. What is the resulting one-variable function?
(c) Freeze x 1 to be 0. What is the resulting function?
4.12 Consider f (x 0 , x 1 , x 2 ) = x 0 + 2x 1 + 3x 2 .
(a) Freeze x 0 to have the value 1. What is the resulting two-variable function?
(b) What two-variable function results from fixing x 0 to be 2?
(c) Let a be a natural number. What two-variable function results from fixing x 0
to be a ?
(d) Freeze x 0 at 5 and x 1 at 3. What is the resulting one-variable function?
(e) What one-variable function results from fixing x 0 to be a and x 1 to be b , for
a, b ∈ N?
✓ 4.13 Suppose that the Turing machine sketched by this flowchart has index e .
Start
Read x 0 , x 1
Print x 0 + x 1
End
(a) Describe the function ϕ s1, 1 (e, 1) .

(b) What are the values of ϕ s1, 1 (e, 1) (0), ϕ s1, 1 (e, 1) (1), and ϕ s1, 1 (e, 1) (2)?
(c) Describe the function ϕ s1, 1 (e, 0) .
(d) What are the values of ϕ s1, 1 (e, 0) (0), ϕ s1, 1 (e, 0) (1), and ϕ s1, 0 e, 0 (2)?
4.14 Let the Turing machine sketched by this flowchart have index e .
Start
Read x 0 , x 1 , x 2
Print x 0 + x 1 · x 2
End
(a) Describe the function ϕ s1, 2 (e, 1) .

(b) Find ϕ s1, 2 (e, 1) (0, 1), ϕ s1, 2 (e, 1) (1, 0), and ϕ s1, 2 (e, 1) (2, 3)
(c) Describe the function ϕ s2, 1 (e, 1, 2) .
(d) Find ϕ s2, 1 (e, 1, 2) (0), ϕ s2, 1 (e, 1, 2) (1), and ϕ s2, 1 (e, 1, 2) (2).
Start
Read x 0 , x 1
N Y
Print x 1 x 0 > 1? Infinite loop
End
(a) Describe ϕ s1, 1 (e, 0) . (b) What is ϕ s1, 1 (e, 0) (5)? (c) Describe ϕ s1, 1 (e, 1) . (d) What
is ϕ s1, 1 (e, 1) (5)? (e) Describe ϕ s1, 1 (e, 2) . (f) What is ϕ s1, 1 (e, 2) (5)?
Start
Read x 0 , x 1 , y
Y x 0 even? N
Print x 1 · y Print x 1 + y
End
We will describe the family of functions parametrized by the arguments x 0 and x 1 .

(a) Use Theorem 4.3, the s-m-n theorem, to fix x 0 = 0 and x 1 = 3. Describe
ϕ s(e, 0, 3) . What is ϕ s(e, 0, 3) (5)?
(b) Use the s-m-n theorem to fix x 0 = 1. Describe ϕ s(e, 1, 3) . What is ϕ s(e, 1, 3) (5)?
(c) Describe ϕ s(e,a,b) .
✓ 4.17 (a) Argue that the function ψ : N2 → N given by ψ (x, y) = 3x + y is
computable. (b) Show that there is a family of functions ψn parameterized by n
such that ψn (y) = 3n + y . Hint: take e ∈ N such that ψ (x, y) = ϕ e (x, y), and
apply the s-m-n theorem.
✓ 4.18 Show that there is a total computable function д : N → N such that Turing
machine Pд(n) computes the function y 7→ y + n 2 .
✓ 4.19 Show that there is a total computable function д : N2 → N such that Turing
machine Pд(m,b) computes x 7→ mx + b .
✓ 4.20 Suppose that e 0 is such that ϕ e0 is a Universal Turing machine, in that if
given the input cantor(e, x) then it returns the same value as ϕ e (x). Suppose also
that e 1 is such that ϕ e1 (x) = 4x for all x ∈ N. Determine, if possible, the value
of these. If it is not possible, briefly describe why not. (a) ϕ e0 (cantor(e 1 , 5))
(b) ϕ e1 (cantor(e 0 , 5)) (c) ϕ e0 (cantor(e 0 , cantor(e 1 , 5)))
4.21 Suppose that e 0 is such that if ϕ e0 (cantor(e, x)) returns the same value
as ϕ e (x) (or does not converge if that function does not converge). Sup-
pose also that ϕ e1 (x) = x + 2 and that ϕ e2 (x) = x 2 , for all x ∈ N. If
possible determine the value of these (if it is not possible, say why not).
(a) ϕ e0 (cantor(e 1 , 4)) (b) ϕ e0 (cantor(4, e 1 )) (c) ϕ e1 (cantor(e 0 , cantor(e 2 , 3)))
(d) ϕ e0 (cantor e 0 , cantor(e 0 , 4))
Section
II.5 The Halting problem
We’ve showed that there are functions that are not mechanically computable. We
gave a counting argument, that there are countably many Turing machines but
uncountably many functions and so there are functions with no associated machine.
Section 5. The Halting problem 93
While knowing what’s true is great, even better is to exhibit a specific function that
is unsolvable. We will now do that.
Definition The natural approach to producing such a function is to go through
Cantor’s Theorem and effectivize it, to turn the proof into a construction.
Here is an illustrative table adapted from the discussion of Cantor’s Theorem
on page 77. Imagine that this table’s rows are the computable functions and its
columns are the inputs. For instance, this table lists ϕ 2 (3) = 5.
Input x Start
0 1 2 3 4 5 6 ...
ϕ0 3 1 2 7 7 0 4 ... Read e
ϕ1 0 5 0 0 0 0 0 ...
ϕ2 1 4 1 5 9 2 6 ... Compute table entry
ϕ3 9 1 9 1 9 1 9 ... for index e , input e
ϕ4 1 0 1 0 0 1 0 ...
Print result + 1
ϕ5 6 2 5 5 4 1 8 ...
.. ..
. . End
Diagonalizing means considering the machine on the right. It moves down the
array’s diagonal, changing the 3, changing the 5, etc., so that when the input is 0
then the output is 4, when the input is 1 then the output is 6, etc. It appears that
in the usual diagonalization way, this machine’s output does not equal any of the
table’s rows.
However, that’s a puzzle, an apparent contradiction. The flowchart outlines an
effective procedure — we can implement this by using a Universal Turing machine,
so its output should be one of the rows.
What’s the puzzle’s resolution? The program’s first, second, fourth, and fifth
boxes are trivial, so the issue must involve getting through the box in the middle.
The answer is that there must be an e ∈ N so that ϕ e (e)↑, and for that index the
Turing machine sketched in the flowchart never gets through the middle box and
never prints the apparently contradictory output. That is, to avoid a contradiction
the above table must contain ↑’s.
So we have an important insight: the fact that some computations fail to halt
on some inputs is central to the nature of computation.

5.1 Definition K = {e ∈ N ϕ e (e)↓, that is, Turing machine Pe halts on input e }
5.2 Problem (Halting problem) Given e ∈ N, determine whether ϕ e (e)↓, that is,
whether Turing machine Pe halts on input e .
For any e ∈ N, obviously either ϕ e (e) ↓ or ϕ e (e) ↑. The Halting problem is
whether we can mechanically settle which numbers are members of the set K .
5.3 Theorem (Unsolvability of the Halting problem) The Halting problem is

mechanically unsolvable.
Proof Assume otherwise, that there is a Turing machine whose behavior is this.
(
1 – if ϕ e (e)↓
K(e) = halt_decider(e) =
0 – if ϕ e (e)↑
Then the function below is also mechanically computable. The flowchart illustrates
how f is constructed; it uses the above function in its decision box.
Start
( Read e
42 – if ϕ e (e)↑
f (e) = Y
K (e) = 0? N
Infinite loop
↑ – if ϕ e (e)↓ Print 42
End
(In f ’s top case the output value doesn’t matter, all that matters is that f converges.)
Since this function is mechanically computable, it has an index. Let that index
be ê , so that f (x) = ϕ ê (x) for all inputs x .
Consider f (ê) = ϕ ê (ê), that is, feed the machine the input ê . If it diverges then
the first clause in the definition of f means that f (ê)↓, which contradicts divergence.
If it converges then f ’s second clause means that f (ê)↑, also impossible. Since
assuming that halt_decider is mechanically computable leads to a contradiction,
that function is not mechanically computable.
With Church’s Thesis in mind will say that a problem is unsolvable if it is
mechanically unsolvable, if no Turing machine computes that task. If the problem
is to compute the answer to ‘yes’ or ‘no’ questions, so it is the problem of determining
membership in a set, then we will say that the set is undecidable.
Discussion The fact that the Halting Problem is unsolvable does not mean that
we cannot tell if any program halts. This program obviously adds 1 to its input
and then halts for every input.
#;1> (define (prompt/read prompt)
---> (display prompt)
---> (read-line))
#;2>
#;2> (+ 1
---> (string->number (prompt/read "Enter n")))
Enter n---> 4
5
Nor does the unsolvability of the Halting problem mean that we cannot tell if a
program does not halt. Consider this one.
#;1> (define (f x)
---> (+ 1 (f x)))
This obviously does not halt; once started, it just keeps going.
#;2> (f 0)
^C
Call history:
<eval> [f] (+ 1 (f x))

<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
<eval> [f] (f x)
<eval> [f] (+ 1 (f x))
readline.scm:392: print-call-chain <--
*** user interrupt ***
Instead, the unsolvability of the Halting Problem says that there is no single
program that, for all e , correctly decides in a finite time whether Pe halts on
input e .
That has the qualifier ‘finite time’ because we could perfectly well write source
code to read an input e , simulate Pe on input e , and then print some nominal
output such as 42, but if Pe on input e fails to halt then we would not get the
output in a finite time.
The ‘single program’ qualifier is there because for any index e , either Pe halts
on e or else it does not. That is, for any e one of these two programs gives the right
answer.
read e read e
print 0 print 1
Of course, guessing which one applies is not what we had in mind. We had in mind
a program, an effective and uniform procedure, that inputs e and outputs the right
answer.
Thus, the unsolvability of the Halting Problem is about the non-existence of a
single program that works across all indices. It speaks to uniformity, or rather, the
impossibility of uniformity.
Significance A beginning programming class could leave the impression that if

a program doesn’t halt then it just has a bug, something fixable. So the Halting
problem could seem to be not very interesting. That impression is wrong.
Imagine a utility for programmers, always_halt, to patch non-halting pro-
grams. After writing a program P we run it through the utility, which modifies
the source so that for any input where P fails to halt, the modified program will
halt (and output 0), but the utility does not change any outputs where P does
halt. That would give rise to a list of total functions like the one on page 93, and
diagonalization gives a contradiction.
Thus, halting, or rather failure to halt, is inherent in the nature of computation.
In any general computational scheme there must be some computations that halt
on all inputs, some that halt on no inputs, and some that halt on some inputs but
not on others.
That alone is enough to justify study of the Halting problem but we will give a
second reason. If halt_decider were a computable function then we could solve
many problems that we currently don’t know how to solve.
For instance, a natural number is perfect if it is the sum of its proper positive
divisors. Thus 6 is perfect because 6 = 1 + 2 + 3. Similarly, 28 = 1 + 2 + 4 + 7 + 14
is perfect. These have been studied since Euclid and today we understand the
form of all even perfect numbers. But no one knows if there are any odd perfect
numbers.
With a solution to the Halting Problem we could
Start
settle this question. The program shown here
Read x searches for an odd perfect number.† If it finds
one then it halts. If not then it does not halt. So if
i =0
we had a halt_decider and we gave it the index of
this program, then that would settle whether there
i = i + 1 No 2i + 1 perfect? exists any odd perfect numbers. There are many
open questions involving an unbounded search that
Yes
would fall to this approach. (Just to name one
Print 1
more: no one knows if there is any n > 4 such that
n
End 2(2 ) + 1 is prime. We could answer the question by
writing P to search for such an n , and give the index
of P to halt_decider.)
Before moving on, note that unbounded search is a theme in our studies. We
have seen it earlier, in defining general recursion. And, it is at the heart of the
Halting problem since the natural way to test whether ϕ e (e)↓ is to run a brute force
computation, an unbounded search for a stage at which the computation halts.
General unsolvability We have named one job, the Halting problem, that no
mechanical computer can do. With that one in hand, we are able to show that a
wide class of jobs cannot be done. That is, the Halting problem is part of a larger
unsolvability phenomenon.
5.4 Example Consider the following problem: we want to know if a given Turing
machine halts on the input 3. That is, given x , does ϕ x (3)↓? Of course, the nature
of the material we are studying is that we want to answer this question with a
†
This program takes an input x but ignores it; in this book we prefer to have the machines that we use
take an input and give an output.
computation.
(
1 – if ϕ x (3)↓
halts_on_three_decider(x) =
0 – otherwise
We will show that if halts_on_three_decider were a computable function then

we could compute the solution of the Halting problem. That’s impossible, so we
will then know that halts_on_three_decider is also not effectively computable.
The plan is to create a scheme where being able to determine whether a
machine halts on 3 allows us to settle Halting problem questions. Consider the
machine sketched on the right below. It reads the input y , ignores it, and gives
a nominal output. The action is in the middle box, where it simulates running
Px on input x . If that simulation halts then the machine as a whole halts. If that
simulation does not halt then this simulation as a whole does not halt. Thus, the
machine on the right halts on input y = 3 if and only if Px halts on x . So, using
this flowchart, we can leverage being able to answer questions about halting on 3
to answer questions about whether Px halts on x .
Start Start
Read x, y Read y
Run Px on x Run Px on x
Print 42 Print 42
End End
With that motivation we are ready for the argument. For contradiction,
assume that halts_on_three_decider is mechanically computable. Consider this
function. (
42 – if ϕ x (x)↓
ψ (x, y) =
↑ – otherwise
Observe that ψ is mechanically computable, because it is computed by the flowchart

above on the left. So by Church’s Thesis there is a Turing machine whose input-
output behavior is ψ . That Turing machine has some index, e , meaning that
ψ = ϕe .
Use the s-m-n theorem to parametrize x , giving ϕ s(e,x ) . This is a family of
functions, one for x = 0, one for x = 1, etc. Below is the family of associated
machines. Notice in particular the one on the right. It is a repeat of the one on the
right above. Notice also that it has a ‘Read y ’ but no ‘Read x ’. For each of these
machines, the value used in the middle box is hard-coded into its source.
Start Start Start
Read y Read y Read y
Run P0 on 0 Run P1 on 1 ... Run Px on x ...

Print 42 Print 42 Print 42
End End End
As planned, for all x ∈ N we have this.
ϕ x (x)↓ if and only if halts_on_three_decider( s(e, x) ) = 1 (*)
The function s is computable so the supposition that halts_on_three_decider

is also computable gives that the right side is effectively computable, which
in turn implies that the Halting problem is effectively solvable, which it isn’t.
This contradiction means that halts_on_three_decider is not mechanically
computable.
5.5 Remark We emphasize that s(e, x) gives a family of infinitely many machines and
computable functions to make the point that while e is constant (it is the index of
the machine that computes ψ ), x varies. We need that for (∗).
5.6 Example We will show that this function is not mechanically computable: given x ,
determine whether Px outputs a 7 for any input.
(
1 – if ϕ x (y) = 7 for some y
outputs_seven_decider(x) =
0 – otherwise
Assume otherwise, that outputs_seven_decider is computable. Consider this.

(
7 – if ϕ x (x)↓
ψ (x, y) =
↑ – otherwise
The flowchart on the left below outlines how to compute ψ . Because it is intuitively
mechanically computable, Church’s Thesis says that there is a Turing machine
whose input-output behavior is ψ . That Turing machine has an index, e , so
that ψ = ϕ e .
Start Start
Read x, y Read y
Print 7 Print 7
End End
The s-m-n theorem gives a family of functions ϕ s(e,x ) parametrized by x . On the

right is a flowchart for an associated machine. As in the prior example, note that
this is a family of infinitely many different machines, one with x = 0, one with
x = 1, etc. Each machine in the family has its x hard-coded in its source.
Then, ϕ x (x)↓ if and only if outputs_seven_decider(s(e, x)) = 1. If, as we sup-
posed, outputs_seven_decider is computable then the composition of two com-
putable functions outputs_seven_decider ◦s is computable, so the Halting prob-
lem is computably solvable, which is not right. Therefore outputs_seven_decider
is not computable.
5.7 Example We next show that this problem is unsolvable: given x , determine
whether ϕ x doubles its input, that is, whether ϕ x (y) = 2y for all y .
We want to show that this function is not mechanically computable.
(
1 – if ϕ e (y) = 2y for all y
doubler_decider(e) =
0 – otherwise
Assume that it is computable. This function

(
2y – if ϕ x (x)↓
ψ (x, y) =
↑ – otherwise
is intuitively mechanically computable by the flowchart on the left below. So by

Church’s Thesis there is a Turing machine that computes it. It has some index, e .
Start Start
Read x, y Read y
Print 2y Print 2y
End End
Apply the s-m-n theorem to get a family of functions ϕ s(e,x ) parametrized by x . The
machine Ps(e,x ) is sketched by the flowchart on the right. Then ϕ x (x)↓ if and only if
outputs_seven_decider(s(e, x)) = 1. So the supposition that doubler_decider
is computable gives that the Halting problem is computable, which is wrong.
These examples show the Halting problem serving as a touchstone for unsolv-
ability. Often we show something is unsolvable by showing that if we could solve it
then we could solve the Halting problem. We say that the Halting problem reduces
to the given problem.†
Before the next subsection, three comments. First, to reiterate, saying that a
†
We use ‘reduces to’ in the same sense that we would in saying, “finding the roots of a polynomial
reduces to factoring that polynomial,” meaning that if we could factor then we could find the roots.
problem is unsolvable means that it is unsolvable by a mechanism, that there is

no Turing machine that can compute the solution to the problem. There can be
functions that solve it but no computable function does.
The second is that we note that Turing and Church, independently, used the
reasoning of this section to settle the Entscheidungsproblem. They showed that it is
an unsolvable problem.
The final point is that there are solvable problems that start with “Given an
index.” One is: given e , decide if one instruction in Pe is q 0 BLq 1 . But the unsolvable
problems above above are about the behavior of the computed function — each is
about ϕ e rather than Pe . This echoes the opening of the first chapter, that we are
most interested in the input-output behavior of the machines.
II.5 Exercises
5.8 Someone in your class says, “I don’t get the point of the Halting problem; it
just seems totally not relevant. If you want programs to halt then just watch them
and when they exceed a set number of cycles, send a kill signal.” Give them a
clue.
5.9 True or false: there is no function that solves the Halting Problem; there is no
f such that f (e) = 1 if ϕ e (e)↓ and f (e) = 0 if ϕ e (e)↑.
✓ 5.10 Your study partner asks you, “The Turing machine P = {q 0 BBq 0 , q 0 11q 0 }
fails to halt for all inputs, that’s obvious. But these unsolvability results say that I
cannot know that. Why not?” Explain what they are missing.
5.11 You have a person in class who a lot of time talks before thinking. They say,
“Hey, I can solve the Halting problem. For any given Turing machine there are a
finite number of states, right? And the tape alphabet is finite, right? So there
are only finitely many state and character pairs that can happen. As the machine
runs, just monitor it for a repeat pair. If we see one then declare that the machine
is looping.” What’s missing?
5.12 This is the Hailstone function.
 42 – if n = 0 or n = 1




h(x) = h(n/2) – if n is even
h(3n + 1) – else



The Collatz conjecture is that f halts on all x ∈ N. No one knows if it is true. Is it
an unsolvable problem to determine whether f halts on all input?
✓ 5.13 True or false?
(a) The problem of determining, given e , whether ϕ e (3)↓ is unsolvable because
no function halts_on_three_decider exists.
(b) The existence of unsolvable problems indicates weaknesses in the models of
computation, and we need stronger models.
5.14 A set is computable if its characteristic function is a computable function.

Consider the set consisting of 1 if Mallory reached the summit of Everest, and
otherwise consisting of 0. Is that set computable?
✓ 5.15 Describe the family of computable functions that you get by using the s-m-n
Theorem to parametrize x in each function. Also give flowcharts sketching
the associated machines for x =(0, x = 1, and x = 2. (a) f (x, y) = 3x + y
x – if x is odd
(b) f (x, y) = xy 2 (c) f (x, y) =
0 – otherwise
5.16 Show that each of these is a solvable problem.
(a) Given an index x , determine whether Turing machine Px runs for at least 42
steps on input 3.
(b) Given an index x , determine whether Turing machine Px runs for at least 42
steps on input x .
(c) Given an index x , determine whether Turing machine Px runs for at least x
steps on input x .
The answer to all three is the same. Running a Turing machine for a fixed number
of steps is not a problem (and in the third item, the number of steps is fixed for a
given input).
For each of the problems from Exercise 5.17 to Exercise 5.23, show that it is unsolvable
by applying reducing the Halting problem to it.
✓ 5.17 Given an index x , determine if ϕ x is total, that is, if it converges on every
input.
✓ 5.18 Given an index x , decide if the Turing machine Px squares its input. That is,
decide if ϕ x maps y 7→ y 2 .
5.19 Given x , determine if the function ϕ x returns the same value on two
consecutive inputs, so that ϕ x (y) = ϕ x (y + 1) for some y ∈ N.
5.20 Given an index x , determine whether ϕ x fails to converge on input 5.
5.21 Given an index, determine if the computable function with that index fails to
converge on all odd numbers.
5.22 Given an index e , decide if the function ϕ e computed by machine Pe is the
function x 7→ x + 1.
5.23 Given an index e , decide if the function ϕ e fails to converge on both inputs x
and 2x , for some x .
5.24 Fix integers a, b, c ∈ Z. Consider the problem of determining, given
cantor(x, y), whether ax + by = c . Is that problem solvable or unsolvable?

5.25 In some ways a more natural set than K = {x ∈ N ϕ x (x)↓} is K 0 =
{ ⟨e, x⟩ ∈ N2 ϕ e (x)↓}. Use the fact that K is not computable to prove that K 0 is
not computable.
5.26 As stated, the Halting problem of determining membership in the set K =
{x ϕ x (x)↓} cuts across all Turing machines.
(a) Produce a single Turing

machine, P , such that the question of determining
membership in {y ϕ(y)↓} is undecidable.
(b) Fix a number y . Show that the question of whether P halts on y is decidable.
✓ 5.27 For each, if it is mechanically solvable then sketch a program to solve it. If it
is unsolvable then show that.
(a) Given e , determine the number of states in Pe .
(b) Given e , determine whether Pe halts when the input is the empty string.
(c) Given e , determine if Pe halts on input e within one hundred steps.
5.28 Is K infinite?
5.29 Show that for any Turing machine, the problem of determining whether it
halts on all input is solvable.
5.30 Goldbach’s conjecture is that that every even natural number greater than
two is the sum of two prime numbers. It is one of the oldest and best-known
unsolved problems in mathematics. Show that if we could solve the Halting
problem then we could settle Goldbach’s conjecture.
5.31 If we could solve the Halting problem, then could we solve all problems?
5.32 Show that most problems are unsolvable by showing that there are uncount-
ably many functions f : N → N that are not computed by any Turing machine,
while the number of function that are computable is countable.
5.33 Give an example of a computable function that is total, meaning that it
converges on all inputs, but whose range is not computable.
5.34 A set of bit strings is a decidable language if its characteristic function is
computable. Show that the collection of decidable languages is closed under
these operations.
(a) The union of two decidable languages is a decidable language.
(b) The intersection of two decidable languages is a decidable language
(c) The complement of a decidable language is a decidable language.
Section
II.6 Rice’s Theorem
The intuition from the unsolvability examples is that we cannot mechanically
analyze the behavior of a mechanism. These two definitions make precise the word
‘behavior’.
6.1 Definition Two computable functions have the same behavior ϕ e ≃ ϕ ê if they
converge on the same inputs x ∈ N and when they do converge, they have the
same outputs.†
†
Strictly speaking we don’t need the symbol ≃. A function is a set of ordered pairs, So if ϕ e (0) ↓
while ϕ e (1)↑, then the set ϕ e contains a pair starting with 0 but no pair with first entry 1. Thus for
partial functions, if they converge on the same inputs and when they do converge they have the same
Section 6. Rice’s Theorem 103
6.2 Definition A set I of natural numbers is an index set‡ when for all indices e, ê ∈
N, if e ∈ I and ϕ e ≃ ϕ ê then also ê ∈ I .

6.3 Example The set I = {e ∈ N ϕ e (x) = 2x for all x } is an index set. Suppose that
e ∈ I and that ê ∈ N is such that ϕ e ≃ ϕ ê . Then the behavior of ϕ ê is also to double
its input: ϕ ê (x) = 2x for all x . Thus ê ∈ I also.
6.4 Example The set J = {e ∈ N ϕ e (x) = 3x for all x , or ϕ e (x) = x 3 for all x } is

an index set. For, suppose that e ∈ J and that ϕ e ≃ ϕ ê where ê ∈ N. Because

e ∈ J , either ϕ e (x) = 3x for all x or ϕ e (x) = x 3 for all x . Because ϕ e ≃ ϕ ê we
know that either ϕ e (x) = 3x for all x or ϕ e (x) = x 3 for all x , and thus ê ∈ J .

6.5 Example The set {e ∈ N Pe contains an instruction starting with q 10 } is not an
index set. We can easily produce two Turing machines having the same behavior
where one machine contains such an instruction while the other does not.
6.6 Theorem (Rice’s theorem) Every index set that is not trivial, that is not empty
and not all of N, is not computable.
Proof Let I be an index set. Choose an e 0 ∈ N so that ϕ e0 (y) ↑ for all y . Then
either e 0 ∈ I or e 0 < I . We shall show that in the second case I is not computable.
The first case is similar, and is Exercise 6.29.
So assume e 0 < I . Since I is not empty there is an index e 1 ∈ I . Because I is
an index set, ϕ e0 ; ϕ e1 . Thus there is an input y such that ϕ e1 (y)↓.
Consider the flowchart on the left below. Note that e 1 is not an input, it is
hard-coded into the source. By Church’s Thesis there is a Turing machine with that
behavior, let it be Pe . Apply the s-m-n theorem to parametrize x , resulting in the
uniformly computable family of functions ϕ s(e,x ) , whose computation is outlined
on the right.
Start Start
Read x, y Read y
Run Pe1 on y Run Pe1 on y
End End
We’ve constructed the machine sketched on the right so that if ϕ x (x) ↑ then
ϕ s(e,x ) ≃ ϕ e0 and thus s(e, x) < I . Further, if ϕ x (x)↓ then ϕ s(e,x ) ≃ ϕ e1 and thus
s(e, x) ∈ I . Therefore if I were mechanically computable, so that we could
effectively check whether s(e, x) ∈ I , then we could solve the Halting problem.
outputs, then we can simply say that the two are equal, ϕ = ϕˆ, as sets. We use ≃ as a reminder that
‡
the functions may be partial. It is called an index set because it is a set of indices.
6.7 Example We use Rice’s Theorem to show that this problem is unsolvable: given e ,
decide if ϕ e (3)↓.
Consider the set I = {e ∈ N ϕ e (3)↓}. To apply Rice’s Theorem we must show
that this set is not empty, that it is not all of N, and that it is an index set. The
set I is not empty because we can write a Turing machine that acts as the identity
function ϕ(x) = x , and if e 0 is the index of that Turing Machine then e 0 ∈ I . The
set I is not equal to N because, where e 1 is the index of a Turing machine that
never halts, we have that e 1 < I .
To finish we will verify that I is an index set. Assume that e ∈ I and let ê ∈ N
be such that ϕ e ≃ ϕ ê . Then e ∈ I gives that ϕ e (3)↓ and ϕ e ≃ ϕ ê gives that ϕ ê (3)↓
also. Hence ê ∈ I , and I is an index set.
6.8 Example We can use Rice’s Theorem to show that this problem is unsolvable:
given e , decide if ϕ e (x) = 7 for some
x.
We will show that I = {e ∈ N ϕ e (x) = 7 for some x } is a nontrivial index set.
This set is not empty because, where e 0 is the index of a Turing Machine that
acts as the identity function ϕ e0 (x) = x , we have that e 0 ∈ I . It is not all of N
because, where e 1 is the index of a Turing Machine that never halts, ei < I . So I
is nontrivial.
To showing that I is an index set assume that e ∈ I and let ê ∈ N be such that
ϕ e ≃ ϕ ê . By the first assumption, ϕ e (x 0 ) = 7 for some input x 0 . By the second, the
same input gives ϕ ê (x 0 ) = 7. Consequently, ê ∈ I .
6.9 Example This problem is unsolvable: determine, given an index e , whether ϕ e is
this. (
4 – if x is prime
f (x) =
x + 1 – otherwise

Let I = { j ∈ N ϕ j = f }. The set I is not empty because we can write a program
with this behavior, and so by Church’s Thesis there is a Turing machine with this
behavior, and its index is a member of I . Also, I , N because there is a Turing
machine that fails to halt on any input, and its index is not a member of I .
To finish, we argue that I is an index set. So suppose that e ∈ I and that
ϕ e ≃ ϕ ê . Because e ∈ I we have that ϕ e (x) = f (x) for all inputs x . Because ϕ e ≃ ϕ ê
we have that ϕ e (x) = ϕ ê (x) for all x , and so ê is also a member of I . Hence, I is
an index set.
We close by reflecting on the significance of Rice’s Theorem.
This result addresses the properties of computable functions. It does not speak
to properties of machines that aren’t about input-output behaviors. For example,
the set of functions computed by C programs whose first character is ‘k’ is not an
index set. This brings us back to the declaration in the first paragraph of the first
chapter that we are more interested in what the machines do than in the details of
their internal construction.
At this chapter’s start we saw that unsolvable problems exist, although we used
Section 6. Rice’s Theorem 105
a counting argument that did not give us natural examples. With the Halting
problem we saw that there are interesting unsolvable problems. Here the definition
of index set gave us a natural way to encapsulate a behavior of interest, and Rice’s
Theorem says that every nontrivial index set is unsolvable. So we’ve gone from
taking unsolvable problems as exotic, to taking them as things that genuinely do
come up, to taking them as occurring everywhere.
Of course, that’s an overstatement; we’ve all seen and written real-world
programs with interesting behaviors. Nonetheless, Rice’s Theorem is especially
significant for understanding what can be done mechanically.
II.6 Exercises
6.10 Your friend says, “According to Rice’s Theorem, everything is impossible.
Every property of a computer program is non-computable. But I do this supposedly
impossible stuff all the time!” Set them straight.

6.11 Is I = {e Pe runs for at least 100 steps on input 5 } an index set?
For each of the problems from Exercise 6.12 to Exercise 6.18, show that it is unsolv-
able by applying Rice’s theorem. (These repeat the problems from Exercise 5.17 to
Exercise 5.23.)
✓ 6.12 Given an index x , determine if ϕ x is total, that is, if it converges on every
input.
✓ 6.13 Given an index x , decide if the Turing machine Px squares its input. That is,
decide if ϕ x maps y 7→ y 2 .
6.14 Given x , determine if the function ϕ x returns the same value on two
consecutive inputs, so that ϕ x (y) = ϕ x (y + 1) for some y ∈ N.
6.15 Given an index x , determine whether ϕ x fails to converge on input 5.
6.16 Given an index, determine if the computable function with that index fails
to converge on all odd numbers.
6.17 Given an index e , decide if the function ϕ e computed by machine Pe is
x 7→ x + 1.
6.18 Given an index e , decide if the function ϕ e fails to converge on both inputs x
and 2x , for some x .
✓ 6.19 Show that each of these is an unsolvable problem by applying Rice’s Theorem.
(a) The problem of determining if a function is total, that is, converges on every
input.
(b) The problem of determining if a function is partial, that is, fails to converge
on some input.
✓ 6.20 For each problem, fill in the blanks to prove that it is unsolvable.

We will show that I = {e ∈ N (1) } is a nontrivial index set. Then Rice’s theorem
will give that the problem of determining membership in I is algorithmically unsolvable.
First we argue that I , . The sketch (2) is intuitively computable, so by Church’s
Thesis there is such a Turing machine. That machine’s index is an element of I .
Next we argue that I , N. The sketch (3) is intuitively computable, so by Church’s

Thesis there is such a Turing machine. Its index is not an element of I .
To finish, we show that I is an index set. Suppose that e ∈ I and that ê is such that
ϕ e ≃ ϕ ê . Because e ∈ I , (4) . Because ϕ e ≃ ϕ ê , (5) . Thus, ê ∈ I . Consequently,
I is an index set.
(a) Given e , determine if Turing machine e halts on all inputs x that are multiples
of five.
(b) Given e , decide if Turing machine e ever outputs a seven.
6.21 Define that a Turing machine accepts a set of bit strings L ⊆ B∗ if that machine
inputs bit strings, and it halts on all inputs, and it outputs 1 if and only if the input
is a member of L. Show that each problem is unsolvable, using Rice’s Theorem.
(a) The problem of deciding, given x ∈ N, whether Px accepts an infinite language.
(b) The problem of deciding, given e ∈ N, whether Pe accepts the string 101.
6.22 Show that this problem is mechanically unsolvable: give e , determine if there
is an input x so that ϕ e (x)↓.
6.23 We say that a Turing machine has an unreachable state if for all inputs,
during the course
of the computation the machine never enters that state. Show
that J = {e Pe has an unreachable state } is not an index set.

6.24 Your classmate says, “Here is a problem that is about the behavior of machines
but is also solvable: given x , determine whether Px only halts on an empty input
tape. To solve this problem, give machine Pe an empty input and see whether it
goes on.” Where are they mistaken?

6.25 Give a trivial index set: fill in the blanks I = {e Pe } so that
the set I is empty.
6.26 Show that
each of these is an index set.
(a) {e ∈ N machine Pe halts on at least five inputs }
(b) {e ∈ N the function ϕ e is one-to-one }
(c) {e ∈ N the function ϕ e is either total or else ϕ e (3)↑}
6.27 Index sets can seem abstract. Here is an alternate characterization. The
Padding Lemma on page 74 says that every computable function has infinitely
many indices. Thus, there are infinitely many indices for the doubling function
f (x) = 2x , infinitely many for the function that diverges on all inputs, etc. In the
rectangle below imagine the set of all integers and group them together when
they are indices of equal computable functions. The picture below shows such a
partition. Select a few parts, such as the ones shown shaded. Take their union.
That’s an index set.
···
Section 7. Computably enumerable sets 107
More formally stated, consider the relation ≃ between natural numbers given by
e ≃ ê if ϕ e ≃ ϕ ê . (a) Show that this is an equivalence relation. (b) Describe the
parts, the equivalence classes. (c) Show that each index set is the union of some
of the equivalence classes. Hint: show that if an index set contains one element
of a class then it contains them all.
6.28 Because being an index set is a property of a set, we naturally consider how
it interacts with set operations. (a) Show that the complement of an index set is
also an index set. (b) Show that the collection of index sets is closed under union.
(c) Is it closed under intersection? If so prove that and if not then give a
counterexample.
6.29 Do the e 0 ∈ I case in the proof of Rice’s Theorem, Theorem 6.6.
Section
II.7 Computably enumerable sets
To attack the Halting problem the natural thing is to start by simulating P0 on
input 0 for a single step. Then simulate P0 on input 0 for a second step and
also simulate P1 on input 1 for one step. After that, run P0 on 0 for a third step,
followed by P1 on 1 for a second step, and then P2 on 2 for one step. This process
cycles among the Pe on e simulations, running each for a step. Eventually you will
see some of these halt and the elements of K will fill in. On computer systems this
interleaving is called time-slicing but in theory discussions it is called dovetailing.
We are listing the elements of K : first f (0), then f (1), . . . (the computable
function f is such that, for instance, f (0) = e where it happens that Pe on input e
is the first of these to halt). Definition 1.12 gives the terminology that a function f
with domain N enumerates its range.
Why won’t this process of gradual enumeration solve the Halting problem? If
e ∈ K then it will tell us that eventually, but if e < K then it will not.
7.1 Definition A set of natural numbers is computable or decidable if its characteris-
tic function is computable, so that we can effectively determine both membership
and non-membership.
7.2 Definition A set of natural numbers is computably enumerable (or recursively

enumerable or c.e. or r.e.) if it is effectively listable, that is, if it is the range of a
total computable function, or it is the empty set.
So a set S is computable if there is a Turing machine that decides membership;
this machine inputs a number x and decides either ‘yes’ or ‘no’ whether x ∈ S . With
computably enumerable sets there is a machine that decides ‘yes’ but that machine
need not address ‘no’. Computably enumerable sets are also called semicomputable
or semidecidable.
This is the natural way to computably produce sets — picture a stream of
numbers ϕ e (0), ϕ e (1), ϕ e (2), . . . gradually filling out the set. (This list may
contain repeats, and the numbers could appear in jumbled up order, that is, not
necessarily in ascending order.)
7.3 Lemma The following are equivalent for a set of natural numbers.
(a) It is computably enumerable, that is, either it is empty or it is the range of a
total computable function.
(b) It is the domain of a partial computable function.
(c) It is the range of a partial computable function.
Proof We will show that the first two are equivalent. That the second and third
are equivalent is Exercise 7.29.
Assume first that S is computably enumerable. If S is empty then it is the domain
of the partial computable function that diverges on all inputs. So instead assume
that S is the range of a total computable f , and we will describe a computable д
with domain S . Given the input x ∈ N, to compute д(x) enumerate f (0), f (1),
. . . and wait for x to appear as one of the values. If x does appear then halt
the computation (and return some nominal value). If x never appears then the
computation never halts.
For the other direction, assume that S is the domain of a partial computable
function д, to show that it is computably enumerable. If S is empty then it
is computably enumerable by definition. Otherwise we must produce a total
computable f whose range is S . If S is finite but not empty, S = {s 0 , ... sm }, then
such a function is given by 0 7→ s 0 , . . . m 7→ sm , and n 7→ s 0 for n > m .
Finally assume that S is infinite. Fix some s 0 ∈ S . Given n ∈ N, run the
computations of each of д(0), д(1), . . . д(n) for n -many steps. Possibly some of
these computations halt. Define f (n) to be the least k where д(k) halts within n
steps, and so that k < { f (0), f (1), ... f (n − 1) }. If no such k exists then define
f (n) = ŝ ; this makes f a total function.
If t < S then д(t) never converges and so t is never enumerated by f . If s ∈ S
then eventually д(s) must converge, in some number of steps, ns . The number s is
then is queued for output by f in the sense that it will be enumerated by f as, at
most, f (ns + s).
Many authors define computably enumerable sets using the second or third
items. Definition 7.2 is more natural but also more technically awkward.

7.4 Definition We = {y ϕ e ↓}
7.5 Lemma (a) If a set is computable then it is computably enumerable.

(b) A set is computable if and only if both it and its complement are computably
enumerable.
Proof First, let S ⊆ N be computable. We will produce an effective enumeration f .
If S is finite, take f (0) = s 0 , f (1) = s 1 , . . . f (n − 1) = sn−1 , and set f (m) ↑ for
m ≥ n . The other case is that S is infinite. For f (0), find the smallest element
of S by testing whether 0 ∈ S , then whether 1 ∈ S , . . . This search is effective
Section 7. Computably enumerable sets 109
because S is computable, and it must halt because S is infinite. Similarly, f (k) will
be the k -th smallest element in S .
As to the second item, first suppose that S is computable. The prior item shows
that it is computably enumerable. The complement of S is also computable because
its characteristic function is 1S c = 1 − 1S . So the prior item shows that S c is also
computably enumerable.
Finally, suppose that both S and S c are computably enumerable. Let S be
enumerated by f and let S c be enumerated by f¯. We must give an effective
procedure to determine whether a given x ∈ N is an element of S . We will
dovetail the two enumerations: first run the computation of f (0) for a step and
the computation of f¯(0) for a step, then run the computations of f (0) and f¯(0)
for a second step, etc. Eventually x will be enumerated into one or the other.
7.6 Corollary The Halting problem set K is computably enumerable. Its complement
K c is not.
Proof The set K is the domain of the function f (x) = ϕ x (x), which is mechanically
computable by Church’s Thesis. If the complement K c were computably enumerable
then Lemma 7.5 would imply that K is computable, but it isn’t.
That result gives one reason to be interested in computably enumerable sets,
namely that the Halting problem
set K falls into
the class of computably enumerable
sets, as do sets such as {e ϕ e (3)↓} and {e there is an x so that ϕ e (x) = 7 }. So
this collection of sets contains lots of interesting members.
Another reason that these sets are interesting is philosophical: with Church’s
Thesis we can think that, in a sense, computable sets are the only sets that we will
ever know, and semidecidable sets are ones that we at least half know.
II.7 Exercises
✓ 7.7 You got a quiz question to define computably enumerable. A friend of yours
says they answered, “A set that can be enumerated by a Turing machine but that
is not computable.” Is that right?
✓ 7.8 Produce a function that enumerates each set, whose range is the given set.
(a) N (b) the even numbers (c) the perfect squares (d) the set { 5, 7, 11 }.
7.9 Produce a function that enumerates each set (a) the prime numbers (b) the
natural numbers whose digits are in non-increasing order (e.g., 531 or 5331 but
not 513).
7.10 One of these two is computable and the other is computably enumerable but
not computable.
Which is which?
(a) {e Pe halts on input 4 in less than twenty steps }

(b) {e Pe halts on input 4 in more than twenty steps }
7.11 Short answer: for each set state whether it is computable, computably
enumerable but not computable, or neither. (a) The set of indices e of Turing
machines that contain an instruction using state q 4 . (b) The set of indices of
Turing machines that halt on input 3. (c) The set of indices of Turing machines
that halt on input 3 in fewer than 100 steps.
✓ 7.12 You read someone online who says, “every countable set S is computably
enumerable because if f : N → N has range S then you have the enumeration S
as f (0), f (1), . . .” Explain why this is wrong.

✓ 7.13 The set A5 = {e ϕ e (5)↓} is clearly not computable. Show that it is
computably enumerable.

7.14 Show that the set {e ϕ e (2) = 4 } is computably enumerable.
7.15 Name a set that has an enumeration but not a computable enumeration.
7.16 Name three sets that are computably enumerable but not computable.

✓ 7.17 Let K 0 = { ⟨e, x⟩ Pe halts on input x }.
(a) Show that it is computably enumerable.
(b) Show that the columns of K 0 , the sets C e = { ⟨e, x⟩ Pe halts on input x }
make up all the computable enumerable sets.
7.18 We know that there are subsets of N that are not computable. Are the
computably enumerable sets the rest of the subsets?

✓ 7.19 Show that the set Tot = {e ϕ e (y)↓ for all y } is not computable and not
computably enumerable. Hint: if this collection is computably enumerable then
we can get a table like the one that starts ??, on Unsolvability.
7.20 Can there be a set such that the problem of determining membership in that
set is unsolvable, and also the set is computably enumerable?
7.21 (a) Prove that every finite set is computably enumerable. (b) Sketch a
program that takes as input a finite set and returns a function that enumerates
the set.
7.22 Prove that every infinite computably enumerable set has an infinite com-
putable subset.
7.23 Let f be a partial computable function that enumerates the infinite set R ⊆ N.
Produce a total computable function that enumerates R .
7.24 A set is enumerable in increasing order if there is a computable function f
that is increasing: n < m implies f (n) < f (m), and whose range is the set. Prove
that an infinite set S is computable if and only if it is computably enumerable in
increasing order.
7.25 A set is computably enumerable without repetition if it is the range of a
computable function that is one-to-one. Prove that a set is computably enumerable
and infinite if and only if it is computably enumerable without repetition.
7.26 A set is co-computably enumerable if its complement is computably enumer-
able. Produce a set that is neither computably enumerable nor co-computably
enumerable.
Section 8. Oracles 111
7.27 Computability is a property of sets so we can consider its interaction with set
operations. (a) Must a subset of a computable set be computable? (b) Must the
union of two computable sets be computable? (c) The intersection? (d) The
complement?
7.28 Computable enumerability is a property of sets so we can consider its
interaction with set operations. (a) Must the union of two computably enumerable
sets be computably enumerable? (b) The intersection? (c) The complement?
7.29 Finish the proof of Lemma 7.3 by showing that the second and third items
are equivalent.
Section
II.8 Oracles
The problem of deciding whether a machine halts is so hard that it is

unsolvable. It this the absolutely hardest problem or are there ones that
are even harder?
What does it mean to say that one problem is harder than another?
We have compared problem hardness already, for instance when we
considered the problem of whether a Turing machine halts on input 3.
There we proved that if we could solve the halts-on-3 problem then we
could solve the Halting problem. That is, we proved that halts-on-3 is
at least as hard as the Halting problem. So, the idea is that one problem
is harder than a second if solving the first would also give us a solution
to the second.†
Under Church’s Thesis we interpret the unsolvability of the Halting Priestess of Del-
problem to say that no mechanism can answer all questions about phi (Collier
membership in K . So if we want to answer questions about sets that 1891)
are harder than K then we need the answers to be supplied in some
way that won’t be a physically-realizable discrete and deterministic mechanism.
Consequently, we posit a device, an oracle, that we attach to the Turing machine
box and that acts as the characteristic function of a set. For example, to see what
could be computed if we could solve the Halting problem we can attach a K -oracle
that answers, “Is x ∈ K ?” This oracle is a black box, meaning that we can’t open it
to see how it works.‡
†
We can instead think that the first problem is more general than the second. For instance, the problem
of inputting a natural number and outputting its prime factors is harder than the problem of inputting
a natural and determining if it is divisible by seven. Clearly if we could solve the first then we could
solve the second. ‡ Opening it would let out the magic smoke.
We could formally define computation with an oracle X ⊆ N by extending the

definition of Turing machines. But we will instead use Church’s Thesis. Imagine
enhancing a programming language by introducing a Boolean function oracle.
(if (oracle(x))
... )
When x ∈ X then this takes one branch, while if x < X then it takes the other.
Most of what we have already developed about machines carries over. For
instance, program are strings, each program has an index, and the index is source-
equivalent, so that from an index we can compute the program source and from a
source we can find the index.
In the setup above, the program code does not change if we change the oracle —
if we unplug the X oracle box and replace it with a Y oracle box then the white box
is unchanged. Of course, the values returned by oracle(x) may change, resulting
in changes to the outcome of running the program with the oracle, the two-box
system. But the enhanced Turing machine stays the same. Thus, to specify a
relative computation, in addition to specifying which program we are using and
which inputs, we must also specify the oracle set. This explains the notations for
the oracle Turing machine, PeX , and for the outcome of the function computed
relative to an oracle, ϕ eX (x).
8.1 Definition If a function computed from X is the characteristic function of the
set S then we say that S is X -computable, or that S is Turing reducible to X or
that S reduces to X , denoted S ≤T X .
That is, S ≤T X if and only if ϕ eX = 1S for some e ∈ N. Think of the set S as
being a easier problem, or at least no harder, than X . For instance, where E is the
set of even numbers then E ≤T K .
8.2 Remark The terminology ‘S reduces to X ’ can at first seem reversed. The idea is
that we can solve problem S by using a solution to X . This phrase also appears in
other areas of Mathematics. For instance, in Calculus we may say that finding the
area under a polynomial curve reduces to the problem of antidifferentiation.
8.3 Theorem (a) A set is computable if and only if it is computable relative to the
empty set, or relative to any computable set.
(b) (Reflexivity) Every set is computable from itself, A ≤T A.
(c) (Transitivity) If A ≤T B and B ≤T C then A ≤T C .

Proof For the first, a set is computable if its characteristic function is computable.
If the characteristic function is computable without reference to an oracle then
it can be computed by an oracle machine, by ignoring the oracle. For the other
direction, suppose that a characteristic function can be computed by reference to
the empty set or any other computable oracle. Then it can be computed without
reference to an oracle by replacing the oracle calls with computations.
The second item is clear. For the third, suppose that PeB computes the
characteristic function of A and that PêC computes the characteristic function
of B . Then in the computation of A from B we can replace the B -oracle calls with
calls to PêC . That computes the characteristic function of A directly from C .
8.4 Example Recall the problem of determining, given e , whether Pe halts on

input 3. It asks for a program that acts as the characteristic function of the set
A = {e Pe halts on 3 }. We will show that K ≤T A.
This is a reprise of Example 5.4. There, we considered the function ψ : N2 → N
that returns 42 if ϕ x (x) ↓ and that diverges otherwise. Church’s Thesis gave a
Turing machine whose input-output behavior is ψ . We let that machine have
index e , and applied the s-m-n theorem to parametrize x , giving ϕ s(e,x ) . We
finished by observing that ϕ x (x)↓ if and only if 1A (s(e, x)) = 1, where 1A is the
characteristic function. That computes K from an A oracle.
8.5 Lemma Any computable set, including or N, is Turing reducible to any other
set.
Proof Let C ⊆ N be computable. Then there is a Turing machine that computes
the characteristic function of C . Think of this as an oracle machine that never
queries its oracle.
The Halting problem is to decide whether Pe halts on input e . A person may
perceive that a more natural problem is to decide whether Pe halts on input x .

8.6 Definition K 0 = { ⟨e, x⟩ Pe halts on input x }
However, we will argue that the two are equivalent, that they are inter-solvable,
meaning that if you can solve the one then you can solve the other. Thus your
choice is a matter of convenience and convention.†
8.7 Definition Two sets A, B are Turing equivalent or T -equivalent, denoted A ≡T B ,
if A ≤T B and B ≤T A.
Showing that two sets are T -equivalent shows that two seemingly-different
problems are actually versions of the same problem. We will greatly expand on
this approach in the chapter on Complexity.
†
For the Halting problem definition we use K because it is the standard and because it has some
technical advantages, including that it falls out of the diagonalization development done at the start of
this subsection.
8.8 Theorem K ≡T K 0 .
Proof For K ≤T K 0 suppose that we have access to a K 0 -oracle. Since it can say
whether Pe halts on x for any input x , it can clearly say whether Pe halts on e .
For the K 0 ≤T K half, consider the flowchart on the left; obviously this machine
halts for all input triples exactly if ⟨e, x⟩ ∈ K 0 . By Church’s Thesis there is a Turing
machine implementing it; let it be machine Pê .
Start Start
Read e, x, y Read y
Simulate Pe Simulate Pe
on input x on input x
Print 42 Print 42
End End
Get the flowchart on the right by applying the s-m-n theorem to parametrize e
and x (that is, this is a sketch of machine Ps(ê,e,x ) ). That flowchart represents a
family of machines, one for each pair ⟨e, x⟩ .
Now suppose that we are given a pair ⟨e, x⟩ and consider the right-side flowchart
for that pair. It either halts on all inputs y or fails to halt on all inputs, depending
on whether ϕ e (x)↓. In particular, taking the input to be the number of this machine
y = s(ê, e, x), we have that Ps(ê,e,x ) halts on input s(ê, e, x) if and only if ϕ e (x)↓.
So give a question about membership in K 0 , about whether ⟨e, x⟩ ∈ K 0 , we can
answer it by determining whether s(ê, e, x) ∈ K .
8.9 Corollary The Halting problem is at least as hard as any computably enumerable
problem: We ≤T K for all e ∈ N.
Proof By Lemma 7.3 the computably enumerable sets are the columns of K 0 .

We = {y ϕ e (y)↓} = {y ⟨e, y⟩ ∈ K 0 }
So We ≤T K 0 ≡T K .
Because the Halting problem is in this sense the hardest of the computably
enumerable problems, we say that it is complete among the c.e. sets.
8.10 Theorem There is no e ∈ N such that ϕ eK is the characteristic function of
K K = {x ϕ xK (x)↓}. That is, where the Relativized Halting problem is the
problem of determining membership in K K , its solution is not computable from
a K oracle.
Proof This is an adaptation of the proof that the Halting problem is unsolvable.
Assume otherwise, that there is a mechanical computation relative to a K oracle
that acts as the characteristic function of K K .

(
1 – if ϕ xK (x)↓
ϕ eK (x) = ( ∗)
0 – otherwise
Then the function below is also computable relative to a K oracle. The flowchart
illustrates its construction; it uses the above function for the branch.
Start
( Read x
42 – if ϕ xK (x)↑
f K (x) = N Y
↑ – if ϕ xK (x)↓ Print 42 ϕ eK (x ) = 1? Infinite loop
End
Since f is computable, it has an index. Let that index be ê , so that f K = ϕ êK .

Now feed f its own index — consider f K (ê) = ϕ êK (ê). If that diverges then the
first clause in the definition of f gives that f K (ê)↓, which is a contradiction. If it
converges then f ’s second clause gives f K (ê)↑, which is also impossible. Either
way, assuming that the function in (∗) can be computed relative to a K oracle leads
to a contradiction.
8.11 Theorem Any set S is reducible to its relativized Halting problem, S ≤T K S .

Proof The flowchart on the left sketches a function that is intuitively mechanically
computable relative to an oracle X . So Church’s Thesis gives that it is PeX for
some index e . Apply the s-m-n theorem to parametrize x , giving the uniformly
X
computable family of machines Ps(e,x charted on the right.
)
Start Start
Read x, y Read y
Y N Y N
x ∈ X? x ∈ X?
Print 42 Loop Print 42 Loop
End End
X
On the right ϕ s(e,x (y)↓ if and only if x ∈ X . Taking the oracle to be S and the
)
S
input to be s(e, x) gives that x ∈ S if and only if ϕ s(e,x (s(e, x))↓, which holds if
)
and only if s(e, x) ∈ K S .
8.12 Corollary K ≤T K K , but K K ≰T K
Proof This follows from the prior two results.

That answers the question posed at the start of this section. One problem
strictly harder than the Halting problem is to compute the characteristic function
of K K .
II.8 Exercises
Recall from page 11 that a Turing machine is a decider for a set if it computes the
characteristic function of that set.
✓ 8.13 Suppose that the set A is Turing-reducible to the set B . Which of these are
true?
(a) A decider for A can be used to decide B .
(b) If A is computable then B is computable also.
(c) If A is uncomputable then B is uncomputable too.
✓ 8.14 Both oracles and deciders take in a number and return, 0 or 1, whether that
number is in the set. What’s the difference?
✓ 8.15 Your friend says, “Oracle machines are not real, so why talk about them?”
What do you say?
8.16 Your classmate says they answered a quiz question to define an oracle with,
“A set to solve unsolvable problems.” Give them a gentle critique.
8.17 Is there an oracle for every problem? For every problem is there an oracle?
8.18 There is this person in your class who keeps mouthing off and your professor
has to keep gently setting them straight. This time its, “Oracles can solve
unsolvable problems, right? And K K is unsolvable, right? So an oracle like the K
oracle should solve it.” Help your prof out here.
8.19 Is the number of oracles countable or uncountable?
✓ 8.20 Prove that A ≤T Ac for all A ⊆ N.
8.21 Your study partner confesses, “I don’t understand relative computation. Any
computation using an oracle must make only finitely many oracle calls if it halts.
But a finite oracle is computable, and so by Lemma 8.5 it is reducible to any set.”
Help them out.
8.22 Let A and B be sets. Show that if A(q) = B(q) for all q ∈ N used in the oracle
computation ϕ A (x) then ϕ A (x) = ϕ B (x).
✓ 8.23 Show that K ≰T .
✓ 8.24 Show that the Halting problem set K reduces to each.
(a) {x Px outputs a 7 for some input }

(b) {x ϕ x (y) = 2y for all input }
8.25 Let A and B be sets. Produce a set C so that A ≤T C and B ≤T C .
8.26 Fix an oracle. Prove that the collection of sets computable from that oracle
is countable.
Section 9. Fixed point theorem 117
8.27 The relation ≤T involves sets so we naturally ask how it interacts with set
operations.
(a) Does A ⊆ B imply A ≤T B ?
(b) Is A ≤T A ∪ B ?
(c) Is A ≤T A ∩ B ?
(d) Is A ≤T Ac ?
8.28 Let A ⊆ N. (a) Define when a set is computably enumerable in an oracle.
(b) Show that N is computably enumerable in A for all sets A. (c) Show that K A
is computably enumerable in A.
Section
II.9 Fixed point theorem
Recall our first example of diagonalization, the proof that the set of real numbers
is not countable, on page 76. We assume that there is an f : N → R and consider
its inputs and outputs, as illustrated in this table.
n f (n)’s decimal expansion

0 42 . 3 1 2 7 7 0 4 ...
1 2.0 1 0 0 0 0 0 ...
2 1.4 1 4 1 5 9 2 ...
3 −20 . 9 1 9 5 9 1 9 ...
.. ..
. .
Let a decimal representation of the number on row n be dn = d.d ˆ n, 0 dn, 1 dn, 2 ... Go
down the diagonal to the right of the decimal point to get the sequence of digits
⟨d 0, 0 , d 1, 1 , d 2, 2 , ...⟩ . With that sequence, construct a number z = 0.z 0 z 1 z 2 ... by
making its n -th decimal place be something other than dn,n . In our example we
took a transformation t of digits given by t(dn,n ) = 2 if dn,n = 1, and t(dn,n ) = 1
otherwise, so that the table above gives z = 0.1211 ... Then the diagonalization
argument culminates in verifying that z is not any of the rows.
When diagonalization fails But what if the transformed diagonal is a row,

z = f (n 0 )? Then the member of the array where the diagonal crosses that row is
unchanged by the transformation, dn0,n0 = t(dn0,n0 ). Conclusion: if diagonalization
fails then the transformation has a fixed point.
We will apply this to sequences of computable functions, ϕ i 0 , ϕ i 1 , ϕ i 2 , ... We are
interested in effectiveness so we don’t consider arbitrary sequences of indices but
instead take the indices to be computable, i 0 , i 1 , i 2 ... = ϕ e (0), ϕ e (1), ϕ e (2) ... for
some e . So a sequence of computable functions has this form.
ϕ ϕe (0) , ϕ ϕe (1) , ϕ ϕe (2) ...

Below is a table with all such sequences, that is, all effective sequences of effective
functions, ϕ ϕe (n) .
Sequence term
n=0 n=1 n=2 n=3 ...
e =0 ϕ ϕ0 (0) ϕ ϕ0 (1) ϕ ϕ0 (2) ϕ ϕ0 (3) ...
e =1 ϕ ϕ1 (0) ϕ ϕ1 (1) ϕ ϕ1 (2) ϕ ϕ1 (3) ...
Sequence e =2 ϕ ϕ2 (0) ϕ ϕ2 (1) ϕ ϕ2 (2) ϕ ϕ2 (3) ...
e =3 ϕ ϕ3 (0) ϕ ϕ3 (1) ϕ ϕ3 (2) ϕ ϕ3 (3)
.. .. ..
. . .
Each entry ϕ ϕe (n) is a computable function. If ϕ e (n) diverges then the function as
whole diverges.
The natural transformation is to use a computable function f .
tf
ϕ x 7−→ ϕ f (x )
The next result shows that under this transformation, diagonalization fails. Thus,
the transformation t f has a fixed point.
9.1 Theorem (Fixed Point Theorem, Kleene 1938)† For any total computable
function f there is a number k such that ϕ k = ϕ f (k ) .
Proof The array diagonal is ϕ ϕ0 (0) , ϕ ϕ1 (1) , ϕ ϕ2 (2) ... The flowchart on the left below
is a sketch of a function f (n, x) = ϕ ϕn (n) (x). Church’s Thesis says that some Turing
machine computes this function; let that machine have index e 0 . Apply the s-m-n
theorem to parametrize n , giving the right chart, which describes the family of
machines that compute ϕ s(e0,n) , the n -th function on the diagonal.
Start Start
Read n, x Read x
(
Run Pn on n Run Pn on n ϕ ϕe (e) (x) – if ϕ e (e)↓
ϕ s(e0,e) (x) =
↑ – otherwise
With the result w , With the result w ,
run Pw on input x run Pw on input x
End End
The index e 0 is fixed, so s(e 0 , n) is a function of one variable. Let д(n) = s(e 0 , n),
so that the diagonal functions are ϕд(n) . This function д is computable and total.
†
This is also known as the Recursion Theorem but there is another widely used result of that name.
This name is more descriptive so we’ll go with it.
Under t f those functions are transformed to ϕ f д(0) , ϕ f д(1) , ϕ f д(2) , ... The
composition f ◦ д is computable and total, since f is specified as total.
Start
( Read x
ϕ f ϕn (n) (x) – if ϕ n (n)↓ Run Pn on n
ϕ f д(n) (x) =
↑ – otherwise
With the result w , run Pf (w ) on x
End
As the flowchart underlines, this is a computable sequence of computable functions.

Hence it is one of the table’s rows. Let it be row v , so that ϕ f д(m) = ϕ ϕv (m) for
all m . Consider where the diagonal sequence ϕд(n) intersects that row: ϕд(v) =
ϕ ϕv (v) = ϕ f д(v) . The desired fixed point for f is k = д(v).
So when we try to diagonalize out of the partial computable functions, we fail.
That is, the notion of partial computable function seems to have an in-built defense
against diagonalization.
The Fixed Point Theorem applies to any total computable function. Conse-
quently, it leads to many surprising results.
9.2 Corollary There is an index e so that ϕ e = ϕ e+1 .
Proof The function f (x) = x + 1 is computable and total. So there is an e ∈ N
such that ϕ e = ϕ f (e) .
9.3 Corollary There is an index e such that Pe halts only on e .

Proof Consider the program described by the flowchart on the left. By Church’s
Thesis it can be done with a Turing machine, Pe0 . Parametrize to get the program
on the right, Ps(e0,m) .
Start Start
Read x, m Read x (
42 – if x = m
x = m? x = m? ϕ s(eo ,m) (x) =
Y N Y N ↑ – otherwise
Print 42 Loop Print 42 Loop
End End
Since e 0 is fixed (it is the index of the machine sketched on the left), s(e 0 , x) is a
total computable function of one variable, f (m) = s(e 0 , m), where the associated
Turing machine halts only on input m . The Fixed Point Theorem gives a fixed
point, ϕ f (e) = ϕ e , and the associated Turing machine Pe halts only on e .
This says that there is a Turing machine that halts only on one input, its index.
Rephrased for rhetorical effect, this machine’s name is its behavior.†
9.4 Corollary There is an m ∈ N such that ϕm (x) = m for all inputs x .
Proof Consider the function ψ (x, y) = x . As the flowchart on the left illustrates, it
is computable.‡ So by Church’s Thesis there is a Turing machine that computes it.
Let that machine have index e , so that ψ (x, y) = ϕ e (x, y) = x .
Start Start
Read x , y Read y
Print x Print x
End End
Apply the s-m-n theorem to get a family of uniformly computable functions

parametrized by x , given by ϕ s(e,x ) (y) = x . Because e is fixed, as it is the number
of the Turing machine that computes ψ , define д : N → N by д(x) = s(e, x). This
function is total. The Fixed Point Theorem says that there is a m ∈ N with
ϕm (y) = ϕд(m) (y) = ϕ s(e,m) (y) = m for all y .
9.5 Remark Every Turing machine has some index number but here the index is
related to its machine’s behavior. Imagine finding that in our numbering scheme,
machine P7 outputs 7 on all inputs. This may seem to be an accident of the choice
of scheme. But it isn’t an accident; the corollary says that something like this must
happen for any acceptable numbering.
The Fixed Point Theorem is deep, showing surprising and interesting behaviors
that occur in any sufficiently powerful computation system. For instance, since a
Turing machine’s index is source-equivalent, the prior result raises the question
of whether there is a program that prints its own source, that self-reproduces. In
addition to the discussion below, Extra C has more.
Discussion The Fixed Point Theorem and its proof are often considered mysterious,
or at any rate obscure. Here we will expand on a few points.
One aspect that bears explication is how it employs the use-mention distinction.
Compare the sentence Atlantis is a mythical city to There are two a’s in “Atlantis”. In
the first, we say that ‘Atlantis’ is used because it has a value, it points to something.
In the second sentence ‘Atlantis’ is not referring to something — its value is itself —
so we say that it is mentioned.§
†
Here, ‘name’ is used as an equivalent of ‘index’ that is meant to be evocative. ‡ In this argument
perhaps the flowchart is overkill since the function is obviously computable. But when it is not obvious,
as in the prior result and in some of the exercises, we need an outline of how to compute the function.
§
A version of this comes up in programming books. If such a book has the sentence, “The number
of players is players” then the first ‘players’ refers to people while the second is a variable from the
program. There the computer code is in a typewriter font, as is a standard practice, because quoting
would be awkward and ugly.
A version of the use-mention distinction happens in com-

puter programming, with pointers. The C language program
below illustrates. The second line’s asterisk means that x
and y are pointers. While the compiler associates x and y with
memory cells, we are not so much interested in the contents
of these cells as in the contents of the memory cells that they
name. The first diagram imagines that the compiler happens
to associate x with memory address 123 and y with 124. It
Courtesy xkcd.com
further imagines that the contents of cell 123 is the number 901
and the contents of cell 124 is 902. We say that x points to 901 and y points
to 902.
The second diagram in the sequence shows the code running. Because of the
*x = 42, the system puts 42 where x points: it does not put 42 in location 123,
rather it puts 42 in the location referred to by the contents of 123, namely, cell 901.†
Then the code sets y to point to the same address as x, address 901. Finally, it puts
13 where y points, which is at this moment the same cell to which x points.
. . . .
Address .
.
.
.
.
.
.
.
void main() { 123 901 901 901 901
int* x,y; 124 902 902 901 901
x = malloc(sizeof(int));
y = malloc(sizeof(int)); . . . .
. . . .
. . . .
*x = 42;
y=x; 901 42 42 13
*y = 13; 902
}
9.6 Animation: Pointers in a C program.
The x and y variables are being considered at different levels of meaning than
ordinary variables. On one level, x refers to the contents of 123, while on another
level it is about the contents of those contents, what’s in address 901.
As to the role played by the use-mention distinction in the Fixed Point Theorem,
the proof starts by taking д(e) to be the name of this procedure.
(
ϕ ϕe (e) (x) – if ϕ e (e)↓
ϕд(e) (x) = ϕ s(e0,e) (x) =
↑ – otherwise
Don’t be fooled by the notation; it is not the case that д(e) equals ϕ e (e) but
instead д(e) is an index of the flowchart on the right in the proof, describing the
procedure that computes the function above. Regardless of whether ϕ e (e)↓, we can
nonetheless compute the index д(n) and from it the instructions for the function.
There is an analogy here with Atlantis — despite that the referred-to city doesn’t
exist we can still sensibly assert things about its name.
†
Using the * operator to access the value stored at a pointer is called dereferencing that pointer. There
is a matching referencing operator, &, that gives the address of an existing variable.
Informally, what д(e) names is, “Given input x , run Pe on input e and if it halts
with output w then run Pw on input x .” Shorter: “Produce ϕ e (e) and then do
ϕ e (e).”
Next, from f we consider the composition and give it a name f ◦ д = ϕv .
Substituting v into the prior paragraph gives that д(v) names, “Compute ϕv (v)
and then do ϕv (v).” That’s the same as “Compute f ◦ д (v) and then do f ◦ д (v).”
Note the self-reference; it may naively appear that to compute д(v) we need
to compute д(v), that the instructions for д(v) paradoxically contains itself as a
subpart.
Then д(v) first computes the name of f ◦ д (v) and after that runs the machine
numbered f ◦ д (v). So д(v) and f ◦ д (v) are names for machines that compute
the same function. Thus д(v) does not contain itself; more precisely, the set of
instructions for computing д(v) does not contain itself. Instead, it contains a name
for the instructions for computing itself.
II.9 Exercises
✓ 9.7 Your friend asks you about the proof of the Fixed Point Theorem, Theorem 9.1.
“The last line says ϕд(v) = ϕ ϕv (v) ; isn’t this just saying that д(v) = ϕv (v)? Why
the circumlocution?” Help them out.
✓ 9.8 Show each. (a) There is an index e such that ϕ e = ϕ e+7 . (b) There is an e
such that ϕ e = ϕ 2e .
9.9 What conclusion can you draw about acceptable enumerations of Turing
machines by applying the Fixed Point Theorem to each of these? (a) the tripling
function x 7→ 3x (b) the squaring function x 7→ x 2 (c) the function that gives
0 except for x = 5, when it gives 1 (d) the constant function x 7→ 42
9.10 We will prove that there is an m such that Wm = {x ϕm (x)↓} = {m 2 }.

(a) You want to show that there is a uniformly computable family of functions
like this. (
42 – if y = x 2
ϕ s(e,x ) (y) =
↑ – otherwise
Define a suitable ψ : N2 → N, argue that it is intuitively mechanically com-

putable, and apply the s-m-n Theorem to get the family of ϕ s(e,x ) .
(b) Observe that e is fixed so that s(e, x) is a function of one variable only, and
name that function д : N → N.
(c) Apply the Fixed Point Theorem to get the desired m .

✓ 9.11 We will show there is an index m so that Wm = {y ϕm (y)↓} is the set
consisting of one element, the m -th prime number.
(a) Argue that p : N → N such that p(x) is the x -th prime is computable.
(b) Use it and the s-m-n Theorem to get that this family of functions is uniformly
computable: ϕ s(e,x ) (y) = 42 if y = p(x) and diverges otherwise.
(c) Draw the desired conclusion.
Extra A. Hilbert’s Hotel 123
✓ 9.12 Prove that there exists m ∈ N such that Wm = {y ϕm (y)↓} = 10m .

9.13 Show there is an index e so that We = {ϕ e (x)↓} = { 0, 1, ... e }.

9.14 The Fixed Point Theorem says that for all f (which are computable and
total) there is an n so that ϕ n = ϕ f (n) . What about the statement in which
we flip the quantifiers: for all n ∈ N, does there exist a total and computable
function f : N → N so that ϕ n = ϕ f (n) ?
9.15 Prove or disprove
the existence of the set. (a) Wm = {ϕm (y)↓} = N − {m }
(b) Wm = {x ϕm (x) diverges }
9.16 Corollary 9.3 shows that there is a computable function ϕ n with domain {n }.
(a) Show that there is a computable function ϕm with range {m } .
(b) Is there a computable function ϕm with range { 2m } ?
9.17 Prove that K is not an index set. Hint: use Corollary 9.3 and the Padding
Lemma, Lemma 2.15.
Extra
II.A Hilbert’s Hotel
Once upon a time there was an infinite hotel. The rooms were numbered 0, 1,
. . . , naturally. One day every room was occupied when someone new came to the
front desk; could the hotel accommodate? The clerk hit on the idea of moving
each guest up a room, that is, moving the guest in room n to room n + 1. With
that, room 0 was empty. So this hotel always has space for a new guest, or a finite
number of new guests.
Next a bus rolls in with infinitely many people p0 , p1 , ... The clerk has
the idea to move each guest to a room with twice the number, putting
the guest from room n into room 2n . Now the odd-numbered rooms are
empty, so pi can go in room 2i + 1, and everyone has a room.
Then in rolls a convoy of buses, infinitely many of them, each with
infinitely many people: B 0 = {p0, 0 , p0, 1 , ... }, and B 1 = {p1, 0 , p1, 1 , ... },
etc. By now the spirit is clear: move each current guest to a new room
with twice the number and the new people go into the odd-numbered
rooms, in the breadth-first order that we use to count N × N.
After this experience the clerk may well suppose that there is always
room in the infinite hotel, that it can fit any set of guests at all, with a Plenty of empty
rooms in this
sufficiently clever method. Restated, this story makes natural the guess
hotel.
that all infinite sets have the same cardinality. That guess is wrong.
There are sets so large that their members could not all fit in the hotel.
One such set is R.†
†
Alas, the infinite hotel does not now exist. The guest in room 0 said that the guest from room 1 would
cover both of their bills. The guest from room 1 said yes, but in addition the guest from room 2 had
agreed to pay for all three rooms. Room 2 said that room 3 would pay, etc. So Hilbert’s Hotel made no
money despite having infinitely many rooms, or perhaps because of it.
II.A Exercises
A.1 Imagine the hotel is empty. A hundred buses arrive, where bus Bi contains
passengers bi, 0 , bi, 1 , etc. Give a scheme for putting them in rooms.
A.2 Give a formula assigning a room to each person from the infinite bus convoy.
A.3 The hotel builds a parking lot. Each floor Fi has infinitely many spaces fi, 0 ,
fi, 1 , . . . And, no surprise, there are infinitely many floors F 0 , F 1 , . . . One day
the hotel is empty and buses arrive, one per parking space, each with infinitely
many people. Give a way to accommodate all these people.
A.4 The management is irked that this hotel cannot fit all of the real numbers. So
they announce plans for a new hotel, with a room for each r ∈ R. Can they now
cover every possible set of guests?
Extra
II.B The Halting problem in Wider Culture
The Halting problems and the related results are about limits. In the light of
Church’s Thesis, they say that there are things that we can never do. To help
understand their impact on the intellectual world outside mathematics as well as
inside we can place them in a historical setting.
With Napoleon’s downfall in the early 1800’s, many
people in Europe felt a swing back to a sense of order and
optimism, fueled by progress.† For example, in the history
of Turing’s native England, Queen Victoria’s reign from
1837 to 1901 seemed to many English commentators to
be an extended period of prosperity and peace. Across
wider Europe, people perceived that the natural world was
being tamed with science and engineering — witness the
introduction of steam railways in 1825, the opening of the Queen Victoria opens the Great
Suez Canal in 1869, and the invention of the electric light Exhibition of the Works of
in 1879.‡ Industry of All Nations, 1851
In science this optimism was expressed by A A Michel-
son, who wrote in 1899, “The more important fundamental laws and facts of
physical science have all been discovered, and these are now so firmly established
that the possibility of their ever being supplanted in consequence of new discoveries
is exceedingly remote.”
The twentieth century physicist R Feynman has likened science to working out
the rules of a game by watching it being played, “to try to understand nature is to
†
These statements are in the context of European intellectual culture, the context in which early Theory
of Computation results appeared. A broader view is outside our scope. ‡ This is not to say that this
perception is justified. Disease and poverty were rampant, colonialism and imperialism ruined the lives
of millions, for much of the time the horrors of industrial slavery in the US south went unchecked, and
Europe was hardly an oasis of calm, with for instance the revolutions of 1848. Nonetheless the zeitgeist
included a sense of progress, of winning.
Extra B. The Halting problem in Wider Culture 125
imagine that the gods are playing some great game like chess. . . . And you don’t
know the rules of the game, but you’re allowed to look at the board from time to
time, in a little corner, perhaps. And from these observations, you try to figure out
what the rules are of the game.” Around the year 1900 many observers thought
that we basically had got the rules and that although there might remain a couple
of obscure things like castling, those would be worked out soon enough.
In Mathematics, this view was most famously voiced in an address
given by Hilbert in 1930, “We must not believe those, who today, with
philosophical bearing and deliberative tone, prophesy the fall of culture and
accept the ignorabimus. For us there is no ignorabimus, and in my opinion
none whatever in natural science. In opposition to the foolish ignorabimus
our slogan shall be: We must know — we will know.” (‘Ignorabimus’ means
D Hilbert ‘that which we must be forever ignorant of’ or ‘that thing we will never
†
1862–1943 fully penetrate’.) There was of course a range of opinion but the zeitgeist
was that we could expect that any question would be settled, and perhaps
soon.
But starting in the early 1900’s, that changed. Exhibit A is the
picture to the right. That the modern mastery of mechanisms can
have terrible effect became apparent to everyone during World
War I, 1914–1918. Ten million military men died. Overall, seventeen
million people died. With universal conscription, probably the men
in this picture did not want to be here. They were killed by a man
who probably also did not want to be here, who never knew that
he killed them, and who simply entered coordinates into a firing
mechanism. If you were at those coordinates, it didn’t matter how
brave you were, or how strong, or how right was your cause — you
died. All that stuff about your people and honor and god, that was World War I German
all bullshit. The zeitgeist shifted: Pandora’s box had opened and dead in a trench
the world is not at all ordered, reasoned, or sensible.
At something like the same time in science, Michaelson’s assertion that physics
was a solved problem was destroyed by the discovery of radiation. This brought in
quantum theory, which has at its heart that there are events that are completely
random, that included the uncertainty principle, and that led to the atom bomb.
With Einstein we see most directly the shift in wider intellectual culture away
from a sense of unlimited progress. After experiments during a solar eclipse in 1919
provided strong support for his theories, Einstein became an overnight celebrity
(“Einstein Theory Triumphs” was the headline in The New York Times). He was
†
Below we will cite some things as turning points that occur before 1930; how can that be? For one
thing, cultural shifts always involve muddled timelines. For another, this is Hilbert’s retirement address
so we can reasonably take his as a lagging view. Finally, in Mathematics the shift occurred later than in
the general culture. We mark that shift with the announcement of Gödel’s Incompleteness Theorem.
That announcement came at the same meeting as Hilbert’s speech, on the day before. Gödel was in the
audience for Hilbert’s address and whispered to O Taussky-Todd, “He doesn’t get it.”
seen by the public as having changed our view of the universe from Newtonian
clockwork to one where “everything is relative.” His work showed that the universe
has limits and that everyday perceptions break down: nothing can travel faster
than light, time bends, and even the commonsense idea of two things happening
at the same instant falls apart.
In the general culture there were many reflections of this
loss of certainty. For example, the generation of writers and
artists who came of age in World War I — including Eliot,
Fitzgerald, Hemingway, Pound, and Stein — became known
as the Lost Generation. They expressed their experience
through themes of alienation, isolation, and dismay at the
corruption they saw around them. In music, composers such
as Debussy, Mahler, and Strauss broke with the traditional
expressive forms, in ways that were often hard for listeners to Salvadore Dali’s 1931 Persis-
understand — Stravinsky’s Rite of Spring caused a near riot at tence of Memory. Depicts
relativity’s warping of a
its premiere in 1913. As for art, the painting here dramatically
pillar of reality, time itself.
shows that visual artists also picked up on these themes.
In mathematics, much the same inversion of the stand-
ing order happened in 1930 with K Gödel’s announcement
of the Incompleteness Theorem. This says that if we fix a
(sufficiently strong) formal system such as the elementary
number theory of N with addition and multiplication then
there are statements that, while true in the system, cannot
be proved in that system. The theorem is clearly about
what cannot be done — there are things that are true that
Gödel and his best friend we shall never prove. This statement of hard limits seemed
to many observers to be especially striking in mathematics,
which had held a traditional place as the most solid of knowledge. For example,
I Kant said, “I assert that in any particular natural science, one encounters genuine
scientific substance only to the extent that mathematics is present.”
Gödel’s Theorem is closely related to the Halting problem. In a mathematical
proof, each step must be verifiable as either an axiom or as a deduction that is valid
from the prior steps. So proving a mathematical theorem is a kind of computation.†
Thus, Gödel’s Theorem and other uncomputability results are in the same vein.
To people at the time these results were deeply shocking, revolutionary. And
while we work in an intellectual culture that has absorbed this shock, we must still
recognize them as a bedrock.
†
This implies that you could start with all of the axioms and apply all of the logic rules to get a set of
theorems. Then application of all of the logic rules to those will give all the second-rank theorems, etc.
In this way, by dovetailing from the axioms you can in principle computably enumerate the theorems.
Extra C. Self Reproduction 127
Extra
II.C Self Reproduction
Where do babies come from?

Some early investigators, working without a microscope, thought that
the development of a fetus is that it basically just expands, while retaining
its essential features (one head, two arms, etc.). Projecting backwards, they
posited a homunculus, a small human-like figure that, when given the breath
of life, swells to become a person.
One awkwardness with this hypothesis is that this person may one day
become a parent. So inside each homunculus are its children? And inside
Sperm draw-
them the grandchildren? That is, one problem is the potential infinite ing, 1695
regress. Of course today we know that sperm and egg don’t contain bodies,
they contain DNA, the code to create bodies.
Paley’s watch In 1802, W Paley famously argued for the existence of a god from a
perception of unexplained order in the natural world.
In crossing a heath, . . . suppose I had found a watch upon the ground . . . [W]hen
we come to inspect the watch we perceive . . . that its several parts are framed and put
together for a purpose, e.g., that they are so formed and adjusted as to produce motion,
and that motion so regulated as to point out the hour of the day . . . the inference we
think is inevitable, that the watch must have a maker — that there must have existed,
at some time and at some place or other, an artificer or artificers who formed it for the
purpose which we find it actually to answer, who comprehended its construction and
designed its use.
The marks of design are too strong to be got over. Design must have had a designer.
That designer must have been a person. That person is GOD.
Paley then gives his strongest argument, that the most incredible
thing in the natural world, that which distinguishes living things from
stones or machines, is that they can, if given a chance, self-reproduce.
Suppose, in the next place, that the person, who found the watch,
would, after some time, discover, that, in addition to all the properties
which he had hitherto observed in it, it possessed the unexpected property
of producing, in the course of its movement, another watch like itself . . . If
that construction without this property, or which is the same thing, before
this property had been noticed, proved intention and art to have been
William Paley
employed about it; still more strong would the proof appear, when he
1743–1805
came to the knowledge of this further property, the crown and perfection
of all the rest.
This argument was a very influential before the discovery by Darwin and
Wallace of descent with modification through natural selection. It shows that from
among all the things in the natural world to marvel at — the graceful shell of a
nautilus, the precision of an eagle’s eye, or consciousness — the greatest wonder
for many observers was self-reproduction.
Many people contended that self-reproduction had a special position, that

mechanisms cannot self-reproduce. Picture a robot that assembles cars; it seemed
plausible that this is possible only because the car is in some way less complex
than the robot. In this line of reasoning, machines are only able to produce things
that are less complex than themselves.
That contention is wrong. The Fixed Point Theorem gives self-reproducing
mechanisms.
Quines The Fixed Point Theorem shows that there is a number m such that
ϕm (x) = m for all inputs. Because it is the index, think of m as the function’s
name. Then this function outputs its name. So this machine names itself; this is
self-reference. Said another way, Pm ’s name is its behavior.
Since we can go effectively from the index m to the machine source, in a sense
this machine knows its source. A quine is a program that outputs its own source
code. To make this more concrete we will step through the nitty-gritty of making a
quine.† We will use the C language since it is low-level and so the details are not
hidden.
The first thing a person might think is to include the source
as a string within itself. Here is try0.c.‡
main() {
printf("main(){\n ... }");
}
But this is obviously naive. The string would have to contain

another string, etc. Like the homunculus theory, this leads to
Courtesy xkcd.com
an infinite regress. Instead, we need a program that somehow
contains instructions for computing a part of itself.
A more sophisticated approach leverages our discussion of
the Fixed Point Theorem in that it mentions the code before
using it. This is try1.c.§
char *e="main(){printf(e);}"
main(){printf(e);};
Here is the printout.

main(){printf(e);};
Ratcheting up this approach gives try2.c.

char *e="main(){printf("char*e="");printf(e); printf("";\n");printf(e);"
main(){printf("char *e="");printf(e); printf("";\n");printf(e);}
This is close. Just escape some newlines and quotation marks.|| This program,
try3.c, works.
†
The easiest such program finds its source file on the disk and prints it. That is cheating. ‡ The
backslash-n gives a newline character. § The char *e="..." construct gives a string. In the C
language printf(...) command the first argument is a string. In that string double quotes expand to
single quotes, %c takes a character substitution from any following arguments, and %s takes a string
substitution. || The 10 is the ASCII encoding for newline and 34 is ASCII for a double quotation mark.
Extra C. Self Reproduction 129
char *e="char*e="%c %s %c; %c main() {printf(e,34,e,34,10,10);}%c";

main(){printf(e,34,e,34,10,10);}
Quines are possible in any complete model of computation; the exercises ask
for them in a few languages.
Know thyself A program that prints itself can seem to be a parlor trick. But
for routines to have access to their code is useful. For example, to write a
toString(obj) method you probably want your method to ask obj for its source.
Another example, more nefarious, is a computer virus that transmits copies of its
code to other machines.
We will show how a routine can know its source. We will start with an alternate
presentation of a machine that prints itself.
First, two technical points. One is that given two programs we can combine them
into one, so that we run the first and then run the second. The other point is that we
have fixed a numbering of Turing machines that is ‘acceptable’, meaning that there
is a computable function from indices to machines and another computable function
back. Write T for the set of Turing machines and let the function str : T → B∗
input a Turing machine and return a standard bitstring representation of that
machine (i.e., its source), let machine : N → T input an index e and return the
machine Pe , and let idx : B∗ → N input the string representation of Pe and returns
the index of that machine, e (if the input string doesn’t represent a Turing machine
then it doesn’t matter what this function does). Do this in such a way that idx is
the inverse of the function str ◦ machine.
Consider the machines sketched below. The first computes the function
echo(σ ) = σ . Let it have index e 0 and apply the s-m-n Theorem to get the family
of machines sketched in the middle, Ps(e0,σ ) , each of which ignores its input and
just prints σ .
Start
Start
Start Read σ
Read σ
Print σ Erase σ
Print σ
End Print str ◦ machine (s(e0, σ ))
End
End
On the right, s(e 0 , σ ) is the index of the middle machine so str ◦ machine(s(e 0 , σ ))
is the standard string representation of the middle machine for σ . Thus, if σ is
the standard representation of a Turing machine P then when the machine on
the right is done, the tape will contain only the standard representation of the
middle machine for σ , the machine that ignores its input and prints out σ . Call the
machine on the right Q and call the function that it computes q : Σ∗ → Σ∗ .
The machine that prints itself is a combination of two machines, A and B.
Here’s B.
Start
Read β
Compute α = q(β )
Print α a β
End
The other machine is A = Ps(e0, str(B)) , which ignores anything on the tape and
prints out the string representation of B.
The action of the combination on an empty tape is that first the A part prints
out the standard string representation str(B). Then B reads it in as β , computes
α = str(A), concatenates the two string representations α ⌢ β , and prints it. This is
the string representation of itself, of the combination of A with B.
To get a machine that computes with its own source we will extend this
approach. The idea is to start with a machine C that takes two inputs, a string
representation of a machine and a string, and then get the desired machine D that
uses its own representation.
C.1 Theorem For any Turing machine C that computes a two-input function
c : Σ∗ × Σ∗ → Σ∗ there is a machine D that computes a one-input function
d : Σ∗ → Σ∗ where d(ω) = c(str(D), ω).
The machine D is the combination of three machines, A, B, and C . First, as
shown on the left, modify Q to write its output after a string already on the tape,
because we need to leave the input ω on the tape.
Start Start
Read σ Read ω , τ
Move to end of σ Compute α = q(τ )
Move right, past one blank Print α a τ a B a ω
Print str ◦ machine (s(e0, σ )) Run C
End End
Second, modify A to be Ps(e0, str(B)⌢str(C )) , which ignores anything on the tape and
prints out the string representation of the combination of machines B and C .
Next, as shown on the right, modify B to input two strings, ω and τ . Apply q to
the second to compute A’s standard representation α . Print out the concatenation
of α and τ , then a blank, and then the input ω . That has the form of two
inputs; finish by running the machine C on them.
Verbing In English you can accomplish a self-reference with, “This sentence has
32 characters.” But formal languages such as programming languages usually don’t
have a self-reference operator like the ‘this’ in that sentence. The above discussion
Extra D. Busy Beaver 131
shows that no such operator is necessary. We can also use those techniques in
English, as here.
Print out two copies of the following, the second in quotes: “Print out two copies of
the following, the second in quotes:”
The verb ‘to quine’ means “to write a sentence fragment a first time, and then
to write it a second time, but with quotation marks around it” For example, from
‘say’ we get “say ‘say”’. And, quining ‘quine’ gives “quine ‘quine’.”
In this linguistic analogy of the self-reproducing programs, the word plays the
role of the data, the part played by the machine A or the part played by try3.c’s
string char *e. In the slogan “Produce the machine, and then do the machine,”
they are the ‘produce’ part. The machine B plays the role of the verb ‘quine’, and is
the ‘do’ part.
Reflections on Trusting Trust K Thompson is one of the two main cre-
ators of the UNIX operating system. For this and other accomplishments
he won the Turing Award, the highest honor in computer science. He
began his acceptance address with this.
In college, before video games, we would amuse ourselves by posing
programming exercises. One of the favorites was to write the shortest
self-reproducing program. . . .
More precisely stated, the problem is to write a source program that,
Ken Thompson
when compiled and executed, will produce as output an exact copy of its
b 1943
source. If you have never done this, I urge you to try it on your own. The
discovery of how to do it is a revelation that far surpasses any benefit obtained by being
told how to do it. The part about “shortest” was just an incentive to demonstrate skill
and determine a winner.
This celebrated essay develops a quine and goes on to show how the existence
of such code poses a security threat that is very subtle and just about undetectable.
The entire address (Thompson 1984) is widely available; everyone should read it.
II.C Exercises
C.2 Produce a Scheme quine.
C.3 Produce a Python quine.
C.4 Consider a Scheme function diag that is given a string σ and returns a
string with each instance of x in σ replaced with a quoted version of σ . Thus
diag("hello x world") returns hello ’hello x world’ world. Show that
print(diag('print(diag(x))')) is a quine.
C.5 Write a program that defines a function f taking a string as input, and
produces its output by applying f to its source code. For example, if f reverses
the given string, then the program should outputs its source code backwards.
C.6 Write a two-level polyglot quine, a program in one language that outputs a
program in a second language, which outputs the original program.
Extra
II.D Busy Beaver
Here is a try at solving the Halting problem: “For any n ∈ N the set of Turing
machines having n many tuples or fewer is finite. For some members of this set
Pe (e) halts and for some members it does not, but because the set is finite the list of
which Turing machines halt must also be finite. Finite sets are computable. So to
solve the Halting problem, given a Turing Machine P , find how many instructions
it has and just compute the associated finite halting information set.” The problem
with this plan is uniformity, or rather lack of it — there is no single computable
function that accepts inputs of the form ⟨n, e⟩ and that outputs 1 if the n -instruction
machine Pe (e) halts, or 0 otherwise.
The natural adjustment of that plan, the uniform attack, is to start all of the
machines having n or fewer instructions and dovetail their computations until no
more of them will ever converge.
That is, consider D : N → N, where D(n) is the minimal number of steps after
which all of the n -instruction machines that will ever converge have done so. We
can prove that D is not computable. For, assume otherwise. Then to compute
whether Pe halts on input e , find how many instructions n are in the machine Pe ,
compute D(n), and run Pe (e) for D(n)-many steps. If Pe (e) has not halted by then,
it never will. Of course, this contradicts the unsolvability of the Halting problem.
The function D may seems like just another uncomputable function; why is it
especially enlightening? Observe that if a function D̂ has values larger than D , if
D̂(n) ≥ D(n) for all sufficiently large n, then D̂ is also not computable. This gives
us an insight into one way that functions can fail to be computable: they can grow
too fast.†
So, which n -line program is the most productive? The Busy
Beaver problem is: which n -state Turing Machine leaves the most
1’s after halting, when started on an empty tape?
Think of this as a competition — who can write the busiest
machine? To have a competition we need precise rules, which
differ in unimportant ways from the conventions we have adopted
in this book. So we fix a definition of Turing Machines where
Rare moment of rest there is a single tape that is unbounded at one end, there are
two tape symbols 1 and B, and where transitions are of the form
∆(state, tape symbol) = ⟨state, tape symbol, head shift⟩ .
Busy Beaver is unsolvable Write Σ(n) for the largest number of 1’s that any
n state machine, when started on a blank tape, leaves on the tape after halting.
Write S(n) for the most moves, that is, transitions.
Why isn’t Σ computable? The obvious thing is to do a breadth-first search: there
are finitely many n -state machines, start them all on a blank tape, and await
†
Note the connection with the Ackermann function: we showed that it is not primitive recursive because
it grows faster than any primitive recursive function.
Extra D. Busy Beaver 133
developments.
That won’t work because some of the machines won’t halt. At any
moment you have some machines that have halted and you can see
how many 1’s are on each such tape, so you know the longest so far.
But as to the not-yet-halted ones, who knows? You can by-hand see
that this one or that one will never halt and so you can figure out the
answer for n = 1 or n = 2. But there is no algorithm to decide the
question for an arbitrary number of states.
D.1 Theorem (Radó, 1962) The function Σ is not computable.
Tibor Radó 1895–
Proof Let f : N → N be computable. We will show that Σ , f by 1965
showing that Σ(n) > f (n) for infinitely many n .
First note that there is a Turing Machine Mj having j many states
that writes j -many 1’s to a blank tape. For instance, here is M4 .
B,1 B,1 B,1 B,1
q0 q1 q2 q3 Halt
1,R 1,R 1,R 1,R
Also note that we can compose two Turing machines. The illustration below shows
two machines on the left. On the right, we have combined the final states of the
first machine with the start state of the second.
... mi Halt n0 n1 ... ... mi n0 n1 ...
... mj ... mj
Let F : N → N be this function.
F (m) = (f (0) + 02 ) + (f (1) + 12 ) + (f (2) + 22 ) + · · · + (f (m) + m 2 )
It has the properties: if 0 < m then f (m) < F (m), and m 2 ≤ F (m), and F (m) <
F (m + 1). It is intuitively computable so Church’s Thesis says there is a Turing
machine MF that computes it. Let that machine have n F many states.
Now consider the Turing machine P that performs Mj and follows that with
the machine MF , and then follows that with another copy of the machine MF . If
started on a blank tape this machine will first produce j -many 1’s, then produce
F (j)-many 1’s, and finally will leave the tape with F (F (j))-many 1’s. Thus its
productivity is F (F (j)). It has j + 2n F many states.
Compare that with the j + 2n F -state Busy Beaver machine. By definition
F (F (j)) ≤ Σ(j + 2n F ). Because n F is constant (it is the number of states in the
machine MF ), the relation j + 2n F ≤ j 2 < F (j) holds for sufficiently large j .
Since F is strictly increasing, F (j + 2n F ) < F (F (j)). Combining gives f (j + 2n F ) <
F (j + 2n F ) < F (F (j)) ≤ Σ(j + 2n F ), as required.
What is known That Σ(0) = 0 and Σ(1) = 1 follow straight from the definition.
(The convention is to not count the halt state, so Σ(0) refers to a machine consisting
only of a halting state.) Radó noted in his 1962 paper that Σ(2) = 4. In 1964 Radó
and Lin showed that Σ(3) = 6.
D.2 Example This is the three state Busy Beaver machine.
∆ B 1
q0 q 1 , 1, R q 4 , 1, R
q1 q 2 , 0, R q 1 , 1, R
q2 q 3 , 1, L q 0 , 1, L
In 1983 A Brady showed that Σ(4) = 107. As to Σ(5), even today no one knows.
Here are the current world records.
n 1 2 3 4 5 6
Σ(n) 1 4 6 13 ≥ 4 098 ≥ 1.29 × 10865
S(n) 1 6 21 107 ≥ 47 176 870 ≥ 3 × 101730
Not only are Busy Beaver numbers very hard to compute, at some point they
become impossible. In 2016, A Yedida and S Aaronson obtained an n for which Σ(n)
is unknowable. To do that, they created a programming language where programs
compile down to Turing machines. With this, they constructed a 7918-state
Turing machine that halts if there is a contradiction within the standard axioms
for Mathematics, and never halts if those axioms are consistent. We believe that
these axioms are consistent, so we believe that this machine doesn’t halt. However,
Gödel’s Second Incompleteness Theorems shows that there is no way to prove the
axioms are consistent using the axioms themselves, so Σ(n) is unknowable in that
even if we were given the number n , we could not use our axioms to prove that it
is right, to prove that this machine halts.
So one way for a function to fail to be computable is if it grows faster than
any computable function. Note, however, that this is not the only way. There are
functions that grow slower than some computable function but are nonetheless not
computable.
II.D Exercises
✓ D.3 Give the computation history, the sequence of configurations, that come from
running the three-state Busy Beaver machine. Hint: you can run it on the Turing
machine simulator.
✓ D.4 (a) How many Turing machines with tape alphabet { B, 1 } are there having
one state? (b) Two? (c) How many with n states?
D.5 How many Turing machines are there, with a tape alphabet Σ of n characters
and having n states?
Extra E. Cantor in Code 135
D.6 Show that there are uncomputable functions that grow slower than some
computable function. Hint: There are uncountably many functions with output in
the set B.
D.7 Give a diagonal construction of a function that is greater than any computable
function.
Extra
II.E Cantor in Code
The definitions of cardinality and accountability do not require that the functions
must be effective. In this section we effectivize, counting sets such as { 0, 1 } × N and
N × N using functions that are mechanically computable. The most straightforward
way to show that these functions can be computed is to exhibit code, so here it is.
Scheme’s let creates a local variable.
(use numbers)
;; triangle-num return 1+2+3+..+n

(define (triangle-num n)
(/ (* (+ n 1)
n)
2))
;; cantor Cantor number of the pair (x,y) of integers

(define (cantor x y)
(let ((d (+ x y)))
(+ (triangle-num d)
x)))
Use this code in the natural way.

#;1> (include "godelnumbering.scm")
; including godelnumbering.scm ...
#;2> (cantor 1 2)
7
We will need both the map and its inverse, which goes from the number to
the pair. Here is the routine that inverts cantor. The let* variant allows us to
compute the local variable t by using the local variable d computed before it, in
the prior line.
;; xy given the cantor number, return (x y)
(define (xy c)
(let* ((d (diag-num c))
(t (triangle-num d)))
(list (- c t)
(- d (- c t)))))
This is a sample use.

#;1> (include "godelnumbering.scm")
; including godelnumbering.scm ...
#;2> (xy 7)
(1 2)
The xy routine depends on a diag-num to compute the number of the diagonal.

For that, where the diagonal is d(x, y) = x + y , let the associated triangle number
be t(x, y) = d(d√+ 1)/2 = (d 2 + d)/2. Then 0 = d 2 + d − 2t . Applying the familiar
formula (−b ± b 2 − 4ac)/(2a) gives
p √
−1 + 1 − 4 · 1 · (−2t) −1 + 1 + 8t
d= =
2·1 2
(of the ‘±’, we keep only the ‘+’ because the other root is negative). Not every pair
corresponds to a triangle number so to find the number of the diagonal
√ lying before
the pair ⟨x, y⟩ with cantor(x, y) = c , take the floor d = ⌊(−1 + 1 + 8c)/2⌋ .†
(define (diag-num c)
(let ((s (exact-integer-sqrt (+ 1 (* 8 c)))))
(floor-quotient (- s 1)
2)))
Extending to triples is straightforward.

;; cantor-3 number triples
(define (cantor-3 x0 x1 x2)
(cantor x0 (cantor x1 x2)))
; xy-3 Return the triple that gave (cantor-3 x0 x1 x2) => c

(define (xy-3 c)
(cons (car (xy c))
(xy (cadr (xy c)))))
Using those routines is also straightforward.

#;2> (cantor-3 1 2 3)
172
#;3> (xy-3 172)
(1 2 3)
Turing machines instructions are four tuples so we are interested in those.

;; cantor-4 Number quads
(define (cantor-4 x0 x1 x2 x3)
(cantor x0 (cantor-3 x1 x2 x3)))
; xy-4 Un-number quads: give (x0 x1 x2 x3) so that (cantor-4 x0 x1 x2 x3) => c
(define (xy-4 c)
(let ((pr (xy c)))
(cons (car pr)
(xy-3 (cadr pr)))))
What the heck, let’s extend to tuples of any size. We don’t need these but they
are fun. The cantor-n routine takes a tuple of any length and outputs the Cantor
number of that tuple. Also there is xy-arity that takes two inputs, the length of a
tuple and its Cantor number, and produces the tuple.
†
The code for diag-num has two implementation details of interest. One is that in Scheme the floor
function returns a floating point number. We want xy to be the inverse of cantor, which inputs
integers, so we want diag-num to return an integer. That explains the inexact->exact conversion.
The second detail is that the code leads to numbers large enough to give floating point overflows.
For instance, (cantor-n 1 2 3 4 5 6 7) returns 1.05590697087673e+55. So the code shown has
the naive version of diag-num commented out and instead uses a library for bignums, integers of
unbounded size.
;; These routines generalize: number any tuple, or find the tuple corresponging
;; to a number.
;; The only ugliness is that the empty tuple is unique, so there is only
;; one tupe of that arity.
;; cantor-n number any-sized tuple

(define (cantor-n . args)
(cond ((null? args) 0)
((= 1 (length args)) (car args))
((= 2 (length args)) (cantor (car args) (cadr args)))
(else
(cantor (car args) (apply cantor-n (cdr args))))))
;; xy-arity return the list of the given arity making the cantor number c
;; If arity=0 then only c=0 is valid (others return #f)
(define (xy-arity arity c)
(cond ((= 0 arity)
(if (= 0 c )
'()
(begin
(display "ERROR: xy-arity with arity=0 requires c=0") (newline)
#f)))
((= 1 arity) (list c))
(else (cons (car (xy c))
(xy-arity (- arity 1) (cadr (xy c)))))))
The xy-arity routine is not uniform in that it covers only one arity at a time.
Said another way, xy-arity is not the inverse of cantor-n in that we have to tell
it the tuple’s arity.
To cover tuples of all lengths we define two matched routines, cantor-omega
and xy-omega that communicate using a simple data structure, a pair where
the first element is the length of the tuple and the second is the tuple’s cantor
number. These two are correspondences between the natural numbers and the set
of sequences of natural numbers. They are inverse.
;; cantor-omega encode the arity in the first component
(define (cantor-omega . tuple)
(let ((arity (length tuple)))
(cond ((= arity 0) (cantor 0 0))
((= arity 1) (cantor 0 (+ 1 (car tuple))))
(else
(let ((newtuple (list (- arity 1)
(apply cantor-n tuple))))
(apply cantor newtuple))))))
;; xy-omega Inverse of cantor-omega

(define (xy-omega c)
(let* ((pr (xy c))
(a (car pr))
(cantor-number (cadr pr)))
(cond
((and (= a 0)
(= cantor-number 0)) '())
((= a 0) (list (- cantor-number 1)))
(else (xy-arity (+ 1 a) cantor-number)))))
This shows their use.

#;4> (xy-omega 0)
()
#;5> (xy-omega 1)
(0)
#;6> (xy-omega 2)
(0 0)
#;7> (xy-omega 3)
(1)
#;8> (xy-omega 4)
(0 1)
#;9> (xy-omega 5)
(0 0 0)
#;10> (cantor-omega 1 2 3 4)
12693900784
#;11> (xy (cantor-omega 1 2 3 4))
(4 159331)
#;12> (xy-omega (cantor-omega 1 2 3 4))
(1 2 3 4)
Numbering Turing machines We will define a correspondence between natural

numbers and Turing machines via Scheme code.
We represent an instruction with four-tuple of natural numbers. In the code
below this is a quad. A Turing machine is then represented as a list, a quadlist.
To go from a number to a quadlist we first apply xy-omega. This turns the
number into a sequence of numbers, below called a numlist. We convert each of
its numbers into a quad to get a quadlist.
;; natural->quad Return the quad corresponding to the natural number
;; quad->natural Return the natural matching the quad
(define (natural->quad n)
(xy-4 n))
(define (quad->natural q)
(apply cantor-4 q))
;; numlist->quadlist Convert list of naturals to list of corresponding quads

;; quadlist->numlist Convert list of quads to list of corresponding naturals
(define (numlist->quadlist nlist)
(map natural->quad nlist))
(define (quadlist->numlist qlist)
(map quad->natural qlist))
;; get-nth-quadlist Get the quadlist with the given cantor number

(define (get-nth-quadlist n)
(numlist->quadlist (xy-omega n)))
This illustrates. The last line associates the number 2558 with the three-element
quadlist.
#;1> (natural->quad 1)
(0 0 0 1)
(1 0 0 0)
(0 1 0 0)
#;4> (cantor-omega 3 2 1)
2558
#;5> (get-nth-quadlist 2558)
((0 1 0 0) (1 0 0 0) (0 0 0 1))
A Turing machine is a set so to check if a quadlist is a Turing machine we

must we must check that no two quads are the same. In addition, a Turing machine
is deterministic so we must check that different quad’s begin with a different first
two numbers. The second condition implies the first so we check here only the
second.
This routine checks for determinism by sorting the quadlist alphabetically,
so that if there are two quad’s beginning with the same pair they will then be
adjacent. Checking for adjacent quad’s with the same first two elements only
requires walking once down the list.
;; quadlist-is-deterministic? Is the list of quads deterministic?
;; qlist list of length 4 lists of numbers
(define (quadlist-is-deterministic? qlist)
(let ((sorted-qlist (sort qlist quad-less?)))
(quadlist-is-deterministic-helper sorted-qlist)))
;; quadlist-is-deterministic-helper look for adjacent quads that differ
;; sq sorted list of quads
(define (quadlist-is-deterministic-helper sq)
(cond
((null? sq) #t)
((= 1 (length sq)) #t)
((first-two-equal? (car sq) (cadr sq)) #f)
(else (quadlist-is-deterministic-helper (cdr sq)))))
;; quadlist-is-tm Decide if a quadlist is a Turing machine

;; For a quadlist, not deterministic => not set, so we only check one.
(define (quadlist-is-tm? qlist)
(quadlist-is-deterministic? qlist))
We count Turing machines by brute force: we get the numlist’s in ascending

order — the one whose Cantor number is 0, the one whose number is 1, etc. — and
convert each to a quadlist. We test each to see if it is a Turing machine, and if
so assign it the next index. This routine picks out the quadlist that is a Turing
machine, and that corresponds to a numlist greater than or equal to the input
argument.
;; tm-next Return the TM whose numlist is the first greater than or equal to n
(define (tm-next n)
(do ((c n (+ c 1)))
((quadlist-is-tm? (get-nth-quadlist c)) (list (get-nth-quadlist c) c))))
With that, here is the function that takes in a Turing machine as a list of quads
and finds a natural number index for that Turing machine, along with the inverse
function, taking the machine to an index for its machine.
;; godel Return the index number of Turing machine tm
(define (godel tm)
(let ((c 0))
(do ((dex 0 (+ 1 dex)))
((equal? tm (car (tm-next c))) dex)
(set! c (+ 1 (cadr (tm-next c)))))))
; Here to reuse if a bug appears
; (display "godel tm-next=") (write (tm-next dex)) (newline)
;; machine Return the Turing machine with index g

(define (machine g)
(let ((c 0))
(do ((dex 0 (+ 1 dex)))
((= dex g) (car (tm-next c)))
(set! c (+ 1 (cadr (tm-next c)))))))
Use is straightforward. The last one takes a few seconds.

#;2> (machine 0)
()
#;3> (machine 1)
((0 0 0 0))
#;4> (machine 2)
((0 0 0 1))
#;5> (machine 3)
((1 0 0 0))
#;6> (godel '((0 0 0 0)))
1
#;7> (godel '((0 1 1 1)))
298
Earlier we saw a Turing machine simulator. We can translate the machines

written here to the earlier format. That is, we can interpret each of the quad’s
above (a0 a1 a2 a3) as an instruction (qp ,Tp ,Tn , qn ).
;; quad->tminstruction Convert a quad to an instruction for a TM
;; tminstruction->quad Inverse to the prior
(define (quad->tminstruction q)
(let ((zero (car q))
(one (cadr q))
(two (caddr q))
(three (cadddr q)))
(list (nat->inst-zero zero)
(nat->inst-one one)
(nat->inst-two two)
(nat->inst-three three))))
(define (tminstruction->quad i)
(let ((zero (car i))
(one (cadr i))
(two (caddr i))
(three (cadddr i)))
(list (inst->nat-zero zero)
(inst->nat-one one)
(inst->nat-two two)
(inst->nat-three three))))
;; quadlist->instructionlist ql convert a quadlist to a list of instructions

(define (quadlist->instructionlist ql)
(map quad->tminstruction ql))
(define (instructionlist->quadlist tm)
(map tminstruction->quad tm))
These rely on helper routines. Handling the states is trivial: for instance,
a0 = 0 and a3 = 0 translate to the state q 0 .
(define (nat->inst-zero i)
i)
(define (inst->nat-zero i)
i)
(define (nat->inst-three i)
i)
(define (inst->nat-three i)
i)
The present tape character Tp has three possibilities. It can be a blank, which
we associate with a1 = 0. Second, for readability we allow lower case letters a–z,
which we associate with a1 = 1 through a1 = 26. Finally, for higher-numbered
a1’s we just punt and write them as natural numbers. For instance, a1 = 27 is
associated with Tp = 0.
(define ASCII-a (char->integer #\a))
(define (nat->inst-one i)
(cond
((= i 0) #\B)
((and (> i 0) (<= i 26))
(integer->char (+ (- i 1) ASCII-a)))
(else (- i 27))))
(define (inst->nat-one i)
(cond
((equal? i #\B) 0)
((char? i) (+ 1 (- (char->integer i) ASCII-a)))
(else (+ i 27))))
Note Scheme’s notation for characters, for instance #\a and #\B represent the
characters a and B.
The tape-next description Tn is much the same, except that it also can be L or R.
(define (nat->inst-two i)
(cond
((= i 0) #\L)
((= i 1) #\R)
((= i 2) #\B)
((and (> i 2) (<= i 28))
(integer->char (+ (- i 3) ASCII-a)))
(else (- i 29))))
(define (inst->nat-two i)
(cond
((equal? i #\L) 0)
((equal? i #\R) 1)
((equal? i #\B) 2)
((char? i) (+ 3 (- (char->integer i) ASCII-a)))
(else (+ i 29))))
The machine here is simple; if started on a blank tape it writes an a and then
halts in the next step.
#;1> (instructionlist->quadlist '((0 #\B #\a 0) (0 #\a #\a 1)))
((0 0 3 0) (0 1 3 1))
We could find its index number as here.

#;2> (godel (instructionlist->quadlist '((0 #\B #\a 0) (0 #\a #\a 1))))
The list (machine 0), (machine 1), . . . contains all the Turing machines.
II.E Exercises
E.1 The code for machine, the routine that inputs a natural number and produces
the Turing machine corresponding to that number, is slow. Find how long it
takes to produce Pn for the numbers n = 0, 100, ... , 700. You can use, e.g.,
(time (machine 100)). Graph n against the time.
E.2 What does Turing machine number 666 do? Does it halt on input 0? On
input 666?
E.3 The set of Turing machines can be numbered in ways other than the one
given here. One is to use the same coding of states and tape symbols but instead
of leveraging Cantor’s correspondence, it uses the powers of primes to get the
final index. For instance, the Turing machine P = {q 0 B1q 0 , q 0 11q 1 } has the
two quad’s (0 0 4 0) and (0 3 4 1). We can take the index of P to be the
natural number 21 31 55 71 111 134 175 192 (we add 1 to the exponents because if
we did not then we could not tell whether the four-tuple ⟨0, 0, 0, 0⟩ is one of
the instructions). (a) What are some advantages and disadvantages of the two
encodings? (b) Compute the index of the example P under this encoding.
Part Two
Automata
Chapter
III Languages
Turing machines input strings and output strings, sequences of tape symbols. So a
natural way to work is to represent a problem as a string, put it on the tape, run a
computation, and end with the solution as a string.
Everyday computers work the same way. Consider a program that finds the
shortest driving distance between cities. Probably we work by inputting the map
distances as a strings of symbols and inputting the desired two cities as two strings,
and after running the program we have the output directions as a string. So strings,
and collections of strings, are essential.
Section
III.1 Languages
Our machines input and output strings of symbols. We take a symbol (sometimes
called a token) to be an atomic unit that a machine can read and write.† On
everyday binary computers the symbols are the bits, 0 and 1. An alphabet is a
nonempty and finite set of symbols. We usually denote an alphabet with the upper
case Greek letter Σ, although an exception is the alphabet of bits, B = { 0, 1 }. A
string over an alphabet is a sequence of symbols from that alphabet. We use lower
case Greek letters such as σ and τ to denote strings. We use ε to denote the empty
string, the length zero sequence of symbols. The set of all strings over Σ is Σ∗ .‡
1.1 Definition A language L over an alphabet Σ is a set of strings drawn from that
alphabet. That is, L ⊆ Σ∗ .
1.2 Example The set of bitstrings that begin with 1 is L = { 1, 10, 11, 100, ... }.
1.3 Example Another language over B is the finite set { 1000001, 1100001 }.
1.4 Example Let Σ = { a, b }. The language consisting of strings where the number of
a’s is twice the number of b’s is L = {ε, aab, aba, baa, aaaabb, ... }.
1.5 Example Let Σ = { a, b, c }. The language of length-two strings over that alphabet
is L = Σ2 = { aa, ab, ba ... , cc }. Over the same alphabet this is the language of
Image: The Tower of Babel, by Pieter Bruegel the Elder (1563) † We can imagine Turing’s clerk
calculating without reading and writing symbols, for instance by keeping track of information by having
elephants move to the left side of a road or to the right. But we could translate any such procedure into
one using marks that our mechanism’s read/write head can handle. So readability and writeability are
not essential but we require them in the definition of symbols as a convenience; after all, elephants are
inconvenient. ‡ For more on strings see the Appendix on page 354.
146 Chapter III. Languages
string of length three, each of which is sorted in ascending order.
{ aaa, bbb, ccc, aab, aac, abb, abc, acc, bbc, bcc }
(It is not that the set is sorted in ascending order since sets don’t have an order.
Instead, each string has its characters come in ascending order.)
1.6 Definition A palindrome is a string that reads the same forwards as backwards.
Some words from English that are palindromes are ‘kayak’, ‘noon’, and ‘racecar’.
1.7 Example The language of palindromes over Σ = { a, b } is L = {σ ∈ Σ∗ σ = σ R }.

A few members are abba, aaabaaa, and a.

1.8 Example Let Σ = { a, b, c }. Pythagorean triples ⟨i, j, k⟩ ∈ N3 are those where
i 2 + j 2 = k 2 . A few such triples are ⟨3, 4, 5⟩ , ⟨5, 12, 13⟩ , and ⟨8, 15, 17⟩ . One way
to describe Pythagorean triples is with this language.
L = { ai bj ck ∈ Σ∗ i, j, k ∈ N and i 2 + j 2 = k 2 }

= { aaabbbbccccc = a3 b4 c5 , a5 b12 c13 , a8 b15 c17 , ... }

1.9 Example The empty set is a language L = { } over any alphabet. So is the set
whose single element is the empty string L̂ = {ε }. These two languages are
different, because the first has no members.
We can think that a natural language such as English consists of sentences,
which are strings of words from a dictionary. Here Σ is the set of dictionary words
and σ is a sentence. This explains the definition of “language” as a set of strings.
Of course, our definition allows a language to be any set of strings at all, while
in English you can’t form a sentence by just taking any crazy sequence of words;
an sentence must be constructed according to rules. We will study sets of rules,
grammars, later in this chapter.
1.10 Definition A collection of languages is a class.
1.11 Example Fix an alphabet Σ. The collection of all finite languages over that
alphabet is a class.
1.12 Example Let Pe be a Turing machines, using the input alphabet Σ = { B, 1 }. The
set of strings Le = {σ ∈ Σ∗ Pe halts on input σ } is a language. The collection of
all such languages, of the Le for all e ∈ N, is the class of computably enumerable
languages over Σ.
We next consider operations on languages. They are sets so the operations
of union, intersection, etc., apply. However, for instance the union of a language
over { a }∗ with a language over { b }∗ is an awkward marriage, a combination of
strings of a’s with strings of b’s. That is, the union of a language over Σ0 with
a language over Σ1 is a language over Σ0 ∪ Σ1 . The same thing happens for
intersection.
Section 1. Languages 147
Other operations on languages are extensions of operations on strings. We

define the concatenation of languages to be the language of concatenations, etc.
1.13 Definition (Operations on languages) The concatenation of languages, L0⌢L1
or L0 L1 , is the set of concatenations, {σ0 ⌢ σ1 σ0 ∈ L0 and σ1 ∈ L1 }.
For any language, the power, Lk , is the language consisting of the concatenation
of k -many members, Lk = {σ0 ⌢ · · · ⌢ σk −1 σi ∈ L } when k > 0. In particular,

L1 = L. For k = 0, we take L0 = {ε }.† The Kleene star of a language, L∗ , is the

language consisting of the concatenation of any number of strings.
L∗ = {σ0 ⌢ · · · ⌢ σk −1 k ∈ N and σ0 , ... , σk −1 ∈ L }

This includes the concatenation of 0-many strings, so that ε ∈ L∗ if L , .

R R
The
reversal, L , of a language L is the language of reversals, L =
R
{σ σ ∈ L }.
1.14 Example Where the language is the set of bitstrings L = { 1000001, 1100001 }
then the reversal is L R = { 1000001, 10000011 }.
1.15 Example If the language L consists of two strings { a, bc } then the second power
of that language is L2 = { aa, abc, bca, bcbc }. Its Kleene star is this.
L∗ = {ε, a, bc, aa, abc, bca, bcbc, aaa, ... }

1.16 Remark Here are two points about Kleene star. Earlier, for an alphabet Σ we
defined Σ∗ to be the set of strings over that alphabet, of any length. The two
definitions agree if we take each character in the alphabet to be a length-one string.
Also, we could define the operation of repeatedly choosing strings from the
language in two ways. We could choose a string σ from the language and then
replicate, getting the set of σ k ’s, Or, we could repeat choosing strings from the
language, getting σ0 ⌢ σ1 ⌢ · · · ⌢ σk −1 ’s. The second case is more useful and that’s
the definition of L∗ .
We close with a comment that bears on how we will use languages in later
chapters. We will say that a machine decides a language if that machine computes
whether or not its input is a member of the language. However, we have seen
the distinction between computable and computably enumerable, that for some
sets there is a machine that determines in a finite time, for all inputs, if the input
is a member of that set, but no machine can determine in a finite time for all
inputs that the input is not in the set. We will say that a machine recognizes (or
accepts, or semidecides) a language when, given an input, if the input is in the
language then that machine is able to compute that fact, and if the input is not
in the language then the machine will never incorrectly report that it is. (The
machine may either explicitly determine that it is not, or simply fail to report a
conclusion, perhaps by failing to halt.) In short, deciding a language means that
†
For technical convenience we take L0 = { ε } even when L = ; see Exercise 1.36.
on any input the machine correctly computes all ‘yes’ and all ‘no’ answers, while
recognizing a language requires only that it correctly computes all ‘yes’ answers.
III.1 Exercises
1.17 List five of the shortest strings in each language, if there are five.
(a) {σ ∈ B∗ the number of 0’s plus the number of 1’s equals 3 }
(b) {σ ∈ B∗ σ ’s first and last characters are equal }

✓ 1.18 Is the set of decimal representations of real numbers a language?

1.19 Which of these is a palindrome: ()() or )(()? (a) Only the first (b) Only
the second (c) Both (d) Neither
✓ 1.20 Show that if β is a string then β ⌢ β R is a palindrome. Do all palindromes
have that form?
✓ 1.21 Let L0 = {ε, a, aa, aaa } and L1 = {ε, b, bb, bbb }. (a) List all the members
of L0 ⌢ L1 . (b) List all the members of L1 ⌢ L0 . (c) List all the members of L2 .0
(d) List ten members, if there are ten, of L0 ∗ .
✓ 1.22 List five members of each language, if there are five, and
if not list them all.
(a) {σ ∈ {a, b }∗ σ = an b for n ∈ N } (b) {σ ∈ {a, b }∗ σ = an bn for n ∈ N }

(c) { 1n 0n+1 ∈ B∗ n ∈ N } (d) { 1n 02n 1 ∈ B∗ n ∈ N }

✓ 1.23 Where L = { a, ab }, list each. (a) L2 (b) L3 (c) L1 (d) L0
1.24 Where L0 = { a, ab } and L1 = { b, bb } find each. (a) L0 ⌢ L1 (b) L1 ⌢ L0
(c) L2 (d) L2 (e) L2 ⌢ L2
0 1 0 1
1.25 Suppose that the language L0 has three elements and L1 has two. Knowing
only that information, for each of these, what is the least number of elements
possible and what is the greatest number possible? (a) L0 ∪ L1 (b) L0 ∩ L1
(c) L0 ⌢ L1 (d) L21 (e) L1 R (f) L0 ∗ ∩ L1 ∗
1.26 Let L = { a, b }. Why is L0 defined to be {ε }? Why not ?
1.27 What is the language that is the Kleene star of the empty set, ∗ ?
✓ 1.28 Is the k -th power of a language the same as the language of k -th powers?
1.29 Does L∗ differ from (L ∪ {ε })∗ ?
1.30 We can ask how many elements are in the set L2 .
(a) Prove that if two strings are unequal then their squares are also unequal.
Conclude that if L has k -many elements then L2 has at least k -many elements.
(b) Provide an example of a nonempty language that achieves this lower bound.
(c) Prove that where L has k -many elements, L2 has at most k 2 -many.
(d) Provide an example, for each k ∈ N, of a language that achieves this upper
bound.
1.31 Prove that L∗ = L0 ∪ L1 ∪ L2 ∪ · · · .
1.32 Consider the empty language L0 = . For any language L1 , describe L1⌢L0 .
Section 1. Languages 149
1.33 A language L over some Σ is finite if | L | < ∞.

(a)If the language is finite must the alphabet be finite?
(b)Show that there is some bound B ∈ N where |σ | ≤ B for all σ ∈ L.
(c)Show that the class of finite languages is closed under finite union. That is,
show that if L0 , ... Lk are finite languages over a shared alphabet for some
k ∈ N then their union is also finite.
(d) Show also that the class of finite languages is closed under finite intersection
and finite concatenation.
(e) Show that the class of finite languages is not closed under complementation
or Kleene star.
1.34 What is the difference between the languages L = {σ ∈ Σ∗ σ = σ R } and

L̂ = {σ ⌢ σ R σ ∈ Σ∗ }?
1.35 For any language L ⊆ Σ∗ we can form the set of prefixes.
Pref (L) = {τ ∈ Σ∗ σ ∈ L and τ is a prefix of σ }

Where Σ = { a, b } and L = { abaaba, bba }, find Pref (L).

1.36 This explains why we define L0 = {ε } even when L = .
(a) Show that Lm ⌢ Ln = Lm+n for any m, n ∈ N+ .
(b) Show that if L0 = then L0 ⌢ L1 = L1 ⌢ L0 = .
(c) Argue that if L , then the only sensible definition for L0 is {ε }.
(d) Why would L = throw a monkey wrench if the works unless we define
L0 = {ε }?
1.37 Prove these for any alphabet Σ. (a) For any natural number n the language
Σn is countable. (b) The language Σ∗ is countable.
1.38 Another way of defining the powers of a language is: L0 = {ε }, and
Lk +1 = Lk ⌢ L. Show this is equivalent to the one given in Definition 1.13.
1.39 True or false: if L⌢L = L then either L = or ε ∈ L? If it is true then prove
it and if it is false give a counterexample.
1.40 Prove that no language contains a representation for each real number.
1.41 The operations of languages form an algebraic system. Assume these lan-
guages are over the same alphabet. Show each.
(a) Language union and intersection are commutative, L0 ∪ L1 = L1 ∪ L0 and
L0 ∩ L1 = L1 ∩ L0 .
(b) The language consisting of the empty string is the identity element with
respect to language concatenation, so L ⌢ {ε } = L and {ε } ⌢ L = L.
(c) Language concatenation need not be commutative; there are languages such
that L0 ⌢ L1 , L1 ⌢ L0 .
(d) Language concatenation is associative, (L0 ⌢ L1 ) ⌢ L2 = L0 ⌢ (L1 ⌢ L2 ).
R
(e) (L0 ⌢ L1 ) = L1 R ⌢ L0 R .
(f) Concatenation is left distributive over union, (L0 ∪ L1 ) ⌢ L2 = (L0 ⌢ L2 ) ∪
(L1 ⌢ L2 ), and also right distributive.
(g) The empty language is an annihilator for concatenation, ⌢ L = L ⌢ = .

∗
(h) The Kleene star operation is idempotent, (L∗ ) = L∗ .
Section
III.2 Grammars
We have defined that a language is a set of strings. But this allows for any willy-nilly
set. In practice a language is usually given by rules.
Here is an example. Native English speakers will say that the noun phrase
“the big red barn” sounds fine but that “the red big barn” sounds wrong. That is,
sentences in natural languages are constructed in patterns and the second of those
does not follow the English pattern. Artificial languages such as programming
languages also have syntax rules, usually very strict rules.
A grammar a set of rules for the formation of strings in a language, that is, it
is an analysis of the structure of a language. In an aphorism, grammars are the
language of languages.
Definition Before the formal definition we’ll first see an example.

2.1 Example This is a subset of the rules for for English: (1) a sentence can be made
from a noun phrase followed by a verb phrase, (2) a noun phrase can be made
from an article followed by a noun, (3) a noun phrase can also be made from an
article then an adjective then a noun, (4) a verb phrase can be made with a verb
followed by a noun phrase, (5) one article is ‘the’, (6) one adjective is ‘young’,
(7) one verb is ‘caught’, (8) two nouns are ‘man’ and ‘ball’.
This is a convenient notation for the rules just listed.
⟨sentence⟩ → ⟨noun phrase⟩ ⟨verb phrase⟩
⟨noun phrase⟩ → ⟨article⟩ ⟨noun⟩
⟨noun phrase⟩ → ⟨article⟩ ⟨adjective⟩ ⟨noun⟩
⟨verb phrase⟩ → ⟨verb⟩ ⟨noun phrase⟩
⟨article⟩ → the
⟨adjective⟩ → young
⟨verb⟩ → caught
⟨noun⟩ → man | ball
Each line is a production or rewrite rule. Each has one arrow, →.† To the left of
each arrow is a head and to the right is a body or expansion. Sometimes two rules
have the same head, as with ⟨noun phrase⟩ . There are also two rules for ⟨noun⟩
but we have abbreviated by combining the bodies using the ‘ | ’ pipe symbol.‡
† ‡
Read the arrow aloud as “may produce,” or “may expand to,” or “may be constructed as.” Read
aloud as “or.”
Section 2. Grammars 151
The rules use two different components. The ones written in typewriter type,
such as young, are from the alphabet Σ of the language. These are terminals. The
ones written with angle brackets and in italics, such as ⟨article⟩ , are nonterminals.
These are like variables, and are used for intermediate steps.
The two symbols ‘→’ and ‘|’ are neither terminals nor nonterminals. They are
metacharacters, part of the syntax of the rules themselves.
These rewrite rules govern the derivation of strings in the language. Under
the English grammar every derivation starts with ⟨sentence⟩ . Along the way,
intermediate strings contain a mix of nonterminals and terminals. The rules
all have a head with a single nonterminal. So to derive the next string, pick a
nonterminal in the present string and substitute an associated rule body.
⟨sentence⟩ ⇒ ⟨noun phrase⟩ ⟨verb phrase⟩
⇒ ⟨article⟩ ⟨adjective⟩ ⟨noun⟩ ⟨verb phrase⟩
⇒ the ⟨adjective⟩ ⟨noun⟩ ⟨verb phrase⟩
⇒ the young ⟨noun⟩ ⟨verb phrase⟩
⇒ the young man ⟨verb phrase⟩
⇒ the young man ⟨verb⟩ ⟨noun phrase⟩
⇒ the young man caught ⟨noun phrase⟩
⇒ the young man caught ⟨article⟩ ⟨noun⟩
⇒ the young man caught the ⟨noun⟩
⇒ the young man caught the ball
Note that the single line arrow → is for rules, while the double line arrow ⇒ is for
derivations.†
The derivation above always substitutes for the leftmost nonterminal, so it is a
leftmost derivation. However, in general we could substitute for any nonterminal.
The derivation tree or parse tree is an alternative representation.‡
hsentencei
hnoun phrasei hverb phrasei
harticlei hadjectivei hnouni hverbi hnoun phrasei
the young man caught harticlei hnouni
the ball
2.2 Definition A context-free grammar is a four-tuple G = ⟨Σ, N , S, P⟩ . First, Σ

is an alphabet, whose elements are the terminal symbols. Second, N is a set of
nonterminals or syntactic categories. (We assume that Σ and N are disjoint and
† ‡
Read ‘⇒’ aloud as “derives” or “expands to.” The terms ‘terminal’ and ‘nonterminal’ come from
where the components lie in this tree.
that neither contains metacharacters.) Third, S ∈ N is the start symbol. Fourth,

P is a set of productions or rewrite rules.
We will take the start symbol to be the head of the first rule.
2.3 Example This context free grammar describes algebraic expressions that involve
only addition, multiplication, and parentheses.
⟨expr⟩ → ⟨term⟩ + ⟨expr⟩ | ⟨term⟩
⟨term⟩ → ⟨term⟩ * ⟨factor⟩ | ⟨factor⟩
⟨factor⟩ → ( ⟨expr⟩ ) | a | b | . . . | z
Here is a derivation of the string x*(y+z).
hexpri
⟨expr⟩ ⇒ ⟨term⟩
⇒ ⟨term⟩ * ⟨factor⟩ htermi
⇒ ⟨factor⟩ * ⟨factor⟩
htermi * hfactori
⇒ x * ⟨factor⟩
⇒ x * ( ⟨expr⟩ ) hfactori ( hexpri )
⇒ x * ( ⟨term⟩ + ⟨expr⟩ )
⇒ x * ( ⟨term⟩ + ⟨term⟩ ) x htermi + hexpri
⇒ x * ( ⟨factor⟩ + ⟨term⟩ )
hfactori htermi
⇒ x * ( ⟨factor⟩ + ⟨factor⟩ )
⇒ x * ( y + ⟨factor⟩ ) y hfactori
⇒ x*(y+z)
z
In that example the rules for ⟨expr⟩ and ⟨term⟩ are recursive. But we don’t
get stuck in an infinite regress because the question is not whether you could
perversely keep expanding ⟨expr⟩ forever; the question is whether, given a string
such as x*(y+z), you can find a terminating derivation.
In the prior example the nonterminals such as ⟨expr⟩ or ⟨term⟩ describe the
role of those components in the language, as did the English grammar fragment’s
⟨noun phrase⟩ and ⟨article⟩ . But in the examples and exercises below we often use
small grammars whose terminals and nonterminals do not have any particular
meaning. For these cases, we often move from the verbose notation like ‘ ⟨sentence⟩
→ ⟨noun phrase⟩ ⟨verb phrase⟩ ’ to writing single letters, with nonterminals in
upper case and terminals in lower case.
2.4 Example This two-rule grammar has one nonterminal, S.
S → aSb | ε
Here is a derivation of the string a2 b2 .
S ⇒ aSb ⇒ aaSbb ⇒ aaε bb ⇒ aabb
Similarly, S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaε bbb ⇒ aaabbb is a deriva-

tion of a3 b3 . For this grammar, derivable strings have the form an bn for n ∈ N.
We next give a complete description of how the production rules govern the
derivations. Each rule in a context free grammar has the form ‘head → body’
where the head consists of a single nonterminal. The body is a sequence of
terminals and nonterminals. Each step of a derivation has the form below, where
τ0 and τ1 are sequences of terminals and non-terminals.
τ0 ⌢ head ⌢τ1 ⇒ τ0 ⌢ body ⌢τ1
That is, if there is a match for the rule’s head then we can replace it with the body.
Where σ0 , σ1 are sequences of terminals and nonterminals, if they are related
by a sequence of derivation steps then we may write σ0 ⇒∗ σ1 . Where σ0 = S is
the start symbol, if there is a derivation σ0 ⇒∗ σ1 that finishes with a string of
terminals σ1 ∈ Σ∗ then we say that σ1 has a derivation from the grammar.†
This description is like the one on page 8 detailing how a Turing machine’s
instructions determine the evolution of the sequence of configurations that is a
computation. That is, production rules are like a program, directing a derivation.
However, one difference from that page’s description is that there Turing machines
are deterministic, so that from a given input string there is a determined sequence
of configurations. Here, from a given start symbol a derivation can branch out to
go to many different ending strings.
2.5 Definition The language derived from a grammar is the set of strings of
terminals having derivations that begin with the start symbol.
2.6 Example This grammar’s language is the set of representations of natural numbers.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | . . . | 9
This is a derivation for the string 321, along with its parse tree.
†
This definition of rules, grammars, and derivations suffices for us but it is not the most general one.
One more general definition allows heads of the form σ0 X σ1 , where σ0 and σ1 are strings of terminals.
(The σi ’s can be empty.) For example, consider this grammar: (i) S → aBSc | abc, (ii) Ba →
aB, (iii) Bb → bb. Rule (ii) says that if you see a string with something followed by a then you can
replace that string with a followed by that thing. Grammars with heads of the form σ0 X σ1 are context
sensitive because we can only substitute for X in the context of σ0 and σ1 . These grammars describe
more languages than the context free ones that we are using. But our definition satisfies our needs and
is the class of grammars that you will see in practice.
hnaturali
⟨natural⟩ ⇒ ⟨digit⟩⟨natural⟩
hdigiti hnaturali
⇒ 3 ⟨natural⟩
⇒ 3 ⟨digit⟩⟨natural⟩
3 hdigiti hnaturali
⇒ 32 ⟨natural⟩
⇒ 32 ⟨digit⟩
2 hdigiti
⇒ 321
1
2.7 Example This grammar’s language is the set of strings representing natural
numbers in unary.
⟨natural⟩ → ε | 1 ⟨natural⟩
2.8 Example Any finite language is derived from a grammar. This one gives the
language of all length 2 bitstrings, using the brute force approach of just listing all
the member strings.
S → 00 | 01 | 10 | 11
This gives the length 3 bitstrings by using the nonterminals to keep count.
A → 0B | 1B
B → 0C | 1C
C → 0|1
2.9 Example For this grammar
S → aSb | T | U
T → aS | a
U → Sb | b
an alternative is to replace T and U by their expansions to get this.
S → aSb | aS | a | Sb | b
It generates the language L = { ai bj ∈ { a, b }∗ i , 0 or j , 0 }.

The prior example is the first one where the generated language is not clear
so we will do a formal verification. We will show mutual containment, that the
generated language is a subset of L and that it is also a superset. The rule that
eliminates T and U shows that any derivation step τ0 ⌢ head ⌢τ1 ⇒ τ0 ⌢ body ⌢τ1
only adds a’s on the left and b’s on the right, so every string in the language has
the form ai bj . That same rule shows that in any terminating derivation S must
eventually be replaced by either a or b. Together these two give that the generated
language is a subset of L.
For containment the other way, we will prove that every σ ∈ L has a derivation.
We will use induction on the length |σ | . By the definition of L the base case
is |σ | = 1. In this case either σ = a or σ = b, each of which obviously has a

derivation.
For the inductive step, suppose that every string from L of length k = 1, . . .,
k = n has a derivation for n ≥ 1 and let σ have length n + 1. Write σ = ai bj .
There are three cases: either i > 1, or j > 1, or i = j = 1. If i > 1 then σ̂ = ai−1 bj
is a string of length n , so by the inductive hypothesis it has a derivation S ⇒ · · ·
⇒ σ̂ . Prefixing that derivation with a S ⇒ aS step will put an additional a on
the left. The j > 1 case works the same way, and σ = a1 b1 is easy.
2.10 Example The fact that derivations can go more than one way leads to an important
issue with grammars, that they can be ambiguous. Consider this fragment of a
grammar for if statements in a C-like language
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩
⟨stmt⟩ → if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
and this code string.
if enrolled(s) if studied(s) grade='P' else grade='F'
Here are the first two lines of one derivation

⟨stmt⟩ ⇒ if ⟨bool⟩ ⟨stmt⟩
⇒ if ⟨bool⟩ if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
and here are the first two of another.
⟨stmt⟩ ⇒ if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
⇒ if ⟨bool⟩ if ⟨bool⟩ ⟨stmt⟩ else ⟨stmt⟩
That is, we cannot tell whether the else in the code line is associated with the
first if or the second. The resulting parse trees for the full code line dramatize the
difference
if if
enrolled(s) if enrolled(s) if else grade='F'
studied(s) grade='P' else grade='F' studied(s) grade='P'
as do these copies of the code string indented to show the association.
if enrolled(s) if enrolled(s)
if studied(s) if studied(s)
grade='P' grade='P'
else else
grade='F' grade='F'
Obviously, those programs behave differently. This is known as a dangling else.

2.11 Example This grammar for elementary algebra expressions

⟨expr⟩ → ⟨expr⟩ + ⟨expr⟩
| ⟨expr⟩ * ⟨expr>⟩
| ( ⟨expr⟩ ) | a | b | . . . z
is ambiguous because a+b*c has two leftmost derivations.
⟨expr⟩ ⇒ ⟨expr⟩ + ⟨expr⟩ ⇒ a + ⟨expr⟩

⇒ a + ⟨expr⟩ * ⟨expr⟩ ⇒ a + b * ⟨expr⟩ ⇒ a + b * c
⟨expr⟩ ⇒ ⟨expr⟩ * ⟨expr⟩ ⇒ ⟨expr⟩ + ⟨expr⟩ * ⟨expr⟩
⇒ a + ⟨expr⟩ * ⟨expr⟩ ⇒ a + b * ⟨expr⟩ ⇒ a + b * c
The two give different parse trees.
hexpri hexpri
hexpri + hexpri hexpri * hexpri
a hexpri * hexpri hexpri + hexpri c
b c a b
Again, the issue is that we get two different behaviors. For instance, substitute 1
for a, and 2 for b, and 3 for c. The left tree gives 1 + (2 · 3) = 7 while the right
tree gives (1 + 2) · 3 = 9.
In contrast, this grammar for elementary algebra expressions is unambiguous.
⟨expr⟩ → ⟨expr⟩ + ⟨term⟩
| ⟨term⟩
⟨term⟩ → ⟨term⟩ * ⟨factor⟩
| ⟨factor⟩
⟨factor⟩ → ( ⟨expr⟩ )
| a | b | ... | z
Choosing grammars that are not ambiguous is important in practice.
III.2 Exercises
✓ 2.12 Use the grammar of Example 2.3. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar, besides the string
in the example. (f) Give three strings in the language { +, *, ), (, a ... , z }∗ that
cannot be derived.
2.13 Use the grammar of Exercise 2.15. (a) What is the start symbol? (b) What
are the terminals? (c) What are the nonterminals? (d) How many rewrite rules
does it have? (e) Give three strings derived from the grammar besides the ones
in the exercise, or show that there are not three such strings. (f) Give three
strings in the language L = {σ ∈ Σ ∪ { space }∗ Σ is the set of terminals } that
cannot be derived from this grammar, or show there are not three such strings.
2.14 Use this grammar.
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩⟨natural⟩
⟨digit⟩ → 0 | 1 | . . . | 9
(a) What is the alphabet? What are the terminals? The nonterminals? What
is the start symbol? (b) For each production, name the head and the body.
(c) Which are the metacharacters that are used? (d) Derive 42. Also give its
parse tree. (e) Derive 993 and give the associated parse tree. (f) How can
⟨natural⟩ be defined in terms of ⟨natural⟩ ? Doesn’t that lead to infinite regress?
(g) Extend this grammar to cover the integers. (h) With this grammar, can you
derive +0? -0?
✓ 2.15 From this grammar
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct object⟩
⟨direct object⟩ → ⟨article⟩ ⟨noun⟩
⟨article⟩ → the | a
⟨noun⟩ → car | wall
⟨verb⟩ → hit
derive each of these: (a) the car hit a wall (b) the car hit the wall
(c) the wall hit a car.
2.16 In the language generated by this grammar.
⟨sentence⟩ → ⟨subject⟩ ⟨predicate⟩
⟨subject⟩ → ⟨article⟩ ⟨noun1⟩
⟨predicate⟩ → ⟨verb⟩ ⟨direct-object⟩
⟨direct-object⟩ → ⟨article⟩ ⟨noun2⟩
⟨article⟩ → the | a | ε
⟨noun1⟩ → dog | flea
⟨noun2⟩ → man | dog
⟨verb⟩ → bites | licks
(a) Give a derivation for dog bites man.
(b) Show that there is no derivation for man bites dog.
✓ 2.17 Your friend tries the prior exercise and you see their work so far.
⟨sentence⟩ ⇒ ⟨subject⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨predicate⟩
⇒ ⟨article⟩ ⟨noun1⟩ ⟨verb⟩ ⟨direct object⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨noun2⟩
⇒ ⟨article⟩ ⟨dog|flea⟩ ⟨verb⟩ ⟨article⟩ ⟨man|dog⟩
Stop them and explain what they are doing wrong.
2.18 With the grammar of Example 2.3, derive (a+b)*c.
✓ 2.19 Use this grammar
S → TbU
T → aT | ε
U → aU | bU | ε
for each part. (a) Give both a leftmost derivation and rightmost derivation of
aabab. (b) Do the same for baab. (c) Show that there is no derivation of aa.
2.20 Use this grammar.
S → aABb
A → aA | a
B → Bb | b
(a) Derive three strings.
(b) Name three strings over Σ = { a, b } that are not derivable.
(c) Describe the language generated by this grammar.
2.21 Give a grammar for the language { an bn+m am n, m ∈ N }.

✓ 2.22 Give the parse tree for the derivation of aabb in Example 2.4.
2.23 Verify
that the language derived from the grammar in Example 2.4 is L =
{ an bn n ∈ N }.
2.24 What is the language generated by this grammar?
A → aA | B
B → bB | cA
✓ 2.25 In many programming languages identifier names consist of a string of letters
or digits, with the restriction that the first character must be a letter. Create a
grammar for this, using ASCII letters.
2.26 Early programming languages had strong restrictions on what could be a
variable name. Create a grammar for a language that consists of strings of at
most four characters, upper case ASCII letters or digits, where the first character
must be a letter.
2.27 What is the language generated by a grammar with a set of production rules
that is empty?
2.28 Create a grammar for each of these languages.

∗
(a) the language of all character strings L = { a, ... , z }
∗

(b) the language of strings of at least one digit {σ ∈ { 0, ... , 9 } |σ | ≥ 1 }
✓ 2.29 This is a grammar for postal addresses. Note the use of the empty string ε to
make ⟨opt suffix⟩ optional.
⟨postal address⟩ → ⟨name⟩ ⟨EOL⟩ ⟨street address⟩ ⟨EOL⟩ ⟨town⟩
⟨name⟩ → ⟨personal part⟩ ⟨last name⟩ ⟨opt suffix⟩
⟨street address⟩ → ⟨house num⟩ ⟨street name⟩ ⟨apt num⟩
⟨town⟩ → ⟨town name⟩ , ⟨state or region⟩
⟨personal part⟩ → ⟨initial⟩ . | ⟨first name⟩
⟨last name⟩ → ⟨char string⟩
⟨opt suffix⟩ → Sr. | Jr. | ε
⟨house num⟩ → ⟨digit string⟩
⟨street name⟩ → ⟨char string⟩
⟨apt num⟩ → ⟨char string⟩ | ε
⟨town name⟩ → ⟨char string⟩
⟨state or region⟩ → ⟨char string⟩
⟨initial⟩ → ⟨char⟩
⟨first name⟩ → ⟨char string⟩
⟨char string⟩ → ⟨char⟩ | ⟨char⟩ ⟨char string⟩ | ε
⟨char⟩ → A | B | . . . z | 0 | . . . 9 | (space)
⟨digit string⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨digit string⟩ | ε
⟨digit⟩ → 0 | . . . 9
The nonterminal ⟨EOL⟩ expands to whatever marks an end of line, while (space)
signifies a space character, ASCII 0 or 32.
(a) Give a derivation for this address.
President
1600 Pennsylvania Avenue
Washington, DC
(b) Why is there no derivation for this address?
Sherlock Holmes
221B Baker Street
London, UK
Suggest a modification of the grammar to include this address.
(c) Give three reasons why this grammar is inadequate for general use. (Perhaps
no grammar would suffice that is less general than one that just accepts any
character string; the other obvious possibility is the grammar that lists as
separate rules every valid addresses in the world, which is just silly.)
2.30 Recall Turing’s prototype computer, a clerk doing the symbolic manipulations
to multiply two large numbers. Deriving a string from a grammar has a similar
feel and we can write grammars to do computations. Fix the alphabet Σ = { 1 },
so that we can interpret derived strings as numbers represented in unary.
(a) Produce a grammar whose language is the even numbers, { 12n n ∈ N } .

(b) Do the same for the multiples of three, { 13n n ∈ N } .

✓ 2.31 Here is a grammar notable for being small.

⟨sentence⟩ → buffalo ⟨sentence⟩ | ε
(a) Derive a sentence of length one, one of length two, and one of length three.
(b) Give those sentences semantics, that is, make sense of them as English
sentences.
2.32 Here is a grammar for LISP.
⟨s expression⟩ → ⟨atomic symbol⟩
| ( ⟨s expression⟩ . ⟨s expression⟩ )
| ⟨list⟩
⟨list⟩ → ( ⟨list-entries⟩ )
⟨list-entries⟩ → ⟨s expression⟩
| ⟨s expression⟩ ⟨list-entries⟩
⟨atomic symbol⟩ → ⟨letter⟩ ⟨atom part⟩
⟨atom part⟩ → ε
| ⟨letter⟩ ⟨atom part⟩
| ⟨number⟩ ⟨atom part⟩
⟨letter⟩ → a | . . . z
⟨number⟩ → 0 | . . . 9
Give a derivation for each string. (a) (a . b) (b) (a . (b . c))
(c) ((a . b) . c)
2.33 Using the Example 2.11’s unambiguous grammar, produce a derivation for
a+(b*c).
2.34 The simplest example of an ambiguous grammar is
S → S |ε
(a) What is the language generated by this grammar?
(b) Produce two different derivations of the empty string.
2.35 This is a grammar for the language of bitstrings L = B∗ .
⟨bit-string⟩ → 0 | 1 | ⟨bit-string⟩ ⟨bit-string⟩
Show that it is ambiguous.
2.36 (a) Show that this grammar is ambiguous by producing two different
leftmost derivations for a-b-a.
Section 3. Graphs 161
E → E-E |a |b
(b) Derive a-b-a from this grammar, which is unambiguous.
E → E-T |T
T → a |b
2.37 Use the grammar from the footnote on 153 to derive aaabbbccc.
Section
III.3 Graphs
Researchers in the Theory of Computation often state their problems, and the
solution of those problems, in the language of Graph Theory. Here are two examples
we have already seen. Both have vertices connected by edges that represent a
relationship between the vertices.
hexpri
htermi
htermi * hfactori
q0 q1 q2 q3 hfactori ( hexpri )
B, L B, L B, R
1, R 1, B 1, L
x htermi + hexpri
hfactori htermi
y hfactori
Definition We start with the basics.

3.1 Definition A simple graph is an ordered pair G = ⟨N , E ⟩ where N is a finite set
of vertices† or nodes and E is a set of edges. Each edge is a set of two distinct
vertices.
3.2 Example This simple graph G has five vertices N = {v 0 , ... , v 4 } and eight edges.
v0
v1 v2
E = { {v 0 , v 1 }, {v 0 , v 2 }, ... {v 3 , v 4 } }
v3 v4
Important: a graph is not its picture. Both of these pictures show the same graph
†
Graphs can have infinitely many vertices but we won’t ever need them. For convenience of notation
we will stick to finite ones.
as above because they show the same vertices and the same connections.
v0
v3 v1 v1 v2
v0
v4 v2 v3 v4
Instead of writing e = {v, v̂ } we often write e = vv̂ . Since sets are unordered
we could write the same edge as e = v̂v .
3.3 Definition Two graph edges are adjacent if they share a vertex, so that they are
uv and vw . A walk is a sequence of adjacent edges ⟨v 0v 1 , v 1v 2 , ... , vn−1vn ⟩ . Its
length is the number of edges, n . If the initial vertex v 0 equals the final vertex vn
then is closed walk, otherwise it is open. If no edge occurs twice then it is a trail.
If a trail’s vertices are distinct, except possibly that the initial vertex equals the
final vertex, then it is a path. A closed path with at least one edge is a cycle. A
graph is connected if between any two vertices there is a path.
3.4 Example On the left is a path from u 0 to u 3 ; it is also a trail and a walk. On the
right is a cycle.
v0 v1
u0
v2 v3
u1
v4 v5
u2 u3
v6 v7
There are many variations of Definition 3.1, used for modeling circumstances
that a simple graph cannot model. One variant allows that some vertices to connect
to themselves, forming a loop. Another is a multigraph, which allows two vertices
to have more than one edge between them.
Still another is a weighted graph, which gives each edge a real number
weight, perhaps signifying the distance or the cost in money or in time
to traverse that edge.
A very common variation is a directed graph or digraph, where edges
have a direction, as in a road map that includes one-way streets. In a
digraph, if an edge is directed from v to v̂ then we can write it as vv̂ but
not in the other order. The Turing machine at the start of this section is
a digraph and also has loops.
Some important graph variations involve the nature of the connections. Courtesy
A tree is an undirected connected graph with no cycles. At the start of xkcd.com
this section is a syntax tree. A directed acyclic graph or DAG is a directed
graph with no directed cycles.
Traversal Many problems involve moving through a graph.

3.5 Definition From one vertex v 0 , another vertex v 1 is reachable if there is a path
from the first to the second.
3.6 Definition In a graph, a circuit is a closed walk that either contains all of the
edges, making it an Euler circuit, or all of the vertices, making it a Hamiltonian
circuit.
3.7 Example The graph on the right of Example 3.4 is a Hamiltonian circuit but not
an Euler circuit.
3.8 Definition Where G = ⟨N , E ⟩ is a graph, a subgraph Ĝ = ⟨N̂ , Ê ⟩ satisfies
N̂ ⊆ N and Ê ⊆ E . A subgraph with every possible edge, with the property that
for every e = vi v j ∈ E such that vi , v j ∈ N̂ then e ∈ Ê , is an induced subgraph.
3.9 Example In the graph G on the left of Example 3.4, consider the highlighted path
E = {u 0u 1 , u 1u 3 }. Taking those edges along with the vertices that they contain,
N̂ = {u 0 , u 1 , u 3 }, gives a subgraph Ĝ .
Also in G , the induced subgraph involving the set of vertices {u 0 , u 2 , u 3 } is the
outer triangle.
Graph representation A common way to represent a graph in a computer is with

a matrix. This example represents Example 3.2’s graph: it has a 1 in the i, j entry
if the graph has an edge from vi to v j and a 0 otherwise.
v0 v1 v2 v3 v4
v0 0 1 1 0 0
v1 ©1 0 1 1 1 ª®
M( G ) = v 2 ( ∗)

1 1 0 1 1 ®®
0 1 1 0 1®

v3
v4 «0 1 1 1 0¬
3.10 Definition For a graph G , the adjacency matrix M(G ) representing the graph
has i, j entries equal to the number of edges from vi to v j .
This definition covers graph variants that were listed earlier. For instance, the
graph represented in (∗) is a simple graph because the matrix has only 0 and 1
entries, because all the diagonal entries are 0, and because the matrix is symmetric,
meaning that the i, j entry has a 1 if and only if the j, i entry is also 1. If a graph has
a loop then the matrix has a diagonal entry that is a positive integer. If the graph
is directed and has a one-way edge from vi to v j then the i, j entry records that
edge but the j, i entry does not. And for a multigraph, where there are multiple
edges from one vertex to another, the associated entry will be larger than 1.
3.11 Lemma Let the matrix M(G ) represent the graph G . Then in its matrix multiplica-
tive n -th power the i, j entry is the number of paths of length n from vertex vi to
vertex v j .
Colors We sometimes partition a graph’s vertices.
3.12 Definition A k -coloring of a graph, for k ∈ N, is a partition of vertices to k -many

classes such that adjacent vertices do not come from the same class.
On the left is a graph that is 3-colored.
On the right the graph has no 3-coloring. The argument goes: the four vertices are
completely connected to each other. If two get the same color then they will be
adjacent same-colored vertices. So a coloring requires four colors.
3.13 Example This shows five committees, where some committees share some
members. How many time slots do we need in order to schedule all committees so
no members must be in two places at once?
A B C D E
Armis Crump Burke India Burke
Jones Edwards Frank Harris Jones
Smith Robinson Ke Smith Robinson
Model this with a graph by taking each vertex to be a committee and if committees
are related by sharing a member then put an edge between them.
B
A C
E D
The picture shows that three colors is enough, that is, three time slots suffice.
Graph isomorphism We sometimes want to know when two graphs are essentially
identical. Consider these two.
w0 w1
v3 v4 v5
w5 w2
v0 v1 v2
w4 w3
They have the same number of vertices and the same number of edges. Further, on
the right as well as on the left there are two classes of vertices where all the vertices
in the first class connect to all the vertices in the second class (on the left the two
classes are the top and bottom rows while on the right they are {w 0 , w 2 , w 4 } and
{w 1 , w 3 , w 5 }). A person may suspect that as in Example 3.2 these are two ways to
draw the same graph, with the vertex names changed for further obfuscation.
That’s true: if we make a correspondence between the vertices in this way
Vertex on left v0 v1 v2 v3 v4 v5
Vertex on right w0 w2 w4 w1 w3 w5
then as a consequence the edges also correspond.
Edge on left {v 0 , v 3 } {v 0 , v 4 } {v 0 , v 5 } {v 1 , v 3 } {v 1 , v 4 } {v 1 , v 5 }
Edge on right {w 0 , w 1 } {w 0 , w 3 } {w 0 , w 5 } {w 2 , w 1 } {w 2 , w 3 } {w 2 , w 5 }
Edge on left (cont) {v 2 , v 3 } {v 2 , v 4 } {v 2 , v 5 }
Edge on right {w 2 , w 1 } {w 2 , w 3 } {w 2 , w 5 }
3.14 Definition Two graphs G and Ĝ are isomorphic if there is a one-to-one and onto
map f : N → N̂ such that G has an edge {vi , v j } ∈ E if and only if Ĝ has the
associated edge { f (vi ), f (v j ) } ∈ Ê .
To verify that two graphs are isomorphic the most natural thing is to produce
the map f and then verify that in consequence the edges also correspond. The
exercises have examples.
Showing that graphs are not isomorphic usually entails finding some graph-
theoretic way in which they differ. A common and useful such property is to
consider the degree of a vertex, the total number of edges touching that vertex with
the proviso that a loop from the vertex to itself counts as two. The degree sequence
of a graph is the non-increasing sequence of its vertex degrees. Thus, the graph in
Example 3.13 has degree sequence ⟨3, 2, 1, 1, 1⟩ . Exercise 3.32 shows that if graphs
are isomorphic then associated vertices have the same degree and thus graphs with
different degree sequences are not isomorphic. Also, if the degree sequences are
equal then they help us construct an isomorphisms, if there is one; examples of this
are in the exercises. (Note, though, that there are graphs with the same degree
sequence that are not isomorphic.)
III.3 Exercises
✓ 3.15 Draw a picture of a graph illustrating each relationship. Some graphs will be
digraphs, or may have loops or multiple edges between some pairs of vertices.
(a) Maine is adjacent Massachusetts and New Hampshire. Massachusetts is

adjacent to every other state. New Hampshire is adjacent to Maine, Mas-
sachusetts, and Vermont. Rhode Island is adjacent to Connecticut and
Massachusetts. Vermont is adjacent to Massachusetts and New Hampshire.
Give the graph describing the adjacency relation.
(b) In the game of Rock-Paper-Scissors, Rock beats Scissors, Paper beats Rock,
and Scissors beats Paper. Give the graph of the ‘beats’ relation; note that this
is a directed relation.
(c) The number m ∈ N is related to the number n ∈ N by being its divisor if
there is a k ∈ N with m · k = n . Give the divisor relation graph among
positive natural numbers less than or equal to 12.
(d) The river Pregel cut the town of Königsberg into four land masses. There
were two bridges from mass 0 to mass 1 and one bridge from mass 0 to
mass 2. There was one bridge from mass 1 to mass 2, and two bridges from
mass 1 to mass 3. Finally, there was one bridge from mass 2 to 3. Consider
masses related by bridges. Give the graph (it is a multigraph).
(e) In our Mathematics program you must take Calculus II before you take
Calculus III, and you must take Calculus I before II. You must take Calculus II
before Linear Algebra, and to take Real Analysis you must have both Linear
Algebra and Calculus III.
3.16 The complete graph on n vertices, Kn is the simple graph with all possible
edges.
(a) Draw K 4 , K 3 , K 2 , and K 1 .
(b) Draw K 5 .
(c) How many edges does Kn have?
3.17 This is the Petersen graph, often used for examples in Graph Theory.
v0
v5
v1 v4
v6 v9
v7 v8
v2 v3
(a) List the vertices and edges.

(b) Give two walks from v 0 to v 7 . What is the length of each?
(c) List both a closed walk and an open walk of length five, starting at v 4 .
(d) Give a cycle starting at v 5 .
(e) Is this graph connected?
3.18 Let a graph G have vertices {v 0 , ... v 5 } and the edges v 0v 1 , v 0v 3 , v 0v 5 ,
v 1v 4 , v 3v 4 , and v 4v 5 . (a) Draw G . (b) Give its adjacency matrix. (c) Find all
subgraphs with four nodes and four edges. (d) Find all induced subgraphs with
four nodes and four edges.
3.19 A graph is a collection of vertices and edges, not a drawing. So a single
graph may have quite different pictures. Consider a graph G with the vertices
N = {A, ... H } and these edges.
E = {AB, AC, AG, AH , BC, BD, BF , CD, CE, DE, DF , EF , EG, F H , GH }
(a) Connect the dots below to get one drawing.

B E
A G
C H
D F
(b) A planar graph is one that can be drawn in the plane so that its edges do not
cross. Show that G is planar.
3.20 Fill in the table’s blanks.
Closed Vertices can Edges can

or open? repeat? repeat?
Walk
Trail
Circuit
Path
Cycle
✓ 3.21 Morse code represents text with a combination of a short sound, written ‘.’
and pronounced “dit,” and a long sound, written ‘-’ and pronounced “dah.” Here
are the representations of the twenty six English letters.
A .- F ..-. K -.- O --- S ... W .--

B -... G --. L .-.. P .--. T - X -..-
C -.-. H .... M -- Q --.- U ..- Y -.--
D -.. I .. N -. R .-. V ...- Z --..
E . J .---
Some representations are prefixes of others. Give the graph for the prefix relation.
3.22 Show that every tree has a 2-coloring.
3.23 A person keeps six species of fish as pets. Species A cannot be in a tank with
species B or C . Species B cannot be with A, C , or E . Species C cannot be with A,
B , D , or E . Species D cannot be with C or F . Species E cannot be together with
B , C , or F . Finally, species F cannot be in with D or E . (a) Draw the graph where
the nodes are species and the edges represent the relation ‘cannot be together’.
(b) Find the chromatic number. (c) Interpret it.
✓ 3.24 If two cell towers are within line of sight of each other then they must get
different frequencies. Here each tower is a vertex and an edge between towers
denotes that they can see each other.
v0 v1 v2
v3 v4 v5 v6
v7 v8 v9 v 10
What is the minimal number of frequencies? Give an assignment of frequencies

to towers.
✓ 3.25 For a blood transfusion, unless the recipient is compatible with the donor’s
blood type they can have a severe reaction. Compatibility depends on the presence
or absence of two antigens, called A and B, on the red blood cells. This creates
four major groups: A, B, O (the cells have neither antigen), and AB (the cells have
both). There is also a protein called the Rh factor that can be either present (+)
or absent (–). Thus there are eight common blood types, A+, A-, B+, B-, O+,
O-, AB+, and AB-. If the donor has the A antigen then the recipient must also
have it, and the B antigen and Rh factor work the same way. Draw a directed
graph where the nodes are blood types and there is an edge from the donor to
the recipient if transfusion is safe. Produce the adjacency matrix.
3.26 Find the degree sequence of the graph in Example 3.2 and of the two graphs
of Example 3.4.
3.27 Give the array representation, like that in equation (∗), for the graphs of
Example 3.4.
3.28 Draw a graph for this adjacency matrix.
0 1 1 0
1 0 0 1®
© ª
1 0 0 1®
®
«0 1 1 0¬
✓ 3.29 These two graphs are isomorphic.

a
x z A B C
b c X Y Z
y
(a) Define the function giving the correspondence.

(b) Verify that under that function the edges then also correspond.
✓ 3.30 Consider this tree.
B C
D E F G
(a) Verify that ⟨BA, AC⟩ is a path from B to C .

(b) Why is ⟨BD, DB, BA, BC⟩ not also a path from B to C ?
(c) Show that in any tree, for any two vertices there is a unique path from one
to the other.
3.31 Consider building a simple graph by starting with with n vertices. (a) How
many potential edges are there? (b) How many such graphs are there?
3.32 We can use degrees and degree sequences to help find isomorphisms, or to
show that graphs are not isomorphic. (Here we allow graphs to have loops and
to have multiple edges between vertices, but we do not make the extension to
directed edges or edges with weights.)
(a) Show that if two graphs are isomorphic then they have the same number of
vertices.
(b) Show that if two graphs are isomorphic then they have the same number of
edges.
(c) Show that if two graphs are isomorphic and one has a vertex of degree k
then so does the other.
(d) Show that if two graphs are isomorphic then for each degree k , the number of
vertices of the first graph having that degree equals the number of vertices of
the second graph having that degree. Thus, isomorphic graphs have degree
sequences that are equal.
(e) Verify that while these two graphs have the same degree sequence, they are
not isomorphic. Hint: consider the paths starting at the degree 3 vertex.
v2 v3 w0
v0 v1 w2 w3 w4 w5
v4 v5 w1
(f) Use the prior result to show that the two graphs of Example 3.4 are not
isomorphic.
As in the final item, in arguments we often use the contrapositive of these
statements. For instance, the first item implies that if they do not have the same
number of vertices then they are not isomorphic.
3.33 Prove Lemma 3.11.
(a) An edge as a length-1 walk. Show that in the product of the matrix with
2
itself M(G ) the entry i, j is the number of length-two
n walks.
(b) Show that for n > 2, the i, j entry of the power M(G ) equals the number
of length n walks from vi to v j .
3.34 Consider these two graphs, G0 and G1 .
v0 v1 n6 n2
v4 v5 n5 n1
v6 v7 n0 n4
v2 v3 n7 n3
(a) List the vertices and edges of G0 .

(b) Do the same for G1 .
(c) Give the degree sequences of G0 and G1 .
(d) Consider this correspondence between the vertices.
vertex of G0 v0 v1 v2 v3 v4 v5 v6 v7
vertex of G1 n6 n2 n7 n3 n5 n1 n0 n4
Find the image, under the correspondence, of the edges of G0 . Do they match
the edges of G1 ?
(e) Of course, failure of any one proposed map does not imply that the two
cannot be isomorphic. Nonetheless, argue that they are not isomorphic.
3.35 In a graph, for a node q 0 there may be some nodes qi that are unreachable,
so there is no path from q 0 to qi .
(a) Devise an algorithm that inputs a directed graph and a start node q 0 , and
finds the set of nodes that are unreachable from q 0 .
(b) Apply your algorithm to these two starting with w 0 .
w0 w4
w3
w2 w1 w3
w0 w1 w2
Extra
III.A BNF
We shall introduce some grammar notation conveniences that are widely

used. Together they are called Backus-Naur form, BNF.
The study of grammar, the rules for phrase structure and forming
sentences, has a long history, dating back as early as the fifth century
BC. Mathematicians, including A Thue and E Post, began systematizing
it as rewriting rules in the early 1900’s. The BNF variant was produced
John Backus by J Backus in the late 1950’s as part of the design of the early computer
1924–2007 language ALGOL60. Since then these rules have become a standard way
to express grammars.
Extra A. BNF 171
One difference from the prior subsection is a minor typographical

change. Originally the metacharacters were not typeable with a standard
keyboard. The advantage of having metacharacters not on a keyboard is
that most likely all of the language characters are typeable. So there is no
need to distinguish, say, between the pipe character | when used as a part
of a language and when used as a metacharacter. But the disadvantage Peter Naur
lies in having to type the untypeable. In the end the convenience of having 1928–2016
typeable characters won over the technical gain of having to typographically
distinguish metacharacters. For instance, for a long time there were not arrows on
a standard keyboard so in place of the arrow symbol ‘→’, BNF uses ‘::=’. (These
adjustments were made by P Naur, as editor of the ALGOL60 report.)†
BNF is both clear and concise, it can express the range of languages that we
ordinarily want to express, and it smoothly translates to a parser.‡ That is, BNF
is an impedance match — it fits with what we typically want to do. Here we will
incorporate some extensions for grouping and replication that are like what you
will see in the wild.
A.1 Example This is a BNF grammar for real numbers with a finite decimal part. Take
the rules for ⟨natural⟩ from Example 2.7.
⟨start⟩ ::= - ⟨fraction⟩ | + ⟨fraction⟩ | ⟨fraction⟩
⟨fraction⟩ ::= ⟨natural⟩ | ⟨natural⟩ . ⟨natural⟩
This derivation for 2.718 is rightmost.
⟨start⟩ ⇒ ⟨fraction⟩ ⇒ ⟨natural⟩ . ⟨natural⟩

⇒ ⟨natural⟩ . ⟨digit⟩⟨natural⟩ ⇒ ⟨natural⟩ . ⟨digit⟩⟨digit⟩⟨natural⟩
⇒ ⟨natural⟩ . ⟨digit⟩⟨digit⟩⟨digit⟩ ⇒ ⟨natural⟩ . ⟨digit⟩⟨digit⟩ 8
⇒ ⟨natural⟩ . ⟨digit⟩ 18 ⇒ ⟨natural⟩ .718 ⇒ 2.718
Here is a derivation for 0.577 that is neither leftmost nor rightmost.
⟨start⟩ ⇒ ⟨fraction⟩ ⇒ ⟨natural⟩ . ⟨natural⟩

⇒ ⟨natural⟩ . ⟨digit⟩⟨natural⟩ ⇒ ⟨natural⟩ .5 ⟨natural⟩
⇒ ⟨natural⟩ .5 ⟨digit⟩⟨natural⟩ ⇒ ⟨digit⟩ .5 ⟨digit⟩⟨natural⟩
⇒ ⟨digit⟩ .5 ⟨digit⟩⟨digit⟩ ⇒ ⟨digit⟩ .5 ⟨digit⟩ 7 ⇒ 0.5 ⟨digit⟩ 7
⇒ 0.577
A.2 Example Time is a difficult engineering problem. One issue is representing
times and one solution in that area is RFC 3339, Date and Time on the Internet:
Timestamps. It uses strings such as 1958-10-12T23:20:50.52Z. Here is a BNF
grammar. (See Exercise 2.28 for some nonterminals.) This grammar includes some
metacharacter extensions discussed below.
†
There are other typographical issues that arise with grammars. While many authors write nonterminals
with diamond brackets, as we do, others use other conventions such as a separate type style or color.
‡
BNF is only loosely defined. Several variants do have standards but what you see often does not
conform to any published standard.
⟨date-fullyear⟩ ::= ⟨4-digits⟩

⟨date-month⟩ ::= ⟨2-digits⟩
⟨date-mday⟩ ::= ⟨2-digits⟩
⟨time-hour⟩ ::= ⟨2-digits⟩
⟨time-minute⟩ ::= ⟨2-digits⟩
⟨time-second⟩ ::= ⟨2-digits⟩
⟨time-secfrac⟩ ::= . ⟨1-or-more-digits⟩
⟨time-numoffset⟩ ::= (+ | -) ⟨time-hour⟩ : ⟨time-minute⟩
⟨time-offset⟩ ::= Z | ⟨time-numoffset⟩
⟨partial-time⟩ ::= ⟨time-hour⟩ : ⟨time-minute⟩ : ⟨time-second⟩
[ ⟨time-secfrac⟩ ]
⟨full-date⟩ ::= ⟨date-fullyear⟩ - ⟨date-month⟩ - ⟨date-mday⟩
⟨full-time⟩ ::= ⟨partial-time⟩ ⟨time-offset⟩
⟨date-time⟩ ::= ⟨full-date⟩ T ⟨full-time⟩
That example shows a BNF notation in the ⟨time-numoffset⟩ rule, where the
parentheses are used as metacharacters to group a choice between the terminals +
and -. It shows another extension in the ⟨partial-time⟩ rule, which includes square
brackets as metacharacters. These denote that the ⟨time-secfrac⟩ is optional.
The square brackets is a very common construct: another example is this syntax
for if ... then ... with an optional else ...
⟨if-stmt⟩ ::= if ⟨boolean-expr⟩ then ⟨stmt-sequence⟩
[ else ⟨stmt-sequence⟩ ] ⟨end if ⟩ ;
To show repetition, BNF may use a superscript Kleene star ∗ to mean ‘zero or
more’ or a + to mean ‘one or more’. This shows parentheses and repetition.
⟨identifier⟩ ::= ⟨letter⟩ ( ⟨letter⟩ | ⟨digit⟩ )*
Each of these extension constructs is not necessary since we can express them
in plain BNF, without the extensions. For instance, we could replace the prior rule
with this.
⟨identifier⟩ ::= ⟨letter⟩ | ⟨letter⟩ ⟨atoms⟩
⟨atoms⟩ ::= ⟨letter⟩ ⟨atoms⟩ | ⟨digit⟩ ⟨atoms⟩ | ε
But these constructs come up often enough that adopting an abbreviation is
convenient.
A.3 Example This grammar for Python floating point numbers shows all three
abbreviations.
⟨floatnumber⟩ ::= ⟨pointfloat⟩ | ⟨exponentfloat⟩
⟨pointfloat⟩ ::= [ ⟨intpart⟩ ] ⟨fraction⟩ | ⟨intpart⟩ .
Extra A. BNF 173
⟨exponentfloat⟩ ::= ( ⟨intpart⟩ | ⟨pointfloat⟩ ) ⟨exponent⟩

⟨intpart⟩ ::= ⟨digit⟩ +
⟨fraction⟩ ::= . ⟨digit⟩ +
⟨exponent⟩ ::= (e | E) [+ | -] ⟨digit⟩ +
As part of the ⟨pointfloat⟩ rule, the first ⟨intpart⟩ is optional. An ⟨intpart⟩ consists
of one or more digits. And an expansion of ⟨exponent⟩ must start with a choice
between e or E.
A.4 Remark Passing from the grammar to a parser for that grammar is mechanical. We
write a program that takes as input a grammar (for example in BNF) and gives as
output the source code of a program that will parse files following that grammar’s
format. This is a parser-generator, sometimes called a compiler-compiler (while
that term is zingy, it is also misleading because a parser is only part of a compiler).
III.A Exercises
✓ A.5 US ZIP codes have five digits, and may have a dash and four more digits at
the end. Give a BNF grammar.
A.6 Write a grammar in BNF for the language of palindromes.
✓ A.7 At a college, course designations have a form like ‘MA 208’ or ‘PSY 101’, where
the department is two or three capital letters and the course is three digits. Give
a BNF grammar.
✓ A.8 Example A.3 uses some BNF convenience abbreviations.
(a) Give a grammar equivalent to ⟨pointfloat⟩ that doesn’t use square brackets.
(b) Do the same for the repetition operator in ⟨intpart⟩ ’s rule, and for the
grouping in ⟨exponent⟩ ’s rule (you can use ⟨intpart⟩ here).
✓ A.9 In Roman numerals the letters I, V, X, L, C, D, and M stand for the values 1, 5,
10, 50, 100, 500, and 1 000. We write the letters from left to right in descending
order of value, so that XVI represents the number that we would ordinarily write
as 16, and MDCCCCLVIII represents 1958. We always write the shortest possible
string, so we do not write IIIII because we can instead write V. However, as we
don’t have a symbol whose value is larger than 1 000 we must represent large
numbers with lots of M’s.
(a) Give a grammar for the strings that make sense as Roman numerals.
(b) Often Roman numerals are written in subtractive notation: for instance, 4 is
represented as IV, because four I’s are hard to distinguish from three of them
in a setting such as a watch face. In this notation 9 is IX, 40 is XL, 90 is XC,
400 is CD, and 900 is CM. Give a grammar for the strings that can appear in
this notation.
A.10 This grammar is for a small C-like programming language.
⟨program⟩ ::= { ⟨statement-list⟩ }
⟨statement-list⟩ ::= [ ⟨statement⟩ ; ]*
⟨statement⟩ ::= ⟨data-type⟩ ⟨identifier⟩

| ⟨identifier⟩ = ⟨expression⟩
| print ⟨identifier⟩
| while ⟨expression⟩ { ⟨statement-list⟩ }
⟨data-type⟩ ::= int | boolean
⟨expression⟩ ::= ⟨identifier⟩ | ⟨number⟩ | ( ⟨expression⟩ ⟨operator⟩
⟨expression⟩ )
⟨identifier⟩ ::= ⟨letter⟩ [ ⟨letter⟩ ]*
⟨number⟩ ::= ⟨digit⟩ [ ⟨digit⟩ ]*
⟨operator⟩ ::= + | ==
⟨letter⟩ ::= A | B | . . . | Z
⟨digit⟩ ::= 0 | 1 | . . . | 9
(a) Give a derivation and parse tree for this program.
{ int A ;
A = 1 ;
print A ;
}
(b) Must all programs be surrounded by curly braces?

A.11 Here is a grammar for LISP.
⟨s-expression⟩ ::= ⟨atomic-symbol⟩
| ( ⟨s-expression⟩ . ⟨s-expression⟩ )
| ⟨list⟩
⟨list⟩ ::= ( ⟨s-expression⟩ * )
⟨atomic-symbol⟩ ::= ⟨letter⟩ ⟨atom-part⟩
⟨atom-part⟩ ::= ⟨empty⟩
| ⟨letter⟩ ⟨atom-part⟩
| ⟨number⟩ ⟨atom-part⟩
⟨letter⟩ ::= a | b | . . . z
⟨number⟩ ::= 1 | 2 | . . . 9
Derive the s-expression (cons (car x) y).
A.12 Python 3’s Format Specification Mini-Language is used to describe string
substitution.
⟨format-spec⟩ ::=
[[ ⟨fill⟩ ] ⟨align⟩ ][ ⟨sign⟩ ][#][0][ ⟨width⟩ ][ ⟨gr⟩ ][. ⟨precision⟩ ][ ⟨type⟩ ]
⟨fill⟩ ::= ⟨any character⟩
⟨align⟩ ::= < | > | = | ˆ
⟨sign⟩ ::= + | - |
⟨width⟩ ::= ⟨integer⟩
Extra A. BNF 175
⟨gr⟩ ::= - | ,
⟨precision⟩ ::= ⟨integer⟩
⟨type⟩ ::= b | c | d | e | E | f | F | g | G | n | o | s | x | X | %
Take ⟨integer⟩ to produce ⟨digit⟩ ⟨integer⟩ or ⟨digit⟩ . Give a derivation of these
strings: (a) 03f (b) +#02X.
Chapter
IV Automata
Our touchstone model of mechanical computation is the Turing machine. A

Turing machines has only two components, a CPU and a tape. We will now take
the tape away and study the CPU alone.
Alternatively stated, while a Turing Machine has unbounded memory, the
devices that we use every day do not. We can ask what jobs can be done by a
machine with bounded memory.
Section
IV.1 Finite State Machines
We produce a new model of computation by modifying the definition of Turing
Machine. We will strip out the capability to write, changing the tape head from
read/write to read-only. This gives us insight into what can be done with states
alone. It will turn out that this type of machine can do many things, but not as
many as a Turing machine.
Definition We will use the same type of transition tables and transition graphs as
with Turing machines.
1.1 Example A power switch has two states, q off and q on and its input alphabet has
one symbol, toggle.
toggle
qoff qon
toggle
1.2 Example Operate this turnstile by putting in two tokens and then pushing through.
It has three states and its input alphabet is Σ = { token, push }.
push
qinit qone qready

token token
push push token
As we saw with Turing machines, the states are a limited form of memory. For
instance, q one is how the turnstile “remembers” that it has so far received one
token.
Image: The astronomical clock in Notre-Dame-de-Strasbourg Cathedral, for computing the date of
Easter. Easter falls on the first Sunday after the full moon on or after the spring equinox. Calculation of
this date was a great challenge for mechanisms of that time, 1843.
Section 1. Finite State Machines 179
1.3 Example This vending machine dispenses items that cost 30 cents.† The picture
is complex so we will show it in three layers. First are the arrows for nickels and
pushing the dispense button.
push n
q0 q5 q10 q15 q20 q25 q30

n n n n n n
push push push push push push
After receiving 30 cents and getting another nickel, this machine does something
not very sensible: it stays in q 30 . In practice a machine would have further states
to keep track of overages so that we could give change, but here we ignore that.
Next comes the arrows for dimes
d
q0 q5 q10 q15 q20 q25 d q30
d d d d d
and for quarters.

q
q
q
q
q
q0 q5 q10 q15 q20 q25 q30
q q
1.4 Example This machine, when started in state q 0 and fed bit strings, will keep
track of the remainder modulo 4 of the number of 1’s.
0 q0 q1 0
1
1 1
0 q3 1 q2 0
1.5 Definition A Finite State machine, or Finite State automata, is composed of five
things, M = ⟨Q, q start , F , Σ, ∆⟩ . They are a finite set of states Q , one of which is
the start state q start , a subset F ⊆ Q of accepting states or final states, a finite input
alphabet set Σ, and a next-state function or transition function ∆ : Q × Σ → Q .
This may not immediately appear to be like our definition of a Turing Machine.
Some of that is because we have already defined the terms ‘alphabet’ and ‘transition
function’. The other differences follow from the fact that that Finite State machines
cannot write. For one thing, because Finite State machines cannot write they don’t
†
US coins are: 1 cent coins that are not used here, nickles are 5 cents, dimes are 10 cents, and quarters
are 25 cents.
180 Chapter IV. Automata
need to move the tape for scratch work, so we’ve dropped the tape action symbols
L and R.
The other difference between Finite State machines and Turing machines is the
presence of the accepting states. Consider, in the vending machine of Example 1.3,
the state q 30 . It is an accepting state, meaning that the machine has seen in the
input what it is looking for. The same goes for Example 1.2’s turnstile state q ready
and Example 1.1’s power switch state q on . While we can design a Turing Machine
to indicate a choice by arranging so that for each input the machine will halt and
the only thing on the tape will be either a 1 or 0, a Finite State machine gives a
decision by ending in one of these designated states. Below, we’ve pictured that the
accepting states are wired to the red light so that we know when a computation
succeeds. In the transition graphs we denote the final states with double circles
and in the transition function tables we mark them with a ‘+’.
To work a Finite State machine device, put the finite-length input on the tape
and press Start.
The machine consumes the input, at each step deleting the prior tape character
and then reading the next one. We can trace through the steps when Example 1.4’s
modulo 4 machine gets the input 10110.

10110 10
0 q0
3 q2
0110 0
1 q1
4 q3
110
2 q1
5 q3
Consequently there is no Halting problem for Finite State machines — they always
halt after a number of steps equal to the length of the input. At the end, either the
Accept light is on or it isn’t. If it is on then we say that the machine accepts the
input string, otherwise it rejects the string.
1.6 Example This machine accepts a string if and only if it contains at least two 0’s as
well as an even number of 1’s. (The + next to q 2 marks it as an accepting state.)
∆ 0 1
q0 q1 q2
q0 q1 q3
0
0 0
q1 q2 q4
1 1 1 1 1 1 + q2 q2 q5
q3 q4 q0
q3 q4 q5 0
0 0
q4 q5 q1
q5 q5 q2
This machine illustrates the key to designing Finite State machines, that each state
has an intuitive meaning. The state q 4 means “so far the machine has seen one 0
and an odd number of 1’s.” And q 5 means “so far the machine has seen two 0’s
but an odd number of 1’s.” The drawing brings out this principle. Its first row has
states that have so far seen an even number of 1’s, while the second row’s states
have seen an odd number. Its first column holds states have seen no 0’s, the second
column holds states have seen one, and the third column has states that have seen
two 0’s.
1.7 Example This machine accepts strings that are valid as decimal representations
of integers. Thus, it accepts ‘21’ and ‘-707’ but does not accept ‘501-’. Both the
transition graph and the table group some inputs together when they result in
the same action. For instance, when in state q 0 this machine does the same thing
whether the input is + or -, namely it passes into q 1 .
1,...,9
∆ +, - 0, . . . 9 else
q0 +,-
q1 q2 0,..,9 q0 q1 q2 e
1,...,9
q1 e q2 e
other
other
other
+ q2 e q2 e
e any e e e e
Any bad input character sends the machine to the error state. e , which is a sink
state, meaning that the machine never leaves that state.
Our Finite State machine descriptions will usually assume that the alphabet is
clear from the context. For instance, the prior example just says ‘else’. In practice
we take the alphabet to be the set of characters that someone could conceivably
enter, including letters such as a and A or characters such as exclamation point or
open parenthesis. Thus, design of a Finite State machine up to a modern standard
might use all of Unicode. But for the examples and exercises here, we will use
small alphabets.
1.8 Example This machine accepts strings that are members of the set { jpg, pdf, png }
of filename extensions. Notice that it has more than one final state.
q1 p
q2 g
q3
j
q0 p
q4 q5 q6 e
d f
q7 g
q8
That drawing omits many edges, the ones involving the error state e . For instance,
from state q 0 any input character other than j or p is an error. (Putting in all the
edges would make a mess. Cases such as this are where the transition table is
better than the graph picture. But most of our machines are small so we typically
prefer the picture.)
This example illustrates that for any finite language there is a Finite State
machine that accepts a string if and only if it is a member of the language. The idea
is: for strings that have common prefixes, the machine steps through the shared
parts together, as here with pdf and png. Exercise 1.46 asks for a proof.
1.9 Example Although they have no scratch memory, Finite State machines can
accomplish useful work such as some kinds of arithmetic. This machine accepts
strings representing a natural number that is a multiple of three, such as 15
and 5013.
2,5,8
0,3,6,9 q0 q1 0,3,6,9
1,4,7
2,5,8
1,4,7
1,4,7 2,5,8
q2
0,3,6,9
Because q 0 is an accepting state, this machine accepts the empty string. Exercise 1.23
asks for a modification of this machine to accept only non-empty strings.
1.10 Example Finite State machines are easy to translate to code. Here is a Scheme
version of the multiple of three machine.†
;; Decide if the input represents a multiple of three
(define (multiple-of-three-fsm input-string)
(let ((state 0))
(if (= 0 (multiple-of-three-fsm-helper state input-string))
(display "accepted")
(display "rejected"))
(display (newline))))
;; tail-recursive helper fcn

(define (multiple-of-three-fsm-helper state tau)
(let ((tau-list (string->list tau)))
(if (null? tau-list)
state
(multiple-of-three-fsm-helper (delta state (car tau-list))
(list->string (cdr tau-list))))))
(define (delta state ch)

(cond
((= state 0)
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 0)
((memv ch '(#\1 #\4 #\7)) 1)
(else 2)))
((= state 1)
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 1)
((memv ch '(#\1 #\4 #\7)) 2)
(else 0)))
(else
(cond
((memv ch '(#\0 #\3 #\6 #\9)) 2)
((memv ch '(#\1 #\4 #\7)) 0)
(else 1)))))
1.11 Example This is a simplified version of how phone numbers used to be handled in
North America. Consider the number 1-802-555-0101. The initial 1 signifies that
the call should leave the local exchange office to go to the long lines. The 802 is an
area code; the system can tell this is so because its second digit is either 0 or 1 so
it is not a same-area local exchange. Next the system processes the local exchange
number of 555, routing the call to a particular physical local switching office. That
office processes the line number of 0101, and makes the connection.
Op Int1 Intn x
1 x Legend:
x 0, . . . 9
n 2, . . . 9
0 LL1 n LL2 n p 0, 1
p LL3 n X1 n
1
q0 X2 n X3 x L1
2,3,5,7,8
4 x
6 H2 n L2
9 1
H3 1 Info x
R2 n L3
1
R3 1
Rep x
E2 n L4
1
E3 1 Emr x
Con
†
One of the great things about the Scheme programming languages is that, because the last thing
called in multiple-of-three-fsm-helper is itself, the compiler converts the recursion to iteration.
So we get the expressiveness of recursion with the space conservation of iteration.
Today the picture is much more complicated. For example, no longer are area codes
required to have a middle digit of 0 or 1. This additional complication is possible
because instead of switching with physical devices, we now do it in software.
After the definition of Turing machine we gave a formal description of the
action of those machines. We next do the same here.
A configuration of a Finite State machine is a pair C = ⟨q, τ ⟩ , where q is a state,
q ∈ Q , and τ is a (possibly empty) string, τ ∈ Σ∗ . We start a machine with some
input string τ0 and say that the initial configuration is C0 = ⟨q 0 , τ0 ⟩ .
A Finite State machine acts by a sequence of transitions from one configuration
to another. For s ∈ N+ the machine’s configuration after the s -th transition is its
configuration at step s , Cs .
Here is the rule for making a transition (we sometimes say it is an allowed or
legal transition, for emphasis). Suppose that the machine is in the configuration
Cs = ⟨q, τs ⟩ . In the case that τs is not empty, pop the string’s leading symbol c .
That is, where c = τs [0], take τs+1 = ⟨τs [1], ... τs [k]⟩ for k = |τs | − 1. Then the
machine’s next state is q̂ = ∆(q, c) and its next configuration is Cs+1 = ⟨q̂, τs+1 ⟩ .
Denote this before-after relationship between configurations by Cs ⊢ Cs+1 .†
The other case is that the string τs is empty. This is the halting configuration
Ch . No transitions follow a halting configuration.
At each transition the length of the tape string drops by one so every computation
eventually reaches a halting configuration Ch = ⟨q, ε⟩ . A Finite State machine
computation is a sequence C0 ⊢ C1 ⊢ C2 ⊢ · · · Ch . We can abbreviate such a
sequence with ⊢∗ ,as in C0 ⊢∗ Ch .‡
If the ending state is a final state, q ∈ F , then the machine accepts the input τ0 ,
otherwise it rejects τ0 .
Notice that, as with the formalism for Turing machines, the heart of the
definitions is the transition function ∆. It makes the machine move step-by-step,
from configuration to configuration, in response to the input.
1.12 Example The multiple of three machine of the prior example gives the computation.
⟨q 0 , 5013⟩ ⊢ ⟨q 2 , 013⟩ ⊢ ⟨q 2 , 13⟩ ⊢ ⟨q 0 , 3⟩ ⊢ ⟨q 0 , ε⟩ . Since q 0 is an accepting state,
the machine accepts 5013.
1.13 Definition The set of strings accepted by a Finite State machine M is the
language of that machine, L(M), or the language recognized, or decided, (or
accepted), by the machine.
(For Finite State machines, deciding a language is equivalent to recognizing it,
because the machine must halt. ‘Recognized’ is more the common term.)
1.14 Definition For any Finite State machine with transition function ∆ : Q × Σ → Q ,
the extended transition function ∆ ˆ : Σ∗ → Q gives the state in which the machine
ends, after starting in the start state and consuming the given string.
† ‡
As with Turing machines, read the symbol ⊢ aloud as “yields in one step.” Read the symbol ⊢∗ as
“yields eventually” or simply “yields.”
Here is an equivalent constructive definition of ∆ ˆ . Fix a Finite State machine

M with transition function ∆ : Q × Σ → Q . To begin, set ∆(ε) ˆ = {q 0 }. Then for
ˆ ⌢t) = ∆(qi , t) ∪ ∆(qi , t) ∪ · · · ∪ ∆(qi , t) for any t ∈ Σ. Finally,
τ ∈ Σ∗ , define ∆(τ 0 1 k
observe that a string σ ∈ Σ∗ is accepted by the machine if ∆(σ ˆ ) is a final state.
ˆ
1.15 Example This machine’s extended transition function ∆
b a b ∆ a b
a q0 q1 q0
q0 a
q1 q2 + q1 q1 q2
b
q2 q1 q2
extends its ordinary transition function ∆ in that it repeats the first row of ∆’s table.
ˆ a) = q 1
∆( ˆ b) = q 0
∆(
ˆ ’s input length
(We disregard the difference between ∆’s input characters and ∆
ˆ
one strings.) Here is ∆’s effect on the length two strings.
ˆ aa) = q 1 ∆(
∆( ˆ ab) = q 2 ∆(
ˆ ba) = q 1 ∆(
ˆ bb) = q 0
This brings us back to determinism because ∆ ˆ would not be well-defined

without it; ∆ has one next state for all input configurations and so, by induction,
ˆ has one output ending state.
for all input strings ∆
Finally, note the similarity between ∆ ˆ and ϕ e , the function computed by the
Turing machine Pe . Both take as input the contents of their machine’s start tape,
and both give as output their machine’s result.
IV.1 Exercises
✓ 1.16 Using this machine, trace through the computation when the input is (a) abba
(b) bab (c) bbaabbaa.
b a b
a
q0 a
q1 q2
b
1.17 True or false: because a Finite State is finite, its language must be finite.
1.18 Your classmate says, “I have a language L that recognizes the empty string ε .”
Straighten them out.
✓ 1.19 How many transitions does an input string of length n cause a Finite State
machine to undergo? n ? n + 1? n − 1? How many (not necessarily distinct)
states will the machine have visited after consuming the string?
1.20 Rebut “no Finite State machine can recognize the language { an b n ∈ N }

because n is infinite.”
✓ 1.21 For each of these formal descriptions of a language, give a one or two sentence
English-language description. Also list five strings that are elements as well as
five that are not, if there are five.
∗
(a) L = {α ∈ { a, b } α = an ban for n ∈ N }
∗
(b) { β ∈ { a, b } β = an bam for n, m ∈ N }

∗
(c) { ban ∈ { a, b } n ∈ N }

∗
(d) { an ban+2 ∈ { a, b } n ∈ N }

∗ ∗
(e) {γ ∈ { a, b } γ has the form γ = α ⌢ α for α ∈ { a, b } }

✓ 1.22 For the machines of Example 1.6, Example 1.7, Example 1.8, and Example 1.9,
answer these. (a) What are the accepting states? (b) Does it recognize the
empty string ε ? (c) What is the shortest string that each accepts? (d) Is the
language of accepted strings infinite?
1.23 Modify the machine of Example 1.9 so that it accepts only non-empty strings.
✓ 1.24 The best way to develop a Finite State machine is to think about what each
state is doing, what it means. For each language, name five strings that are in the
language and five that are not (the alphabet is Σ = { a, b }). Then design a Finite
State machine that will recognize the language by first giving a one-sentence
English description of each state that you use. Then give both a circle diagram
and a table for the transition function.
(a) L1 = {σ ∈ Σ∗ σ has at least one a and at least one b }

(b) L2 = {σ ∈ Σ∗ σ has fewer than three a’s }

(c) L3 = {σ ∈ Σ∗ σ ends in ab }

(d) L4 = { an bm ∈ Σ∗ n, m ≥ 2 }

(e) L5 = { an bm ap ∈ Σ∗ m = 2 }

1.25 Produce the transition graph picturing this transition function. What is the
language of this machine?
∆ a b
q0 q2 q1
+ q1 q0 q2
q2 q2 q2
✓ 1.26 What language is recognized by this machine?

q0 q1 q2
b b
a a a,b
✓ 1.27 Give a Finite State machine over Σ = { a, b, c } that accepts any string
containing the substring abc. As in Example 1.6, give a brief description of each
state’s intuitive meaning in the machine.
1.28 Consider the language of strings over Σ = { a, b } containing at least two a’s
and at least two b’s. Name five elements of the language, and five non-elements.
Give a Finite State machine recognizing this language. As in Example 1.6, give a
brief description of the intuitive meaning of each state.
✓ 1.29 For each language, name five strings in the language and five that are not (if
there are not five, name as many as there are). Then give a transition graph and
State machine recognizing the language. Use Σ = { a, b }.
table for a Finite
∗
(a) {σ ∈ Σ σ has at least two a’s }
(b) {σ ∈ Σ∗ σ has exactly two a’s }

(c) {σ ∈ Σ∗ σ has less than three a’s }

(d) {σ ∈ Σ∗ σ has at least one a followed by at least one b }

1.30 Produce a Finite State machine over the alphabet Σ = { A, ... Z, 0, ... 9 } that
accepts only the string 911, and a machine that accepts any string but that one.
1.31 Using Example 1.15, apply the extended transition function to all of the length
three and length four string inputs.
✓ 1.32 Consider a language of comments, which begin with the two-character string
/#, end with #/, and have no #/ substrings in the middle. Give a Finite State
machine to recognize that language. (Just producing the transition graph is
enough.)
1.33 For each language, give five strings from that language and five that are not
(if there are not that many then list all of the strings that are possible). Also give
a Finite State machine that recognizes the language. Use Σ = { a, b }.
∗
(a) L = {σ ∈ { a, b } σ ends in aa }
∗

(b) {σ ∈ { a, b } σ = ε }
∗
(c) {σ ∈ { a, b } σ = a3 b or σ = ba3 }

∗
(d) {σ ∈ { a, b } σ = an or σ = bn for n ∈ N }

1.34 What happens when the input to an extended transition function is the empty
string?
✓ 1.35 Produce a FiniteState machine that recognizes each.
∗
(a) {σ ∈ { 0, ... 9 } σ has either no 0’s or no 2’s }
∗

(b) {σ ∈ { 0, ... 9 } σ is the decimal representation of a multiple of 5 }
✓ 1.36 Give a Finite State machine over the alphabet Σ = { A, ... Z } that accepts
only strings in which the vowels occur in ascending order. (The traditional vowels,
in ascending order, are A, E, I, O, and U.)
✓ 1.37 Consider this grammar.
⟨real⟩ → ⟨posreal⟩ | + ⟨posreal⟩ | - ⟨posreal⟩
⟨posreal⟩ → ⟨natural⟩ | ⟨natural⟩ . | ⟨natural⟩ . ⟨natural⟩
⟨natural⟩ → ⟨digit⟩ | ⟨digit⟩ ⟨natural⟩
⟨digit⟩ → 0 | . . . 9
(a) Give five strings that are in its language and five that are not. (b) Is the
string .12 in the language? (c) Briefly describe the language. (d) Give a Finite
State machine that recognizes the language.
1.38 Produce five strings in each language and five that are not. Also produce a
Finite State machine to recognize it.
∗

(a) {σ ∈ B every 1 in σ has a 0 just before it and just after }

(b) {σ ∈ B∗ σ represents a number divisible by 4 in binary }

∗

(c) {σ ∈ { 0, ... 9 } σ represents an even number in decimal }
∗

(d) {σ ∈ { 0, ... 9 } σ represents a multiple of 100 in decimal }
1.39 Consider {σ ∈ { 0, ... 9 }∗ σ represents a multiple of 4 in base ten }. Briefly

describe a Finite State machine; you need not give the full graph or table.
1.40 As in Example 1.12, find the computation for the multiple of three machine
with the initial string 2332.
1.41 We will through the formal definition of the extended transition function
that follows Definition 1.14 by applying it to the machine in Example 1.6. (a) Use
ˆ 0) and ∆(
the definition to find ∆( ˆ 1). (b) Use the definition to find ∆
ˆ ’s output on
inputs 00, 01, 10, and 11. (c) Find its action on all length three strings.
✓ 1.42 Produce a Finite State machine that recognizes the language over Σ = { a, b }
containing no more than one occurrence of the substring aa. That is, it may
contain zero-many such substrings or one, but not two. Note that the string aaa
contains two occurrences of that substring.
1.43 Let Σ = B. (a) List all of the different Finite State machines over Σ
with a single state, Q = {q 0 }. (Ignore whether a state is final or not; we
will do that below.) (b) List all the the ones with two states, Q = {q 0 , q 1 }.
(c) How many machines are there with n states? (d) What if we distinguish
between machines with different sets of final states?
✓ 1.44 Propositiones ad acuendos iuvenes (problems to sharpen the young) is the
oldest collection of mathematical problems in Latin. It is by Alcuin of York
(735–804), royal advisor to Charlemagne and head of the Frankish court school.
One problem, Propositio de lupo et capra etfasciculo cauli, is particularly famous: A
man had to transport to the far side of a river a wolf, a goat, and a bundle of
cabbages. The only boat he could find was one that could carry only two of them.
For that reason, he sought a plan which would enable them all to get to the far side
unhurt. Let him, who is able, say how it could be possible to transport them safely.
A wolf cannot be left alone with a goat nor can a goat be left alone with cabbages.
Construct the relevant Finite State machine and use it to solve the problem.
1.45 Consider a variant on a Finite State machine, where the set of input strings
is bounded.
(a) In Rock-Paper-Scissors, two players simultaneously produce one of R, P,
or S. A player with R beats a player with S, and S beats P, and P beats R
(a repeat is a do-over). Encode a game with the two-character string
Section 2. Nondeterminism 189
σ = ⟨player one’s play, player two’s play⟩ . Make a machine that recognizes
a win for player one.
(b) Make a machine that accepts a Tic-Tac-Toe game that is a win for the X’s. A
board has nine squares so encode a game instance with length nine strings.
1.46 Show, as suggested by Example 1.8, that for any finite language there is a
Finite State machine recognizing that language.
1.47 There are languages not recognized by any Finite State machine. Fix an
alphabet Σ with at least two members. (a) Show that the number of Finite
State machines with that alphabet is infinite. (b) Show that it is countable.
(c) Show that the number of languages over that alphabet is uncountable.
Section
IV.2 Nondeterminism
Turing machines and Finite State machines both have the property that the next
state is completely determined by the current state and current character. Once
you lay out an initial tape and push Start then you just walk through the steps like,
well . . . , like an automaton. We now consider machines that are nondeterministic,
where from any configuration the machine could move to more than one next state,
or to just one, or even to no state at all.
Motivation Imagine a grammar with some rules and start symbol. We are given a
string and asked if has a derivation. The challenge to these problems is that you
sometimes have to guess which path the derivation should take. For instance, if
you have S → aS | bA then from S you can do two different things; which one
will work?
In the grammar section’s derivation exercises, we expect that an intelligent
person have the insight to guess the right way. However, if instead you were writing
a program then you might have it try every case; you might do a breadth-first
traversal of the tree of all derivations, until you found a success.
The American philosopher and Hall of Fame baseball catcher Y Berra
said, “When you come to a fork in the road, take it.” That’s a natural way
to attack this problem: when you come up against multiple possibilities,
fork a child for each. Thus, the routine might begin with the start state S
and for each rule that could apply it spawns a child process, deriving a
string one removed from the start. After that, each child finds each rule
that could apply to its string and spawns its own children, each of which
Yogi Berra
now has a string that is two removed from the start. Continue until the
1925–2015
desired string σ appears, if it ever does.
The prototypical example is the celebrated Traveling Salesman problem, that of
finding the shortest circuit of every city in a list. Start with a map of the roads in
the US lower forty eight states. We want to know if there is a trip that visits each
state capital and returns back to where it began in, say, less than 16 000 kilometers.
We’ll start at Montpelier, the capital of Vermont. From there, for each potential next
capital we could fork a process, making forty seven new processes. The process
that is assigned Concord, New Hampshire, for instance, would know that the trip
so far is 188 kilometers. In the next round, each child would fork its own child
processes, forty six of them. For instance the process that after Montpelier was
assigned Concord would have a child assigned to Augusta, Maine and would know
that so far the trip is 452 kilometers. At the end, if there is a trip of less than
16 000 kilometers then some process knows it. There will be lots of processes and
many of them will have failed to find a short trip, but if even one succeeds then we
consider the overall search a success.
This computation is nondeterministic in that while it is happening
the machine is simultaneously in many different states. It imagines an
unboundedly-parallel machine, where any time you have a job for an
additional computing agent, a CPU, you can allocate one.† Think of
such a machine as angelic in that whenever it wants more computational
resources, such as being able to allocate new children, those resources just
appear.
This section considers nondeterministic Finite State machines. (Non- Persian angel,
deterministic Turing machines appear in the fifth chapter.) We will have 1555
‡
two ways to think about nondeterminism, two mental models. The first
was introduced above: when such a machine is presented with multiple possible
next states then the it forks, so that it is in all of them simultaneously. The next
example illustrates.
2.1 Example The Finite State machine below is nondeterministic because leaving q 0
are two arrows labeled 0. It also has states with a deficit of edges; e.g., no 1 arrow
leaves q 1 , so if it is in that state and reads that input then it passes to no state at all.
0,1
q0 q1 q2 q3
0 0 1
The animation shows what happens with input 00001. We take the computation
history as a tree. For example, on the first 0 the computation splits in two.
†
This echoes our experience with everyday computers, when we are writing an email in one window
and watching a video in another. The machine appears to be in multiple states simultaneously although
it may in fact be time-slicing, dovetailing by running each process in succession for a few ticks. ‡ While
these models are helpful in learning and thinking about nondeterminism, they are not part of the
formal definitions and proofs.
Input
0 0 0 0 1
q0
q0 ⊢
⊢
q0
⊢ q1
⊢
q0
⊢ q1 q2 q3
⊢ ⊢ ⊢
q0
⊢ q1 q2
⊢ ⊢
q0
⊢ q1 q2
⊢
0 1 2 3 4 5
Step
2.2 Animation: Steps in the nondeterministic computation.

When we considered the forking approach to string derivations or to the Traveling
Salesman, we observed that if a solution exists then some child process would find
it. The same happens here; there is a branch of the computation tree that accepts
the input string. There are also branches that are not successful. The one at the
bottom dies out after step 2 because when the present state q 2 and the input is 0
this machine passes to no-state.† Another is the branch at the top, which never
dies but also does not accept the input. However we don’t care about unsuccessful
branches, we only care that there is a successful one. So we will define that a
nondeterministic machine accepts an input if there is at least one branch in the
computation tree that accepts the input.
The machine in the above example accepts a string if it ends in two 0’s and
a 1. When we feed it the input 00001 the problem the machine faces is: when it
should stop going around q 0 ’s loop and start to the right? Our definition has the
machine accepting this input so the machine has solved this problem — viewed
from the outside we could say, perhaps a bit fancifully, that the machine has
correctly guessed. This is our second model for nondeterminism. We will imagine
programming by calling a function, some amb(S, R 0 , R 1 ... R n−1 ), and having the
computer somehow guess a successful sequence.
Saying that the machine is guessing is jarring. Based on pro-
gramming classes, a person’s intuition may be that “guessing” is
not mechanically accomplishable. We can instead imagine that the
machine is furnished with the answer (“go around twice, then off to
the right”) and only has to check it. This mental model of nondeter-
minism is demonic because the furnisher seems to be a supernatural
being, a demon, who somehow knows answers that cannot otherwise
be found, but you are suspicious and must check that the answer
The demon Flauros,
†
No-state cannot be an accepting state, since it isn’t a state at all. Duke of Hell
is not a trick. Under this model, a nondeterministic computation

accepts the input if there exists a branch of the computation tree that
a deterministic machine, if told what branch to take, could verify.
Below we shall describe nondeterminism using both paradigms: as a machine
being in multiple states at once, and as a machine guessing. As mentioned above,
here we will do that for Finite State machines but in the fifth chapter we will return
to it in the context of Turing machines.
Definition A nondeterministic Finite State machine’s next-state function does not

output single states, it outputs sets of states.
2.3 Definition A nondeterministic Finite State machine M = ⟨Q, q start , F , Σ, ∆⟩
consists of a finite set of states Q , one of which is the start state q start , a subset
F ⊆ Q of accepting states or final states, a finite input alphabet set Σ, and a
next-state function ∆ : Q × Σ → P (Q).
We will use these machines in three ways. First, with them we encounter
nondeterminism, which is critical for the book’s final part. Second, they are useful
in practice; both below and in the exercises are examples of jobs that are more
easily solved in this way. Finally, we will use them to prove Kleene’s Theorem,
Theorem 3.10.
2.4 Example This is Example 2.1’s nondeterministic Finite State machine, along with
its transition function.
∆ 0 1
0,1
q0 {q 0 , q 1 } {q 0 }
q1 {q 2 } {}
q0 q1 q2 q3
0 0 1 q2 {} {q 3 }
+ q3 {} {}
In this nondeterministic machine the entries of the array are not states, they are
sets of states.
Nondeterministic machines may seem conceptually fuzzy so the formalities are
a help. Contrast these definitions with the ones for deterministic machines.
A configuration is a pair C = ⟨q, τ ⟩ , where q ∈ Q and τ ∈ Σ∗ . A machine starts
with an initial configuration C0 = ⟨q 0 , τ0 ⟩ . The string τ0 is the input.
Following the initial configuration there may be one or more sequences of
transitions. Suppose that there is a machine configuration Cs = ⟨q, τs ⟩ . For
s ∈ N+ , in the case where τs is not the empty string, a transition pops the string’s
leading symbol c to get τs+1 , takes the machine’s next state to be a member q̂ of
the set ∆(q, c) and then takes a subsequent configuration to be Cs+1 = ⟨q̂, τs+1 ⟩ .
Denote that two configurations are connected by a transition with Cs ⊢ Cs+1 .
The other case is that τs is the empty string. This is a halting configuration, Ch .
After Ch , no transitions follow.
A nondeterministic Finite State machine computation is a sequence of transitions
that ends in a halting configuration, C0 = ⟨q 0 , τ0 ⟩ ⊢ C1 ⊢ C2 ⊢ · · · Ch = ⟨q, ε⟩ . From

an initial configuration there may be many such sequences. If at least one ends
with a halting state, with q ∈ F , then the machine accepts the input τ0 , otherwise
it rejects τ0 .
2.5 Example For the nondeterministic machine of Example 2.1, the graphic shows this
allowed sequence of transitions.
⟨q 0 , 00001⟩ ⊢ ⟨q 0 , 0001⟩ ⊢ ⟨q 0 , 001⟩ ⊢ ⟨q 1 , 01⟩ ⊢ ⟨q 2 , 1⟩ ⊢ ⟨q 3 , ε⟩
Because it ends in an accepting state, the machine accepts the initial string, 00001.
2.6 Definition For a nondeterministic Finite State machine M, the set of accepted
strings is the language of the machine L(M), or the language recognized, (or
accepted), by that machine.†
ˆ : Σ∗ → Q .
We will also adapt the definition of the extended transition function ∆
Fix a nondeterministic M with transition function ∆ : Q × Σ → Q . Start with
ˆ
∆(ε) = {q 0 }. Where ∆(τˆ ) = {qi , qi , ... qi } for τ ∈ Σ∗ , define ∆(τˆ ⌢ t) =
0 1 k
∆(qi 0 , t) ∪ ∆(qi 1 , t) ∪ · · · ∪ ∆(qi k , t) for any t ∈ Σ. Then the machine accepts
σ ∈ Σ∗ if and only if any element of ∆(σ ˆ ) is a final state.
2.7 Example The language recognized by this nondeterministic machine
a,b q0 a
q1
b a
q2 q3 a,b
b
is the set of strings containing the substring aa or bb. For instance, the machine
accepts abaaba because there is a sequence of allowed transitions ending in an
accepting state, namely this one.
⟨q 0 , abaaba⟩ ⊢ ⟨q 0 , baaba⟩ ⊢ ⟨q 0 , aaba⟩ ⊢ ⟨q 1 , aba⟩ ⊢ ⟨q 2 , ba⟩ ⊢ ⟨q 2 , a⟩ ⊢ ⟨q 2 , ε⟩

2.8 Example With Σ = { a, b, c }, this nondeterministic machine
c
q0 q1
a
recognizes the language { (ac)n n ∈ N } = {ε, ac, acac, ... }. The symbol b isn’t

attached to any arrow so it won’t play a part in any accepting string.

Often a nondeterministic Finite State machines is easier to write than a
deterministic machine that does the same job.
†
Below we will define something called ε transitions that make ‘recognized’ the right idea here, instead
of ‘decided’.
2.9 Example This nondeterministic machine that accepts any string whose next to
last character is a
a,b q0 a
q1 q2
a,b
is simpler than the deterministic machine.

b a,b
b q0 q1 q2
a a
2.10 Example This machine accepts {σ ∈ B∗ σ = 0 ⌢τ ⌢ 1 where τ ∈ B∗ }.

0,1
q0 q1 q2
0 1
2.11 Example This is a garage door opener listener that waits to hear the re-
mote control send the signal 0101110. That is, it recognizes the language
{σ ⌢ 0101110 σ ∈ B∗ }.
0,1 q0 q1 q2 q3 q4 q5 q6 q7
0 1 0 1 1 1 0
2.12 Remark Having seen a couple of examples we pause to again acknowledge,

as we did when we discussed the angel and the demon, that something about
nondeterminism is unsettling. If we feed τ = 010101110 to the prior example’s
listener then it accepts.
⟨q 0 , 010101110⟩ ⊢ ⟨q 0 , 10101110⟩ ⊢ ⟨q 0 , 0101110⟩ ⊢ ⟨q 1 , 101110⟩

⊢ ⟨q 2 , 01110⟩ ⊢ ⟨q 3 , 1110⟩ ⊢ ⟨q 4 , 110⟩ ⊢ ⟨q 5 , 10⟩ ⊢ ⟨q 6 , 0⟩ ⊢ ⟨q 7 , ε⟩
But the machine’s chain of states is set up for a string, 0101110, that begins with
two sets of 01’s, while τ begins with three. How can it guess that it should ignore
the first 01 but act on the second? Of course, in mathematics we can consider
whatever we can define precisely. However we have so far studied what can be
done by devices that are in principle physically realizable so this may seem to be a
shift.
However, we will next show how to convert any nondeterministic Finite State
machine into deterministic one that does the same job. So we can think of a
nondeterministic Finite State machine as an abbreviation, a convenience. This
obviates at least some of the paradox of guessing, at least for Finite State machines.
ε transitions Another extension, beyond nondeterminism, is to allow ε transitions,

or ε moves. We alter the definition of a nondeterministic Finite State machine,
Definition 2.3, so that instead of ∆ : Q × Σ → P (Q), the transition function’s
signature is ∆ : Q × (Σ ∪ {ε }) → P (Q).† The associated behavior is that the

machine can transition spontaneously, without consuming any input.‡
2.13 Example This machine recognizes valid integer representations. Note the ε
between q 0 and q 1 .
q0 +,-,ε
q1 q2 0,...9
1,...,9
Because of the ε it can accept strings that do not start with a + or - sign. For
instance, with input 123 the machine can begin by following the ε transition to
state q 1 , then read the 1 and transition to q 2 , and stay there while processing the
2 and 3. This is a branch of the computation tree accepting the input, and so the
string 123 is in the machine’s language.
2.14 Example A machine may follow two or more ε transitions. From q 0 this machine
may stay in that state, or transition to q 2 , or q 3 , or q 5 , all without consuming any
input.
ε
q3 c
q4
ε
q0 a
q1 q2
b
ε
q5 q6
d
That is, the language of this machine is the four element set L = { abc, abd, ac, ad }.
We can give a precise definition of the action of a nondeterministic Finite State
machine with ε transitions.
First we define the collection of states reachable by ε moves from a given state.
For that we use E : Q × N → P (Q) where E(q, i) is the set of states reachable
from q within at most i -many ε transitions. That is, set E(q, 0) = {q } and where
E(q, i) = {qi 0 , ... qi k }, set E(q, i + 1) = E(q, i) ∪ ∆(qi 0 , ε) ∪ · · · ∪ ∆(qi k , ε). Observe
that these are nested, E(q, 0) ⊆ E(q, 1) ⊆ · · · and that each is a subset of Q .
But Q has only finitely many states so there must be an iˆ ∈ N where the sequence
of sets stops growing, E(q, iˆ ) = E(q, iˆ + 1) = · · · . Define the ε closure function
Ê : Q → P (Q) by Ê(q) = E(q, iˆ).
With that, we are ready to describe the machine’s action. As before, a
configuration is a pair C = ⟨q, τ ⟩ , where q ∈ Q and τ ∈ Σ∗ . A machine starts with
some initial configuration C0 = ⟨q 0 , τ0 ⟩ , where the string τ0 . is the input.
The key description is that of a transition. Consider a configuration Cs = ⟨q, τs ⟩
for s ∈ N and suppose that τs is not the empty string. We will describe a
configuration Cs+1 = ⟨q̂, τs+1 ⟩ that is related to the given one by Cs ⊢ Cs+1 . (As
with the earlier description of nondeterministic machines without ε transitions,
there may be more than one configuration related in this way to Cs .)
† ‡
Assume ε < Σ Or, think of it as transitioning on consuming the empty string ε .
The string is easy; just pop the leading character to get τs = t ⌢ τs+1 where
t ∈ Σ. To get a legal state q̂ : (i) find the ε closure Ê(q) = {qs0 , ... qsk }, (ii) let q̄
be an element of the set ∆(qs0 , t) ∪ ∆(qs1 , t) ∪ · · · ∪ ∆(qsk , t), and (iii) take q̂ to be
an element of the ε closure Ê(q̄).
If τs is the empty string then this is a halting configuration, Ch . No transitions
follow Ch .
A nondeterministic Finite State machine computation is a sequence of transitions
ending in a halting configuration, C0 = ⟨q 0 , τ0 ⟩ ⊢ C1 ⊢ C2 ⊢ · · · Ch = ⟨q, ε⟩ . From a
given C0 there may be many such sequences. If at least one ends with a halting
state, having q ∈ F , then the machine accepts the input τ0 , otherwise it rejects τ0 .
With that, we will modify the definition of the extended transition func-
tion ∆ˆ : Σ∗ → Q from section 2. Begin by defining ∆(ε) ˆ = Ê(q 0 ). Then the
rule for going from a string to its extension is that for τ ∈ Σ∗ and where
ˆ ) = {qi , qi , ... qi }.
∆(τ 0 1 k
ˆ ⌢ t) = Ê(∆(qi , t)) ∪ · · · ∪ Ê(∆(qi , t))

∆(τ for t ∈ Σ
0 k
Observe that this nondeterministic machine with ε transitions accepts a string

ˆ ) is a final state.
σ ∈ Σ∗ if any one of the states in ∆(σ
2.15 Remark Certainly these are an intricate set of definitions, but they are here to
demonstrate something. In the examples and the homework we often use informal
terms such as “guess” and “demon.” However, don’t take this use evocative and
informal language for an inability to follow mathematical orthodoxy. We can
perfectly well give definitions and results with full precision.
2.16 Example For the machine of Example 2.14, this sequence shows that it accepts abc
⟨q 0 , abc⟩ ⊢ ⟨q 1 , bc⟩ ⊢ ⟨q 2 , c⟩ ⊢ ⟨q 3 , c⟩ ⊢ ⟨q 4 , ε⟩
(note the ε transition between q 2 and q 3 ). This sequence shows it also accepts the
input string d.
⟨q 0 , d⟩ ⊢ ⟨q 5 , d⟩ ⊢ ⟨q 6 , ε⟩
As with nondeterministic machines, one reason that we use ε transitions is that
they can make solving a complex job much easier.
2.17 Example An ε transition can put two machines together with a parallel connection.
This shows a machine whose states are named with q ’s combined with one whose
states are named with r ’s.
a,b
q0 a
q1 q2
b
ε
s0 c
ε
r0 r1
a
is {σ ∈ Σ∗ σ ends in ab } and

The top nondeterministic machine’s language
the bottom machine’s language is {σ ∈ Σ σ = (ac)n for some n ∈ N }, where
∗

Σ = { a, b, c }. The language for the entire machine is the union.
{σ ∈ Σ∗ either σ ends in ab or σ = (ac)n for n ∈ N }

2.18 Example An ε transition can also make a serial connection between machines.
The machine on the left below recognizes the language { (aab)m m ∈ N } and the

machine on the right recognizes { (a | aba)n n ∈ N }.
q2 a q1 q4 q5
b
a a
b a
q0 q3
If we insert an ε bridge from each of the left side’s final states (in this example
there happens to be only one such state) to the right side’s initial state
q2 a q1 q4 q5
b
a a
b a
q0 ε
q3
then the combined machine accepts strings in the concatenation of those languages.
L(M) = {σ ∈ { a, b }∗ σ = (aab)m (a | aba)n for m, n ∈ N }

For example, it accepts aabaababa and aabaabaaa.

2.19 Example An ε transition edge can also produce the Kleene star of a a nondetermin-
istic machine. For instance, without the
ε edge this machine’s language is {ε, ab },
while with it the language is { (ab)n n ∈ N }.
ε
q0 a
q1 q2
b
Equivalence of the machine types We next prove that nondeterminism does not
change what we can do with Finite State machines.
2.20 Theorem The class of languages recognized by nondeterministic Finite State
machines equals the class of languages recognized by deterministic Finite State
machines. This remains true if we allow the nondeterministic machines to have

ε transitions.
We can show that the two classes are equal by showing that they are subsets
of each other. One direction is easy; any deterministic machine is, basically, a
nondeterministic machine. That is, in a deterministic machine the next-state
function outputs single states and to make it a nondeterministic machine just
convert those states into singleton sets. Thus the set of languages recognized
by deterministic machines is a subset of the set recognized by nondeterministic
machines.
We will demonstrate inclusion in the other direction constructively. We will
show how to start with a nondeterministic machine with ε transitions and construct
a deterministic machine that recognizes the same language. We won’t give a
complete proof (although certainly one is possible) simply because a proof is
messy and the examples below are entirely convincing. They give the powerset
construction.
2.21 Example Consider this nondeterministic machine, MN , with no ε transitions.
a
q0 a
q1 q2
b
What differentiates a nondeterministic machine is that it can be in multiple states at

once. So for the associated deterministic machine MD , each line of the transition
function’s table below is a set si = {qi 1 , ... qi k } of MN ’s states.
As an illustration of constructing the table, suppose that the above machine
is in s 5 = {q 0 , q 2 } and is reading a. Combine the next states due to q 0 , the set
∆ N (q 0 , a) = {q 0 , q 1 }, with the next states due to q 2 , the set ∆ N (q 2 , a) = { }. The
union of those two sets gives ∆D (s 5 , a) = {q 0 , q 1 }, which below is the state s 4 .
∆D a b
s0 = { } s0 s0
+ s 1 = {q 0 } s4 s0
s 2 = {q 1 } s0 s3
+ s 3 = {q 2 } s0 s0
+ s 4 = {q 0 , q 1 } s4 s3
+ s 5 = {q 0 , q 2 } s4 s0
+ s 6 = {q 1 , q 2 } s0 s3
+ s 7 = {q 0 , q 1 , q 2 } s4 s3
The start state of MD is s 1 = {q 0 }. A state of the deterministic machine MD is

accepting if any of its element q ’s are accepting states in MN .
In general, compute the transition function of the deterministic machine with
∆D (si , x) = ∆ N (qi 0 , x) ∪ · · · ∪ ∆ N (qi k , x), where si = {qi 0 , ... qi k } and x ∈ Σ.
An example is ∆D (s 5 , a) = ∆ N (q 0 , a) ∪ ∆ N (q 2 , a), which equals {q 0 , q 1 } ∪ { } =

{q 0 , q 1 } = s 4 .
Besides the notational convenience, naming the sets of states as si ’s makes clear
that MD is a deterministic Finite State machine. So does its transition graph.
b
a,b s0 b
a
s1
a,b
a
s6 a s5
b a
b b
s2 s3 s4 a
s7
a
b
If the nondeterministic machine has k -many states then under this construction
the deterministic machine has 2k -many states. Typically many of them can be
eliminated. For instance, in the above machine the state s 6 is clearly unreachable
since there are no arrows into it. The next section covers minimizing the number
of states in a machine.
We next expand that construction to cover ε transitions. Basically, we follow
those transitions. For example, the start state of the deterministic machine is the
ε closure of {q 0 }, the set of the states of MN that are reachable by a sequence
of ε transitions from q 0 . In addition, suppose that we have a nondeterministic
machine and we are constructing the associated deterministic machine’s next-state
function ∆D , that the current configuration is si = {qi 1 , qi 2 , ... } and that the
machine is reading a. If there is a ε transition from qi j to some q then to the set of
next states add ∆ N ({q }, a), and in fact add the entire ε closure.
2.22 Example Consider this nondeterministic machine.
ε
q0 q1
b
a b
ε
q2 q3 a
To find the set of next states, follow the ε transitions. For instance, suppose that
this machine is in q 0 and the next tape character is a. The arrow on the left takes
the machine from q 0 to q 2 . Alternatively, following the ε transition from q 0 to q 3
and then reading the a gives q 3 . So the machine is next in the two states q 0 and q 3 .
These are the ε closures.
state q q0 q1 q2 q3
ε closure Ê(q) {q 0 , q 3 } {q 0 , q 1 } {q 2 } {q 3 }
The full deterministic machine is on the next page. The start state is the
ε closure of {q 0 }, the state s 7 = {q 0 , q 3 }. A state is accepting if it contains any
element of the ε closure of q 1 .

In general, for a state s and tape character x , to compute ∆D (s, x): (i) find
the ε closure of all of the q ’s in the state s , giving ∪q ∈s Ê(q) = {qi 0 , ... qi k }, then
(ii) take the union of the next states of the qi j to get T = ∆ N (qi 0 , x)∪...∪∆ N (qi 0 , x)
and finally (iii) find the ε closure of each element of T , and take the union of all
those, ∪t ∈T Ê(t).
As an example take s = s 5 = {q 0 , q 1 } and x = a. Then (i) gives {q 0 , q 3 } ∪
{q 0 , q 1 } = {q 0 , q 1 , q 3 }. The next states for (ii) are ∆ N (q 0 , a) = {q 2 , q 3 },
∆ N (q 1 , a) = {q 2 , q 3 }, and ∆ N (q 3 , a) = {q 3 }, so T = {q 2 , q 3 }. Finally, for (iii) the
union of the ε closures gives {q 2 } ∪ {q 3 } = {q 2 , q 3 } = s 10 .
∆D a b
s0 = { } s0 s0
+ s 1 = {q 0 } s 10 s2
+ s 2 = {q 1 } s 10 s2
s 3 = {q 2 } s0 s7
s 4 = {q 3 } s4 s0
+ s 5 = {q 0 , q 1 } s 10 s 12
+ s 6 = {q 0 , q 2 } s 10 s 12
+ s 7 = {q 0 , q 3 } s 10 s 12
+ s 8 = {q 1 , q 2 } s 10 s 12
+ s 9 = {q 1 , q 3 } s 10 s2
s 10 = {q 2 , q 3 } s4 s7
+ s 11 = {q 0 , q 1 , q 2 } s 10 s 12
+ s 12 = {q 0 , q 1 , q 3 } s 10 s 12
+ s 13 = {q 0 , q 2 , q 3 } s 10 s 12
+ s 14 = {q 1 , q 2 , q 3 } s 10 s 12
+ s 15 = {q 0 , q 1 , q 2 , q 3 } s 10 s 12
IV.2 Exercises
2.23 Give the transition function for the machine of Example 2.7, and of Exam-
ple 2.8.
✓ 2.24 Consider this machine.
q0 q1 q2 1
0,1 1
(a) Does it accept the empty string? (b) The string 0? (c) 011? (d) 010?
(e) List all length five accepted strings.
2.25 Your class has a jerk who has adopted a world-weary pose and who interjects,
“Isn’t all this machine-guessing stuff just mathematical abstractions that are not
real?” How should the prof respond?
2.26 Your friend objects, “Epsilon transitions don’t make any sense because the
machine below will never get its first step done; it just endlessly follows the
epsilons.” Correct their mistake.
ε
q0 q1 q2
b
ε ,a
✓ 2.27 Give the transition graph of a nondeterministic Finite State machine that
accepts valid North American local phone numbers, strings of the form d 3 -d 4 ,
with three digits, followed by a hyphen character, and then four digits.
2.28 Draw the transition graph
of a nondeterministic machine that recognizes the
language {σ = τ0τ1τ2 ∈ B∗ τ0 = 1, τ2 = 1, and τ1 = (00)k for some k ∈ N }.
2.29 This machine has Σ = { a, b }.
b
a q0 q1 ε
q2 b
a,b
(a) What is the ε closure of q 0 ? Of q 1 ? q 2 ? (b) Does it accept the empty string?
(c) a? b? (d) Show that it accepts aab by producing a suitable sequence of ⊢
relations. (e) List five strings of minimal length that it accepts. (f) List five of
minimal length that it does not accept.
2.30 Produce the table description of the next-state function ∆ for the machine in
the prior exercise. It should have three columns, for a, b, and ε .
2.31 Consider this machine.
0 1
q0 q1 q2
0 1
(a) Show that it accepts 011 by producing a suitable sequence of ⊢ relations.

(b) Show that the machine accepts 00011 by producing a suitable sequence of ⊢
relations. (c) Does it accept the empty string? (d) 0? 1? (e) List five strings
of minimal length that it accepts. (f) List five of minimal length that it does not
accept. (g) What is the language of this machine?
✓ 2.32 Give diagrams for nondeterministic Finite State machines that recognize the
and that have the given number of states. Use Σ = B.
given language
(a) L0 = {σ σ ends in 00 } , having three states

(b) L1 = {σ σ has the substring 0110 } , with five states

(c) L2 = {σ σ contains an even number of 0’s or exactly two 1’s } , with six
states
(d) L3 = { 0 }∗ , with one state
2.33 This table
∆ a b
q0 {q 0 } {q 1 , q 2 }
q1 {q 3 } {q 3 }
q2 {q 1 } {q 3 }
+ q3 {q 3 } {q 3 }
gives the next-state function for a nondeterministic Finite State machine. (a) Draw
the transition graph. (b) What is the recognized language? (c) Give the
next-state table for a deterministic machine that recognizes the same language.
✓ 2.34 Draw the graph of a nondeterministic Finite State machine over B that
accepts strings with the suffix 111000111.
✓ 2.35 For each draw the transition graph for a Finite State machine, which may be
nondeterministic, that accepts the given strings from { a, b }∗ .
(a) Accepted strings have a second character of a and next to last character of b.
(b) Accepted strings have second character a and the next to last character is
also a.
✓ 2.36 Make a table giving the ε closure function Ê for the machine in Example 2.14.
✓ 2.37 Find the nondeterministic Finite State machine that accepts all bitstrings that
begin with 10. Use the algorithm given above to produce the transition function
table of a deterministic machine that does the same.
2.38 Find a nondeterministic Finite State machine that recognizes this language
of three words: L = { cat, cap, carumba }.
2.39 Give a nondeterministic Finite State machine over Σ = { a, b, c } that rec-
ognizes the language of strings that omit at least one of the characters in the
alphabet.
✓ 2.40 What is the language of this nondeterministic machine with ε transitions?
a
a q0 b q1 q2 b
2.41 Find a deterministic machine and a nondeterministic machine that recognizes

the set of bitstrings containing the substring 11. You need not construct the
deterministic machine from the other; you can just construct it using any native
wit that you may posses.
✓ 2.42 For each, follow the construction above to make a deterministic machine
with the same language.
ε
1 0 q0 q2
1
0 q0 q1 q2 0 0,1
0 0,1
q1
Section 3. Regular expressions 203
✓ 2.43 For each give a nondeterministic Finite State machine over Σ = { 0, 1, 2 }.

(a) The machine recognizes the language of strings whose final character appears
exactly twice in the string.
(b) The machine recognizes the language of strings whose final character appears
exactly twice in the string, but in between those two occurrences is no higher
digit.
✓ 2.44 For each give a nondeterministic Finite State machine with ε transitions over
that recognizes each language over B. (The deterministic machines for some of
these are much harder.)
(a) In each string, every 0 is followed immediately by a 1.
(b) Each string contains 000 followed, possibly with some intermediate charac-
ters, by 001.
(c) In each string the first two characters equals the final two characters, in
order. (Hint: what about 000?)
(d) There is either an even number of 0’s or an odd number of 1’s.
2.45 Give a minimal-sized nondeterministic Finite State machine over Σ =
{ a, b, c } that accepts only the empty string. Also give one that accepts any
string except the empty string. For both, produce the transition graph and table.
2.46 A grammar is right linear if every production rule has the form ⟨n1⟩ → x ⟨n2⟩ ,
where the right side has a single terminal followed by a single nonterminal. With
this right linear grammar we can associate this nondeterministic Finite State
machine.
S → aA a b
A → aA |bB
S A B F
B → bB |b a b b
(a) Give three strings from the language of the grammar and show that they are
accepted by the machine.
(b) Describe the language of the grammar and the machine.
2.47 Decide whether each problem is solvable or unsolvable by a Turing machine.
(a) LD F A = { ⟨M, σ ⟩ the deterministic Finite State machine M accepts σ }

(b) LN F A = { ⟨M, σ ⟩ the nondeterministic machine M accepts σ }
(Assume that the machine M is described in some reasonably way, say, by using
bit strings to encode the transition function.)
2.48 (a) For the machine of Example 2.22, for each q ∈ Q produce E(q, 0), E(q, 1),
E(q, 2), and E(q, 3). List Ê(q) for each q ∈ Q .
(b) Do the same for Exercise 2.29’s machine.
Section
IV.3 Regular expressions
In 1951, S Kleene† was studying a mathematical model of neurons. These

are like Finite State machines in that they do not have scratch memory. He
noted patterns to the languages that are recognized by such devices.
For instance, this Finite State machine
b a b
a
q0 q1 q2 Stephen
a
b
Kleene
1909–1994
accepts strings that have some number of b’s (perhaps zero many) followed
by at least one a, possibly then followed by at least one b and then at least
one a. Kleene introduced a convenient way, called regular expressions, to
denote constructs such as “any number of” and “followed by.” He gave the
definition in the first subsection below, and supported it with the theorem
in the second subsection.
Definition A regular expression is a string that describes a language. We will

introduce these with a few examples. These use the alphabet Σ = { a, ... z }.
3.1 Example The string h(a|e|i|o|u)t is a regular expression describing strings
that start with h, have a vowel in the middle, and end with t. That is, this regular
expression describes the language consisting of five words of three letters each,
L = { hat, het, hit, hot, hut }.
The pipe ‘|’ operator, which is a kind of ‘or’, and the parentheses, which provide
grouping, are not part of the strings being described; they are metacharacters.
Besides the pipe operator and parentheses, the regular expression also uses
concatenation since the initial h is concatenated with (a|e|i|o|u), which in turn
is concatenated with t.
3.2 Example The regular expression ab*c describes the language whose words begin
with an a, followed by any number of b’s (including possibly zero-many b’s), and
ending with a c. So ‘∗’ means ‘repeat the prior thing any number of times’. This
regular expression describes the language L = { ac, abc, abbc, ... }.
3.3 Example There is an interaction between pipe and star. Consider the the regular
expression (b|c)*. It could mean either ‘any number of repetitions of picking a b
or c’ or ‘pick a b or c and repeat that character any number of times’.
The definition has it mean the first. Thus the language described by a(b|c)*
consists of words starting with a and ending with any mixture of b’s and c’s, so
that L = { a, ab, ac, abb, abc, acb, acc, ... }.
In contrast, to describe the language whose members begin with a and end
with any number of b’s or any number of c’s, L̂ = { a, ab, abb, ... , ac, acc, ... },
use the regular expression a(b*|c*).
†
Pronounced KLAY-nee. He was a student of Church.
That is, the rules for operator precedence are: star binds most tightly, then
concatenation, then the pipe alternation operator, |. To get another order, use
parentheses.
3.4 Definition Let Σ be an alphabet not containing any of the metacharacters ), (,
|, or *. A regular expression over Σ is a string that can be derived from this
grammar
⟨regex⟩ → ⟨concat⟩
| ⟨regex⟩ ‘|’ ⟨concat⟩
⟨concat⟩ → ⟨simple⟩
| ⟨concat⟩ ⟨simple⟩
⟨simple⟩ → ⟨char⟩
| ⟨simple⟩ *
| ( ⟨regex⟩ )
⟨char⟩ → | ε | x 0 | x 1 | . . .
where the x i characters are members of the alphabet Σ.†
As to their semantics, what regular expressions mean, we will define that
recursively. We start with the bottom line, the single-character regular expressions,
and give the language that each describes. We will then do the forms on the other
lines, for each interpreting it as the description of a language.
The language described by the single-character regular expression is the
empty set, L() = . The language described by the regular expression consisting
of only the character ε is the one-element language consisting of only the empty
string, L(ε) = {ε }. If the regular expression consists of just one character from the
alphabet Σ then the language that it describes contains only one string and that
string has only that single character, as in L(a) = { a }.
We finish by defining the semantics of the operations. Start with regular
expressions R 0 and R 1 describing languages L(R 0 ) and L(R 1 ). Then the pipe
symbol describes the union of the languages, so that L(R 0 |R 1 ) = L(R 0 ) ∪ L(R 1 ).
Concatenation of the regular expressions describes concatenation of the languages,
L(R 0⌢R 1 ) = L(R 0 )⌢L(R 1 ). And, the Kleene star of the regular expression describes
the star of the language, L(R 0 ∗ ) = L(R 0 )∗.
3.5 Example Consider the regular expression aba* over Σ = { a, b }. It is the
concatenation of a, b, and a*. The first describes the single-element language
L(a) = { a }. Likewise, the second describes L(b) = { b }. Thus, the string ab
†
As we have done with other grammars, here we use the pipe symbol | as a metacharacter, to collapse
rules with the same left side. But pipe also appears in regular expressions. For that usage we wrap it in
single quotes, as ‘|’.
describes the concatenation of the two, another one-element language.
L(ab) = L(a) ⌢ L(b) = {σ ∈ Σ∗ σ = σ0 ⌢ σ1 where σ0 ∈ L(a) and σ1 ∈ L(b) }

= { ab }
The regular expression a* describes the star of the language L(a), namely L(a*) =
n

{ a n ∈ N }. Concatenating it with L(ab) gives this.

L(aba*) = {σ ∈ Σ∗ σ = σ0 ⌢ σ1 where σ0 ∈ L(ab) and σ1 ∈ L(a*) }

= { ab, aba, abaa, aba3 , ... }

= { aban n ∈ N }

We finish this subsection with some constructs that appear often. These
examples use Σ = { a, b, c }.
3.6 Example Describe the language consisting of strings of a’s whose length is a
multiple of three, L = {a 3k k ∈ N } = {ε, aaa, aaaaaa, ... }, with the regular

expression (aaa)*.
Note that the empty string is a member of that language. A common gotcha is
to forget that star is for any number of repetitions, including zero-many.
3.7 Example To match any character we can list them all. The language consisting
of three-letter words ending in bc is { abc, bbc, cbc }. The regular expression
(a|b|c)bc describes it. (In practice the alphabet can be very large so that listing
all of the characters is impractical; see Extra A.)
3.8 Example The regular expression a*(ε |b) describes the language of strings that
have any number of a’s and optionally end in one b, L = {ε, b, a, ab, aa, aab, ... }.
Similarly, to describe the language consisting of words with between three and
five a’s, L = { aaa, aaaa, aaaaa } we can use aaa(ε |a|aa).
3.9 Example The language { b, bc, bcc, ab, abc, abcc, aab, ... } has words starting
with any number of a’s (including zero-many a’s), followed by a single b, and then
ending in fewer than three c’s. To describe it we can use a*b(ε |c|cc).
Kleene’s Theorem The next result justifies our study of regular expressions
because it shows that they describe the languages of interest.
3.10 Theorem (Kleene’s theorem) A language is recognized by a Finite State
machine if and only if that language is described by a regular expression.
We will prove this in separate halves. The proofs use nondeterministic machines
but since we can convert those to deterministic machines, the result holds there
also.
3.11 Lemma If a language is described by a regular expression then there is a Finite
State machine that recognizes that language.
Proof We will show that for any regular expression R there is a machine that
accepts strings matching that expression. We use induction on the structure of
regular expressions.
Start with regular expressions consisting of a single character. If R = then
L(R) = { } and the machine on the left below recognizes L(R). If R = ε then
L(R) = {ε } and the machine in the middle recognizes this language. If the regular
expression is a character from the alphabet, such as R = a, then the machine on
the right works.
q0 q0 q0 a
q2
We finish by handling the three operations. Let R 0 and R 1 be regular expressions;

the inductive hypothesis gives a machine M0 whose language is described by R 0
and a machine M1 whose language is described by R 1 .
First consider alternation, R = R 0 |R 1 . Create the machine recognizing the
language described by R by joining those two machines in parallel: introduce a
new state s and use ε transitions to connect s to the start states of M0 and M1 .
See Example 2.17.
Next consider concatenation, R = R 0 ⌢ R 1 . Join the two machines serially: for
each accepting state in M0 , make an ε transition to the start state of M1 and
then convert all those accepting states of M0 to be non-accepting states. See
Example 2.18.
Finally consider Kleene star, R = (R 0 )*. For each accepting state in the
machine M0 that is not the start state make an ε transition to the start state, and
then make the start state an accepting state. See Example 2.19.
3.12 Example Building a machine for the regular expression ab(c|d)(ef)* starts with
machines for the single characters.
q0 a q1 q0 a q1 ... q10 f q11
Put these atomic components together
q4 c q5
ε ε
q0 a q1 ε q2 b q3 q12
q8 e q9 ε q10 f q11
ε
q6 d q7
to get the complete machine.
q4 c q5 ε
ε ε
q0 a q1 ε q2 b q3 ε q12 q8 e q9 ε q10 f q11
ε ε
q6 d q7
This machine is nondeterministic. For a deterministic one use the conversion

process that we saw in the prior section.
3.13 Lemma Any language recognized by a Finite State machine is described by a
regular expression.
Our strategy starts with a Finite State machine and eliminates its states one
at a time. Below is an illustration, before and after pictures of part of a larger
machine, where we eliminate the state q .
qi a
q qo qi qo
b ab
In the after picture the edge is labeled ab, with more than just one character. For
the proof we will generalize transition graphs to allow edge labels that are regular
expressions. We will eliminate states , keeping the recognized language the same.
We will be done when there remain only two states, with one edge between them.
That edge’s label is the desired regular expression.
Before the proof, an example. Consider the machine on the left below.
b b
q1 q1
a c a c
q0 q2 e ε
q0 q2 ε f
d d
The proof starts as above on the right by introducing a new start state guaranteed
to have no incoming edges, e , and a new final state guaranteed to be unique, f .
Then the proof eliminates q 1 as below.
e ε
q0 q2 ε f
d|(ab*c)
Clearly this machine recognizes the same language as the starting machine.
Proof Call the machine M. If it has no accepting states then the regular expression
is and we are done. Otherwise, we will transform M to a new machine, M̂,
with the same language, on which we can execute the state-elimination strategy.
First we arrange that M̂ has a single accepting state. Create a new state f and
for each of M’s accepting states make a ε transition to f (by the prior paragraph
there is at least one such accepting state). Change all the accepting states to
non-accepting ones and then make f accepting.
Next introduce a new start state, e . Make a ε transition between it and q 0 ,
(Ensuring that M̂ has at least two states allows us to handle machines of all sizes
uniformly.)
Because the edge labels are regular expressions, we can arrange that from
any qi to any q j is at most one edge, because if M has more than one edge then
in M̂ use the pipe, |, to combine the labels, as here.
a
qi qj qi qj
a|b
b
Do the same with loops, that is, cases where i = j . Like the prior transformations,
clearly this does not change the language of accepted strings.
The last part of transforming to M̂ is to drop any useless states. If a state
node other than f has no outgoing edges then drop it along with the edges into it.
The language of the machine will not change because this state cannot lead to an
accepting state, since it doesn’t lead anywhere, and this state is not itself accepting
as only f is accepting.
Along the same lines, if a state node q is not reachable from the start e then
can drop that node along with its incoming and outgoing edges. (The idea of
unreachable is clear but for a formal definition see Exercise 3.34.)
With that, M̂ is ready for state elimination. Below are before and after pictures.
The before picture shows a state q to be eliminated. There are states qi 0 , . . . qi j
with an edge leading into q , and states qo0 , . . . qok that receive an edge leading
out of q . (By the setup work above, q has at least one incoming and at least one
outgoing edge.) In addition, q may have a loop.
Ri , o R i , o |(R i R ℓ *R o 0 )
0 0 0 0 0
qi0 Ri , o
qo0 qi0 qo0
0 k R i , o |(R i R ℓ *R o )
Ro0 0 k 0 k
Ri
0
.. q
.. .. ..
. Rℓ
. . .
Ri
j
Ro
k
qi j Ri , o qok qi j R i , o |(R i R ℓ *R o 0 ) qok
j 0 j 0 j
Ri , o R i , o |(R i R ℓ *R o )
j k j k j k
(Here is a subtle point: possibly some of the states shown on the left of each of the
two pictures equal some shown on the right. For example, possibly qi 0 equals qo0 .
If so then the shown edge R i 0,o0 is a loop.)
Eliminate q and the associated edges by making the replacements shown in the
after picture. Observe that the set of strings taking the machine from any incoming
state qi to any outgoing state qo is unchanged. So the language recognized by the
machine is unchanged.
Repeat this elimination until all that remains are e and f , and the edge between
them. (The machine has finitely many states so this procedure must eventually
stop.) The desired regular expression is edge’s label.
3.14 Example Consider M on the left. Introduce e and f to get M̂ on the right.
b
b
q2
q2
b a
b a
b
b
e q0 q1 f
q0 q1 ε ε
a
a
ε
Start by eliminating q 2 . In the terms of the proof’s key step, q 1 = qi 0 and q 0 = qo0 .
The regular expressions are R i 0 = a, Ro0 = b, R i 0,o0 = b, and R ℓ = b. That gives
this machine.
b|(ab*b)
e ε
q0 q1 ε f
a
ε
Next eliminate q 1 . There is one incoming node q 0 = qi 0 and two outgoing nodes
q 0 = qo0 and f = qo1 . (Note that q 0 is both an incoming and outgoing node; this
is the subtle point mentioned in the proof.) The regular expressions are R i 0 = a,
Ro0 = b|(ab*b), and Ro1 = ε .
ε |a(b|ab*b)
e ε
q0 f
ε |aε
All that remains is to eliminate q 0 . The sole incoming node is e = qi 0 and the sole
outgoing node is f = qo0 , and so R i 0 = ε , Ro0 = ε |aε , and R ℓ = ε |a(b|ab ∗ b).
e ε (ε |(a(b|ab*b)))*(ε |aε )
f
This regular expression simplifies. For instance, aε =a.
IV.3 Exercises
3.15 Decide if the string σ matches the regular expression R . (a) σ = 0010,
R = 0*10 (b) σ = 101, R = 1*01 (c) σ = 101, R = 1*(0|1) (d) σ = 101,
R = 1*(0|1)* (e) σ = 01, R = 1*01*
✓ 3.16 For each regular expression produce five bitstrings that match and five that do
not, or as many as there are if there are not five. (a) 01* (b) (01)* (c) 1(0|1)1
(d) (0|1)(ε |1)0* (e)
3.17 Give a brief plain English description of the language for each regular
expression. (a) a*cb* (b) aa* (c) a(a|b)*bb
✓ 3.18 For these regular expressions and for each element of { a, b }∗ that is of length
less than or equal to 3, decide if it is a match. (a) a*b (b) a* (c) (d) ε
(e) b(a|b)a (f) (a|b)(ε |a)a
3.19 For these regular expressions, decide if each element of B∗ of length at most 3
is a match. (a) 0*1 (b) 1*0 (c) (d) ε (e) 0(0|1)* (f) (100)(ε |1)0*
✓ 3.20 A friend says to you, “The point of parentheses is that you first do inside
the parentheses and then do what’s outside. So Kleene star means ‘match the
inside and repeat’, and the regular expression (0*1)* matches the strings 001001
and 010101 but not 01001 and 00000101, where the substrings are unequal.”
Straighten them out.
3.21 Produce a regular expression for the language of bitstrings with a substring
consisting of at least three consecutive 1’s.
3.22 Someone who sits behind you in class says, “I don’t get it. I got a regular
expression that I am sure is right. But the book got a different one.” Explain
what is up.
3.23 For each language, give five strings that are in the language and five that
are not. Then give a regular expression describing the language. Finally, give a
n b2m m, n ≥ 1 }

Finite State machine that accepts the language. (a) L 0 = { a
(b) L0 = { an b3m m, n ≥ 1 }

3.24 Give a regular expression for the language over Σ = { a, b, c } whose strings
are missing at least one letter, that is, the strings that are either without any a’s,
or without any b’s, or without any c’s.
3.25 Give a regular expression for each language. Use Σ = { a, b }∗ . (a) The set of
strings starting with b. (b) The set of strings whose second-to-last character is a.
(c) The set of strings containing at least one of each character. (d) The strings
where the number of a’s is divisible by three.
3.26 Give a regular expression to describe each language over the alphabet
Σ = { a, b, c }. (a) The set of strings starting with aba. (b) The set of strings
ending with aba. (c) The set of strings containing the substring aba.
✓ 3.27 Give a regular expression to describe each language over B. (a) The set
of strings of odd parity, where the number of 1’s is odd. (b) The set of strings
where no two adjacent characters are equal. (c) The set of strings representing
in binary multiples of eight.
✓ 3.28 Give a regular expression to describe each language over the alphabet
Σ = { a, b }. (a) Every a is both immediately preceded and immediately followed
by a b character. (b) Each string has at least two b’s that are not followed by an
a.
3.29 Give a regular expression for each language of bitstrings. (a) The number of
0’s is even. (b) There are more than two 1’s. (c) The number of 0’s is even and
there are more than two 1’s.
3.30 Give a regular expression to describe each language.
∗
(a) {σ ∈ { a, b } σ ends with the same symbol it began with, and σ , ε }
(b) { ai ba j i and j leave the same remainder on division by three }

✓ 3.31 Give a regular expression for each language over B∗ .

(a) The strings representing a binary number that is a multiple of eight.
(b) The bitstrings where the first character differs from the final one.
(c) The bitstrings where no two adjacent characters are equal.
✓ 3.32 Produce a Finite State machine whose language equals the language de-
scribed by each regular expression. (a) a*ba (b) ab*(a|b)*a
3.33 Fix a Finite State machine M. Kleene’s Theorem shows that the set of strings
taking M from the start state to the set of final states is regular.
(a) Show that for any set of states S ⊆ Q M the set of strings taking M from the
start state to one of the states in S is regular.
(b) Show that the set of strings taking M from any single state to any other
single state is regular.
3.34 Part of the proof of Lemma 3.13 involves unreachable states. Here is a
definition. Given a state q , construct the set of states reachable from it by first
setting S 0 = {q } ∪ Ê(q), where Ê(q) is the ε closure. Then iterate: starting with
the set S i of states that are reachable in i -many steps, for each q̃ ∈ S i follow each
outbound edge for a single step and also include the elements of the ε closure.
The union of S i with the collection of the states reached in this way is the set S i+1 .
Stop when S i = S i+1 , at which point it is the set of ever-reachable states. The
unreachable states are the others.
For each machine use that definition to find the set of unreachable states.
q0 q4 a,b
q3 b a,b
(a) a
(b) a
a b q2 q1 q3
q0 q1 q2 a
a b
b b a,b
✓ 3.35 Show that the set of languages over Σ that are described by a regular
expression is countable. Conclude that there are languages not recognized by
any Finite State machine.
3.36 Construct the parse tree for these regular expressions over Σ = { a, b }.
(a) a(b|c) (b) ab*(a|c)
3.37 Construct the parse tree for Example 3.3’s a(b|c)* and a(b*|c*).
✓ 3.38 Get a regular expression by applying the method of Lemma 3.13’s proof to
this machine.
a,b
b
q0 q1
a
(a) Get M̂ by introducing e and f . (b) Where q = q 0 , describe which state from
the machine is playing the diagram’s before picture role of qi 0 , which edge is R i 0 ,
etc. (c) Eliminate q 0 .
Section 4. Regular languages 213
3.39 Apply method of Lemma 3.13’s proof to this machine. At each step describe
which state from the machine is playing the role of qi 0 , which edge is R i 0 , etc.
0,1
q0 q1 q2
1 1
(a) Eliminate q 0 . (b) Eliminate q 1 . (c) q 2 (d) Give the regular expression.
3.40 Apply the state elimination method of Lemma 3.13’s proof to eliminate q 1 .
Note that each of the states q 0 and q 2 are of the kind described in the proof’s
comment on the subtle point.
B
A C
q0 q1 q2
E D
3.41 An alternative proof of Lemma 3.11 reverses the steps of Lemma 3.13. This is
she subset method. Start by labeling the single edge on a two-state machine with
the given regular expression.
e R
f
Then instead of eliminating nodes, introduce them.

qi qo =⇒ qi q qo
R0 R1 R0 R1
R1
qi qo =⇒ qi qo
R0 |R1 R0
qi ε
q ε
qo
qi qo =⇒
R*
R
Use this approach to get a machine that recognizes the language described by the
following regular expressions. (a) a|b (b) ca* (c) (a|b)c* (d) (a|b)(b*|a*)
Section
IV.4 Regular languages
We have seen that deterministic Finite State machines, nondeterministic Finite
State machines, and regular expressions all describe the same set of languages.
The fact that we can describe these languages in so many different ways suggests
that there is something natural and important about them.†
†
This is just like the fact that the equivalence of Turing machines, general recursive functions, and
all kinds of other models suggests that the computable sets form a natural and important collection.
Neither collection is just a historical artifact of what happened to be first explored.
Definition We will isolate and study these languages.

4.1 Definition A regular language is one that is recognized by some Finite State
machine or, equivalently, described by a regular expression.
4.2 Lemma The number of regular languages over an alphabet is countably infinite.
The collection of languages over that alphabet is uncountable, and consequently
there are languages that are not regular.
Proof Fix an alphabet Σ. Recall that, as defined in Appendix A, any alphabet is
nonempty and finite. Thus there are infinitely many regular languages over that
alphabet, because every finite language is regular — just list all the cases as in
Example 1.8 — and there are infinitely many finite languages.
Next we argue that the number of regular languages is countable. This holds
because the number of regular expressions over Σ is countable: clearly there are
finitely many regular expressions of length 1, of length 2, etc. The union of those
is a countable union of countable sets, and so is countable.
We finish by showing that the set of languages over Σ, the set of all L ⊆ Σ∗, is
uncountable. The set Σ∗ is countably infinite by the argument of the prior two
paragraphs. The set of all L ⊆ Σ∗ is the power set of Σ∗ , and so has cardinality
greater than the cardinality of Σ∗ , which makes it uncountable.
Closure properties In proving Lemma 3.11, the first half of Kleene’s Theorem,
we showed that if two languages L0 , L1 are regular then their union L ∪ L1 is
regular, their concatenation L0 ⌢ L1 is regular, and the Kleene star L0 ∗ is regular
also. Briefly, where R 0 is a regular expression describing the language L0 and
R 1 describes L1 then the regular expression R 0 |R 1 describes L0 ∪ L1 , and R 0R 1
describes the concatenation L0 ⌢ L1 , and R 0 * describes L0 ∗.
Recall that a structure is closed under an operation if performing that operation
on its members always yields another member. The next result restates the above
paragraph in this language.
4.3 Lemma The collection of regular languages is closed under union, concatenation,
and Kleene star.
We can ask about the closure of regular languages under other operations. We
will use the product construction.
4.4 Example The machine on the left, M0 , accepts strings with fewer than two a’s.
The one on the right, M1 , accepts strings with an odd number of b’s.
a,b a a
b b
q0 q1 q2 s0 s1
a a
b
The transition tables contain the same information.

∆0 a b ∆1 a b
+ q0 q1 q0 s0 s0 s1
+ q1 q2 q1 + s1 s1 s0
q2 q2 q2
Consider a machine M whose states are
Q 0 × Q 1 = { (q 0 , s 0 ), (q 0 , s 1 ), (q 1 , s 0 ), (q 1 , s 1 ), (q 2 , s 0 ), (q 2 , s 1 ) }
and whose transitions are given by ∆( (qi , r j ) ) = ( ∆0 (qi ), ∆1 (r j ) ), as here.
∆ a b
(q 0 , s 0 ) (q 1 , s 0 ) (q 0 , s 1 )
(q 0 , s 1 ) (q 1 , s 1 ) (q 0 , s 0 )
(q 1 , s 0 ) (q 2 , s 0 ) (q 1 , s 1 )
(q 1 , s 1 ) (q 2 , s 1 ) (q 1 , s 0 )
(q 2 , s 0 ) (q 2 , s 0 ) (q 2 , s 1 )
(q 2 , s 1 ) (q 2 , s 1 ) (q 2 , s 0 )
This machine runs M0 and M1 in parallel. For instance, if we feed the string
aba to M, then the machine’s states go from (q 0 , s 0 ) to (q 1 , s 0 ), then to (q 1 , s 1 ),
and then to (q 2 , s 1 ). This is simply because M0 passes from q 0 to q 1 , then to q 1 ,
and then q 2 , while M1 does s 0 to s 0 , then to s 1 , and finally to s 1 .
The above table does not specify which states are accepting. Suppose that we
say that accepting states (qi , s j ) are the ones where both qi and s j are accepting.
Then by the prior paragraph, M accepts a string σ if both M0 and M1 accept it.
That is, this specification of accepting states causes M to accept the intersection of
the language of M0 and the language of M1 .
4.5 Theorem The collection of regular languages is closed under set intersection,
set difference, and set complement.
Proof Start with two Finite State machines, M0 and M1 , which accept languages
L0 and L1 over some Σ. Perform the product construction to get M. If the
accepting states of M are those pairs where both the first and second component
states are accepting, then M accepts the intersection of the languages, L0 ∩ L1 .
If the accepting states of M are those pairs where the first component state is
accepting but the second is not, then M accepts the set difference of the languages,
L0 − L1 . A special case of that is when L0 is the set of all strings, Σ∗ , whereby M
accepts the complement of L1 .
These closure properties often make it easier to show that a language is regular.
4.6 Example To show that the language
L = {σ ∈ B∗ σ has an even number of 0’s and more than two 1’s }

is regular, we could produce a machine that recognizes it, or give a regular

expression. Or, we can instead note that L is this intersection,
{σ ∈ B∗ σ has an even number of 0’s } ∩ {σ ∈ B∗ σ has more than two 1’s }

and producing machines for those two is easy.
IV.4 Exercises
✓ 4.7 True or false? Obviously you must justify each answer.
(a) Every regular language is finite.
(b) Over B, the empty language is not regular.
(c) The intersection of two languages is regular.
(d) Over B, the language of all strings, B∗, is not regular.
(e) Every Finite State machine accepts at least one string.
(f) For every Finite State machine there is one that has fewer states but recognizes
the same language.
4.8 One of these is true and one is false. Which is which? (a) Any finite language
is regular. (b) Any regular language is finite.
4.9 Is {σ ∈ B∗ σ represents a power of 2 in binary } a regular language?

4.10 Is English a regular language?

✓ 4.11 Show that each language over Σ = { a, b } is regular.
(a) {σ ∈ Σ∗ σ starts and ends with a }
(b) {σ ∈ Σ∗ the number of a’s is even }

✓ 4.12 True or false? Justify your answer.

(a) If L0 is a regular languages and L1 ⊆ L0 then L1 is also a regular language.
(b) If L0 is not regular and L0 ⊆ L1 then L1 is also not regular.
(c) If L0 ∩ L1 is regular then each of the two is regular.
✓ 4.13 Suppose that the language L over B is regular. Show that the language
L̂ = { 1 ⌢ σ σ ∈ L }, also over B, is also regular.
4.14 If machines have n 0 states and n 1 states, then how many states does the
product have?
4.15 For these two machines,
a b a
b
a,b
q0 q1 a
q2 s0 s1
b
a
b
give the transition table for the cross product. Sepcify the accepting states so that
the result will accept (a) the intersection of the languages of the two machines, and
(b) the union of the languages.
4.16 Find the machine that is the cross product of the second machine, M1 , from
Example 4.4, with itself.
a a
b
s0 s1
b
with itself. Set the accepting states so that it accepts the same language, L1 .
4.17 One of our first examples of Finite State machines, Example 1.6, accepts a
string when it contains at least two 0’s as well as an even number of 1’s. Make
such a machine as a product of two simple machines.
4.18 For each, state True or False and give a justification.
(a) Every language is the subset of a regular language.
(b) The union of a regular language and a language that is not regular must be
not regular.
(c) Every language has a subset that is not regular.
(d) The union of two regular languages is regular, without exception.
4.19 Fill in the blank (with justification): The concatenation of a regular language
with a not-regular language regular. (a) must be (b) might be, or might
be not (c) cannot be
4.20 Where L is a language, define L+ as the language L ⌢ L∗ . Show that if L is
regular then so is L+.
4.21 True or false: all finite languages are regular, and there are countably many
finite languages, and there are countably many regular sets, so therefore all
regular languages are finite.
4.22 Use closure properties to show that if L is regular then the set of even-length
strings in L is also regular.
4.23 Example 4.6 shows that closure properties can make easier some arguments
that a language is regular. It can do the same for arguments that a language
is not regular. The next section shows that { an bn ∈ { a, b }∗ n ∈ N } is not
regular (this is a restatement of Example 5.2). Use that and closure properties to
show that {σ ∈ { a, b }∗ σ contains the same number of a’s as b’s } is not regular.

Hint: one way is to use closure under intersection.

4.24 Prove that the collection of regular languages over Σ is closed under each of
the operations.
(a) pref (L) contains those strings
that are a prefix of some string in the language,
that is, pref (L) = {σ ∈ Σ∗ there is a τ ∈ Σ∗ such that σ ⌢τ ∈ L }
(b) suff (L) contains the strings that are a suffix of some string in the language,
that is, suff (L) = {σ ∈ Σ∗ there is a τ ∈ Σ∗ such that τ σ ∈ L }
(c) allprefs(L) contains the strings such that all of the prefixes are in the language,
that is, allprefs(L) = {σ ∈ L for all τ ∈ Σ∗ that is a prefix of σ , τ ∈ L }
4.25 Lemma 4.2 gives a counting argument, a pure existence proof, that there
are languages that are not regular. But we can also exhibit one. Prove that
L = { 1k k ∈ K } is not regular, where K is the Halting problem set, K =

{e ∈ N ϕ e (e)↓}.
4.26 Lemma 4.2 shows that the collection of regular languages over B is countable.
Show that not every individual language in it is countable.
✓ 4.27 An alternative definition of a regular language is one generated by a regular
grammar, where rewrite rules have three forms: X → tY, or X → t, or X → ε .
That is, the rule head has one nonterminal and rule body has a terminal followed
by a nonterminal, or possibly a single nonterminal or the empty string. This is an
example, with the language that it generates.
S → aS | bS
S → aA
L = {σ ∈ { a, b }∗ σ = τ ⌢ aa or σ = τ ⌢ aab }

A → aB
B → ε|b
Here we outline an algorithm that inputs a regular grammar and produces a
Finite State machine that recognizes the same language. Apply these steps to the
above grammar. (a) For each nonterminal X make a machine state q X , where the
start state is the one for the start symbol. (b) For each X → ε rule make state q X
accepting. (c) For each X → tY rule put a transition from q X to qY labeled t.
(d) If there are any X → t rules then make an accepting state q̄ , and for each
such rule put a transition from q X to q̄ labeled t.
4.28 We can give an alternative proof of Theorem 4.5, that the collection of regular
languages is closed under set intersection, set difference, and set complement,
that does not rely on a somewhat mysterious “by construction.”
c
(a) Observe that the identity S ∩ T = (S c ∪ T c ) gives intersection in terms of
union and complement. Use Lemma 4.3 to argue that if regular languages
are closed under complement then they are also closed under intersection.
(b) Use the identity S − T = S ∪ T c to make a similar observation about set
difference.
(c) Show that the complement of a regular languae is also a regular language.
4.29 Prove that the language recognized by a Finite State machine with n states
is infinite if and only if the machine accepts at least one string of length k , where
n ≤ k < 2n.
4.30 Fix two alphabets Σ0 , Σ1 . A function h : Σ0 → Σ1 ∗ induces a homomorphism
on Σ0 ∗ via the operation h(σ ⌢τ ) = h(σ ) ⌢h(τ ) and h(ε) = ε .
(a) Take Σ0 = B and Σ1 = { a, b } . Fix a homomorphism ĥ(0) = a and ĥ(1) = ba.
Find ĥ(01), ĥ(10), and
ĥ(101).
(b) Define h(L) = {h(σ ) σ ∈ Σ0 ∗ } . Let L̂ = {σ ⌢ 1 σ ∈ B∗ } ; describe it with a

regular expression. Using the homomorphism ĥ from the prior item, describe
ĥ(L̂) with a regular expression.
(c) Prove that the collection of regular languages is closed under homomorphism,
that if L is regular then so is h(L).
4.31 The proofs here works with deterministic Finite State machines. Find a
nondeterministic Finite State machine M so that producing another machine M̂
Section 5. Languages that are not regular 219
by taking the complement of the accepting states, F M̂ = (F M )c , will not result in

the language of the second machine being the complement of the language of the
first.
4.32 We will show that the class of regular languages is closed under reversal.
Recall that the reversal of the language is defined to be the set of reversals of the
strings in the language L R = {σ R σ ∈ L }.

(a) Show that for any two strings the reversal of the concatenation is the
concatenation, in the opposite order, of the reversals (σ0 ⌢ σ1 ) R = σ1 R ⌢ σ0 R .
Hint: do induction on the length of σ1 .
(b) We will prove the result by showing that for any regular expression R , the
reversal L(R) R is described by a regular expression. We will construct this
expression by defining a reversal operation on regular expressions. Fix an
alphabet Σ and let (i) R = , (ii) ε R = ε , (iii) x R = x for any x ∈ Σ,
(iv) (R 0 ⌢ R 1 ) R = R 1 R ⌢ R 0 R , (v) (R 0 |R 1 ) R = R 0 R |R 1 R , and (vi) (R *) R =
(R R )*. (Note the relationship between (iv) and the prior exercise item.)
Now show that R R describes L(R) R . Hint: use induction on the length of the
regular expression R .
Section
IV.5 Languages that are not regular
The prior section gave a counting argument to show that there are languages that
are not regular. Now we produce a technique to show that specific languages are
not regular.
The idea is that, although Finite State machines are finite, they can handle
arbitrarily long inputs. This chapter’s first example, the power switch from
Example 1.1, has only two states but even if we toggle it hundreds of times, it still
keeps track of whether the switch is on or off. To handle these long inputs with
only a small number of states, a machine must revisit states, that is, it must loop.
Loops cause a pattern in what a machine accepts. The diagram shows a machine
that accepts aabbbc (it only shows some of the states, those that the machine
traverses in processing this input).
b
qi3 qi2
b a
q0 a
qi1 qi4 c
qi5
b
Besides aabbbc, this machine must also accept a(abb)2 bc because that string takes
the machine through the loop twice, and then to the accepting state. Likewise,
this machine accepts a(abb)3 bc and looping more times pumps out more accepted
strings.
5.1 Theorem (Pumping Lemma) Let L be a regular language. Then there is a

constant p ∈ N, the pumping length for the language, such that every string σ ∈ L
with |σ | ≥ p decomposes into three substrings σ = α ⌢ β ⌢γ satisfying: (1) the
first two components are short, |α β | ≤ p , (2) β is not empty, and (3) all of the
strings αγ , α β 2γ , α β 3γ , . . . are also members of the language L.
Proof Suppose that L is recognized by the Finite State machine M. Denote the
number of states in M by p . Consider a string σ with |σ | ≥ p .
Finite State machines perform one transition per character so the number of
characters in an input string equals the number of transitions. The number of states
that the machine visits is one more than the number of transitions; for instance,
with a one-character input a machine visits two states (not necessarily distinct).
Thus, in processing the input string σ , the machine must visit some state more
than once. It must loop.
Fix a repeated state, q . Also fix the first two substrings, ⟨s 0 , ... si ⟩ and
⟨s 0 , ... si , ... s j ⟩ , of σ that take the machine to state q . That is, j is minimal such
that i , j and such that the extended transition function gives ∆(⟨s ˆ 0 , ... si ⟩) =
ˆ 0 , ... s j ⟩) = q . Then let α = ⟨s 0 , ... , si ⟩ be the string that brings the machine
∆(⟨s
up to the loop, let β = ⟨si+1 , ... s j ⟩ is the string that brings the machine around
the loop, and let γ = ⟨s j+1 , ... sk ⟩ be the rest of σ . (Possibly one or both of α
and γ is empty.) These strings satisfy conditions (1) and (2). (Choosing q to be a
state that is repeated within the initial segment of σ , and choosing i and j to be
minimal, guarantees that for instance if the string σ brings machine around a loop
a hundred times then we don’t pick an α that includes the first ninety nine loops,
and that therefore is longer than p .)
For condition (3), this string
α ⌢γ = ⟨s 0 , ... si , s j+1 , ... sk ⟩

brings the machine from the start state q 0 to q , and then to the same ending state
ˆ
as did σ . That is, ∆(αγ ˆ βγ ) and so is an accepting state. The other strings
) = ∆(α
in (3) work the same way. For instance, for
α ⌢ β 2 ⌢γ = α ββγ = ⟨s 0 , ... si , si+1 , ... , s j−1 , si+1 , ... s j+1 , ... sk ⟩

the substring α brings the machine from q 0 to the state q , the first β brings it
around to q again, then the second β makes the machine loop to q yet again, and
finally γ brings it to the same ending state as did σ .
Typically we use the Pumping Lemma to show that a language is not regular
through an argument by contradiction.
5.2 Example The classic example is to show that this language of matched parentheses
is not regular. The alphabet is the set of the two parentheses Σ = { ), ( }.
L = { (n )n ∈ Σ∗ n ∈ N } = {ε, (), (()), ((())), (4 )4 , ... }

For contradiction, assume that it is regular. Then the Pumping Lemma says that L
has a pumping length, p .
Consider the string σ = (p )p . It is an element of L and its length is greater
than or equal to p so the Pumping Lemma applies. So σ decomposes into three
substrings σ = α ⌢β ⌢γ satisfying the conditions. Condition (1) is that the length of
the prefix α ⌢ β is less than or equal to p . Because of this condition we know that
both α and β are composed exclusively of open parentheses, (’s. Condition (2) is
that β is not the empty string, so it contains at least one (.
Condition (3) is that all of the strings αγ , α β 2γ , α β 3γ , . . . are members of L.
To get the desired contradiction, consider α β 2γ . Compared with σ = α βγ , this
string has an extra β , which adds at least one open parenthesis without adding any
balancing closed parentheses. In short, α β 2γ has more (’s than )’s. It is therefore
not a member of L. But the Pumping Lemma says it must be a member of L, and
therefore the assumption that L is regular is incorrect.
We have seen many examples of things that regular expressions and Finite State
machines can do. Here we see something that they cannot. Matching parentheses,
and other types of matching, is something that we often want to do, for instance,
in a compiler. So the Pumping Lemma helps us show that for some common
computing tasks, regular languages are not enough.
5.3 Example Recall that a palindrome is a string that reads the same backwards
5 5
as forwards, such
R as bab, abbaabba, or a ba . We will prove that the language
∗
L = {σ ∈ Σ σ = σ } of all palindromes over Σ = { a, b } is not regular.

For contradiction assume that this language is regular. The Pumping Lemma
says that L has a pumping length. Call it p . Consider σ = ap bap , which is an
element of L and has more than p characters. Thus it decomposes as σ = α βγ ,
subject to the three conditions. Condition (1) is that |α β | ≤ p and so both substrings
α and β are composed entirely of a’s. Condition (2) is that β is not the empty
string and so β consists of at least one a.
Consider the list from condition (3), αγ , α β 2γ , α β 3γ , ... We will get the desired
contradiction from the first element, αγ (the other list members also lead to a
contradiction but we only need one).
Compared to σ = α βγ , in αγ the β is gone. Because α and β consist entirely
of a’s, the substring γ got σ ’s b, and must also have the ap that follows it. So in
passing from σ = α βγ to αγ we’ve omitted at least one a before the b but none of
the a’s after it, and therefore αγ is not a palindrome. This contradicts the Pumping
Lemma’s third condition, that the strings in the list are all members of L.
5.4 Remark In that example σ has three parts, σ = ap ⌢ b ⌢ ap , and it decomposes
into three parts, σ = α ⌢ β ⌢γ . Don’t make the mistake of thinking that the two
decompositions match. The Pumping Lemma does not says that α = ap , β = b,
and γ = ap — indeed, as the example says the Pumping Lemma gives that β is
not the b part. Instead, the Pumping Lemma only says that the first two strings
together, α ⌢ β , consists exclusively of a’s. So it could be that α β = ap , or it could
instead be that the γ starts with some a’s that are then followed by bap.
5.5 Example Consider L = { 0m 1n ∈ B∗ m = n + 1 } = { 0, 001, 00011, ... }. Its

members have a number of 0’s that is one more than the number of 1’s. We will
prove that it is not regular.
For contradiction assume otherwise, that L is regular, and set p as its pumping
length. Consider σ = 0p+1 1p ∈ L. Because |σ | ≥ p , the Pumping Lemma gives a
decomposition σ = α βγ satisfying the three conditions. Condition (1) says that
|α β | ≤ p , so that the substrings α and β have only 0’s. Condition (2) says that β
has at least one character, necessarily a 0. Consider the list from Condition (3): αγ ,
α β 2γ , α β 3γ , . . . Compare its first entry, αγ , to σ (other entries would also yield
a contradiction). The string αγ has fewer 0’s then does σ but the same number
of 1’s. So the number of 0’s in αγ is not one more than its number of 1’s. Thus
αγ < L, which contradicts the Pumping Lemma.
We can interpret that example to say that Finite State machines cannot correctly
recognize a predecessor-successor relationship. We can also use the Pumping
Lemma to show Finite State machines cannot recognize other arithmetic relations.
5.6 Example The language L = { an n is a perfect square } = {ε, a, a4 , a9 , a16 , ... }

is not regular. For, suppose otherwise. Fix a pumping length p and consider
2
σ = a(p ) , so that |σ | = p 2 .
By the Pumping Lemma, σ decomposes into α βγ , subject to the three conditions.
Condition (1) is that |α β | ≤ p , which implies that |β | ≤ p . Condition (2) is that
0 < |β | . Now consider the strings αγ , α β 2γ , . . .
The gap between the length |σ | = |α βγ | and the length |α β 2γ | is at most p ,
because 0 < |β | ≤ p . But the definition of the language is that after σ the next
longest string has length (p + 1)2 = p 2 + 2p + 1, which is strictly greater than p .
Thus the length of α β 2γ is not a perfect square, which contradicts the Pumping
Lemma’s assertion that α β 2γ ∈ L.
Sometimes we can solve problems by using the Pumping Lemma in conjunction
with the closure properties of regular languages.
5.7 Example The language L = {σ ∈ { a, b }∗ σ has as many a’s as b’s } is not reg-

ular. To prove that, observe that the language L̂ = { am bn ∈ { a, b }∗ m, n ∈ N }

is regular, described by the regular expression a*b*. Recall that the intersection
of two regular languages is regular. But L ∩ L̂ is the set { an bn n ∈ N } and
Example 5.2 shows that this language isn’t regular, after we substitute a and b for
the parentheses.
In previous sections we saw how to show that a language is regular, either
by producing a Finite State machine that recognizes it or by producing a regular
expression that describes it. Being able to show that a language is not regular
nicely balances that.
But our interest is motivated by more than symmetry. A Turing machine can
solve the problem of Example 5.2, of recognizing strings of balanced parentheses,

but we now know that a Finite State machine cannot. Therefore we now know that
to solve this problem we need scratch memory. So the results in this section speak
to the resources needed to solve the problems.
IV.5 Exercises
A useful technique when you are stuck on a language description is to try listing
five strings that are in the language and five that are not. Another is to describe
the language in prose, as though over a telephone. Both help you think through the
formalities.
✓ 5.8 Example 5.5 shows that { 0m 1n ∈ B∗ m = n + 1 } is not regular but your

friend doesn’t get it and asks you, “What’s wrong with the regular expression
0n+1 1n ?” Explain it to them.
5.9 Example 5.2 uses α β 2γ to show that the language of balanced parentheses is
not regular. Instead get the contradiction by showing that αγ is not a member of
the language.
5.10 Your friend has been thinking. They say, “Hey, the diagram just before
Theorem 5.1 doesn’t apply unless the language is infinite. Sometimes languages
are regular because they only have like three or four strings. So the Pumping
Lemma is wrong.” In what way do they need to further refine their thinking?
5.11 Someone in the class emails you, “If a language has string with length greater
than the number of states, which is the pumping length, then it cannot be a
regular language.” Correct?
✓ 5.12 For each, give five strings that are elements of the language and five that are
not, and then show that the language is not regular. (a) L0 = { an bm n + 2 = m }
(b) L1 = { an bm cn n, m ∈ N } (c) L2 = { an bm n < m }

✓ 5.13 Your study partner has read Remark 5.4 but it is still sinking in. About
the matched parentheses example, Example 5.2, they say, “So σ = (p )p , and
σ = α βγ . We know that α β consists only of a’s, so it must be that γ consists of
)’s.” Give them a goose.
5.14 In class someone asks, “Isn’t it true that languages don’t have a unique
pumping length? That if a length of p = 5 will do then p = 6 will also do?”
Before the prof answers, what do you think?
5.15 Show that the language over { a, b } consisting of strings having more a’s
than b’s is not regular.
✓ 5.16 For each language over Σ = { a, b } produce five strings that are mem-
bers. Then decide if that language is regular. You must prove each asser-
tion by either producing a regular expression
or using the Pumping Lemma.
(a) { an bm ∈ Σ∗ n = 3 } (b) { an bm ∈ Σ∗ n + 3 = m } (c) {α ⌢ α α ∈ Σ∗ }

(d) { an bm ∈ Σ∗ n, m ∈ N } (e) { an bm ∈ Σ∗ m − n > 12 }

5.17 One of these is regular and one is not. Which is which? You must prove your
assertions. (a) { an bm ∈ { a, b }∗ n = m 2 } (b) { an bm ∈ { a, b }∗ 3 < m, n }
✓ 5.18 Use the Pumping Lemma to prove that L = { am−1 cbm m ∈ N+ } is not

regular. It may help to first produce five strings from the language.
5.19 Is {σ ∈ B∗ σ = α βα R for α, β ∈ B∗ } regular? Either way, prove it.

5.20 Prove that L = {σ ∈ { 1 }∗ |σ | = n ! for some n ∈ N } is not regular. Hint: the

differences (n + 1)! − n ! grow without bound.

5.21 One of these is regular, one is not: { 0m 10n m, n ∈ N } and { 0n 10n n ∈ N }.

Which is which? Of course, you must prove your assertions.

✓ 5.22 Show that there is a Finite State machine that recognizes this language of all
sums totaling less than four, L4 = { ai bj ck i, j, k ∈ N and i + j = k and k < 4. }.
Use the Pumping Lemma to show that no Finite State machine recognizes the
language of all sums, L = { ai bj ck i, j, k ∈ N and i + j = k . }.
5.23 Decide if each is a regular language of bitstrings: (a) the number of 0’s plus
the number of 1’s equals five, (b) the number of 0’s minus the number of 1’s
equals five.
✓ 5.24 Show that { 0m 1n ∈ B∗ m , n } is not regular. Hint: use the closure prop-

erties of regular languages.

5.25 Example 5.7 shows that {σ ∈ { a, b }∗ σ has as many a’s as b’s } is not reg-

ular. In contrast, show that L = {σ ∈ { a, b }∗ σ has as many ab’s as ba’s } is

regular. Hint: think of ab and ba as marking a transition from a block of one
character to a block of another.
✓ 5.26 Rebut someone who says to you, “Sure, for the machine before Theorem 5.1,
on page 219, a single loop will cause σ = α ⌢ β ⌢γ . But if the machine had a
double loop like below then you’d need a longer decomposition.”
b b
qi3 qi2 qi7 qi6
b a a b
q0 a
qi1 a
qi4 qi5 a
qi8
b
5.27 Show that {σ ∈ B∗ σ = 1n where n is prime } is not a regular language.

Hint: the third condition’s sequence has a constant positive length difference.
5.28 Consider { ai bj ci ·j i, j ∈ N }. (a) Give five strings from this language.

(b) Show that it is not regular.

5.29 The language L described by the regular expression a*bbbb* is a regular
language. We can apply the Pumping Lemma to it. The proof of the Pumping
Lemma says that for the pumping length we can use the number of states in
a machine that recognizes the language. Here that gives p = 4. (a) Consider
σ = abbb. Give a decomposition σ = α βγ that satisfies the three conditions.
(b) Do the same for σ = b15 .
Section 6. Minimization 225
5.30 For a regular language, a pumping length p is a number with the property
that every word of length p or more can be pumped, that is, can be decomposed
so that it satisfies the three properties of Theorem 5.1. The proof of that theorem
shows that where a Finite State machine recognizes the language, the number of
states in the machine suffices as a pumping length. But p can be smaller.
(a) Consider the language L described by (01)*. Construct a deterministic
Finite State machine with three states that recognizes this language.
(b) Show that the minimal pumping length for L is 1.
5.31 Nondeterministic Finite State machines can always be made to have a single
accepting state. For deterministic machines that is not so.
(a) Show that any deterministic Finite State machine that recognizes the finite
language L1 = {ε, a } must have at least two accepting states.
(b) Show that any deterministic Finite State machine that recognizes L2 =
{ε, a, aa } must have at least three accepting states.
(c) Show that for any n ∈ N there is a regular language that is not recognized
by any deterministic Finite State machine with at most n accepting states.
Section
IV.6 Minimization
Contrast these two Finite State machines. For each, the language of accepted
strings is {σ ∈ B∗ σ has at least one 0 and at least one 1 }.
q0 q2 1 q0 q2 q4 1
1 1 1
0 0 0 0 0
( ∗)
q1 q3 q1 q3 q5
1 1 1
0 0,1 0 0 0,1
Our experience from making machines is that in a properly designed machine the
states have a well-defined meaning. For instance, on the left q 2 means something
like, “have seen at least one 1 but still waiting for a 0.”
The machine on the right doesn’t satisfy this design principle because the
meaning of q 4 is the same as that of q 2 , and q 3 ’s meaning is the same as q 5 ’s. That
is, the two pairs of states have the same future. This machine has redundant states.
We will give an algorithm that starts with a Finite State machine and from
it finds the smallest machine that recognizes the same language. The algorithm
collapses together redundant states.
6.1 Definition In a Finite State machine over Σ, where n ∈ N we say that two
states q, q̂ are n -distinguishable if there is a string σ ∈ Σ∗ with |σ | ≤ n such that
starting the machine in state q and giving it input σ ends in an accepting state
while starting it in q̂ and giving it σ does not, or vice versa. Otherwise the states
are n -indistinguishable, q ∼n q̂ .
Two states q, q̂ are distinguishable if there is an n for which they are n -
distinguishable. Otherwise they are indistinguishable, q ∼ q̂ .
6.2 Example Consider the machine on the left above. Starting it in state q 0 and feeding
it σ = 0 ends in the non-accepting state q 1 , while starting it in q 2 and processing
the same input ends in the accepting state q 3 . So q 0 and q 2 are 1-distinguishable,
and therefore are distinguishable.
Another example is that q 2 and q 3 are 0-distinguishable, via σ = ε . That is, a
state that is not accepting is 0-distinguishable from a state that is accepting.
6.3 Example More happens with the machine in the right. This table gives the result
of starting in each state and feeding the machine each length 0, length 1, and
length 2 string. As called for in the definition, the table doesn’t give the resulting
state but instead records whether it is accepting, F , or nonaccepting, Q − F .
ε 0 1 00 01 10 11
q0 Q −F Q −F Q −F Q −F F F Q −F
q1 Q −F Q −F F Q −F F F F
q2 Q −F F Q −F F F F Q −F
q3 F F F F F F F
q4 Q −F F Q −F F F F Q −F
q5 F F F F F F F
The effect of the length 0 string is that there are two kinds of states: members of
{q 0 , q 1 , q 2 , q 4 } are taken to nonaccepting resulting states and members of {q 3 , q 5 }
result in accepting states.
The length 1 strings split the machine’s states into four groups. For instance,
q 0 is 1-distinguishable from q 1 because the two result columns say Q − F , Q − F
for q 0 but say Q − F , F for q 1 . In total there are four 2-distinguishable sets of states,
{q 0 }, {q 1 }, {q 2 , q 4 }, and {q 3 , q 5 }.
The length 2 strings do not further divide the states; the relation of 2-
distinguishable gives the same four classes of states.
6.4 Lemma The ∼ relation and the ∼n relations are equivalences.

Our algorithm† first finds all states that are distinguishable by the length zero
string, next finds all states distinguishable by length zero or one strings, etc. At the
end the machine’s states are broken into classes where inside each class the states
are indistinguishable by strings of any length. Those classes serve as the states of
the minimal machine. We first outline the steps, then we will work through two
complete examples
†
This is Moore’s algorithm. It is easy and suitable for small calculations but if you are writing code then
be aware that another algorithm, Hopcroft’s algorithm, is more efficient, but also more complex.
So consider again the machine with redundant states that we saw in (∗) above.
We use the following notation the equivalence classes, here for the two classes of
the ∼0 relation, the four of the ∼1 relation, and the four of the ∼2 relation.
n ∼n classes
0 E0, 0 = {q 0 , q 1 , q 2 , q 4 } E0, 1 = {q 3 , q 5 }
1 E1, 0 = {q 0 } E1, 1 = {q 1 } E1, 2 = {q 2 , q 4 } E1, 3 = {q 3 , q 5 }
2 E2, 0 = {q 0 } E2, 1 = {q 1 } E2, 2 = {q 2 , q 4 } E2, 3 = {q 3 , q 5 }
The states that we spotted by eye as redundant, q 2 , q 4 and q 3 , q 5 continue to be
together in the same class.
For the algorithm, consider how states q and q̂ could be n + 1-distinguishable but
not n -distinguishable. Let the length n + 1 string σ = ⟨s 0 , s 1 , ... sn−1 , sn ⟩ = τ ⌢sn
distinguishes them. Because the states are not n -distinguishable, where the prefix
τ brings the machine from q to a state r in some class En,i , then τ must bring the
machine from q̂ to some rˆ in the same class, En,i . So distinguishing between these
states must involve σ ’s final character sn taking r to a state in one class, En, j , and
taking rˆ to a state in another, En, jˆ.
Therefore, at each step we don’t need to test whole strings, we need only test
single characters, to see whether they split the equivalence classes, the En,i ’s.
For instance, consider again the machine on the right above, along with its
∼1 classes E1, 0 , E1, 1 , E1, 2 , and E1, 3 . To see if there is any additional splitting in
going to the ∼2 classes, instead of checking all the length 2 strings we see if the
members of E1, 2 and E2, 2 , and are sent to different ∼1 classes on being fed single
characters. (We need only test classes with more than one member because the
singleton classes cannot split.)
E1, 2 0 1 E2, 2 0 1
q2 E1, 3 E1, 2 q3 E1, 3 E1, 3
q4 E1, 3 E1, 2 q5 E1, 3 E1, 3
In both tables there is no split, because the right side of the rows are the same for
all the classes members. So we can stop.
The examples of this algorithm below show how to translate this into a minimal
machine, and add a table notation that simplifies the computation.
6.5 Example We will find a machine that recognizes the same language as this one
but that has a minimum number of states.
q2 a q4
a,b
b
b
q0 q5 a,b
a b a,b
q1 a
q3
To do bookkeeping we will use triangular tables like the one below. They have an
entry for every two-element set {i, j } where i and j are indices of states and i , j .
Start by checkmarking the i, j entries where one of qi and q j is accepting while
the other is not.
0
✓ 1
✓ 2
✓ ✓ 3
✓ ✓ 4
✓ ✓ ✓ 5
These mark states that are 0-distinguishable and the blanks denote pairs of states
that are 0-indistinguishable. In short, here are the two ∼0 -equivalence classes.
E0, 0 = {q 0 , q 3 , q 4 } E0, 1 = {q 1 , q 2 , q 5 }
Next investigate the table’s blanks, the 0-indistinguishable pairs of states, to

see if they can be 1-distinguished. For each ∼0 class, look at all pairs of states. The
table below lists them on the left. For each single character input, see where that
character sends the two states in the pair; that’s listed in the middle. On the right
are the associated ∼0 classes. If any entry on the right has two different classes
then put a checkmark on that row because the two states are 1-distinguishable.
For instance, q 0 and q 3 are in the same ∼0 class, E0, 0 . On being fed an a, q 0
goes to q 1 and q 3 goes to q 5 . The two, q 1 and q 5 , are together in E0, 1 .
In contrast, q 2 and q 5 are together in E0, 1 . The character b takes them to q 3
and q 5 . Those are in different ∼0 classes, so their row gets a checkmark. (Put a
mark if at least one entry on the right has two different classes.)
a b a b
q0, q3 q1, q5 q2, q5 E0, 1 , E0, 1 E0, 1 , E0, 1 0
q0, q4 q1, q5 q2, q5 E0, 1 , E0, 1 E0, 1 , E0, 1 ✓ 1
✓ 2
q3, q4 q5, q5 q5, q5 E0, 1 , E0, 1 E0, 1 , E0, 1
✓ ✓ 3
q1, q2 q3, q4 q4, q3 E0, 0 , E0, 0 E0, 0 , E0, 0 ✓ ✓ 4
✓ q1, q5 q3, q5 q4, q5 E0, 0 , E0, 1 E0, 0 , E0, 1 ✓ ✓ ✓ ✓ ✓ 5
✓ q2, q5 q4, q5 q3, q5 E0, 0 , E0, 1 E0, 0 , E0, 1
We have found that the states q 1 and q 2 are not 1-distinguishable, but that q 5 can
be 1-distinguished from q 1 and q 2 . In short, E0, 1 = {q 1 , q 2 , q 5 } splits into two
∼1 classes.
E1, 0 = {q 0 , q 3 , q 4 } E1, 1 = {q 1 , q 2 } E1, 2 = {q 5 }
We’ve updated the triangular table with marks at 1, 5 and 2, 5.
Iterate. The next iteration subdivides the ∼1 -equivalence classes, the E1,i ’s, to
compute the ∼2 -equivalence classes.
a b a b 0
✓ {q 0 , q 3 } {q 1 , q 5 } {q 2 , q 5 } E1, 1 , E1, 2 E1, 1 , E1, 2 ✓ 1
✓ 2
✓ {q 0 , q 4 } {q 1 , q 5 } {q 2 , q 5 } E1, 1 , E1, 2 E1, 1 , E1, 2
✓ ✓ ✓ 3
{q 3 , q 4 } {q 5 , q 5 } {q 5 , q 5 } E1, 2 , E1, 2 E1, 2 , E1, 2 ✓ ✓ ✓ 4
{q 1 , q 2 } {q 3 , q 4 } {q 4 , q 3 } E1, 0 , E1, 0 E1, 0 , E1, 0 ✓ ✓ ✓ ✓ ✓ 5
We have found that q 3 and q 4 are not 2-distinguishable, they are each distinguishable
from q 0 . The ∼1 class E1, 0 splits into two ∼2 classes.
E2, 0 = {q 0 } E2, 1 = {q 1 , q 2 } E2, 2 = {q 3 , q 4 } E2, 3 = {q 5 }
The updated triangular table contains the same information since its only blanks
are at entries 1, 2 and 3, 4.
Once more through the iteration gives this.
0
a b a b ✓ 1
✓ 2
{q 1 , q 2 } {q 3 , q 4 } {q 4 , q 3 } E2, 2 , E2, 2 E2, 2 , E2, 2
✓ ✓ ✓ 3
{q 3 , q 4 } {q 5 , q 5 } {q 5 , q 5 } E2, 3 , E2, 3 E2, 3 , E2, 3 ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ ✓ 5
There is no more splitting. The algorithm stops with these classes.
E2, 0 = {q 0 } E2, 1 = {q 1 , q 2 } E2, 2 = {q 3 , q 4 } E2, 3 = {q 5 }
This shows the minimized machine, with r 0 as a name for E2, 0 and r 1 for E2, 1 , etc.
Its start state r 0 is the one containing q 0 . Its final states are the ones containing
final states of the original machine.
r0 a,b
r1 a,b
r2 a,b
r3 a,b
As to the connections between states, for instance consider r 0 = {q 0 }. In the

original machine, q 0 under input a goes to q 1 . Since q 1 is an element of E2, 1 = r 1 ,
the a arrow out of r 0 goes to r 1 .
The algorithm has one more step, which was not needed in the prior example.
If there are states that are unreachable from q 0 then we omit those at the start.
6.6 Example Minimize this machine.
q1 q3 0,1
1
0
q0 0
1
q2 q4 q5
1 1
0 0,1 0
First, q 5 cannot be reached from the start state. Drop it. That leaves this initial
triangular table.
0
1
2
✓ ✓ ✓ 3
✓ ✓ ✓ 4
It gives these ∼0 classes, the non-final states and the final states.
E0, 0 = {q 0 , q 1 , q 2 } E0, 1 = {q 3 , q 4 }
Next we see if the 0 classes split.
0 1 0 1 0
✓ q0, q1 q1, q2 q2, q3 E0, 0 , E0, 0 E0, 0 , E0, 1 ✓ 1
✓ q0, q2 q1, q2 q2, q4 E0, 0 , E0, 0 E0, 0 , E0, 1 ✓ 2
q1, q2 q2, q2 q3, q4 E0, 0 , E0, 0 E0, 1 , E0, 1 ✓ ✓ ✓ 3
q3, q4 q3, q4 q3, q4 E0, 1 , E0, 1 E0, 1 , E0, 1 ✓ ✓ ✓ 4
The first row gets a check mark because on being fed a 1 the states q 0 and q 1 go
to resulting states, q 2 and q 3 , that are in different ∼0 classes. The same is true
for the second row. So q 0 is 1-distinguishable from q 1 and q 2 but they are not
1-distinguishable from each other. That is, E0, 0 = {q 0 , q 1 , q 2 } splits in two.
E1, 0 = {q 0 } E1, 1 = {q 1 , q 2 } E1, 2 = {q 3 , q 4 }
As earlier, the updated triangular table contains the same information, since it has
only two blank entries, 1, 2 and 3, 4.
On the next iteration no more splitting happens. The minimized machine has
three states.
0 0,1
s0 0,1
s1 1
s2
We will close this section with a proof that this algorithm returns a minimal
machine. For that, consider the drawing below. It has the input machine above the
output machine so we can imagine that its states project down onto the output
machine’s states with p(q 0 ) = r 0 , p(q 1 ) = p(q 2 ) = r 1 , p(q 3 ) = p(q 4 ) = r 2 , and
p(q 5 ) = r 3 .
q2 a q4
a,b
b
b
algorithm input MI : q0 q5 a,b
a b a,b
q1 a
q3
output MO : r0 a,b
r1 a,b
r2 a,b
r3 a,b
The point is that the arrows work — the algorithm groups together MI ’s states to
make MO ’s states in a way that respects the starting machine’s transitions.
The tables below make the same point. The left table is the transition function
of the starting machine, ∆MI . The right table groups the q ’s into r ’s, so it shows
∆MO . The states are grouped in a way that allows the transitions in MO to be
derived from the transitions in MI . For instance, q 1 and q 2 project to r 1 , and when
presented with an input a they each transition to a state (q 3 and q 4 respectively)
that projects to r 2 .
∆MI a b ∆MO a b
q0 q1 q2 r0 r1 r1
q1 q3 q4
r1 r2 r2
q2 q4 q3
q3 q5 q5
r2 r3 r3
q4 q5 q5
q5 q5 q5 r3 r3 r3
More precisely, the algorithm allows us to define E MO (p(q), x) = p( E MI (q, x) ) for
all q ∈ MI and x ∈ Σ.
6.7 Lemma The algorithm above returns a machine that recognizes the same language
as the input machine, L(MO ) = L(MI ), and from among all of the machine
recognizing the same language has the minimal number of states.
Proof We will argue that the algorithm halts for all input machines, that the
returned machine recognizes the same language, and that it has a minimal number
of states. The first is easy: the algorithm halts after a step where no class splits and
since these machines have only finitely many states, that step must appear.
The second holds because the transition function of the output machine respects
the transition function of the input. Start both machines on the same string, σ ∈ Σ∗ .
The machine MI starts in q 0 while MO starts in a state E0 that contains q 0 . The
first character of σ moves MI to a state q̂ and moves MO to a state that contains q̂ .
The processing proceeds in this way until the string runs out. Then MI is in a final
state if and only if MO is in a state that contains that final state, which is itself a
final state of MO . Thus the two machines accept the same set of strings.
For the third, let M̂ be a machine that recognizes the same language as MO .
We will show that it has at least as many states by giving an association, where
each state in MO is associated with at least one state in M̂ and never are different
states in MO associated with the same state in M̂.
Consider the union of the sets of states of the two machines (assume that they
have no states in common). We will follow the process above to find when two
states in this union are indistinguishable. As above, start by saying that two states
in the union are 0-indistinguishable if either both are final in their own machine or
neither is final. Step n + 1 of this process, also as above, begins with ∼n classes
En, 0 , ... En,k of states from the union that are n -indistinguishable. For each such
class, see if it splits. That is, see if there are two states in that class that are sent by
a character x ∈ Σ to different ∼n classes. This gives the ∼n+1 classes. When we
reach a step with no splitting then we know which states are indistinguishable and
they form the ∼ classes.
Notice that the start states in the two machines are indistinguishable, are
in the same ∼ class, because L(MO ) = L(M̂). In addition, if two states are
indistinguishable then their successor states on any one input symbol x ∈ Σ are
also indistinguishable from each other, simply because if a string σ distinguishes
between the successors then x ⌢ σ distinguishes between the original two states.
In turn, the successors of these successors are indistinguishable, etc.
Now, say that states in MO and M̂ are associated if they are indistinguishable,
that is, if they are in the same ∼ class. We first show that every state q of MO
is associated with at least one state of M̂. Because MO is the output of the
minimization process, it has no inaccessible state. So there is a string that takes
the start state of MO to q . This string takes the start state of M̂ to some q̂ , and
the prior paragraph applies to give that q ∼ q̂ .
We finish by showing that there cannot be two different states of MO that are
both associated with the same state of M̂. If there were two such states then by
Lemma 6.4 they would be indistinguishable from each other. But that’s impossible
because MO is the output of the minimization process, which ensures all of its
states are distinguishable.
IV.6 Exercises
6.8 From the triangular table find the ∼i classes.
0
1
2
✓ ✓ ✓ 3
✓ ✓ ✓ ✓ 4
✓ ✓ ✓ ✓ 5
6.9 From the ∼i classes find the associated triangular table. (a) Ei, 0 = {q 0 , q 1 },
Ei, 1 = {q 2 }, and Ei, 2 = {q 3 , q 4 }, (b) Ei, 0 = {q 0 }, Ei, 1 = {q 1 , q 2 , q 4 }, and
Ei, 2 = {q 3 }, (c) Ei, 0 = {q 0 , q 1 , q 5 }, Ei, 1 = {q 2 , q 3 }, and Ei, 2 = {q 4 },
✓ 6.10 Suppose that E0, 0 = {q 0 , q 1 , q 2 , q 5 } and E0, 1 = {q 3 , q 4 }, and from the
machine you compute this table.
a b a b
q0, q1 q1, q1 q2, q3 E0, 0 , E0, 0 E0, 0 , E0, 1
q0, q2 q1, q2 q2, q4 E0, 0 , E0, 0 E0, 0 , E0, 1
q0, q5 q1, q5 q2, q5 E0, 0 , E0, 0 E0, 0 , E0, 0
q1, q2 q1, q2 q3, q4 E0, 0 , E0, 0 E0, 1 , E0, 1
q1, q5 q1, q5 q3, q5 E0, 0 , E0, 0 E0, 1 , E0, 0
q2, q5 q2, q5 q4, q5 E0, 0 , E0, 0 E0, 1 , E0, 0
q3, q4 q3, q4 q5, q5 E0, 1 , E0, 1 E0, 0 , E0, 0
(a) Which lines of the table do you checkmark? (b) Give the resulting ∼1 classes.
✓ 6.11 This machine accepts strings with an odd parity, with an odd number of 1’s.
Minimize it, using the algorithm described in this section. Show your work.
0 0 0
1
q0 q1 q2
1
1
✓ 6.12 For many machines we can find the unreachable states by eye, but there is
an algorithm. It inputs a machine M and initializes the set of reachable states
to R 0 = {q 0 }. For n > 0, step n of the algorithm is: for each q ∈ R n find all
states q̂ reachable from q in one transition and add those to make Rn+1 . That
is, R n+1 = R n ∪ { q̂ = ∆M (q, x) q ∈ R n and x ∈ Σ }. The algorithm stops when
Rk = Rk +1 and the set of reachable states is R = Rk . The unreachable states are
the others, Q − R .
For each machine, perform this algorithm. Show the steps.
q3 q5
a,b
b
a,b
(a) a
a
a,b (b) a b
q0 q1 q2 q0 q1 a q2 q3 q4
a a, b
b
a,b b b
✓ 6.13 Perform the minimization algorithm on the machine with redundant states
at the start of this section, the one on the right in (∗) on page 225.
6.14 What happens when you minimize a machine that is already minimal?
✓ 6.15 This machine accepts strings described by (ab|ba)*. Minimize it, using the
algorithm of this section and showing the work.
a
q0 a
q1 q2 b q7
b
b a a b a a b
q3 q4 b q5 a q6
b
a,b
6.16 If a machine’s start state is accepting, must the minimized machine’s start
state be accepting? If you think “yes” then prove it, and if you think “no” then
give an example machine where it is false.
6.17 Minimize this machine.

q1
0 1
0 0
q0 q2 q4 0,1
1
0
1 1
q3
6.18 Minimize this. Show the work, including producing the diagram of the
minimized machine.
a a
q1 q3
b
a b
q0 q5 a,b
b b
q2 q4
b
a a
6.19 This machine has no accepting states. Minimize it.
q0 q1 0
0
0
1 0 1
1 q2 1 q3
What happens to a machine where all states are accepting?

6.20 Minimize this machine.
q2
b b
a b
b
b
q0 q1 a
q3 a q4 a q5
a
b a
6.21 What happens if you perform the minimization procedure in Example 6.6
without first omitting the unreachable state?
✓ 6.22 Minimize.
q0 a
q1 a
q2 a
q3 a
q4
b b b b a,b
Note that the algorithm takes, roughly, a number of steps that are equal to the
number of states in the machine.
Section 7. Pushdown machines 235
6.23 Verify Lemma 6.4.

(a) Verify that each ∼n is an equivalence relation between states.
(b) Verify that ∼ is an equivalence.
6.24 There are ways to minimize Finite State machines other then the one given
in this section. One is Brzozowski’s algorithm, which has the advantage of being
surprising and fun in that you perform some steps that seem a bit wacky and
unrelated to elimination of states and then at the end it has worked. (However,
it has the disadvantage of taking worst-case exponential time.) We will walk
through the algorithm using this Finite State machine, M.
q1 a,b
a
q0 a
b
q2 b
(a) Use the algorithm in this section to minimize it.

(b) Instead, get a new machine by taking M, changing the state names to be
ti instead of qi , and reversing all the arrows. This gives a nondeterministic
machine. Mark what used to be the initial state as an accepting state, and
mark what used to be the accepting state as an initial state. (In general, this
may result in a machine with more than one initial state.)
(c) Use the method described in an earlier section to convert this into a deter-
ministic machine, whose states are named ui . Omit unreachable states.
(d) Repeat the second item by changing the state names to vi instead of ui ,
and reversing all the arrows. Mark what used to be the initial state as an
accepting state and mark what used to be the accepting state as an initial
state.
(e) Convert to a deterministic machine and compare with the one in the first
item.
6.25 For each language L recognized by some Finite State machine, let rank(L)
be the smallest number n ∈ N such that L is accepted by a Finite State machine
with n states. Prove that for every n there is a language with that rank.
Section
IV.7 Pushdown machines
No Finite State machine can recognize the language of balanced parentheses.
So this machine model is not powerful enough to use, for instance, if you want
to decide whether input strings are valid programs in a modern programming
language. To handle nested parentheses the natural data structure is a stack. We
will next see a machine type consisting of a Finite State machine with access to a
pushdown stack.
Like a Turing machine tape, a stack is unbounded storage. But it has restrictions
that the tape does not. A stack doesn’t give random access to the elements. It is
like the restaurant dish dispenser below. When you pop a dish off, a spring pushes
the remaining dishes up so you can reach the next one. When you push a new dish
on, its weight compresses the spring so all the old dishes move down and the latest
dish is the only one that you can reach. We say that this stack is LIFO, Last-In,
First-Out.
Below on the right is a sequence of views of a stack data structure. First the
stack has two characters, g3 and g2. We push g1 on the stack, and then g0. Now,
although g1 is on the stack, we don’t have immediate access to it. To get at g1 we
must first pop off g0, as in the last stack shown.
g3 g1 g0 g1
g2 g3 g1 g3
g2 g3 g2
g2
Once something is popped, it is gone. We could include in the machine a state

whose intuitive meaning is that we have just popped g0 but because there are
finitely many states that strategy has limits.
Definition To define these machines we will extend the definition of Finite State
machines, starting with deterministic machines.
7.1 Definition A Pushdown machinehas a finite set of states Q = {q 0 , ... qn−1 }
including a start state q 0 and a subset F ⊆ Q of accepting states, a nonempty
input alphabet Σ and a nonempty stack alphabet Γ , as well as a transition function
∆ : Q × (Σ ∪ { B, ε }) × Γ → Q × Γ ∗ .
We assume that the stack alphabet Γ contains the character that we use to mark
the stack bottom, ⊥.† The rest of Γ is g0, g1, etc. We also assume that the tape
alphabet Σ does not contain the blank, B, or the character ε .‡
†
Read that character aloud as “bottom.” ‡ The definition allows ε to appear in two separate places, as
the second component of ∆’s inputs and also as the empty string, from Γ ∗ . However, one of those is in
the inputs and the other is in the outputs so it isn’t ambiguous.
The transition function describes how these machines act. For the input
⟨qi , s, дj ⟩ ∈ Q × (Σ ∪ { B, ε }) × Γ there are two cases. When the character s is
an element of Σ ∪ { B } then an instruction ∆(qi , s, дj ) = ⟨qk , γ ⟩ applies when the
machine is in state qi with the tape head reading s and with the character дj on
top of the stack. If there is no such instruction then the computation halts, with
the input string not accepted. If there is such an instruction then the machine does
this: (i) the read head moves one cell to the right, (ii) the machine pops дj off
the stack and pushes the characters of the string γ = ⟨дi 0 , ... дim ⟩ onto the stack
in the order from дim first to дi 0 last, and (iii) the machine enters state qk . The
other case for the input ⟨qi , s, дj ⟩ is when the character s is ε . Everything is the
same except that the tape head does not move. (We use this case to manipulate
the stack without consuming any input.)
As with Finite State machines, Pushdown machines don’t write to the tape but
only consume the tape characters. However, unlike Finite State machines they can
fail to halt, see Exercise 7.6.
The starting configuration has the machine in state q 0 , reading the first character
of σ ∈ Σ∗ , and with the stack containing only ⊥. A machine accepts its input σ if,
after starting in its starting configuration and after scanning all of σ , it eventually
enters an accepting state q ∈ F .
Notice that at each step the machine pops a character off the stack. If we want
to leave the stack unchanged then as part of the instruction we must push that
character back on. In addition, if the machine reaches a configuration where the
stack is empty then it will lock up and be unable to perform any more instructions.†
7.2 Example This grammar generates the language of balanced parentheses.
S → [ S ] | SS | ε LBAL = {ε, [], [[]], [][], [[][]], [[[]]], ... }
The Pumping Lemma shows that no Finite State machine recognizes this language.
But it is recognized by a Pushdown machine. This machine has states Q =
{q 0 , q 1 , q 2 } with accepting states F = {q 1 }, and languages Σ = { [, ] } and
Γ = { g0, ⊥ }. The table gives its transition function ∆, with the instructions
numbered for ease of reference.
Instr no Input Output
0 qo , [, ⊥ q 0 , ‘g0⊥’
1 qo , [, g0 q 0 , ‘g0g0’
2 qo , ], g0 q0, ε
3 qo , ], ⊥ q2, ε
4 qo , B, ⊥ q1, ε
It keeps a running tally of the number of [’s minus the number of ]’s, as the number
†
An alternative to the final state definition of acceptance we are using is to define that a machine
accepts its input if after consuming that input, it empties the stack. The definitions are equivalent in
that a string is accepted by either type of machine if it is accepted by the other.
of g0’s on the stack. This computation starts with the input [[]][] and ends in an
accepting state.
Step Configuration
[ [ ] ] [ ] ⊥
0 q0
[ [ ] ] [ ] g0 ⊥
1 q0
[ [ ] ] [ ] g0 g0 ⊥
2 q0
[ [ ] ] [ ] g0 ⊥
3 q0
[ [ ] ] [ ] ⊥
4 q0
[ [ ] ] [ ] g0 ⊥
5 q0
[ [ ] ] [ ] ⊥
6 q0
[ [ ] ] [ ]
7 q1
7.3 Example Recall that a palindrome is a string that reads the same forwards and
backwards, σ = σ R . This language of palindromes uses a c character as a middle
marker.
LMM = {σ ∈ { a, b, c }∗ σ = τ ⌢ c ⌢τ R for some τ ∈ { a, b }∗ }

When the Pushdown machine below is reading τ it pushes characters onto the
stack; g0 when it reads a and g1 when it reads b. That’s state q 0 . When the
machine hits the middle c, it reverses. It enters q 1 and starts popping; when
reading a it checks that the popped character is g0, and when reading b it checks
that what popped is g1. If the machine hits the stack bottom at the same moment
that the input runs out, then it goes into the accepting state q 3 .
Instr no Input Output Instr no Input Output

0 q 0 , a, ⊥ q 0 , ‘g0⊥’ 9 q 1 , a, g0 q1, ε
1 q 0 , b, ⊥ q 0 , ‘g1⊥’ 10 q 1 , a, g1 q2, ε
2 q 0 , ε, ⊥ q3, ε 11 q 1 , a, ⊥ q2, ε
3 q 0 , a, g0 q 0 , ‘g0g0’ 12 q 1 , b, g0 q2, ε
4 q 0 , a, g1 q 0 , ‘g0g1’ 13 q 1 , b, g1 q1, ε
5 q 0 , b, g0 q 0 , ‘g1g0’ 14 q 1 , b, ⊥ q2, ε
6 q 0 , b, g1 q 0 , ‘g1g1’ 15 q 1 , B, g0 q2, ε
7 q 0 , c, g0 q 1 , ‘g0’ 16 q 1 , B, g1 q2, ε
8 q 0 , c, g1 q 1 , ‘g1’ 17 q 1 , B, ⊥ q3, ε
This computation has the machine accept bacab.
Step Configuration
b a c a b ⊥
0 q0
b a c a b g1 ⊥
1 q0
b a c a b g0 g1 ⊥
2 q0
b a c a b g0 g1 ⊥
3 q1
b a c a b g1 ⊥
4 q1
b a c a b ⊥
5 q1
b a c a b
6 q3
7.4 Remark Stack machines are often used in practice, particularly for running
hardware. Here is a ‘Hello World’ program in the PostScript printer language.
/Courier % name the font
20 selectfont % font size in points, 1/72 of an inch
72 500 moveto % position the cursor
(Hello world!) show % stroke the text
showpage % print the page
The interpreter pushes Courier on the stack, and then on the second line pushes
20 on the stack. It then executes selectfont, which pops two things off the stack
to set the font name and size. After that it moves the current point, and places the
text on the page. Finally, it draws that page to paper.
This language is quite efficient. But it is more suited to situations where the
code is written by a program, such as with a word processor or LATEX, than to
situations where a person is writing it.

Nondeterministic Pushdown machines To get nondeterminism we alter the
definition in two ways. The first is minor: we don’t need the input character
blank, B, as a nondeterministic machine can guess when the input string ends.
The second alteration changes the transition function ∆. We now allow the
same input to give different outputs, ∆ : Q × (Σ ∪ {ε }) × Γ → P (Q × Γ ∗ ). (If
the set of outputs is empty then we take the machine to freeze, resulting in a
computation that does not accept the input.) As always with nondeterminism, we
can conceptualize this either as that the the computation evolves as a tree or as
that the machine chooses one of the outputs.
7.5 Example This grammar generates the language of all palindromes over B∗ .
P → ε | 0 | 1 | 0P0 | 1P1 LPAL = {ε, 0, 1, 00, 11, 000, 010, 101, 111, ... }
This language is not recognized by any Finite State machine, but it is recognized
by a Pushdown machine.
This machine has Q = {q 0 , q 1 , q 2 } with accepting states F = {q 2 }, and
languages Σ = B and Γ = { g0, g1, ⊥ }.
During its first phase it puts g0 on the stack when it reads the input 0 and
puts g1 on the stack when it reads 1. During the second phase, if it reads 0 then
it only proceeds if the popped stack character is g0 and if it reads 1 then it only
proceeds if it popped g1.
Instr no Input Output Instr no Input Output

0 qo , 0, ⊥ q 0 , ‘g0⊥’ 9 qo , 1, g0 q 1 , ‘g1g0’
1 qo , 1, ⊥ q 0 , ‘g1⊥’ 10 qo , 1, g1 q 1 , ‘g1g1’
2 qo , ε, ⊥ q2, ε 11 qo , 0, g0 q 1 , ‘g0’
3 qo , 0, g0 q 0 , ‘g0g0’ 12 qo , 0, g1 q 1 , ‘g1’
4 qo , 1, g0 q 0 , ‘g1g0’ 13 qo , 1, g0 q 1 , ‘g0’
5 qo , 0, g1 q 0 , ‘g0g1’ 14 qo , 1, g1 q 1 , ‘g1’
6 qo , 1, g1 q 0 , ‘g1g1’ 15 q 1 , 0, g0 q1, ε
7 qo , 0, g0 q 1 , ‘g0g0’ 16 q 1 , 1, g1 q1, ε
8 qo , 0, g1 q 1 , ‘g0g1’ 17 q 1 , 0, g0 q2, ε
18 q 1 , 1, g1 q2, ε
How does the machine know when to change from phase one to two? It is
nondeterministic — it guesses. For instance, compare instructions 3 and 7, which
show the same input associated with two different outputs.
Here the machine accepts the string 0110. In the calculation it uses instructions
0, 9, 16, and 17.
Step Configuration
0 1 1 0 ⊥
0 q0
0 1 1 0 g0 ⊥
1 q0
0 1 1 0 g1 g0 ⊥
2 q1
0 1 1 0 g0 ⊥
3 q1
0 1 1 0 ⊥
4 q2
Here is the machine accepting 01010 using instructions 0, 4, 12, 16, and 17.
Step Configuration
0 1 0 1 0 ⊥
0 q0
0 1 0 1 0 g0 ⊥
1 q0
0 1 0 1 0 g1 g0 ⊥
2 q0
0 1 0 1 0 g1 g0 ⊥
3 q1
0 1 0 1 0 g0 ⊥
4 q1
0 1 0 1 0 ⊥
5 q2
The nondeterminism is crucial. In the first example, after step 1 the machine is
in state q 0 , is reading a 1, and the character that will be popped off the stack is g0.
Both instructions 3 and 9 apply to that configuration. But, applying instruction 3
would not lead to the machine accepting the input string. The computation shown
instead applies instruction 9, going to state q 1 , whose intuitive meaning is that the
machine switches from pushing to popping.
We have given two mental models of nondeterminism. One is that the machine
guesses when to switch, and that for this even-length string making that switch
halfway through is the right guess. We say the string is accepted because there
exists a guess that is correct, that ends in acceptance. (That there exist incorrect
guesses is not relevant.)
Taking the other view of nondeterminism omits guessing and instead sees
the computation as a tree. In one branch the machine applies instruction 3 and
in another it applies instruction 9. By definition, for this machine the string is
accepted because there is at least one accepting branch (the above table of the
sequence of configurations shows the tree’s accepting branch).
Input strings with odd length are different. In the language of guessing, the
machine needs to guess that it must switch from pushing to popping at the middle
character, but it must not push anything onto the stack since that thing would
never get popped off. Instead, when instruction 12 pops the top character g1 off
the stack, as all instructions do when they are executed, it immediately pushes it
back on. The net effect is that in this turn around from pushing to popping the
stack is unchanged.
Recall that deterministic Finite State machines can do any jobs that nondeter-
ministic ones can do. The palindrome result shows that for Pushdown machines the
situation is different. While nondeterministic Pushdown machines can recognize
the language of palindromes, that job cannot be done by deterministic Pushdown
machines. So for Pushdown machines, nondeterminism changes what can be done.
Intuitively, Pushdown machines are between Turing machines and Finite State
machines in that they have a kind of unbounded read/write memory, but it is
limited. We’ve proved that they are more powerful than Finite State machines
because they can recognize the language of balanced parentheses.
There is a relevant result that we will mention but not prove: there are jobs that
Turing machines can do but that no Pushdown machine can do. One is the decision
problem for the language {σ ⌢ σ σ ∈ B∗ }. The intuition is that this language
contains strings such as 1010, 10101010, etc. A Pushdown machine can push the
characters onto the stack, as it does for the language of balanced parentheses, but
then to check that the second half matches the first it would need to pop them off
in reverse order.†
The diagram below summarizes. The box encloses all languages of bitstrings,
all subsets of B∗ . The nested sets enclose those languages recognized by some
Finite State machine, or some Pushdown machine, etc.
Class Machine type

D
A Finite State, including nondeterministic
B
C B Pushdown
A C nondeterministic Pushdown
D Turing
†
Another way to tell that the set of languages recognized by an nondeterministic Pushdown machine
is a strict subset of the set of languages recognized by a Turing machine is to note that there is no
Halting Problem for Pushdown machines. We can write a program that inputs a string and a Pushdown
machine, and decides whether it is accepted. But of course we cannot write such a program for Turing
machines. Since the languages differ and since anything computed by a Pushdown machine can be
computed by a Turing machine, the languages of Pushdown machines must be a strict subset.
Context free languages In the section on Grammars we restricted our attention to

production rules where the head consists of a single nonterminal, such as S → cS.
An example of a rule where the head is not of that form is cSb → aS. With
this rule we can substitute for S only if it is preceded by c and followed by b. A
grammar with rules of this type is called context sensitive because substitutions
can only be done in a context.
If a language has a grammar in which all the rules are of the first type, of the
type we described in Chapter III’s Section 2, then it is a context free language.
Most modern programming languages are context free, including C, Java, Python,
and Scheme. So grammars that are context sensitive, without the restriction of
being required to be context free, are much less common in practice.
We will state, but not prove, the connection with this section: a language is
recognized by some nondeterministic Pushdown machine if and only if it has a
context free grammar.
IV.7 Exercises
✓ 7.6 Produce a Pushdown Automata that does not halt.
✓ 7.7 Produce a Pushdown machine to recognize each language over Σ = { a, b, c }.
(a) { an cb2n n ∈ N }
(b) { an cbn−1 n > 0 }

7.8 Give a Pushdown machine that recognizes { 0 ⌢τ ⌢ 1 τ ∈ B∗ }.

7.9 Consider the Pushdown Automata in Example 7.2.

(a) It has an asymmetry in the definition. In line 3 it specifies that if there are
too many ]’s then the machine should go to the error state q 2 . But there is
no line specifying what to do if there are too many [’s. Why is it not needed?
(b) Prove that this machine recognizes the language of balanced parentheses
defined by the grammar.
7.10 Give a Pushdown machine that recognizes { a2n bn n ∈ N }.

✓ 7.11 Example 7.5 discusses the view of a nondeterministic computation as a tree.

Draw the tree for that machine these inputs. (a) 0110 (b) 01010
✓ 7.12 The grammar Q → 0Q0 | 1Q1 | ε generates a different language of palin-
dromes than the grammar in Example 7.5. What is this language?
7.13 Write a context-free grammar for { an bcn ∈ { a, b, c }∗ n ∈ N }, the language

where the number of a’s before the b is the same as the number of c’s after it.
7.14 Find a grammar that generates the language {σ ⌢ b ⌢ σ R σ ∈ { a, b }∗ }.

7.15 The grammar Q → 0Q0 | 1Q1 | ε generates a different language of palin-

dromes than the grammar in Example 7.5. What is this language?
7.16 Find a grammar for the language over σ = { a, b, c } consisting of palindromes
that contain at most three c’s. Hint: Use two nonterminals, with one for the case
of not adjoining c’s.
7.17 Show that the language of all palindromes from Example 7.5 is not recognized
by any Finite State machine. Hint: you can use the Pumping Lemma.
7.18 Show that a string σ ∈ B∗ is a palindrome σ = σ R if and only if it is generated
by the grammar given in Example 7.5. Hint: Use induction in both directions.
7.19 Show that the set of pushdown automata is countable.
7.20 Show that any language recognized by a Pushdown machine is recognized
by some Turing machine.
7.21 There is a Pumping Lemma for Context Free languages: if L is Context Free
then it has a pumping length p ≥ 1 such that any σ ∈ L with |σ | ≥ p decomposes
into five parts σ = α ⌢ β ⌢ γ ⌢ δ ⌢ ζ subject to the conditions (i) |α βγ | ≤ p ,
(ii) |βδ | ≥ 1, and (iii) α β nγ δ n ζ ∈ L for all n ∈ N.
(a) Use it to show that { an bn cn n ∈ N } is not Context Free.
(b) Show that {σ 2 σ ∈ B∗ } is not Context Free.

7.22 For both Turing machines and Finite State machines, after we gave an
informal description of how they act we supplemented that with a formal one.
Supply that for Pushdown machines.
(a) Define a configuration.
(b) Define the meaning of the yields symbol ⊢ and a transition step.
(c) Define when a machine accepts a string.
Extra
IV.A Regular expressions in the wild
Regular expressions are often used in practice. For instance, imagine that you
need to search a web server log for the names of all the PDF’s downloaded from a
subdirectory. A user on a Unix-derived system might type this.
grep "/linearalgebra/.*ṗdf" /var/log/apache2/access.log
The grep utility looks through the file line by line, and if a line matches the
pattern then grep prints that line. That pattern, starting with the subdirectory
/linearalgebra/, is an extended regular expression.
That is, in practice we often need text operations, and regular expressions are
an important tool. Modern programming languages such as Python and Scheme
include capabilities for extended regular expressions, sometimes called regexes,
that go beyond the small-scale theory examples we saw earlier. These extensions
fall into two categories. The first is convenience constructs that make easier
something that would otherwise be doable, but awkward. The second is that some
of the extensions to regular expressions in modern programming languages go
beyond mere abbreviations. More on this later.
Extra A. Regular expressions in the wild 245
First, the convenience extensions. Many of them

are about sheer scale: our earlier alphabets had two
or three characters but in practice an alphabet must
include at least ASCII’s printable characters: a – z, A –
Z, 0 – 9, space, tab, period, dash, exclamation point,
percent sign, dollar sign, open and closed parenthesis,
open and closed curly braces, etc. It may even contain
all of Unicode’s more than one hundred thousand
characters. We need manageable ways to describe
such large sets of characters.
Consider matching a digit. The regular expression
(0|1|2|3|4|5|6|7|8|9) is too verbose for an often-
needed list. One abbreviation that modern languages Courtesy xkcd.com
allow is [0123456789], omitting the pipe characters
and using square brackets, which in extended regular expressions are metachar-
acters. Or, because the digit characters are contiguous in the character set,† we
can shorten it further to [0-9]. Along the same lines, [A-Za-z] matches a single
English letter.
To invert the set of matched characters, put a caret ‘^’ as the first thing inside
the bracket (and note that it is a metacharacter). Thus, [^0-9] matches a non-digit
and [^A-Za-z] matches a character that is not an ASCII letter.
The most common lists have short abbreviations. Another abbreviation for the
digits is \d. Use \D for the ASCII non-digits, \s for the whitespace characters
(space, tab, newline, formfeed, and line return) and \S for ASCII characters that are
non-whitespace. Cover the alphanumeric characters (upper and lower case ASCII
letters, digits, and underscore) with \w and cover the ASCII non-alphanumeric
characters with \W. And — the big kahuna — the dot ‘.’ is a metacharacter that
matches any member of the alphabet at all.‡ We saw the dot in the grep example
that began this discussion.
A.1 Example Canadian postal codes have seven characters: the fourth is a space, the
first, third, and sixth are letters, and the others are digits. The regular expression
[a-zA-Z]\d[a-zA-Z] \d[a-zA-Z]\d describes them.
A.2 Example Dates are often given in the ‘dd/mm/yy’ format. This matches:
\d\d/\d\d/\d\d.
A.3 Example In the twelve hour time format some typical times strings are ‘8:05 am’
or ‘10:15 pm’. You could use this (note the empty string at the start).
(|0|1)\d:\d\d\s(am|pm)
Recall that in the regular expression a(b|c)d the parentheses and the pipe
†
The digits are contiguous in ASCII and their descendents are contiguous in Unicode. ‡ Programming
languages in practice by default have the dot match any character except newline. In addition, they
have a way to make it also match newline.
are not there to be matched. They are metacharacters, part of the syntax of the
regular expression. Once we expand the alphabet Σ to include all characters, we
run into the problem that we are already using some of the additional characters
as metacharacters.
To match a metacharacter prefix it with a back-
slash, ‘\’. Thus, to look for the string ‘(Note’ put a
backslash before the open parentheses \(Note. Sim-
ilarly, \| matches a pipe and \[ matches an open
square bracket. Match backslash itself with \\. This
is called escaping the metacharacter. The scheme de- Courtesy xkcd.com
scribed above for representing lists with \d, \D, etc is
an extension of escaping.
Operator precedence is: repetition binds most strongly, then concatenation,
and then alternation (force different meanings with parentheses). Thus, ab* is
equivalent to a(b*), and ab|cd is equivalent to (ab)|(cd).
Quantifiers In the theoretical cases we saw earlier, to match ‘at most one a’ we
used ε |a. In practice we can write something like (|a), as we did above for the
twelve hour times. But depicting the empty string by just putting nothing there
can be confusing. Modern languages make question mark a metacharacter and
allow you to write a? for ‘at most one a’.
For ‘at least one a’ modern languages use a+, so the plus sign is another
metacharacter. More generally, we often want to specify quantities. For instance, to
match five a’s extended regular expressions use the curly braces as metacharacters,
with a{5}. Match between two and five of them with a{2,5} and match at least
two with a{2,}. Thus, a+ is shorthand for a{1,}.
As earlier, to match any of these metacharacters you must escape them. For
instance, To be or not to be\? matches the famous question.
Cookbook All of the extensions to regular expressions that we are seeing are
driven by the desires of working programmers. Here is a pile of examples showing
them accomplishing practical work, matching things you’d want to match.
A.4 Example US postal codes, called ZIP codes, are five digits. We can match them
with \d{5}.
A.5 Example North American phone numbers match \d{3} \d{3}-\d{4}.
A.6 Example The regular expression (-|\+)?\d+ matches an integer, positive or

negative. The question mark makes the sign optional. The plus sign makes sure
there is at least one digit; it is escaped because + is a metacharacter.
A.7 Example The expression [a-fA-F0-9]+ matches a natural number representation

in hexadecimal. Programmers often prefix such a representation with 0x so the
expression becomes (0x)?[a-fA-F0-9]+.
A.8 Example A C language identifier begins with an ASCII letter or underscore and
then can have arbitrarily many more letters, digits, or underscores: [a-zA-Z_]\w*.
A.9 Example Match a user name of between three and twelve letters, digits, under-
scores, or periods with [\w\.]{3,12}. Use .{8,} to match a password that is at
least eight characters long.
A.10 Example Match a valid username on Reddit: [\w-]{3,20}. The hyphen, because
it comes last in the square brackets, matches itself. And no, Reddit does not allow
a period in a username.
A.11 Example For email addresses, \S+@\S+ is a commonly used extended expression.†
A.12 Example Match the text inside a single set of parentheses with $[^()]*$.
A.13 Example This matches a URL, a web address such as http://joshua.smcvt.
edu/computing. This regex is more intricate than prior ones so it deserves some
explanation. It is based on breaking URL’s into three parts: a scheme such as http
followed by a colon and two forward slashes a host such as joshua.smcvt.edu,
and a path such as /computing (the standard also allows a query string that follows
a question mark but this regex does not handle those).
(https?|ftp)://([^\s/?\.#]+\.?){1,4}(/[^\s]*)?
Notice the https?, so the scheme can be http or https, as well as ftp. After
a colon and two forward slashes comes the host part, consisting of some fields
separated by periods. We allow almost any character in those fields, except for
a space, a question mark, a period or a hash. At the end comes a path. The
specification allows paths to be case sensitive but the regex here has only lower
case.
But wait! there’s more! You can also match the start of a line and end of line
with the metacharacters caret ‘^’ and dollar sign ‘$’.
A.14 Example Match lines starting with ‘ Theorem’ using ^Theorem. Match lines ending
with end{equation*} using end{equation\*}$.
The regex engines in modern languages let you specify that the match is case
insensitive (although they differ in the syntax).
A.15 Example An HTML document tag for an image, such as <img src="logo.jpg">,
uses either of the keys src or img to give the name of the file containing the
image that will be served. Those strings can be in upper case or lower case, or
any mix. Racket uses a ‘?i:’ syntax to mark part of the regex as insensitive:
\\s+(?i:(img|src))= (note also the double backslash, which is how Racket
escapes the ‘s’).
†
This is naive in that there are elaborate rules for the syntax of email addresses (see below). But it is a
reasonable sanity check.
Beyond convenience The regular expression engines that come with recent
programming languages have capabilities beyond matching only those languages
that recognized by Finite State machines.
A.16 Example The web document language HTML uses tags such as boldface
text and italicized text. Matching any one is straightforward,
for instance [^<]*. But for a single expression that matches them all you
would seem to have to do each as a separate case and then combine cases with a
pipe. However, instead we can have the system remember what it finds at the start
and look for that again at the end. Thus, Racket’s regex <([^>]+)>[^<]*</\\1>
matches HTML tags like the ones given. Its second character is an open parenthesis,
and the \\1 refers to everything between that open parenthesis and the matching
close parenthesis. (As you might guess from the 1, you can also have a second
match with \\2, etc.)
That is a back reference. It is very convenient. However, it gives extended
regular expressions more power than the theoretical regular expressions that we
studied earlier.
A.17 Example This is the language of squares over Σ = { a, b }.
L = {σ ∈ Σ∗ | σ = τ ⌢τ for some τ ∈ Σ∗ }
Some members are aabaab, baaabaaa, and aa. The Pumping Lemma shows that
the language of squares is not regular; see Exercise A.35. Describe this language
with the regex (.+)\1; note the back-reference.
Downsides Regular expressions are powerful tools, and this goes double for
enhanced regexes. As illustrated by the examples above, some of their uses are: to
validate usernames, to search text files, and to filter results. But they can come
with costs also.
For instance, the regular expression
for twelve hour time from Example A.3
(ε |0|1)\d:\d\d\s(am|pm) does indeed
match ‘8:05 am’ and ‘10:15 pm’ but it falls
short in some respects. One is that it
requires am or pm at the end, but times are
often are given without them. We could
change the ending to (ε |\s am|\s pm), Courtesy xkcd.com
which is a bit more complex but does solve
the issue.
Another issue is that it also matches some strings that you don’t want, such as
13:00 am or 9:61 pm. We can solve this as with the prior paragraph, by listing the
cases.†
†
Some substrings are elided so it fits in the margins, .
(01|02|...|11|12):(01|02|...|59|60)(\s am|\s pm)

This is like the prior fix-up, in that it does indeed fix the issue but it does so at a
cost of complexity, since it amounts to a list of the allowed substrings.
Another example is that not every string matching the Canadian postal expres-
sion in Example A.1 has a corresponding post office — for one thing, no valid codes
begin with Z. And ZIP codes work the same way; there are fewer than 50 000
assigned ZIP codes so many five digits strings are not in use. Changing the regular
expressions to cover only those codes actually in use would make them little more
lists of strings, (which would change frequently).
The canonical extreme example is the regex for valid email addresses. We
show here just five lines out of its 81 but that’s enough to make the point about its
complexity.
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
And, even if you do have an address that fits the standard, you don’t know if there
is an email server listening at that address.
At this point regular expressions may be starting to seem a little less like a
fast and neat problem-solver and a little more like a potential development and
maintenance problem. The full story is that sometimes a regular expression is just
what you need for a quick job, and sometimes they are good for more complex
tasks also. But some of the time the cost of complexity outweighs the gain in
expressiveness. This power/complexity tradeoff is often referred to online by citing
this quote from J Zawinski.
The notion that regexps are the solution to all

problems is . . . braindead. . . . Some people,
when confronted with a problem, think "I
know, I’ll use regular expressions." Now they
have two problems.
Courtesy xkcd.com
IV.A Exercises
✓ A.18 Which of the strings matches the regex ab+c? (a) abc (b) ac (c) abbb
(d) bbc
A.19 Which of the strings matches the regex [a-z]+[\.\? !]? (a) battle!
(b) Hot (c) green (d) swamping. (e) jump up. (f) undulate? (g) is.?
✓ A.20 Give an extended regular expression for each. (a) Match a string that
has ab followed by zero or more c’s, (b) ab followed by one or more c’s,
(c) ab followed by zero or one c, (d) ab followed by two c’s, (e) ab followed
by between two and five c’s, (f) ab followed by two or more c’s, (g) a followed
by either b or c.
✓ A.21 Give an extended regular expression to accept a string for each description.
(a) Containing the substring abe.
(b) Containing only upper and lower case ASCII letters and digits.
(c) Containing a string of between one and three digits.
A.22 Give an extended regular expression to accept a string for each description.
Take the English vowels to be a, e, i, o, and u.
(a) Starting with a vowel and containing the substring bc.
(b) Starting with a vowel and containing the substring abc.
(c) Containing the five vowels in ascending order.
(d) Containing the five vowels.
A.23 Give an extended regular expression matching strings that contain an open
square bracket and an open curly brace.
✓ A.24 Every lot of land in New York City is denoted by a string of digits called BBL,
for Borough (one digit), Block (five digits), and Lot (four digits). Give a regex.
✓ A.25 Example A.5 gives a regex for North American phone numbers.
(a) They are sometimes written with parentheses around the area code. Extend
the regex to cover this case.
(b) Sometimes phone numbers do not include the area code. Extend to cover
this also.
A.26 Most operating systems come with file that has a list of words, which
can be used for spell-checking, etc. For instance, on Linux it may be at
/usr/share/dict/words but in any event you can find it by running locate
words | grep dict. Use that file to find how many words fit the criteria.
(a) contains the letter a (b) starts with A (c) contains a or A (d) contains X
(e) contains x or X (f) contains the string st (g) contains the string ing
(h) contains an a, and later a b (i) contains none of the usual vowels a, e, i, o or u
(j) contains all the usual vowels (k) contains all the usual vowels, in ascending
order
✓ A.27 Give a regex to accept time in a 24 hour format. It should match times of the
form ‘hh:mm:ss.sss’ or ‘hh:mm:ss’ or ‘hh:mm’ or ‘hh’.
A.28 Give a regex describing a floating point number.
✓ A.29 Give a suitable extended regular expression.
(a) All Visa card numbers start with a 4. New cards have 16 digits. Old cards
have 13
(b) MasterCard numbers either start with 51 through 55, or with the numbers
2221 through 2720. All have 16 digits.
(c) American Express card numbers start with 34 or 37 and have 15 digits.
✓ A.30 Postal codes in the United Kingdom have six possible formats. They are:
(i) A11 1AA, (ii) A1 1AA, (iii) A1A 1AA, (iv) AA11 1AA, (v) AA1 1AA, and (vi) AA1A
1AA, where A stands for a capital ASCII letter and 1 stands for a digit.
(a) Give a regex.
(b) Shorten it.
✓ A.31 You are stuck on a crossword puzzle. You know that the first letter (of eight)
is an g, the third is an n and the seventh is an i. You have access to a file that
contains all English words, each on its own line. Give a suitable regex.
A.32 In the Downsides discussion of Example A.3, we change the ending to (ε |\s
am|\s pm). Why not \s(ε |am|pm), which factors out the whitespace?
A.33 Give an extended regular expression that matches no string.
✓ A.34 The Roman numerals taught in grade school use the letters I, V, X, L, C,
D, and M to represent 1, 5, 10, 50, 100, 500, and 1000. They are written in
descending order of magnitude, from M to I, and are written greedily so that we
don’t write six I’s but rather VI. Thus, the date written on the book held by the
Statue of Liberty is MDCCLXXVI, for 1776. Further, we replace IIII with IV, and
replace VIIII with IX. Give a regular expression for valid Roman numerals less
than 5000.
A.35 Example A.17 says that the language of squares over Σ = { a, b }
L = {σ ∈ Σ∗ σ = τ ⌢τ for some τ ∈ Σ∗ }

is not regular. Verify that.
A.36 Consider L = { 0n 10n n > 0 }. (a) Show that it is not regular. (b) Find a

regex.
A.37 In regex golf you are given two lists and must produce a regex that matches
all the words in the first list but none of the words in the second. The ‘golf’ aspect
is that the person who finds the shortest regex, the one with the fewest characters,
wins. Try these: accept the words in the first list and not the words in the second.
(a) Accept: Arthur, Ester, le Seur, Silverter
Do not accept: Bruble, Jones, Pappas, Trent, Zikle
(b) Accept: alight, bright, kite, mite, tickle
Do not accept: buffing, curt, penny, tart
(c) Accept: afoot, catfoot, dogfoot, fanfoot, foody, foolery, foolish, fooster,
footage, foothot, footle, footpad, footway, hotfoot, jawfoot, mafoo, nonfood,
padfoot, prefool, sfoot, unfool
Do not accept: Atlas, Aymoro, Iberic, Mahran, Ormazd, Silipan, altared,
chandoo, crenel, crooked, fardo, folksy, forest, hebamic, idgah, manlike,
marly, palazzi, sixfold, tarrock, unfold
A.38 In a regex crossword each row and column has a regular expression. You
have to find strings for those rows and columns that meet the constraints.
(AB|OE|SK)
[^SPEAK]+
(A|B|C)\1
EP|IP|EF
(a) (b)
HE|LL|O+ .*M?O.*
[PLEASE]+ (AN|FE|BE)
Extra
IV.B The Myhill-Nerode Theorem
We defined regular languages in terms of Finite State machines. Here we will give
a characterization that does not depend on that.
This Finite state machine accepts strings that end in ab.
b
q0 a
q1 b q2
a
b a
Consider other strings over Σ = { a, b }, not just the accepted ones, and see where
they bring the machine.
Input string σ ε a b aa ab ba bb aaa aab aba abb

ˆ )
Ending state ∆(σ q0 q1 q0 q1 q2 q1 q0 q1 q2 q1 q0
The collection of all strings Σ∗ , pictured below, breaks into three sets, those that
bring the machine to q 0 , those that bring the machine to q 1 , and those that bring
the machine to q 2 .
EM, 0 = {ε, b, bb, abb, ... }

EM, 1 = { a, aa, ba, aba, ... } EM, 0 EM, 1 EM, 2
EM, 2 = { ab, aab, abb, ... }
B.1 Definition Let M be a Finite State machine with alphabet Σ. Two strings
σ0 , σ1 ∈ Σ∗ are M-related if starting the machine with input σ0 ends with it in
the same state as does starting the machine with input σ1 .
B.2 Lemma The binary relation of M-related is an equivalence, and so partitions the
collection of all strings Σ∗ into equivalence classes.
Proof We must show that the relation is reflexive, symmetric, and transitive.
Reflexivity, that any input string σ brings the machine to the same state as itself, is
obvious. So is symmetry, that if σ0 brings the machine to the same state as σ1 then
Extra B. The Myhill-Nerode Theorem 253
σ1 brings it to the same state as σ0 . Transitivity is straightforward: if σ0 brings M

to the same state as σ1 , and σ1 brings it to the same state as σ2 , then σ0 brings it
to the same state as σ2 .
So a machine gives rise to a partition. Does it go the other way?
B.3 Definition Suppose that L is a language over Σ. Two strings σ , σ̂ ∈ Σ∗ are
L-related (or L-indistinguishable), denoted σ ∼L σ̂ , when for every suffix τ ∈ Σ∗
we have σ ⌢ τ ∈ L if and only if σ̂ ⌢ τ ∈ L. Otherwise, the two strings are
L-distinguishable.
Said another way, the two strings σ and σ̂ can be L-distinguished when there
is a suffix τ that separates them: of the two σ ⌢τ and σ̂ ⌢τ , one is an element of L
while the other is not.
B.4 Lemma For any language L, the binary relation ∼L is an equivalence, and thus
gives rise to a partition of all strings.
Proof Reflexivity, that σ ∼L σ , is trivial. So is symmetry, that σ0 ∼L σ1 implies
σ1 ∼L σ0 . For transitivity suppose σ0 ∼L σ1 and σ1 ∼L σ2 . If σo ⌢τ ∈ L then by
the first supposition σ1 ⌢τ ∈ L, and the second supposition in turn gives σ2 ⌢τ ∈ L.
Similarly σo ⌢τ < L implies that σ2 ⌢τ < L. Thus σ0 ∼L σ2 .
B.5 Example Let L be the set {σ ∈ B∗ σ has an even number of 1’s }. We can find

the parts of the partition. If two strings σ0 , σ1 both have an even number of 1’s
then they are L-related. That’s because for any τ ∈ B∗ , if τ has an even number of
1’s then σ0 ⌢τ ∈ L and σ0 ⌢τ ∈ L, while if τ has an odd number of 1’s then the
concatenations will not be members of L. Similarly, two strings both have an odd
number of 1’s then they are L-related. So the relationship ∼L gives rise to this
partition of B∗ .
EL, 0 = {ε, 0, 00, 11, 000, 011, 101, 110, ... }EL, 1 = { 1, 01, 10, 001, 010, ... }
B.6 Example Let L be {σ ∈ { a, b }∗ σ has the same number of a’s as b’s }. Then two

members of L, two strings σ0 , σ1 ∈ Σ∗ with the same number of a’s as b’s, are
L-related. This is because for any suffix τ , the string σ0 ⌢τ is an element of L if
and only if σ1 ⌢τ is an element of L, which happens if and only if τ has the same
number of a’s as b’s.
Similarly, two strings σ0 , σ1 such that the number of a’s is one more than the
number of b’s are L-related because for any suffix τ , the string σ0 ⌢τ is an element
of L if and only if σ1 ⌢τ is an element of L, namely if and only if τ has one fewer a
than b.
Following this reasoning, ∼L partitions { a, b }∗ into the infinitely many parts
∗

EL,i = {σ ∈ { a, b } the number of a’s minus the number of b’s equals i }, where
i ∈ Z.
B.7 Example This machine M recognizes L = {σ ∈ { a, b }∗ σ has even length }.

a,b
q1 q3
a,b
a
q0 a,b
b
q2 q4
a,b
We will compare the partitions induced by the two relations introduced above.
The M-related relation breaks { a, b }∗ into five parts, one for each state (since
each state in M is reachable).
EM, 0 = {ε }
EM, 1 = { a, aaa, aab, aba, abb, aaaaa, aaaab, ... }
EM, 2 = { b, baa, bab, bba, bbb, baaaa, baaab, ... }
EM, 3 = { aa, ab, aaaa, aaab, aaba, aabb, abaa, abab, abba, abbb, aaaaaa, ... }
EM, 4 = { ba, bb, baaa, baab, baba, babb, bbaa, bbab, bbba, bbbb, baaaaa, ... }
The L-related relation breaks { a, b }∗ into two parts.

EL, 0 = {σ σ has even length } EL, 1 = {σ σ has odd length }
Verify this by noting that if two strings are in EL, 0 then adding a suffix τ will result
in a string that is a member of L if and only if the length of τ is even, and the same
reasoning holds for EL, 1 and odd-length τ ’s.
The sketch below shows the universe of strings { a, b }∗ , partitioned in two ways.
There are two L-related parts, the left and right halves. The five M-related parts
are subsets of the L-related parts.
EM, 0
EM, 1
ELL, 0 EM, 3 ELL, 1
EM, 4 EM, 2
That is, the M-related partition is finer than the L-related partition (‘fine’ in the
sense that sand is finer than gravel).
B.8 Lemma Let M be a Finite State machine that recognizes L. If two strings are
M-related then they are L-related.
Proof Assume that σ0 and σ1 are M-related, so that starting M with input σ0
causes it to end in the same state as starting it with input σ1 . Thus for any suffix τ ,
giving M the input σ0 ⌢τ causes it to end in the same state as does the input σ1 ⌢τ .
In particular, σ0 ⌢τ takes M to a final state if and only if σ1 ⌢τ does. So the two
Extra B. The Myhill-Nerode Theorem 255
strings are L-related.
B.9 Lemma Let L be a language. (1) If two strings σ0 , σ1 are L-related, σ0 ∼L σ1 ,

then adjoining a common extension β gives strings that are also L-related,
σ0 ⌢ β ∼L σ1 ⌢ β . (2) If one member of a part σ0 ∈ EL,i is an element of L then
every member of that part σ1 ∈ EL,i is also an element of L.
Proof For (1), start with two strings σ0 , σ1 that are L-related. By definition, no
extension τ will L-distinguish the two — it is not the case that one of σ0 ⌢τ , σ1 ⌢τ
is in L while the other is not. Taking β ⌢τ̂ for τ gives that for the two strings σ0 ⌢ β
and σ1 ⌢ β , no extension τ̂ will L-distinguish the two. So they are L-related.
Item (2) is even easier: if σ0 ∼L σ1 and σ0 ∈ L but σ1 < L then they are
distinguished by the empty string, which contradicts that they are L-related.
B.10 Example We will milk Example B.7 for another observation. Take a string σ from
EM, 1 and append an a. The result σ ⌢ a is a member of EM, 3 , simply because if the
machine is in state q 1 and it receives an a then it moves to state q 3 . Likewise, if
σ ∈ EM, 4 , then σ ⌢ b is a member of EM, 2 . If adding the alphabet character x ∈ Σ
to one string σ from EL,i results in a string σ ⌢ x from EL, j then the same will
happen for any string from EL,i .
In this example we see that’s true because the EM ’s are contained in the EL ’s.
The key step of the next result is to find it even in a context where there is no
machine.
B.11 Theorem (Myhill-Nerode) A language L is regular if and only if the relation
∼L has only finitely many equivalence classes.
Proof One direction is easy. Suppose that L is a regular language. Then it is
recognized by a Finite State machine M. By Lemma B.8 the number of element
in the partition induced by ∼L is finite because the number of elements in the
partition associated with being M-related is finite, as there is one part for each of
M’s reachable states.
For the other direction suppose that the number of elements in the partition
associated with being L-related is finite. We will show that L is regular by
producing a Finite State machine that recognizes L.
The machine’s states are the partition’s elements, the EL,i ’s. That is, si is EL,i .
The start state is the part containing the empty string ε . A state is final if that part
contains strings from the language L (Lemma B.9 (2) says that each part contains
either no strings from L or consists entirely of strings from L).
The transition function is: for any state si = EL,i and alphabet element x ,
compute the next state ∆(si , x) by starting with any string in that part σ ∈ EL,i ,
appending the character to get a new string σ̂ = σ ⌢x , and then finding the part
containing that string, the EL, j such that σ̂ ∈ EL, j . Then ∆(si , x) = s j .
We must verify that this transition function is well-defined. That is, the
definition of ∆(si , x) as given potentially depends on which string σ you choose
from si = EL,i , and we must check that choosing a different string cannot lead to
a different resulting part. This follows from (1) in Lemma B.9: take two starting
strings from the same part σ0 , σ1 ∈ EL,i and make a common extension by the
one-element string β = ⟨x⟩ so the results are in the same part σ0 ∼L σ1 .
Here is an equivalent way to describe the next-state function that is illuminating.
Recall that we write the part containing σ as Jσ K. Then the definition of the
transition function for the machine under construction is ∆(Jσ K, x) = Jσ ⌢x K. With
that, a simple induction shows that the extended transition function in the new
ˆ
machine is ∆(α) = Jα K.
Finally, we must verify that the language recognized by this machine is L. For
any string σ ∈ Σ∗ , starting this machine with σ as input will cause the machine to
end in the partition containing σ ; this is what the prior paragraph says. This string
will be accepted by this machine if and only if σ ∈ L.
IV.B Exercises
✓ B.12 Find the L equivalence classes for each regular set. The alphabet is Σ =
{ a, b }.
(a) L0 = { an b n ∈ N }

(b) L1 = { a2 bn n ∈ N }

✓ B.13 For each language describe the L equivalence classes. The alphabet is B.
(a) The set of strings ending in 01
(b) The set of strings where every 0 is immediately followed by two 1’s
(c) The set of string with the substring 0110
(d) The set of strings without the substring 0110
✓ B.14 The language of palindromes L = {σ ∈ a, b∗ σ R = σ } is not regular. Find

infinitely many L equivalence classes.

✓ B.15 Use the Myhill-Nerode Theorem to show that the language L = { an bn n ∈ N }

is not regular.
Part Three
Computational Complexity
Chapter
V Computational Complexity
In the first part of this book we asked what can be done with a mechanism at
all. This mirrors the history: when the Theory of Computing began there were
no physical computers. Researchers were driven by considerations such as the
Entscheidungsproblem. The subject was interesting, the questions compelling, and
there were plenty of problems, but the initial phase had a theory-only feel.
A natural next step is to look to do jobs efficiently. When physical computers
became widely available, that’s exactly what happened. Today, the Theory of
Computing has incorporated many questions that at least originate in applied fields,
and that need answers that are feasible.
We start by reviewing how we measure the practicality of algorithms, the orders
of growth of functions. Then we will see a collection of the kinds of problems
that drive the field today. By the end of this chapter we will be at the research
frontier, and we will state some things without proof as well as discuss some things
about which which we are not sure. In particular, we will consider the celebrated
question of P versus NP.
Section
V.1 Big O
We begin by reviewing the definition of the order of growth of functions. We will
study this because of its relationship with how algorithms consume computational
resources.
First, we illustrate with an anecdote. Here is a grade school multiplication.
678
× 42
1356
2712
28476
The algorithm combines each digit of the multiplier 42 with each digit of the
multiplicand 678, in a nested loop. A person could think that this is the right
way to compute multiplication — indeed, the only way — and that in general to
multiply of two n digit numbers requires about n 2 -many operations.
Image: Striders can walk on water because they are five orders of magnitude smaller than us. This
change of scale changes the world — bugs see surface tension as more important than gravity. Similarly,
changing an algorithm from taking n 2 time to taking time that is n · lg n can make some things easy
that were previously simply not practical.
260 Chapter V. Computational Complexity
In 1960, A Kolmogorov organized a seminar at Moscow State

University aimed at proving this. But before the seminar’s second
meeting one of the students, A Karatsuba, discovered that it is false. He
produced a clever algorithm that used only n lg(3) ≈ n 1 . 585 ticks. At the
next meeting, Kolmogorov explained the result and closed the seminar.
And this phenomenon continues: every day, smart researchers
produce results saying, “for this job, here is a way to do it in less time,
or less space, etc.” † People are good at finding clever algorithms that
solve a problem using less of some computational resource. But we
are not as good at finding lower bounds, at proving something like Andrey Kol-
mogorov 1903–
“no algorithm, no matter how clever, can do the job faster then this.”
1987
This is one reason that we will typically compare the growth rates of
functions by using a measure, Big O, that is like ‘less than or equal to’.
Motivation To compare the performance of algorithms, we need a way to measure
that performance. Typically, an algorithm takes longer on larger input. So we
describe the time performance of an algorithm with a function whose argument is
the input size and whose value is the maximum time that the algorithm takes on
all inputs of that size.
Next we develop the criteria for the definition of Big O, the tool that we use to
compare those functions. Suppose
√ first that we have two algorithms. When the
‡
input is size √
n ∈ N, one takes n many ticks
√ while the other takes 10 · lg(n). Initially,
it looks like n is better. For instance, 1000 ≈ 31.62 and 10 lg(1000) ≈ 99.66.
100
10 lg(n)
50
√
n
500 1000
√
However, for large n the value n is much bigger than 10 lg(n). For instance,
√
1 000 000 = 1 000 while 10 lg(1 000 000) ≈ 199.32.
√
1000 n
500
10 lg(n)
500000 1000000
So the first criteria is that big O’s definition must focus on what happens in the
long run.
†
See the Theory of Computing blog feed at http://cstheory-feed.org/ (Various authors 2017).
‡
Recall that lg(n) = log2 (n). That is, compute lg(n) by starting with n and then finding the power of 2
that produces it, so if n = 8 then lg(n) = 3 and if n = 10 then lg(n) ≈ 3 . 32.
Section 1. Big O 261
The second criteria is more subtle. Consider these three examples.

1.1 Example These graphs compare f (n) = n 2 + 5n + 6 with д(n) = n 2 . The graph
on the right compares them in ratio, f /д.†
500 f
400
300 д
200
10
100 5 f /д
1
5 10 15 20 5 10 15 20
On the left a person’s eye is struck that n 2 + 5n + 6 is ahead of n 2 . But on the right
the ratios show that this is misleading. For large inputs, f ’s 5n and 6 are swamped
by the highest order term, the n 2 . Consequently these two track together — by far
the biggest factor in the behavior of these two is that they are both quadratic —
and their long run behavior is basically the same.
1.2 Example Next compare the quadratic f (n) = n 2 + 5n + 6 with the cubic д(x) =
n 3 + 2n + 3. In contrast to the the prior example, these two don’t track together.
Initially f is larger, with f (0) = 6 > д(0) = 3 and f (1) = 12 > д(1) = 6. But soon
enough the cubic accelerates ahead of the quadratic, so much that at the scale of
the graph, the values of f don’t rise much above the axis.
д
10000
40 д/f
30
20
f 10
10 20 5 10 15 20
On the right side, the graph underlines that д races ahead of f because the ratios
†
These graphs are discrete; they picture functions of natural numbers, not of real numbers.√This is
because the performance functions take inputs that are natural numbers. The earlier graphs of n and
10 lg n are also discrete but they have so many dots that they appear to be continuous.
grow without bound. So д is a faster-growing function than f . In the long run they
both go to infinity, but in a sense, д goes there faster.
1.3 Example Finally, compare the quadratics f (x) = 2n 2 + 3n+ 4 and д(n) = n 2 + 5n+ 6.
We’ve already seen that the function comparison definition needs to discount the
initial behavior that f (0) = 4 < д(0) = 6 and f (1) = 9 < д(1) = 12, and instead
focus on the long run.
1000
f
500 5
д
f /д
2
5 10 15 20 10 20
This example differs from Example 1.1 in that in the long run, f stays ahead
of д, and gains in an absolute sense, because of the 2 in f ’s dominant term 2n 2 ,
compared with д’s n 2. So it may appear that we should count д as less than f .
However, unlike in Example 1.2, f does not accelerate away. Instead, the ratio
between the two is bounded. For O, we will consider that д’s growth rate is
equivalent to f ’s.
1.4 Example We close the motivation with a very important example. Let the function
bits : N → N give the number of bits needed to represent its input in binary. The
bottom line of this table gives lg(n), the power of 2 that equals n .
Input n 0 1 2 3 4 5 6 7 8 9
Binary 0 1 10 11 100 101 110 111 1000 1001
bits(n) 1 1 2 2 3 3 3 3 4 4
lg(n) – 0 1 1.58 2 2.32 2.58 2.81 3 3.17
This is bits(n) for n ∈ { 1, ... 30 }.
5 10 15 20 25 30
The formula is bits(n) = 1 + ⌊ lg(n)⌋ , except that if n = 0 then bits(n) = 1. The

graph below compares bits(n) with f (n) = lg(n). (Note the change in the horizontal
and vertical scales, and that the domain of f does not include 0.)
10 20 30 40 50 60 70 80 90 100
Over the long run the ‘+1’ does not matter much and the floor does not matter much
(and is algebraically awkward). A reasonable summary is that the function giving
the number of bits required to express a number n is the base 2 logarithm, lg n .
Example 1.2 notes that the function comparison definition given below disregards
constant multiplicative factors. The formula for converting among logarithmic
functions with different bases, logc (x) = (1/logb (c))· logb (x), shows that they differ
only by a constant factor, 1/logb (c). So even the base does not matter; another
reasonable summary is that the number of bits is “a” logarithmic function.
Definition Machine resource sizes, such as the number of bits of the input and of
memory, are natural numbers and so to describe the performance of algorithms we
may think to focus on functions that input and output natural numbers. However,
above we have already found useful a function, lg, that inputs and outputs reals.
So instead we will consider a subset of the real functions.†
1.5 Definition A complexity function f is one that inputs real number arguments
and outputs real number values, and (1) it has an unbounded domain in that there
is a number N ∈ R+ such that x ≥ N implies that f (x) is defined, and (2) it is
eventually nonnegative in that there is a number M ∈ R+ so that x ≥ M implies
that f (x) ≥ 0.
1.6 Definition Let д be a complexity function. Then O(д) is the collection of

complexity functions f satisfying: there are constants N , C ∈ R+ so that if x ≥ N
then both д(x) and f (x) are defined, and C · д(x) ≥ f (x). We say that f ∈ O(д),
or that f is O(д), or that f is of order at most д, or that f = O(д).
1.7 Remarks (1) Read O(д) aloud as “Big-O of д.” We use ‘O’ because this is about
the order of growth. (2) The term ‘complexity function’ is not standard. We will
find it convenient for stating the results below. (3) We may say ‘x 2 + 5x + 6 is
O(x 2 )’ instead of ‘ f is O(д) where f (x) = x 2 + 5x + 6 and д(x) = x 2 ’. (4) The
‘ f = O(д)’ notation is very common, but is awkward in that it does not follow the
usual rules of equality. For instance f = O(д) does not allow us to write ‘O(д) = f ’.
Another awkwardness is that x = O(x 2 ) and x 2 = O(x 2 ) together do not imply
that x = x 2 . (5) Some authors allow negative real outputs and write the inequality
with absolute values, f (x) ≤ C · |д(x)| . (6) Sometimes you see ‘ f is O(д)’ stated
as ‘ f (x) is O(д(x))’. Speaking strictly, this is wrong because f (x) and д(x) are
numbers, not functions.
†
Using real functions has the disadvantage that natural number functions such as n ! can seem to be
left out. One way to deal with this is to extend these to take real number arguments, for instance,
extending the factorial to ⌊x ⌋ !, whose domain is the set of nonnegative reals. In addition, we shall be
pragmatic, so that when working with natural number functions is easiest then we shall do that.
Think of ‘ f is O(д)’ as meaning that f ’s growth rate is less than or equal to

д’s rate. The sketches below illustrate. On the left д appears to accelerate away
from f , so that д’s growth rate is greater than f ’s. On the right the two seem to
track together, so that f is O(д) and also д is O(f ).
д д
f
f
N N
To apply the definition, we must produce suitable N and C , and verify that
they work.
1.8 Example Let f (x) = x 2 and д(x) = x 3 . Then f is O(д), as witnessed by N = 2
and C = 1. The verification is: x > N = 2 implies that д(x) = x 3 = x · x 2 is greater
than 2 · x 2 , which in turn is greater than x 2 = C · f (x) = 1 · f (x).
If f (x) = 5x 2 and д(x) = x 4 then to show f is O(д) take N = 2 and C = 2.
The verification is that x > N = 2 implies that C · x 4 = 2 · x 2 · x 2 ≥ 8x 2 > 5x 2 .
1.9 Example Don’t confuse a function having values that are smaller than another’s
with its growth rate being smaller. Let д(x) = x 2 and f (x) = x 2 + 1, so that
д(x) < f (x). But д’s growth rate is not smaller, because f is O(д). To verify, take
N = 2 and C = 2, so that for x ≥ N = 2 we have C · д(x) = 2x 2 = x 2 + x 2 >
x 2 + 1 = f (x).
1.10 Example Let Z : R → R be the zero function Z (n) = 0. Then Z is O(д) for every
complexity function д. Verify that with N = 1 and C = 1.
1.11 Example Some pairs of functions aren’t comparable, that is, neither f ∈ O(д) nor
д ∈ O(f ). Let д(x) = x 3 and consider this function.
(
x 2 – if ⌊x⌋ is even
f (n) = 4
x – if ⌊x⌋ is odd
To see that f is not O(д), consider inputs where ⌊x⌋ is odd. There is no constant C
that gives C · x 3 ≥ x 4 ; for instance, C = 3 will not do because when ⌊x⌋ is greater
than 3 and odd then 3 · д(x) = 3x 3 < x 4 = f (x). Likewise, д is not O(f ) because
of f ’s behavior when ⌊x⌋ is even.
1.12 Lemma (Algebraic properties) Let these be complexity functions.

(a) If f is O(д) then for any positive constant a ∈ R+, the function a · f is O(д).
(b) If f 0 is O(д0 ) and f 1 is O(д1 ) then the sum f 0 + f 1 is O(д), where д(n) =
max(д0 (n), д1 (n)). In particular, if there is a function д where both f 0 and f 1
are O(д) then f 0 + f 1 is also O(д).
(c) If f 0 is O(д0 ) and f 1 is O(д1 ) then the product f 0 f 1 is O(д0д1 ).
This result gives us two principles for simplifying Big O expressions. First, if a
function is a sum of terms of which one has the largest growth rate then we can
drop the other terms. Second, if a function is a product of factors then we can
drop constants, factors that do not depend on the input.
1.13 Example Consider f (n) = 5x 3 + 3x 2 + 12x . Of the three summand terms, the
one with the largest growth rate is 5x 3 , because it has the largest exponent (this
is intuitively clear and it will follow from Theorem 1.17 below). By the second
item in the prior result, because all of the summands are O(5x 3 ), the first principle
lets us simplify to say that f is O(5x 3 ). Similarly, because one of 5x 3 ’s factors is
constant, applying the first item in the prior result with a = 1/5 lets us cite the
second principle and simplify further to get that f is O(x 3 ).
1.14 Definition Two complexity functions have equivalent growth rates, or the same
order of growth, if f is O(д) and also д is O(f ). We say that f is Θ(д), or, what is
the same thing, that д is Θ(f ).
1.15 Lemma The Big-O relation is reflexive, so f is O(f ). It is also transitive, so if

f is O(д) and д is O(h) then f is O(h). Having equivalent growth rates is an
equivalence relation between functions.
1.16 Figure: Each bean contains the complexity functions. Faster growing functions are
higher, so that if they were shown then f 0 (x) = x 5 would be above f 1 (x) = x 4 . On
the left is sketched is the cone O(д) for some д, containing all of the functions with
growth rate less than or equal to д’s. The ellipse at the top is Θ(д), holding functions
with growth rate equivalent to д’s. The sketch on the right adds the cone O(f ) for
some f in O(д).
For most of the functions that we work with, such as polynomial and logarithmic
functions, the next result often makes calculations involving Big O easier.
1.17 Theorem Let f , д be complexity functions. Suppose that limx →∞ f (x)/д(x)

exists and equals L ∈ R ∪ { ∞ }.
(a) If L = 0 then д grows faster than f , that is, f is O(д) but д is not O(f ).†
(b) If L = ∞ then f grows faster than д, that is, д is O(f ) but f is not O(д).‡
(c) If L is between 0 and ∞ then the two functions have equivalent growth rates,
so that f is Θ(д) and д is Θ(f ).§
It pairs well with the following, from Calculus.
1.18 Theorem (L’Hôpital’s Rule) Let f and д be complexity functions such that both
f (x) → ∞ and д(x) → ∞ as x → ∞, and such that both are differentiable for
large enough inputs. If limx →∞ f ′(x)/д ′(x) exists and equals L ∈ R ∪ {±∞} then
limx →∞ f (x)/д(x) also exists and also equals L.
1.19 Example Where f (x) = x 2 + 5x + 6 and д(x) = x 3 + 2x + 3.
f (x) x 2 + 5x + 6 2x + 5 2
lim = lim 3 = lim 2
= lim =0
x →∞ д(x) x →∞ x + 2x + 3 x →∞ 3x + 2 x →∞ 6x
Then Theorem 1.17 says that f is O(д) but д is not O(f ). That is, f ’s growth rate is
less than д’s.
Next consider f (x) = 3x 2 + 4x + 5 and д(x) = x 2 .
3x 2 + 4x + 5 6x + 4 6
lim = lim = lim = 3
x →∞ x2 x →∞ 2x x →∞ 2
So the growth rates of the two are equivalent. That is, f is Θ(д).
For f (x) = 5x 4 + 15 and д(x) = x 2 − 3x , this
5x 4 + 15 20x 3 60x 2
lim = lim = lim =∞
x →∞ x 2 − 3x x →∞ 2x − 3 x →∞ 2
shows that f ’s growth rate is strictly greater than д’s rate — д is O(f ) but f is
not O(д).
1.20 Example The logarithmic function f (x) = logb (x) grows very slowly: logb (x)
is O(x), and logb (x) is O(x 0 . 1 ), and is O(x 0 . 01 ), and in fact logb (x) is O(x d ) for
any d > 0, no matter how small, by this equation.
logb (x) (1/x ln(b)) 1 1

lim = lim = · lim =0
x →∞ xd x →∞ dx d−1 d ln(b) x →∞ x d
By Theorem 1.17 that calculation also shows that x d is not O(logb (x)).
†
This case is denoted f is o(д). ‡ The ‘д is O(f )’ is denoted f is Ω(д). §
If L = 1 then f and д are
asymptotically equivalent, denoted f ∼ д .
The difference in growth rates is even stronger than that. L’Hôpital’s Rule,
along with the Chain Rule, gives that (logb (x))2 is O(x) because this is 0.
(logb (x))2 2 ln(x) · (1/x ln(b)) 2 ln(x) 2 1/x

lim = lim = · lim = · lim
x →∞ x x →∞ 1 ln(b) x →∞ x ln(b) x →∞ 1
Further, Exercise 1.46 shows that for every power k the function (logb (x))k is O(x d )
for any positive d , no matter how small.
The log-linear function x · lg(x) has a similar relationship to the polynomials
x d , where d > 1.
x lg(x) x · (1/x ln(2)) + 1 · ln(x) 1 (1/ln(2)) + ln(x)

lim = lim = · lim
x →∞ xd x →∞ dx d −1 d x →∞ x d −1
1 (1/x) 1 1
= · lim d −2 = · lim d −1 = 0
d(d − 1) x →∞ x d(d − 1) x →∞ x
1.21 Example We can compare the polynomial f (x) = x 2 to the exponential д(x) = 2x .
2x 2x · ln(2) 2x · (ln(2))2
lim = lim = lim =∞
x →∞ x 2 x →∞ 2x x →∞ 2
Thus, f is in O(д) but д is not in O(f ).
An easy induction argument gives that
2x
lim =∞
x →∞ xk
for any k , and so x k is in O(2x ) but 2x is not in O(x k ).
1.22 Lemma Logarithmic functions grow more slowly than polynomial functions: if
f (x) = logb (x) for some base b and д(x) = am x m + · · · + a 0 then f is O(д)
but д is not O(f ). Polynomial functions grow more slowly than exponential
functions: where h(x) = b x for some base b > 1 then then д is O(h) but h is not
O(д).
As we discussed, for thinking about the resources used by mechanical compu-

tations, discrete functions, from N to N, are the most natural. But we’ve defined
complexity functions as mapping reals to reals. One motivation is that some
functions, such as logarithms, are easiest to work with when they map reals to
reals. Now we’ve seen another, that L’Hôpital’s Rule, which uses the derivative and
so needs reals, is a big convenience. The next result says that our conclusions in the
continuous context carry over to the discrete context. (As given, this lemma does
not say that the functions are only defined for inputs larger than some value N ,
but this version is easier to state and makes the same point.)
1.23 Lemma Let f 0 , f 1 : R → R, and consider the restrictions to a discrete domain

д0 = f 0 ↾N and д1 = f 1 ↾N . Where L, a ∈ R,
(a) if L = limx →∞ (a f 0 ) (x) then L = limn→∞ (aд0 ) (n),
(b) if L = limx →∞ (f 0 + f 1 ) (x) then L = limn→∞ (д0 + д1 ) (n),
(c) if L = limx →∞ (f 0 · f 1 ) (x) then L = limn→∞ (д0 · д1 ) (n), and
(d) where the expressions are defined, if L = limx →∞ (f 0 /f 1 ) (x) then L =
limn→∞ (д0 /д1 ) (n).
Tractable and intractable This table lists orders of growth that appear often in
practice. They are listed with faster-growing functions later in the table.
Order Name Examples
O(1) Bounded f 0 (n) = 1, f 1 (n) = 15
O(lg(lg(n))) Double logarithmic f (n) = ln(ln(x))
O(lg(n)) Logarithmic f 0 (n) = ln(n), f 1 (n) = lg(n3 )
O (lg(n))c Polylogarithmic f (n) = (lg(n))3
O(n) Linear f 0 (n) = n , f 1 (n) = 3n + 4)
O(n lg(n)) = O(lg(n !)) Log-linear
O(n 2 ) Polynomial (quadratic) f 0 (n) = 5n2 + 2n + 12
O(n 3 ) Polynomial (cubic) f (n) = 2n 3 + 12n 2 + 5
..
.
O(2n ) Exponential f (n) = 10 · 2n
O(3n ) Exponential f (n) = 6 · 3n + n 2
..
.
O(n !) Factorial
O(nn ) –No standard name–
1.24 Table: The Hardy order of growth hierarchy.
We often split that hierarchy between the polynomial and exponential func-
tions; the table below shows why. It lists how long a job will take if we use an
algorithm that runs in time lg n , time n , etc. (A modern computer runs at 10 GHz,
10 000 million ticks per second, and there are 3.16 × 107 seconds in a year.)
n=1 n = 10 n = 50 n = 100
lg n – 1.05 × 10−17 1.79 × 10−17 2.11 × 10−17
n 3.17 × 10−18 3.17 × 10−17 1.58 × 10−16 3.17 × 10−16
n lg n – 1.05 × 10−16 8.94 × 10−16 2.11 × 10−15
n2 3.17 × 10−18 3.17 × 10−16 7.92 × 10−15 3.17 × 10−14
n3 3.17 × 10−18 3.17 × 10−15 3.96 × 10−13 3.17 × 10−12
2n 6.34 × 10−18 3.24 × 10−15 3.57 × 10−3 4.02 × 1012
1.25 Table: Comparison of the times, in years, taken by algorithms whose behavior is
given by some functions from the order of growth hierarchy, on some input sizes n .
Consider the final column, n = 100. Between the initial rows the relative
change is an order of magnitude, which is a lot, but the absolute times are small.
Then we get to the final two rows. It is not a typo, the bottom really is 1012 years.
This is a huge change, both relatively and absolutely. The universe is 14 × 109
years old so this computation, even with input size of only 100, would take longer
than the age of the universe. Exponential growth is very, very much larger than
polynomial growth.
Another way to understand this point is to consider the effect of adding
one more bit to an algorithm’s input, such as by passing from the length ten
σ0 = 11 0100 1010 to the length eleven σ1 = 110 1001 0101. An algorithm that
loops through the bits will just do one more loop, so it takes ten percent more time.
But an algorithm that takes 2 |σ | time will take double the time.
Cobham’s thesis is that the tractable problems — those that are at least con-
ceivably solvable in practice — are those for which there is an algorithm whose
resource consumption is polynomial.† For instance, if a problem’s best available
algorithm runs in exponential time then we may say that the problem is, or at least
appears, intractable.
Discussion Big O is about relative scalability: an algorithm whose runtime behav-

ior is O(n 2 ) scales worse than one whose behavior is O(n lg n), but better than one
whose behavior is O(n 3 ). Is there more to say?
True, that is the essence. Nonetheless, experience shows that there are points
about Big O that can puzzle learners. Here we will elaborate on those.
The first is that Big O is for algorithms, not programs. Contrast these.
for i in range(0,10): x=4

x=4 for i in range(0,10):
find_clique(G,x,i) find_clique(G,x,i)
The code snippet on the left has x=4 inside the loop. That makes it slower by nine
extra assignments. But Big O disregards this constant time difference. That is,
Big O is not the right tool for characterizing fine coding details. Big O works at a
higher level, such as for comparing runtimes among algorithms.
That fits with our second point about Big O. We use it to help pick the best
algorithm, to rank them according to how much they use of some computing
resources. But algorithms are tied to an underlying computing model.‡ So for the
comparison we need a definition of the time used on a particular machine model.
†
Cobham’s Thesis is widely accepted, but not universally accepted. Some researchers object that if an
algorithm runs in time n 100 or if it runs in time Cn 2 but with an enormous C then the solution is not
actually practical. A rejoinder to that objection notes that when someone announces an algorithm with
a large exponent or large constant then typically over time the approach gets refined, shrinking those
two. In any event, polynomial time is markedly better than exponential time. In this book we accept
the thesis because it gives technical meaning to the informal ‘acceptably fast’. ‡ More discussion of the
relationship between algorithms and machine models is in Section 3.
1.26 Definition A machine M with input alphabet Σ takes time t M : Σ∗ → N ∪ { ∞ }

if that function gives the number of steps that the machine takes to halt on
input σ ∈ Σ∗ . If M does not halt then t M (σ ) = ∞. The machine runs in input
length time tˆM : N → N ∪ { ∞ } if tˆM (n) is the maximum of the t(σ ) over all
inputs σ ∈ Σ∗ of length n . The machine runs in time O(f ) if tˆM is O(f ).
Another model that is widely used in the Theory of Computing, besides the
Turing machine, is the Random Access machine (RAM). Whereas a Turing machine
cell stores only a single symbol, so that to store an integer you may need multiple
cells, a RAM model machine has registers that each store an entire integer. And
whereas to get to a cell a Turing machine may spend a lot of steps moving through
the tape, the RAM model gets to each register contents in one step.
Close analysis shows that if we start with an algorithm intended for a RAM
model machine and run on a Turing machine then this may add as much as n 3
extra ticks to the runtime, so that if the algorithm is O(n 2 ) on the RAM then on
the Turing machine it can be O(n 5 ).† Thus, to understand the cost of an algorithm,
we must first settle on a model, and only then discuss the Big O.
We have already brought up our third issue about Big O, in relation to Cobham’s
Thesis. The definition of Big O ignores constant factors; does that greatly reduce
its value for comparing algorithms? If for inputs n > N , our algorithm takes time
given by C · n 2 then don’t we need to know C and N ? After all, if one algorithm
has runtime C 0n for an enormous C 0 while another is C 1n 2 for tiny C 1 , could that
not make the second algorithm more useful? Similarly, could a huge N mean that
we need to analyze what happens before that point?
Part of the answer is that finding these constants is hard.‡ Machines
vary widely in their low-level details such as the memory addressing
and paging, cache hits, and whether the CPU can do some operations in
parallel, and these details can make a tremendous difference in constants
such as C and N . Imagine doing the analysis on a commercially available
machine and then the vendor releases a new model, so it is all to do again.
That would be discouraging. And what’s more, experience shows that
Donald Knuth
doing the work to find the exact numbers usually does not change the
b 1938
algorithm that gets picked. As Table 1.25 illustrates, knowing at a Big O
level how the algorithm grows is much more influential that knowing the exact
constant values.
Instead of analyzing commercial machines, we could agree on specifications for
a reference architecture — this is the approach taken by D Knuth in the monumental
Art of Computer Programming series — but again there is the risk that we might
have to update that standard. Then, published results from some time ago may no
longer apply, because they refer to an old standard. Again, discouraging. So the
†
A more extreme example of a model-based difference is that addition of two n × n matrices on a
RAM model takes time that is O(n 2 ), but on an unboundedly parallel machine model takes constant
time, O(1). ‡ People do sometimes note the order of magnitude of these constants.
analysis that we do to find the Big O behavior of an algorithm usually refers to

an abstract computer model, such as a Turing machine or a RAM model, and it
usually does not go to the extent of finding the constants. That is, being reference
independent is, in a quickly changing field, an advantage.
This ties back to the paragraph at the start of this
discussion. Not taking into account the precise differ-
ence between, say, the cost of a division and the cost
of a memory write (as long as those costs lie between
reasonable limits) implies that constant factors are
meaningless, and we focus on relative comparisons.
Here is an analogy: absolute measurement of dis-
Courtesy xkcd.com
tance involve units such as miles or kilometers, but
being able to make statements irrespective of the unit
constants requires making relative statements such as, “from here, New York City
is twice as far as Boston.”
That is, the entire answer is that we say that an
algorithm that on input of size n will takes 3n ticks
is O(n) in order to express that, roughly, doubling the
input size will no more than double the number of steps
taken. Similarly, if an algorithm is O(n 2 ) then doubling
the input size will at most quadruple the number of steps.
The Big O notation ignores constants because that is
inherent in being a unit free, relative measurement.
However, for a person with the sense that leaving
out the constants makes this measure approximate, then
certainly, Big O is is only a rough comparison. It cannot
say with precision which of two algorithms will be
absolutely better when they are cast into code and run
on a particular platform, for input sizes in a given range.†
This leads to our fourth point about Big O. Under-
standing how an algorithm performs as the input size
grows requires that we define the input size.
Consider a naive algorithm for factoring numbers
that, given an input natural number n , tests each k ∈
Courtesy xkcd.com
{ 2, ... n − 1 } to see if it divides n . If n is prime then it
tests all of those k ’s, which is roughly n -many divisions.
We can take the size of n to be the number of bits needed to represent n in binary,
approximately lg n . That is, for this algorithm the input is of size lg n and the
number of operations is about n . That’s exponential growth — passing from lg n
to n requires exponentiating — so this algorithm is O(2b ), where b is the number
of input bits.
However, in a programming class this algorithm would likely be described as
†
For that, use benchmarks.
linear, as O(n), with the reasoning that for the input n there are about n -many
divisions. How to explain the difference between these two Big O estimates?
This is another example of the relationship between an algorithm and an
underlying computing model. A programmer may make the engineering judgment
that for every use of their program the input will fit into a 64 bit word. They are
selecting a computation model, like the RAM model, where larger numbers take
the same time to read as smaller numbers. With this model, the prior paragraph
applies and the algorithm is linear.
So this difference in Big O estimates is in part an application versus theory
thing. In the common programming application setting, where the bit size of the
inputs is bounded, the runtime behavior is O(n). In a theoretical setting, accepting
input that is arbitrarily large and so the runtime is a function of the bit size of the
inputs, the algorithm is O(2b ). An algorithm whose behavior as a function of the
input is polynomial, but whose behavior as a function of the bit size of the input is
exponential, is said to be pseudopolynomial.
A fifth and final point about Big O. When we are analyzing an algorithm we
can consider the behavior that is the worst case for any input of that size (as in
Definition 1.26), or the behavior that is the average over all inputs of that size.
For instance, the quicksort algorithm takes quadratic time O(n 2 ) at worst, but on
average is O(n lg n). Note, though, that worst-case analysis is the most common.
V.1 Exercises
1.27 True or false: if a function is O(n 2 ) then it is O(n 3 ).
✓ 1.28 Your classmate emails you, “I have an algorithm with running time that is
O(n 2 ). So with input n = 5 it will take 25 ticks.” Tell them two things that they
have wrong.
1.29 Suppose that someone posts to a group that you are in, “I’m working on a
problem that is O(n 3 ).” Explain to them, gently, how their sentence is mistaken.
✓ 1.30 How many bits does it take to express each number in binary? (a) 5 (b) 50
(c) 500 (d) 5 000
✓ 1.31 One is true, the other one is not. Which is which? (a) If f is O(д) then f is
Θ(д). (b) If f is Θ(д) then f is O(д).
✓ 1.32 For each, find the function on the Hardy hierarchy, Table 1.24, that has
the same rate of growth. (a) n 2 + 5n − 2 (b) 2n + n 3 (c) 3n 4 − lg lg n
(d) lg n + 5
1.33 For each, give the function on the Hardy hierarchy, Table 1.24, that has the
same rate of(growth. That is, find д in that table where f is Θ(д).
n – if n < 100
(a) f (n) =
0 – else
(
1 000 000 · n – if n < 10 000
(b) f (n) = 2
n – else
(
1 000 000 · n 2 – if n < 100 000
(c) f (n) =
lg n – else
✓ 1.34 For each pair of functions decide if f is O(д), or д is O(f ), or both, or
neither. (a) f (n) = 4n 2 + 3, д(n) = (1/2)n 2 − n (b) f (n) √
= 53n 3 , д(n) = ln n
√
(c) f (n) = 2n 2 , д(n) = n (d) f (n) = n 1 . 2 + lg n , д(n) = n 2 + 2n (e) f (n) = n 6 ,
д(n) = 2n/6 (f) f (n) = 3n , д(n) = 3 · 2n (g) f (n) = lg(3n), д(n) = lg(n)
1.35 Which of these are O(n 2 )? (a) lg n (b) 3 + 2n + n 2 (c) 3 + 2n + n 3
(d) 10 + 4n 2 + ⌊ cos(n 3 )⌋ (e) lg(5n )
✓ 1.36 For each, state true or false. (a) 5n 2 + 2n is O(n 3 ) (b) 2 + 4n 3 is O(lg n)
(c) ln n is O(lg n) (d) n 3 + n 2 + n is O(n 3 ) (e) n 3 + n 2 + n is O(2n )
1.37 For each find the smallest k ∈ N so that the given function is O(nk ).
(a) n 3 + (n 4 /10 000 000√) (b) (n + 2)(n + 3)(n2 − lg n) (c) 5n3 + 25 + ⌈cos(n)⌉
2 3 4
(d) 9 · (n + n ) (e) ⌊ 5n 7 + 2n 2 ⌋
1.38 Consider Table 1.25. (a) Add a column for n = 200. (b) Add a row for 3n .
✓ 1.39 On a computer that performs at 10 GHz, at 10 000 million instructions per
second, what is the longest input that can be done in √
a year under an algorithm
with each time performance function? (a) lg n (b) n (c) n (d) n 2 (e) n 3
(f) 2n
1.40 Sometimes in practice we must choose between two algorithms where the
performance of one is better than the performance of the other in a big-O sense,
but where the first has a long initial segment of poorer performance. What is the
least input number such that f (n) = 100 000 · n 2 is less than д(n) = n 3 ?
1.41 What is the order of growth of the run time of a deterministic Finite State
machine?
✓ 1.42 (a) Verify that the function f (x) = 7 is O(1).
(b) Verify that f (x) = 7 + sin(x) is O(1). So if a function is in O(1), that does
not mean that it is a constant function.
(c) Verify that f (x) = 7 + (1/x) is also O(1).
(d) Show that a complexity function f is O(1) if and only if it is bounded above
by a constant, that is, if an only if there exists L ∈ R so that f (x) ≤ L for all
inputs x ∈ R.
1.43 Where does д(x) ≤ x O(1) place the function д in the Hardy hierarchy?
Hint: see the prior question.
1.44 Let f (x) = 2x and д(x) = x 2 . Prove directly from Definition 1.6 that f
is O(д), but that д is not O(f ).
1.45 Prove that 2n is O(n !). Hint: because of the factorial, consider these natural
number functions and find suitable N , C ∈ N.
1.46 Use L’Hôpital’s Rule as in Example 1.20 to verify these for any d ∈ R+:
(a) (logb (x))3 is O(x d ) (b) for any k ∈ N+ , (logb (x))k is O(x d ).
1.47 What is the running time of the empty Turing machine?
1.48 Assume that д : R → R is increasing, so that x 1 ≥ x 0 implies that д(x 1 ) ≥
д(x 0 ). Assume also that f : R → R is a constant function. Show that f is O(д).
1.49 (a) Show that there is a computable function whose output values grow at a
rate that is O(1), one whose values grow at a rate that is O(n), one for O(n 2 ), etc.
(b) The Halting problem function K is uncomputable. Place its rate of growth in
the Hardy hierarchy, Table 1.24. (c) Produce a function that is not computable
because its output values are larger than those of any computable function. (You
need not show that the rate of growth is strictly larger, only that the output values
are larger.)
1.50 An algorithm that inputs natural numbers runs in pseudopolynomial time if
its runtime is polynomial in the numeric value of the input n , but exponential in
the number of bits required to represent n . Show that the naive algorithm to test
if the input is prime, which just checks whether it is divisible by any number that
is smaller and greater than 2, is pseudopolynomial. (Hint: we can check whether
one number divides another in quadratic time.)
✓ 1.51 Show that O(2x ) ∈ O(3x ) but O(2x ) , O(3x ).
1.52 Table 1.24 states that n ! grows slower than nn . (a) Verify this. Hint: although
n ! is a natural
√ number function, Theorem 1.17 still applies. (b) Stirling’s formula
is that n ! ≈ 2πn · (nn /e n ). Doesn’t this imply that n ! is Θ(nn )?
✓ 1.53 Two complexity functions f , д are asymptotically equivalent, f ∼ д, if
limx →∞ (f (x)/д(x)) = 1. Show that each pair is asymptotically equivalent:
(a) f (x) = x 2 + 5x + 1 and д(x) = x 2 , (b) lg(x + 1) and lg(x).
1.54 Is there an f so that O(f ) is the set of all polynomials?
1.55 There are orders of growth between polynomial and exponential. Specifically,
f (x) = x lg x is one.
(a) Show that lg(x) ∈ O((lg(x))2 ) but (lg(x))2 < O(lg(x)).
(b) Argue that for any power k , we have x k ∈ O(x lg x ) but x lg(x ) < O(x k ).
Hint: take the ratio, rewrite using a = 2lg(a) , and consider the limit of the
exponent.
2
(c) Show that x lg x = 2(lg x ) . Hint: take the logarithm of both halves.
(d) Show that x lg x is in O(2x ). Hint: form the ratio using the prior item.
The remaining exercises verify results that were presented without proof.
1.56 Verify the clauses of Lemma 1.12. (a) If a ∈ R+ then a f is also O(д).
(b) The function f 0 + f 1 is O(д), where д is defined by д(n) = max(д0 (n), д1 (n)).
(c) The product f 0 f 1 is O(д0д1 ).
1.57 Verify these clauses of Lemma 1.15. (a) The big-O relation is reflexive.
(b) It is also transitive.
Section 2. A problem miscellany 275
1.58 Theorem 1.17 says that if the limit of the ratio of two functions exists then
we can determine the O relationship between the two. Assume that f and д are
complexity functions.
(a) Suppose that limx →∞ f (x)/д(x) exists and equals 0. Show that f is O(д).
(Hint: this requires a rigorous definition of the limit.)
(b) We can give an example where f is O(д) even though limx →∞ f (x)/д(x)
does not exist. Verify that, where д(x) = x and where f (x) = x when ⌊x⌋ is
odd and f (x) = 2x when ⌊x⌋ is even.
Section
V.2 A problem miscellany
Much of today’s work in the Theory of Computation is driven by problems that
come from a field outside the subject. We will describe some problems to get a
sense of the ones that appear in the subject, and also to use for examples and
exercises. These are all well known.
Problems with stories We start with a few problems that come with stories.
Besides being fun, and an important part of the field’s culture, these also give a
sense of where problems come from.
W R Hamilton was a polymath whose genius was recognized early
and he was given a sinecure as Astronomer Royal of Ireland. He made
important contributions to classical mechanics, where his reformulation
of Newtonian mechanics is now called Hamiltonian mechanics. Other
work of his in physics helped develop classical field theories such as
electromagnetism and laid the ground work for the development of
quantum mechanics. In mathematics, he is best known as the inventor
of the quaternion number system.
William Rowan One of his ventures was a game, Around the World. The vertices
Hamilton in the graph below were holes labeled with the names of world cities.
1805–1865 Players put pegs in the holes, looking for a circuit that visits each city
once and only once.
2.1 Animation: Hamilton’s Around the World game

It did not make Hamilton rich. But it did get him associated with a great problem.
2.2 Problem (Hamiltonian Circuit) Given a graph, decide if it contains a Hamiltonian
circuit, a cyclic path that includes each vertex once and only once.
This is stated as a type of problem called a decision problem, because it asks for
a ‘yes’ or ‘no’ answer. In this section we will see a a variety of types of problems.
The next section will say more about problem types.
A special case of the Hamiltonian Circuit problem is the Knight’s Tour problem,
to use a chess knight to make a circuit of the squares on the board. (Recall
that a knight moves three squares at a time, two in one direction and then one
perpendicular to that direction.)
This is the solution given by L Euler. In graph terms, there are sixty four vertices,
representing the board squares. An edge goes between two vertices if they are
connected by a single knight move. Knight’s Tour asks for a Hamiltonian circuit of
that graph.
Hamiltonian Circuit has another famous variant.
2.3 Problem (Traveling Salesman) Given a weighted graph, where we call the vertices
S = {c 0 , ... c k −1 } ‘cities’ and we call the edge weight d(c i , c j ) ∈ N+ for all c i , c j
the ‘distance’ between the cities, find the shortest-distance circuit that visits every
city and returns back to the start.
We can start with a map of the state capitals of
the forty eight contiguous US states and the distances
between them: Montpelier VT to Albany NY is 254
kilometers, etc. From among all trips that visit each
city and return back to the start, such as Montpelier →
Albany → Harrisburg → · · · → Montpelier, we want
Courtesy xkcd.com the shortest one.
As stated, this is an optimization problem. However,
we can recast it as a decision problem. Introduce a bound B ∈ N and change the
problem statement to ‘decide if there is a circuit of total distance less than B ’. If
we had an algorithm to quickly solve this decision problem then we could also
quickly solve the optimization problem: ask whether there is a trip bounded by
length B = 1, then ask if there is a trip of length B = 2, etc. When we eventually
get a ‘yes’, we know the length of the shortest trip.
The next problem sounds much like Hamiltonian Circuit, in that it

involves exhaustively traversing a graph. But it proves to act very
differently.
Today the city of Kaliningrad is in a Russian enclave between Poland
and Lithuania. But in 1727 it was in Prussia and was called Königsberg.
The Pregel river divides the city into four areas, connected by seven
bridges. The citizens used to promenade, to take leisurely walks or drives
where they could see and be seen. Among these citizens the question Leonhard Euler
arose: can a person cross each bridge once and only once, and arrive 1707–1783
back at the start? No one could think of a way but no one could think of
a reason that there was no way. A local mayor wrote to Euler, who proved that no
circuit is possible. This paper founded Graph Theory.
A D
Euler’s summary sketch is in the middle and the graph is on the right.
2.4 Problem (Euler Circuit) Given a graph, find a circuit that traverses each edge
once and only once, or declare that no such circuit exists.
Next is a problem that sounds hard. But all of us see it solved every day, for
instance when we ask a phone for the shortest driving directions to someplace.
2.5 Problem (Shortest Path) Given a weighted graph and two vertices, find the
shortest path between them, or report that no path exists.
There is an algorithm that solves this problem quickly.† For instance, with this
graph we could look for the cheapest path from A to F .
14
A D
9
9 C 2 F
7
10 11 6
B E
15
The next problem was discovered in 1852 by a young mathematician, F Guthrie,

who was drawing a map of the counties of England. He wanted to color the
counties, and naturally wanted to different colors for counties that share a border.
His map required only four colors and he conjectured that for any map at all, four
colors were enough.
†
Dijkstra’s algorithm is at worst quadratic in the number of vertices.
Guthrie imposed the condition that the countries must be contiguous,

and he defined ‘sharing a border’ to mean sharing an interval, not just
a point (see Exercise 2.46). Below is a map and a graph version of the
same problem. In the graph, counties are vertices and edges connect
ones that are adjacent. A crucial point is that the graph is planar —
we can draw it in the plane so that its edges do not cross.
The Four Color problem inputs a planar graph and outputs the
vertices partitioned into no more than four sets, called colors, such
Augustus De Mor-
that adjacent vertices are in different colors. (It is a special case of the
gan 1806–1871
problem given next.)
2.6 Animation: Counties of England and the derived planar graph
Guthrie consulted his former professor, A De Morgan, who

was also unable to either prove or disprove the conjecture. But he
did make the problem famous by promoting it among his friends.
Appel and Haken’s post It remained unsolved until 1976, when K Appel and W Haken
office celebrating reduced the proof to 1 936 cases and got a computer to check
those cases.
This was the first major proof that was done on a computer and it was
controversial. Many mathematicians felt that the purpose of the subject was to
understand the things studied, and not just be satisfied when a computer program
that perhaps seems to be bug-free assures us that theorems are verified.† In the
interim, though, a new generation has appeared that is more comfortable with the
process and now computer proofs are routine, or at least not as controversial.
2.7 Problem (Graph Colorability) Given a graph and a number k ∈ N, decide whether
the graph is k -colorable, whether we can partition its vertices into k -many sets,
N = C0 ∪ · · · ∪ Ck −1 , such that no two same-set vertices are connected.
2.8 Problem (Chromatic Number) Given a graph, find the smallest number k ∈ N
such that the graph is k -colorable.
†
This is in contrast to the goal of the Entscheidungsproblem.
Our final story introduces a problem that will serve as a benchmark to

which we compare others. In 1847, G Boole outlined what we today call
Boolean algebra in An Investigation of the Laws of Thought on Which are
Founded the Mathematical Theories of Logic and Probabilities.
A variable is Boolean if it takes only the values T or F . A Boolean
function inputs and outputs tuples of those. Boolean expressions connect
variables using the binary and operator, ∧, the or operator, ∨, or the unary George
not operator, ¬. This Boolean function is given by an expression with three Boole
variables. 1815–1864
f (P, Q, R) = (P ∨ Q) ∧ (P ∨ ¬Q) ∧ (¬P ∨ Q) ∧ (¬P ∨ ¬Q ∨ ¬R)

We will take expressions to be in conjunctive normal form, so they consist of clauses
of ∨’s connected with ∧’s. A Boolean expression is satisfiable if there is some
combination of input T ’s and F ’s so that the formula evaluates to T . This truth
table shows the input-output behavior of the function defined by that formula.
P Q R P ∨Q P ∨ ¬Q ¬P ∨ Q ¬P ∨ ¬Q ∨ ¬R f (P, Q, R)
F F F F T T T F
F F T F T T T F
F T F T F T T F
F T T T F T T F
T F F T T F T F
T F T T T F T F
T T F T T T T T
T T T T T T F F
That T in the final column witnesses that this formula is satisfiable.
2.9 Problem (Satisfiability, SAT ) Decide if a given Boolean expression is satisfiable.
2.10 Problem (3-Satisfiability, 3-SAT ) Given a propositional logic formula in conjunc-

tive normal form in which each clause has at most three variables, decide if it is
satisfiable.
Observe that if the number of input variables is v then the number of rows in
the truth table is 2v . So solving SAT appears to require exponential time. Whether
that is right is a very important question, as we will see in later sections.
More problems, omitting the stories We will list more examples. All of these
are widely known, part of the culture of the field.
2.11 Problem (Vertex-to-Vertex Path) Given a graph and two vertices, find if the
second is reachable from the first, that is, if there is a path to the second from the
first.†
†
There are lots of problems about paths, so calling this just the Path problem is confusing. Some
authors call this st -Path, st -Connectivity, or STCON.
2.12 Example These are two Western-tradition constellations, Ursa Minor and Draco.
Here we can solve the Vertex-to-Vertex Path problem by eye. For any two vertices
in Ursa Minor there is a path and for any two vertices in Draco there is a path. But
if the two are in different constellations then there is no path.
2.13 Problem (Minimum Spanning Tree) Given a weighted undirected graph, find a
minimum spanning tree, a subgraph containing all the vertices of the original
graph such that its edges have a minimum total.
This is an undirected graph with weights on the edges.
18
8
4
3 10
9
1 7 9
3 9
5 8
4
9 4
4
2 6
9
The highlighted subgraph includes all of the vertices, that is, it spans the graph. In
addition, its weights total to a minimum from among all of the spanning subgraphs.
From that it follows that this subgraph is a tree, meaning that it has no cycles, or
else we could eliminate an edge from the cycle and thereby lower the edge weight
total without dropping any vertices.
This problem looks like the Hamiltonian Circuit problem in requiring that the sub-
graph contain all the vertices. One difference is that for the Minimum Spanning Tree
problem we know algorithms that are quick, that are O(n lg n).
2.14 Problem (Vertex Cover) Given a graph and a bound B ∈ N, decide if the graph
has a size B vertex cover, a set of vertices, C , such that for any edge, at least one
of its ends is a member of C .
2.15 Example A museum posts guards to watch the exhibits. They have eight halls, laid
out as below, and they will post guards at corners. What is the smallest number of
guards that will suffice to watch all of the hallways?
w0 w1 w2
w3 w4 w5
Obviously, one guard will not do. A two-element list that covers is C = {w 0 , w 4 }.
2.16 Problem (Clique) Given a graph and a bound B ∈ N, decide if the graph has a
size B clique, a set of B -many vertices such that any two are connected.
If the graph nodes represent people and the edges connect friends then we
want to know if there are B -many mutual friends. Any graph with a 4-clique has
the subgraph like the one below on the left and any graph with a 5 clique has the
subgraph like the one the right.
2.17 Example Decide if this graph has a 4-clique.

v3 v2
v4 v1
v5 v0
2.18 Animation: Instance of the Clique problem
2.19 Problem (Broadcast) Given a graph with initial vertex v 0 , and a bound B ∈ N,
decide if a message can spread from v 0 to every other vertex within B steps. At
each step, any node that has heard the message can transmit it to at most one
adjacent node.
v7
v1 v4
v0 v8 v 10
v3 v2 v5 v9
v6
2.20 Animation: Instance of the Broadcast problem

2.21 Example In the graph no vertex is more than three edges away from the initial
one. The animation shows it taking four steps to broadcast.
2.22 Problem (Three-dimensional Matching) Given as input a set M ⊆ X × Y × Z ,

where the sets X , Y , Z all have the same number of elements, n , decide if there is
a matching, a set M̂ ⊆ M containing n elements such that no two of the triples in
M̂ agree on any of their coordinates.
2.23 Example Let X = { a, b }, Y = { b, c }, and Z = { a, d }, so that n = 2. Then

M = X × Y × Z contains these eight elements.
M = { ⟨a, b, a⟩, ⟨a, c, a⟩, ⟨b, b, a⟩, ⟨b, c, a⟩, ⟨a, b, d⟩, ⟨a, c, d⟩, ⟨b, b, d⟩, ⟨b, c, d⟩ }
The set M̂ = { ⟨a, b, a⟩, ⟨b, c, d⟩ } has 2 elements and they disagree in the first
coordinate, as well as on the second and third.
2.24 Example Fix n = 4 and consider X = { 1, 2, 3, 4 }, Y = { 10, 20, 30, 40 }, and

Z = { 100, 200, 300, 400 }, all four-element sets. Also fix this subset of X × Y × Z .
M = { ⟨1, 10, 200⟩, ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 10, 400⟩,
⟨3, 40, 100⟩, ⟨3, 40, 200⟩, ⟨4, 10, 200⟩, ⟨4, 20, 300⟩ }
A matching is M̂ = { ⟨1, 20, 300⟩, ⟨2, 30, 400⟩, ⟨3, 40, 100⟩, ⟨4, 10, 200⟩ }.
2.25 Problem (Subset Sum) Given a multiset of natural numbers S = {n 0 , ... nk −1 }

and a target T ∈ N, decide if a subset of S sums to the target.
Recall that a multiset is like a set in that the order of the elements is not
significant but is different than a set in that repeats do not collapse: the multiset
{ 1, 2, 2, 3 } is different than the multiset { 1, 2, 3 }.
2.26 Example Decide if some of the numbers { 911, 22, 821, 563, 405, 986, 165, 732 }
add to T = 1173. One such collection is { 165, 986, 22 }.
2.27 Example No sum of the numbers { 831, 357, 63, 987, 117, 81, 6785, 606 } adds to
T = 2105. All of the numbers are multiples of three, while the target T is not.
2.28 Problem (Knapsack) Given a finite set S whose elements s have a weight w(s) ∈
N+ and a value v(s) ∈ N+ , along with a weight bound B ∈ N+ and a value
target T ∈ N+ , find a subset Sˆ ⊆ S whose elements have a total weight less than
or equal to the bound and total value greater than or equal to the target.
Imagine that we have items to pack in a knapsack and we can carry at most ten
pounds. Can we pack a value of T = 100 or more?
Item a b c d
Weight 3 4 5 6
Value 50 40 10 30
We pack the most value while keeping to the weight limit by taking items (a)
and (b). So we cannot meet the value target.
2.29 Problem (Partition) Given a finite multiset A that has for each of its elements an
associated positive number size s(a) ∈ N+ , decide if there is a division of the set
into two halves, Â and A − Â, so that the total of the sizes is the same in both
halves, a ∈Â s(a) = a<Â s(a).
Í Í
2.30 Example The set A = { I, a, my, go, rivers, cat, hotel, comb } has eight words.
The size of a word, s(σ ), is the number of letters. Then Â = { cat, river, I, a, go }
gives a ∈Â s(a) = a<Â s(a) = 12.
Í Í
2.31 Example The US President is elected by having states send representatives to

the Electoral College. The number of representatives depends in part on the
state’s population. Below are the numbers for the 2020 election; all of a state’s
representatives vote for the same person (we will ignore some fine points). The
Partition Problem asks if a tie is possible.
Reps No. states States Reps No. states States

55 1 CA 11 4 AZ, IN, MA, TN
38 1 TX 10 4 MD, MN, MO, WI
29 2 FL, NY 9 3 AL, CO, SC
20 2 IL, PA 8 2 KY, LA
18 1 OH 7 3 CT, OK, OR
16 2 GA, MI 6 6 AR, IA, KS, MS, NV, UT
15 1 NC 5 3 NE, NM, WV
14 1 NJ 4 5 HI, ID, ME, NH, RI
13 1 VA 3 8 AK, DE, DC, MT, ND,
12 1 WA SD, VT, WY
2.32 Problem (Crossword) Given an n × n grid, and a set of 2n -many strings, each of
length n , decide if the words can be packed into the grid.
2.33 Example Can we pack the words AGE, AGO, BEG, CAB, CAD, and DOG into a 3 × 3
grid?
C A B
A G E
D O G
2.34 Animation: Instance of the Crossword problem
2.35 Problem (15 Game) Given an n × n grid holding tiles numbered 1, . . . n − 1,

and a blank, find the minimum number of moves that will put the tile numbers
into ascending order. A move consists of switching a tile with an adjacent blank.
This game was popularized as a toy.
The final three problems may seem inextricably linked, but as we understand
them today, they seem to be different in the big-O behavior of the algorithms to
solve them.
2.36 Problem (Divisor) Given a number n ∈ N, find a nontrivial divisor.
When the numbers are sufficiently large, we know of no efficient algorithm to
find divisors.† However, as is so often the case, at this time we also have no proof
that no efficient algorithm exists.‡ Not all numbers of a given length are equally
hard to factor. The hardest numbers to factor, using the best currently known
techniques, are semiprimes, the product of two prime numbers.
2.37 Problem (Prime Factorization) Given a number n ∈ N, produce its decomposition
into a product of primes.
Factoring seems, as far as we know today, to be hard. What about if you only
want to know whether a number is prime or composite, and don’t care about its
factors?
2.38 Problem (Composite) Given a number n ∈ N, determine if it has any nontrivial
factors; that is, decide if there is a number a that divides n and such that 1 < a < n .
For many years the consensus among experts was that
Composite was probably quite hard.§ One reasonable justifi-
cation was that, for centuries, many of the smartest people in
the world had worked on composites and primes, and none
of them had produced a fast test. But in 2002, M Agrawal,
N Kayal, and N Saxena proved that primality testing can be
done in time polynomial in the number of digits of the number.
This is the AKS primality test.|| Today, refinements of their
Nitin Saxena (b 1981),
technique run in O(n 6 ). Neeraj Kayal (b 1979),
This dramatically illustrates that, although experts are Manindra Agrawal
expert and their opinions have value, nonetheless they can be (b 1966)
wrong. People producing a result that gainsays established
orthodoxy has happened before and will happen again.
In short, one correct proof outweighs any number of expert opinions.
†
No efficient algorithm is known on a non-quantum computer. ‡ There is no proof despite centuries
of ingenious attacks on the problems by many of the brightest minds of the past, and of today. The
presumed difficulty of this problem is at the heart of widely used algorithms in cryptography. § There
are a number of probabilistic algorithms that are often used in practice that can test primality very
quickly, with an extremely small chance of error. || At the time that they did most of the work, Kayal
and Saxena were undergraduates.
V.2 Exercises
2.39 Name the prime numbers less than one hundred.

2.40 Decide if each is prime. (a) 5 477 (b) 6 165 (c) 6 863 (d) 4 207
(e) 7 689
✓ 2.41 Find a proper divisor of each. (a) 31 221 (b) 52 424 (c) 9 600 (d) 4 331
(e) 877
2.42 We can specify a propositional logic behavior in a truth table and then
produce such a statement in conjunctive normal form.
P Q R
F F F T
F F T T
F T F F
F T T F
T F F T
T F T F
T T F T
T T T F
(a) The two terms P and ¬P are atoms. So are Q , ¬Q , R , and ¬R . Produce a
three-atom clause that evaluates to F only on the F -T -F line.
(b) Produce three-atom clauses for each of the other truth table lines having the
value F on the right.
(c) Take the conjunction of those four clauses and verify that it has the given
behavior.
✓ 2.43 Decide if each formula is satisfiable.
(a) (P ∧ Q) ∨ (¬Q ∧ R)
(b) (P → Q) ∧ ¬((P ∧ Q) ∨ ¬P)
✓ 2.44 Each of the five Platonic solids has a Hamiltonian circuit, as shown.
Hamilton used the fourth, the dodecahedron, for his game. Find a Hamiltonian
circuit for the third and the fifth, the octahedron and the icosahedron. To make
the connections easier to see, below we have grabbed a face in the back of each
solid, and expanded it until we could squash the entire shape down into the plane
without any edge crossings.
0
1
1
4 3 4 5
2
2 5
3 6
7 8
0 9
10 11
2.45 Give a planar map that requires four colors.

2.46 (a) The Four Color problem requires that the countries be contiguous, that
they not consist of separated regions (that is, components). Give a planar map
that consists of separated regions that requires five colors. (b) We also define
adjacent to mean sharing a border that is an interval, not just a point. Give a
planar map that, without that restriction, would require five colors.
✓ 2.47 Solve Example 2.26.
✓ 2.48 This shows interlocking corporate directorships. The vertices are corporations
and they are connected if they share a member of their Board of Directors (the
data is from 2004).
JP Morgan Caterpillar AT&T Texas Instruments
Ford Motor Citigroup
AIG Georgia Pacific Haliburton
(a) Is there a path from AT&T to Ford Motor? (b) Can you get from Haliburton
to Ford Motor? (c) Can you get from Caterpillar to Ford Motor? (d) JP Morgan
to Ford Motor?
✓ 2.49 A popular game extends the Vertex-to-Vertex Path problem by counting the
degrees of separation. Below is a portion of the movie connection graph, where
actors are connected if they have ever been together in a movie.
Elvis Presley Meryl Streep
Change of Habit
Ed Asner JFK The River Wild
JFK Kevin Bacon She’s Having a Baby Alec Baldwin

Jay O Sanders
Northern Borders Beauty Shop Cats & Dogs
Tara O’Reilley
Shout It Out! Andie MacDowell John Kennedy
Jim Hefferon
A person’s Bacon number is the number of edges connecting them to Bacon, or

infinity if they are not connected. The game Six Degrees of Kevin Bacon asks: is
everyone connected to Kevin Bacon by at most six movies?
(a) What is Elvis’s Bacon number?
(b) John Kennedy’s (no, it is not that John Kennedy)?
(c) Bacon’s?
(d) How many movies separate me from Meryl Streep?
✓ 2.50 This Knapsack instance has no solution when the weight bound is B = 73
and the value target is T = 140.
Item a b c d e
Weight 21 33 49 42 19
Value 50 48 34 44 40
Verify that by brute force, by checking every possible packing attempt.

2.51 Using the data in Example 2.31, decide if there could be a tie in the 2020
Electoral College.
2.52 Find the shortest path in this graph
q1 12 q7
8 17
15
2
q6 7
6
q0 q2 1 16 11
13
3 q4 9 q5
18
5 10
4 14
q3 q8
24
(a) from q 2 to q 7 , (b) from q 0 to q 8 , (c) from q 8 to q 0 .

2.53 The Subset Sum instance with S = { 21, 33, 49, 42, 19 } and target T =
114 has no solution. Verify that by brute force, by checking every possible
combination.
✓ 2.54 What shape is a 3-clique? A 2-clique?
2.55 How many edges does a k -clique have?
✓ 2.56 The Course Scheduling problem starts with a list of students and the classes
that they wish to take, and then finds how many time slots are needed to schedule
the classes. If there is a student taking two classes then those two will not be
scheduled to meet at the same time. Here is an instance: a school has classes
in Astronomy, Biology, Computing, Drama, English, French, Geography, History,
and Italian. After students sign up, the graph below shows which classes have
an overlap. For instance Astronomy and Biology share at least one student while
Biology and Drama do not.
G E H
C F I
D
A B
What is the minimum number of class times that we must use? In coloring terms,
we define that classes meeting at the same time are the same color and we ask for
the minimum number of colors needed so that no two same-colored vertices share
an edge. (a) Show that no three-coloring suffices. (b) Produce a four-coloring.
2.57 Some authors define the Satisfiability problem as: given a finite set of propo-
sitional logic statements, find if there is a single input tuple b0 , ... b j−1 , where
each bi is either T or F , that satisfies them all. Show that this is equivalent to the
definition given in Problem 2.9.
✓ 2.58 Find all 3-cliques in this graph.
v6 v5
v1 v4 v3
v0 v2
2.59 Is there a 3-clique in this graph? A 4-clique? A 5-clique?
v0 v1
v2
v3 v4
v5 v6
2.60 Recall that Vertex Cover inputs a graph G = ⟨N , E ⟩ and a number k ∈ N,

and asks if there is a subset S of at most k vertices such that for each edge at least
one endpoint is an element of S . The Independent Set problem inputs a graph
and a number k̂ ∈ N and asks if there is a subset Sˆ with at least k̂ vertices such
that for each edge at most one endpoint is in Sˆ. The two are obviously related.
(a) In this graph find a vertex cover S with k = 2 elements. Find an independent
set with k̂ = 4 elements.
v0 v1 v2
v3 v4 v5
(b) In this graph find a vertex cover with k = 3 elements, and an independent
set with k̂ = 3 elements.
v0 v1 v2 v3
v4 v5
(c) In this graph find a vertex cover S with k = 4 elements. Find an independent
set Sˆ with k̂ = 6 elements.
Section 3. Problems, algorithms, and programs 289
v0 v1 v2 v3 v4
v5 v6 v7 v8 v9
(d) Prove that S is a vertex cover if and only if its complement Sˆ = N − S is an

independent set.
✓ 2.61 A college department has instructors A, B , C , D , and E . They need placing
into courses 0, 1, 2, 3, and 4. The available time slots are α , β , γ , δ , and ε . This
shows which instructors can teach which courses, and which courses can run in
which slots.
A B C D E
0 1 2 3 4
α β γ δ ε
For example, instructor A can only teach courses 1, 2, and 3. And, course 0
can only run at time α or time δ . Verify that this is an instance of the
Three-dimensional Matching problem and find a match.
2.62 Consider Three Dimensional Matching, Problem 2.22. Let X = { a, b, c },
Y = { b, c, d }, and Z = { a, d, e }.
(a) List all the elements of M = X × Y × Z .
(b) Is there a three element subset M̂ whose triples have the property that no
two of them agree on any coordinate?
2.63 In Example 2.21 the broadcast takes four steps. Can it be done in fewer?
Section
V.3 Problems, algorithms, and programs
Now, with many examples in hand, we will briefly reflect on problems and solutions.
We will keep this discussion on an intuitive level only — indeed, many of these
things have no widely accepted precise definition.
A problem is a job, a task. Somewhat more precisely, it is a uniform family of
tasks, typically with an unbounded number of instances. For a sense of ‘family’,
contrast the general Shortest Path problem with that of finding the shortest path
between Los Angeles and New York. The first is a family while the second is an
instance. We are more likely to talk about the family, both because the second is a
special case so that any conclusions about the first subsumes the second, and also
because the first feels more natural.† We are most focused on problems that can be
†
There are interesting problems with only one task, such as computing the digits of π .
solved with a mechanism, although we continue to be interested to learn that a

problem cannot be solved mechanically at all.
An algorithm describes at a high level an effective way to solve a problem.† An
algorithm is not an implementation, although it should be described in a way that
is detailed enough that implementing it is routine for an experienced professional.
One subtle point about algorithms is that while they are abstractions, they are
nonetheless based on an underlying computing model. An algorithm that is based
on a Turing machine model for adding one to an input would be very different
than an algorithm to do the same task on a model that is like a desktop computer
with registers.
An example of a very different computing model that an algo-
rithm could target is distributed computation. For instance, Science
United is a way for anyone with a computer and an Internet con-
nection to help scientific projects, by donating computing time. These projects
do research in astronomy, physics, biomedicine, mathematics, and environmental
science. Contributors install a free program that runs jobs in the background. This
is massively parallel computation.‡
A program differs from an algorithm in that it is an implementation of an
algorithm, typically expressed in a formal computer language, and often designed
to be executed on a specific computing platform.
To illustrate the differences between the problems, algorithms, and programs,
consider the problem of Prime Factorization. One algorithm is to use brute force,
that is, given an input n > 1, try every number k ∈ (1 .. n) to see if k divides n . We
could implement that algorithm with a program written in Scheme.
Types of problems There are patterns to the problems that we see in the Theory
of Computation. As a first example, a problem type that we have already seen
is a function problem. These ask that an algorithm that has a single output
for each input. A example is the Prime Factorization problem, which takes in a
natural number and returns its prime decomposition, perhaps as a sequence of
pairs, ⟨prime, exponent⟩ . Another example is the problem of finding the greatest
common divisor of two natural numbers, where the input is a pair of natural
numbers and the output is a natural number.
A second common problem type is the optimization problem. These call for a
solution that is best, according to some metric. The Shortest Path problem is one
of these, as is the Minimal Spanning Tree problem.
†
There is no widely-accepted formal definition of ‘algorithm’. Whatever it is, it fits between ‘mathematical
function’ and ‘computer program’. For example, a ‘sort’ function takes in a set of items and returns
the sorted sequence. This behavior could be implemented using different algorithms: merge sort,
heap sort, etc. In turn, each algorithm could be implemented by many programs, written in different
languages and for different platforms. So the best handle that we have is informal — an ‘algorithm’ is
an equivalence class of programs (i.e., Turing machines), where two programs are equivalent if they do
essentially the same thing. ‡ There are now coming up on a million volunteers offering computing
time. To join them, visit https://scienceunited.org/.
A perhaps less familiar problem type is a search problem. For one of these,
while there may be many solutions in the search space, the algorithm can stop when
it has found any one. A natural example inputs a Propositional Logic statement and
outputs a line in the truth table that witnesses that the statement is satisfiable (or
signals that there is no such line). There may be many such lines but it only needs
to find one. Another example is the problem, “Given a weighted graph, and two
vertices, and a bound B ∈ R, find a path between the vertices that costs less than
the bound.” Still another example is that of finding a B -coloring for a graph, from
among possibly many such colorings. Another example is the Knapsack problem.
This problem type also appeared in the discussion on nondeterminism on page 191,
where we defined that a string is accepted if in the computation tree we could find
at least one accepting branch.
A decision problem is one with a ‘Yes’ or ‘No’ answer.† The first problem
that we saw, the Entscheidungsproblem, is one of these.‡ We have also seen
decision problems in conjunction with the Halting problem, such as the problem of
determining, given an index e , whether ϕ e will output a seven for any input. In this
chapter we saw the problem of deciding whether a given natural number is prime,
the Composite problem, as well as the Clique problem, the Partition problem, and
the Subset Sum problem.
Often a decision problem is expressed as a language decision problem, or
language recognition problem, where we are given some language and asked for an
algorithm to decide if its input is a member of that language. We did lots of these
in the Automata chapter, such as producing a Finite State machine that decides if
an input string is a member of L = {σ ∈ { a, b }∗ σ contains at least two b’s }, or
proving that no such machine can determine membership in { an bn n ∈ N }.
This relates to the discussion from the Languages section, on page 147, about
the distinction between deciding a language and recognizing it.
3.1 Definition A language L is decided by a Turing machine, or Turing machine
decidable, if the function computed by that machine is the characteristic function
of the language. The language is recognized, or accepted, by when for each
input σ ∈ B∗, if σ ∈ L then the machine returns 1, while if σ < L then either the
machine does not halt or it does not return 1.
Restated, P decides the language L if the machine has this input-output behavior.
(
1 – if σ ∈ L
ϕ P (σ ) = 1L (σ ) =
0 – otherwise
Note in particular that in this case the machine halts for all inputs. Note also that
if a machine recognizes a language then when σ < L, possibly the machine just
does not halt.
†
Although a decision problem calls for producing a function of a kind, a Boolean function, they are
common enough to be a separate category. ‡ Recall that the word is German for “decision problem”
and that it asks for an algorithm to decide, given a mathematical statement, whether that statement is
true or false.
3.2 Remark One reason that we are interested in language membership decisions
comes from practice. A language compiler must recognize whether a given source
file is a member of the language. Another reason is that Finite State machines can
only do one thing, decide languages, and so to compare these with other machines
we must do so by comparing which languages they can decide. Still another reason
is that in many contexts stating a problem in this way is natural, as we saw with
the Halting problem.
Distinctions between problem types are fuzzy and often we can describe a task
with more than one type. For the task of determining the evenness of a number,
for instance, we could consider the function problem ‘given n , return its remainder
on division by 2’, or the language decision problem of determining membership in
L = { 2k k ∈ N }.
There, the different types are essentially the same. However, sometimes
selecting the problem type that best captures the complexities involved in a task
requires judgment. Consider the task of finding roots of a polynomial. We may
express it as a function problem with ‘given a polynomial p , return the set of its
rational number roots’, or as a language decision problem with ‘decide if a given
⟨p, r ⟩ , belongs to the set of all sequences consisting of a polynomial and one of its
rational roots’. The second option, for which the algorithm just plugs r into p , does
not seem to involve some of the essential difficulty in find a root, for instance such
as the problem of distinguishing between a single number that is a double root
and two close numbers that are each single roots.
When we have a choice of problem types, we prefer language decision problems.
It is our default interpretation of ‘problem’ and we will focus on them in the rest of
the book. In addition, we will be sloppy about the distinction between the decision
problem for a language and the language itself; for instance, we will write L for a
problem.
3.3 Example The Satisfiability problem, as stated, is a decision problem. We can
recast
it as the problem of determining membership in the language SAT =
{ F F is a satisfiable propositional logic statement }. This recasting is trivial, sug-
gesting that the language recognition problem form is a natural way to describe
the underlying task.
Recasting optimization problems as language decision problems often involves
using a parametrized sequence of languages.
3.4 Example The Chromatic Number problem inputs a graph and returns a minimal
number B ∈ N such that the graph is B -colorable. Recast it by considering the family
of languages, LB = { G G that has a B -coloring }. If we could solve the decision
problem for those languages then we could compute the minimal chromatic number
by testing B = 1, B = 2, etc., until we find the smallest B for which G ∈ LB .
3.5 Example The Traveling Salesman problem is an optimization problem. Recast it
as a sequence of language decision problems as above: consider a parameter B ∈ N

and define T S B = { G the graph G has a circuit of length no more than B }.
For a task, we want to state it as a problem in a way that captures the essential
difficulty. In particular, these recastings of optimization problems preserves
polytime solvability. For instance, if there were a power k ∈ N such that for each B
we could solve T S B in time O(nk ) then looping through B = 1, B = 2, etc., will
solve the Traveling Salesman problem in polytime, namely time O(nk +1 ).
Statements and representations To be complete, the description of a problem
must include the form of the inputs and outputs. For instance, if we state a problem
as: ‘input two numbers and output their midpoint’ then we have not fully specified
what needs to be done. The input or output might use strings representing decimal
numbers, or might be in binary, or even might be in unary, where the number n is
represented with n -many 1’s.†
The representation of the input and output data both matters and doesn’t
matter. One sense in which it matters is that the input’s form can change the
algorithm that we choose, or its runtime behavior. Suppose for instance that
we must decide whether a number is divisible by four. If the input is a bitstring
representing the number in binary then the algorithm is immediate: a number
is divisible by four if and only if in its final two bits are 00.‡ In contrast, if the
number is represented in unary then we may scan the 1’s, keeping track of the
current remainder modulo 4.
However, the representation doesn’t matter in the sense that if we have an
algorithm for one representation in hand then we can solve the problem for other
representations by translating to what the algorithm expects, and then applying
that algorithm. For example, for the divisible-by-four problem we could handle
unary inputs by converting them to binary and then applying the binary algorithm.§
The same applies to output representations.
Another way in which the representation doesn’t matter is that typically we
find that the costs of different representations don’t change the Big O runtime
behavior. For example we might have a graph algorithm whose run time is not
large at all, O(n lg n). Even for this minimal time, we can find a representation
for the input graphs, such as where inputting takes O(n) time, that leaves the
algorithm analysis conclusion unchanged at O(n lg n).
With this in mind, we will adopt the point of view, which we shall call Lipton’s
Thesis, that everything of interest can be represented with reasonable efficiency by
bitstrings.|| This applies to all of the mathematical problems stated earlier. But it
†
An experienced programmer may have the reaction that unary is wasteful — if the input is one
thousand then this representation would require the machine to take a thousand steps just to read
the input, while in binary notation the input is 1111101000, which takes only ten steps to read. But
unary is not completely useless; we have found that it suited our purpose when we simply wanted to
illustrate Turing machines. In any event, it certainly is possible. ‡ Thus, on a Turing machine, if when
the machine starts the head is under the final character, then the machine does not even need to read
the entire input to decide the question. The algorithm runs in time independent of the input length.
§
That is, the unary case reduces to the binary one. || ‘Reasonable’ means that it is not so inefficient as
to greatly change the big-O behavior.
also applies to cases that may seem less natural, such as that we can use bitstrings
to faithfully represent Beethoven’s 9th Symphony, or an exquisite Old Master.†
3.6 Figure: Basket of Fruit by Caravaggio (1571–1610)
Consequently, in practice researchers often do not mention representations.

We may describe the Shortest Path problem as, “Given a weighted graph and two
vertices . . .” in place of the more complete, “Given the following reasonably efficient
bitstring representation of a weighted graph G and vertices v 0 and v 1 , . . .” Outside
of this discussion we also do this,‡ leaving implementation details to a programmer.
(When we do discuss representations, we write str(x) for a convenient, reasonably
efficient, bitstring representation of x .§ ) Basically, the representation details do
not affect the outcome of our analysis, much.

3.7 Remark There is a caveat. We have seen that conflating {n ∈ N n is prime } with
{σ ∈ B∗ σ represents a prime number } can cause confusion. The distinction
between thinking of an algorithm as inputting a number and thinking of it as
inputting the string representation of a number is the basis for describing the
Big O behavior of that algorithm as pseudopolynomial. This is because the binary
representation of a number n takes O(lg n) bits and so inputting it takes O(lg n)
ticks.
†
This is in some ways like Church’s Thesis. We cannot prove it, but our experience with digital
reproduction of music, movies, etc., finds that it is so. ‡ Naturally some exercises in this section cover
representations. § Many authors use diamond brackets to stand for a representation, as in ‘ ⟨G, v 0, v 1 ⟩ ’.
Here, we reserve diamond brackets for sequences.
V.3 Exercises
✓ 3.8 What is the difference — speaking informally, since some of these do not have
formal definitions — between an algorithm and: (a) a heuristic, (b) pseudocode,
(c) a Turing machine (d) a flowchart, and (e) a process?
3.9 So, if a problem is essentially a set of strings, what constitutes a solution?
3.10 What is the difference between a decision problem and a language decision
problem?
3.11 As an illustration of the thesis that even surprising things can be represented
reasonably efficiently and with reasonable fidelity in binary, we can do a simple
calculation. (a) At 30 cm, the resolution of the human eye is about 0.01 cm.
How many such pixels are there in a photograph that is 21 cm by 30 cm?
(b) We can see about a million colors. How many bits per pixel is that?
(c) How many bits for the photo, in total?
3.12 Name something important that cannot be represented in binary.
✓ 3.13 True or false: any two programs that implement the same algorithm must
compute the same function. What about the converse?
3.14 Some tasks are hard to express as a language decision problem. Consider
sorting the characters of a string into ascending order. Briefly describe why each
of these language decision problems fails to capture the task’s essential difficulty.
(a) {σ ∈ Σ∗ σ is sorted } (b) { ⟨σ , p⟩ p is a permutation that orders σ }

✓ 3.15 Sketch an algorithm for each language decision problem.

(a) L0 = { ⟨n, m⟩ ∈ N2 n + m is a square and one greater than a prime }
∗

(b) L1 = {σ ∈ { 0, ... 9 } σ represents in decimal a multiple of 100 }
(c) L2 = {σ ∈ B∗ σ has more 1’s than 0’s }

(d) L3 = {σ ∈ B∗ σ R = σ }

3.16 Solve the language decision problem for (a) the empty language, (b) the
language B, and (c) the language B∗.
3.17 For each language, sketch an algorithm that solves the language decision
problem.
(a) {σ ∈ B∗ σ matches the regular expression a*ba* }

(b) The language defined by this grammar

S → AB
A → aA | ε
B → bB | ε
3.18 Solve each decision problem about Finite State machines, M, by producing
an algorithm.
(a) Given M, decide if the language accepted by M is empty.
(b) Decide if the language accepted by M is infinite.
(c) Decide if L(M) is the set of all strings, Σ∗ .
3.19 For each language decision problem, give an algorithm that runs in O(1).
(a) The language of minimal-length binary representations of numbers that are
nonzero.
(b) The binary representations of numbers that exceed 1000.
3.20 In a graph, a bridge edge is one whose removal disconnects the graph. That
is, there are two vertices that before the bridge is removed are connected by a
path, but are not connected after it is removed. (More precisely, a connected
component of a graph is a set of vertices that can be reached from each other by
a path. A bridge edge is one whose removal increases the number of connected
components.) The problem is: given a graph, find a bridge. Is this a function
problem, a decision problem, a language decision problem, a search problem, or
an optimization problem?
✓ 3.21 For each, give the categorization that best applies: a function problem,
a decision problem, a language decision problem, a search problem, or an
optimization problem. (a) The Graph Connectedness problem, which inputs
a graph and decides whether for any two vertices there is a path between
them. (b) The problem that inputs two natural numbers and returns their
least common multiple. (c) The Graph Isomorphism problem that inputs two
graphs and determines whether they are isomorphic. (d) The problem that
takes in a propositional logic statement and returns an assignment of truth
values to its inputs that makes the statement true, if there is such an assignment.
(e) The Nearest Neighbor problem that inputs a weighted graph and a vertex,
and returns a vertex nearest the given one that does not equal the given one.
(f) The Discrete Logarithm problem: given a prime number p and two numbers
a, b ∈ N, determine if there is a power k ∈ N so that ak ≡ b (mod p). (g) The
problem that inputs a bitstring and decides if the number that it represents in
binary will, when converted to decimal, contain only odd digits.
✓ 3.22 For each, give the characterization that best applies: a function problem,
a decision problem, a language decision problem, a search problem, or an
optimization problem.
(a) The 3-Satisfiability problem, Problem 2.10
(b) The Divisor problem, Problem 2.36
(c) The Prime Factorization problem, Problem 2.37
(d) The F − SAT problem, where the input is a propositional logic expression
and the output is either an assignment of T and F to the expression’s variables
that makes it evaluate to T , or the string None.
(e) The Composite problem, Problem 2.38
3.23 Express each task as a language decision problem. Include in the description
explicit mention of the string representation.
(a) Decide whether a number is a perfect square.
(b) Decide whether a triple ⟨x, y, z⟩ ∈ N3 is a Pythagorean triple, that is, whether
x 2 + y2 = z2.
(c) Decide whether a graph has an even number of edges.

(d) Decide whether a path in a graph has any repeated vertices.
✓ 3.24 Describe how to answer each as a language decision problem. Include explicit
mention of the string representation.
(a) Given a natural number, do its factors add to more than twice the number?
(b) Given a Turing machine and input, does the machine halt on the input in
less than ten steps?
(c) Given a propositional logic statement, are there three different assignments
that evaluate to T ? That is, are there more than three lines in the truth table
that end in T ?
(d) Given a weighted graph and a bound B ∈ R, for any two vertices is there a
path from one to the other with total cost less than the bound?
3.25 Recast each in language decision terms. Include explicit mention of the
string representation. (a) Graph Colorability, Problem 2.7, (b) Euler Circuit,
Problem 2.4, (c) Shortest Path, Problem 2.5.
3.26 Restate the Halting problem as a language decision problem.
✓ 3.27 As stated, the Shortest Path problem, Problem 2.5, is an optimization problem.
Convert it into a parametrized family of decision problems. Hint: use the technique
outlined following the Traveling Salesman problem, Problem 2.3.
✓ 3.28 Express each optimization problem as a parametrized family of language
decision problems.
(a) Given a 15 Game board, find the least number of slides that will solve it.
(b) Given a Rubik’s cube configuration, find the least number of moves to solve
it.
(c) Given a list of jobs that must be accomplished to assemble a car, along with
how long each job takes and which jobs must be done before other jobs, find
the shortest time to finish the entire car.
3.29 As stated, the Hamiltonian Circuit problem is a decision problem. Give a
function version of this problem. Also give an optimization version.
3.30 The different problem types are related. Each of these inputs a square
matrix M with more than 3 rows, and relates to a 3 × 3 submatrix (form the
submatrix by picking three rows and three columns, which need not be adjacent).
Characterize each as a function problem, a decision problem, a search problem,
or an optimization problem.
(a) Find a submatrix that is invertible.
(b) Decide if there is an invertible submatrix.
(c) Return a submatrix that is invertible, or the string ‘None’.
(d) Return a submatrix whose determinant has the largest absolute value.
Also give a language for an associated language decision problem.
3.31 Convert each function problem to a matching decision problem.
(a) The problem that inputs two natural numbers and returns their product.
(b) The Nearest Neighbor problem, that inputs a weighted graph and a vertex
and returns the vertex nearest the given one, but not equal to it.
3.32 The Linear Programming problem starts with a list of linear inequalities
ai, 0x 0 + · · · + ai,n−1x n−1 ≤ bi for a 0 , ... an−1 , bi ∈ Q and it looks for a sequence
⟨s 0 , ... sn−1 ⟩ ∈ Qn that is feasible, in that substituting the number s j for the variable
x i ’s makes each inequality true. Give a version that is a (a) language decision
problem, (b) search problem, (c) function problem, and (d) optimization
problem. (For some parts there is more than one sensible answer.)
3.33 An independent set in a graph is a collection of vertices such that no
two are connected by an edge. Give a version of the problem of finding an
independent set that is a (a) a decision problem, (b) language decision problem,
(c) search problem, (d) function problem, and (e) optimization problem. (For
some parts there is more than one reasonable answer.)
3.34 Give an example of a problem where the decision variant is solvable quickly,
but the search variant is not.
3.35 Let LF = { ⟨n, B⟩ ∈ N2 there is an m ∈ { 1, ... B } that divides n } and con-

sider its language decision problem.

(a) Show that ⟨d, B⟩ ∈ LF if and only if B is greater than or equal to the least
prime factor of d .
(b) Conclude that you can use a solution to the language recognition problem to
solve the search problem of, given a number, returning a prime factor of that
number.
✓ 3.36 Show how to use an algorithm that solves the Shortest Path problem to
solve the Vertex-to-Vertex Path problem. How to use it on graphs that are not
weighted?
✓ 3.37 Show that with an algorithm that quickly solves the Subset Sum problem,
Problem 2.25, we can quickly solve the associated function problem of finding the
subset.
3.38 Show how to use an algorithm that solves Vertex-to-Vertex Path problem
to solve the Graph Connectedness problem, which inputs a graph and decides
whether that graph is connected, so that for any two vertices there is a path
between them.
Section
V.4 P
Recall that we usually are not careful to distinguish between a language L and the
problem of deciding which strings are in that language.
4.1 Definition A complexity class is a collection of languages.
The term ‘complexity’ is there because these collections are often associated
with some resource specification, so that a class consists of the languages that
Section 4. P 299
are accepted by a mechanical computer whose use of some resource fits the
specification.†
4.2 Example One complexity class is the collection of languages for which there is an
deciding Turing machine that runs in time O(n 2 ). That is, C = { L0 , L1 , ... }, where
each Lj is decided by some machine Pi j , which, when given input σ , finishes
within |σ | 2 steps.
4.3 Example Another is the collection of languages accepted by Turing machines that
use only in logarithmic space. So, when the accepting machine gets input σ then
it halts after visiting a number of tape squares less than or equal to lg(|σ |).
Some points bear development. As to the computing machine, researchers
study not just Turing machines but other types of machines as well, including
nondeterministic Turing machines, and Turing machines with access to an oracle
for random numbers. The resource specification often involve bounds on the time
and space behavior. But they could instead be, for instance, the complement of
O(n 2 ), so it isn’t always a bound.‡
Definition The complexity class that we introduce now is the most important one.
It is the collection of problems that under Cobham’s Thesis we take to be tractable.
4.4 Definition A language decision problem is a member of the class P if there is an
algorithm for it that on a deterministic Turing machine runs in polynomial time.
4.5 Example One problem that is a member of P is that of deciding whether a given
graph is connected.

{ G for any two vertices v 0 , v 1 ∈ G , there is a path from one to the other }
To verify this, we must produce an algorithm that decides membership in this

language, and that runs in polynomial time. The natural approach is to do a
breadth first search or depth first search of the graph. The runtime of both is
bounded by O(| N | 3 ).
4.6 Example Another is the problem of deciding whether two natural numbers are
relatively prime.
{ ⟨n 0 , n 1 ⟩ ∈ N2 the greatest common divisor is 1 }

Again, to verify that this language is a member of P we produce an algorithm that

determines membership, and that runs in polytime. Euclid’s algorithm solve this
problem, with runtime O(lg(max(n 0 , n 1 ))).
†
There are other definitions of complexity class. Some authors make it a requirement that in a class
the languages can be computed under some resource specification. This has implications — if all of the
members of a class must be computable by Turing machines then each class is countable. Here, we only
say that it is a collection, so our definition is maximally general. ‡ At this writing there are 545 studied
classes but the number changes frequently; see the Complexity Zoo, Section 4.
4.7 Example Still another problem in P is the String Search problem, to decide
substring-ness.
{ ⟨σ , τ ⟩ ∈ Σ∗ σ is a substring of τ }

Often τ is very long and is called the haystack while σ is short and is the needle. The
algorithm that first tests σ at the initial character of τ , then at the next character,
etc., has a runtime of O(|σ | · |τ |), which is O(max(|σ |, |τ |)2 ).
4.8 Example A circuit is a directed acyclic graph. Each vertex, called a gate, is labeled
with a two input/one output Boolean function. The only exception is that the
vertices on the left are the input gates that provide source bits, b0 , b1 , b2 , b3 ∈ B.
Edges are called wires. Below, ∧ is the boolean function ‘and’, ∨ is ‘or’, ⊕ is
‘exclusive or’, and ≡ is the negation of ‘exclusive or’, which returns 1 if and only if
the two inputs bits are the same.
b0 ⊕
∧
b1 ∧
≡ f (b0 , b1, b2, b3 )
b2 ⊕
∨
b3 ∧
This circuit returns 1 if the sum of the input bits is a multiple of 3. The
Circuit Evaluation problem inputs a circuit like this one and computes the out-
put, f (b0 , b1 , b2 , b3 ). This problem is a member of P.
4.9 Example Although polytime is a restriction, nonetheless P is a very large collection.

More example members: (1) matrix multiplication, taken as a language decision
problem for { ⟨σ0 , σ1 , σ2 ⟩ they
represent matrices with M 0 · M 1 = M 2 } (2) mini-
mal spanning tree, { ⟨G ,T ⟩ T is a minimal spanning tree in G } (3) edit distance,

the number of single-character removals, insertions, or substitutions needed to
transform between strings, { ⟨σ0 , σ1 , n⟩ σ0 transforms to σ1 in at most n edits }.
Slow
..
.
Fast
4.10 Figure: This shows all language decision problems, all L ⊆ B∗ . Shaded is P. The
shading is in layers to depict that P contains problems with a solution algorithm that
is O(1), problems with an algorithm that is O(n), ones with a O(n 2 ) algorithm, etc.
Two final observations. First, if a problem is solved by an algorithm that is

Section 4. P 301
O(lg n) then that problem is in P. Second, the members of P are problems, so it is

wrong to say that an algorithm is in P.
Effect of the model of computation A problem is in P if it has an algorithm that

is polytime. But algorithms are based on an underlying computing model. Is
membership in P dependent on the model that we use?
In particular, our experience with Turing machines gives the sense that they
involve a lot of tape moving. So we may expect that algorithms directed at
Turing machine hardware are slow. However, close analysis with a wide range of
alternative computational models proposed over the years shows that while Turing
machine algorithms are often slower than related algorithms for other natural
models, it is only by a factor of between n 2 and n 4 .† That is, if we have a problem
for which there is a O(n) algorithm on another model then we may find that on a
Turing machine model it is O(n 3 ), or O(n 4 ), or O(n 5 ). So it is still in P.
A variation of Church’s thesis, the Extended Church’s Thesis, posits that not
only are all reasonable models of mechanical computation of equal power, but in
addition that they are of equivalent speed in that we can simulate any reasonable
model of computation‡ in polytime on a probabilistic Turing machine.§ Under the
extended thesis, a problem that falls in the class P using Turing machines also falls
in that class using any other natural models. (Note, however, that this thesis does
not enjoy anything like the support of the original Church’s Thesis. Also, we know
of several problems, including the familiar Prime Factorization problem, that under
the Quantum Computing model have algorithms with polytime solutions, but for
which we do not know of any polytime solution in a non-quantum model. So
the Quantum Computing model would provide a counterexample to the extended
thesis, if we can produce physical devices matching that model.)
4.11 Remark Breaking news! Recently engineers at Google claimed to have achieved
Quantum Supremacy, to have solved a problem using an algorithm running on a
physical quantum computer that is not known to be solvable on a Turing machine
or RAM machine in less than centuries.
Now, there are reservations. For one, the claim is the subject of scholarly
controversy. For another, on its face, this is not general purpose computing; the
problem solved is exotic. Whether quantum computers will ever be practical
physical devices used for everyday problems is not at this moment clear, although
scientists and engineers are making great progress. For our purposes we put this
aside, but we will watch events with great interest.
Naturalness We will give the class P a lot of attention because there are reasons
to think that it is the collection that best captures the notion of problems that have
a feasible solution.
†
We take a model to be ‘natural’ if it was not invented in order to be a counterexample to this. ‡ One
definition of ‘reasonable’ is “in principle physically realizable” (Bernstein and Vazirani 1997). § A Turing
machine with a random oracle.
The first reason echos the prior subsection. There are many models of compu-
tation, including Turing machines, RAM machines, and Racket programs. All of
them compute the same set of functions as Turing machines, by Church’s Thesis,
but they do so at different speeds. However, while the speeds differ, all these
models run within polytime of each other.† That makes P invariant under the
choice of computing model: if a problem is in P for any model then it is in P for all
of these models. The fact that Turing machines are our standard is in some ways a
historical accident, but differences between the runtime behavior of any of these
models is lost in the general polynomial sloppiness.
Another reason that P is a natural class is that we’d like that if two things, f
and д, are easy to compute then a simple combination of the two is also easy. More
precisely, fix total functions f , д : N → N and consider these.
Lf = { str(⟨n, f (n)⟩) ∈ B∗ × B∗ n ∈ N } Lд = { str(⟨n, д(n)⟩) ∈ B∗ × B∗ n ∈ N }

(Recall that str(...) means that we represent the argument reasonably efficiently
as a bitstring). With that recasting, P is closed under function addition, scalar
multiplication by an integer, subtraction, multiplication, and composition. It is
also closed under language concatenation, and the Kleene star operator. It is the
smallest nontrivial class with these appealing closure properties.
But the main reason that P is our candidate is Cobham’s Thesis, the contention
that the formalization of ‘tractable problem’ should be that it has a solution
algorithm that runs in polynomial time. We discussed this on page 269; a person
may object that polytime is too broad a class to capture this idea because a problem
whose solution algorithm cannot be improved below a runtime of O(n 1 000 000 ) is
really not feasible or tractable. Further, using diagonalization we can produce such
problems. However, the problems produced in that way are artificial, and empirical
experience over close to a century of computing is that problems with solution
algorithms of very large degree polynomial time complexity do not seem to arise
in practice. We see plenty of problems with solution algorithms that are O(n lg n),
or O(n 3 ), and we see plenty of problems that are exponential, but we just do not
see O(n 1 000 000 ).
Moreover, often in the past when a researcher has produced an algorithm for a
problem with a runtime that is even a moderately large degree polynomial, then,
with this foot in the door, over the next few years the community brings to bear an
array of mathematical and algorithmic techniques that bring the runtime degree
down to reasonable size.
Even if the objection to Cobham’s Thesis is right and P is too broad, it would
nonetheless still be useful because if we could show that a problem is not in P
then we would have shown that it has no solution algorithm that is practical.‡
†
All of the non-quantum natural models. ‡ This argument has lost some of its force in recent years
with the rise of SAT solvers. These algorithms attack problems believed to not be in P, and can solve
instances of the problems of moderately large size, using only moderately large computing resources.
See Extra B.
Section 4. P 303
(This is like in the first and second chapter where we considered Turing machine
computations that are unbounded. Showing that something is not solvable even
for an unbounded computation also shows that it is not solvable within bounds.)
So Cobham’s Thesis, to this point, has held up. Insofar as theory should be a
guide for practice, this is a compelling reason to use P as a benchmark for other
complexity classes.
V.4 Exercises
✓ 4.12 True or False: if the language is finite then the language decision problem is
in P.
✓ 4.13 Your coworker says something mistaken, “I’ve got a problem whose algorithm
is in P.” They are being a little sloppy with terms; how?
✓ 4.14 What is the difference between an order of growth and a complexity class?
✓ 4.15 Your friend says to you, “I think that the Circuit Evaluation problem takes
exponential time. There is a final vertex. It takes two inputs, which come from
two vertices, and each of those take two inputs, etc., so that a five-deep circuit
can have thirty two vertices.” Help them see where they are wrong.
4.16 In class, someone says to the professor, “Why aren’t all languages in P
according to your definition? I’ll design a Turing machine P so that no matter
what the input is, it outputs 1. It only needs one step, so it is polytime for sure.”
Perhaps you are not able rightly to apprehend the kind of confusion of ideas that
could provoke such a question, but give a calm explanation of how it is mistaken.
4.17 True or false: if a language is accepted by a machine then its complement is
also accepted by a machine.
✓ 4.18 Show that the decision problem for {σ ∈ B∗ σ = τ 3 for some τ ∈ B∗ } is

in P.
✓ 4.19 Show that the language of palindromes, {σ ∈ B∗ σ = σ R }, is in P.

4.20 Sketch a proof that each problem is in P.

(a) The τ 3 problem: given a bitstring σ , decide if it has the form σ = τ ⌢τ ⌢τ .
(b) The problem of deciding which Turing machines halt within ten steps.
✓ 4.21 Consider the problem of Triangle: given an undirected graph, decide if it has
a 3-clique, three vertices that are mutually connected.
(a) Why is this not the Clique problem, from page 281?
(b) Sketch a proof that this problem is in P.
✓ 4.22 Prove that each problem is in P by citing the runtime of an algorithm that
suits.
∗

(a) Deciding the language {σ ∈ { a, ... z } σ is in alphabetical order } .
(b) Deciding the language of correct sums, { ⟨a, b, c⟩ ∈ N3 a + b = c } .

(c) Analogous to the prior item, deciding

this language of triples of matrices that
give correct products, { ⟨A, B, C⟩ the matrices are such that AB = C }.
(d) Deciding the language of primes, { 1k

k is prime }.

(e) Reachable nodes: { ⟨G , v 0 , v 1 ⟩ the graph G has a path from v 0 to v 1 } .
4.23 Find which of these are currently known to be in P and which are not.
Hint: you may need to look up what is the fastest known algorithm. (a) Shortest Path
(b) Knapsack (c) Euler Path (d) Hamiltonian Circuit
4.24 The problem of Graph Connectedness is: given a finite graph, decide if there
is a path from any vertex to any other. Sketch an argument that this problem is
in P.
4.25 Following the definition of complexity class, Definition 4.1, is a discussion of
the additional condition of being computed by some machine under a resource
specification, such as a Big O constraint on time or space.
(a) Show that the set of regular languages forms a complexity class, and that it
meets this additional constraint.
(b) The definition of P uses Turing machines. We can view a Finite State machine
as a kind of Turing machine, one that consumes its input one character at a
time, never writes to the tape, and, depending on the state that the machine
is in when the input is finished, prints 0 or 1. With that, argue that any
regular language is an element of P.
4.26 We have already studied the collection RE of languages that are computably
enumerable.
(a) Recast RE as a class of language decision problems.
(b) Following Definition 4.1 is a discussion of the additional condition of being
computed by a machine under a resource specification. Show that RE also
satisfies this condition.
4.27 Is P countable or uncountable?
4.28 If L0 , L1 ∈ P and L0 ⊆ L ⊆ L1 , must L be in P?
✓ 4.29 Is the Halting problem in P?
4.30 A common modification of the definition of Turing machine designates one
state as an accepting state. Then the machine decides the language L if it halts
on all input strings, and L is the set of strings that such that the machine ends
in the accepting state. A language is decidable if it is decided by some machine.
Prove that every language in P is decidable.
✓ 4.31 Draw a circuit that inputs three bits, b0 , b1 , b2 ∈ B, and outputs the value of
b0 + b1 + b2 (mod 2).
4.32 Prove that the union of two complexity classes is also a complexity class.
What about the intersection? Complement?
✓ 4.33 Prove that P is closed under the union of two languages. That is, prove that
if two languages are both in P then so is their union. Prove the same for the
union of finitely many languages.
Section 5. NP 305
4.34 Prove that P is closed under complement. That is, prove that if a language is
in P then so is its set complement.
4.35 Prove that the class of languages P is closed under reversal. That is, prove
that if a language is an element of P then so is the reversal of that language
(which is the language of string reversals).
4.36 Show that P is closed under the concatenation of two languages.
4.37 Show that P is closed under Kleene star, meaning that if L ∈ P then L∗ ∈ P.
(Hint: σ ∈ L∗ if σ = ε , or σ ∈ L, or σ = α ⌢ β for some α, β ∈ L∗ )
4.38 Show that this problem is unsolvable: give a Turing machine P , decide
whether it runs in polytime on the empty input. Hint: if you could solve this
problem then you could solve the Halting problem.
4.39 There are studied complexity classes besides those associated with language
decision problems. The class FP consists of the binary relations R ⊆ N2 where
there is a Turing machine that, given input x ∈ N, can in polytime find a y ∈ N
where ⟨x, y⟩ ∈ R .
(a) Prove that this class closed under function addition, multiplication by a
scalar r ∈ N, subtraction, multiplication, and function composition.
(b) Where f : N → N is computable, consider this decision problem associated
with the function, Lf = { str(⟨n, f (n)⟩) ∈ B∗ n ∈ N } (where the numbers
are represented in binary). Assume that we have two functions f 0 , f 1 : N → N
such that Lf0 , Lf1 ∈ P. Show that the natural algorithm to check for closure
under function addition is pseudopolynomial.
4.40 Where L0 , L1 ⊆ B∗ are languages, we say that L1 ≤p L0 if there is a function
f : B∗ → B∗ that is computable, total, that runs in polytime, and so that σ ∈ L1
if and only if f (σ ) ∈ L0 . Prove that if L0 ∈ P and L1 ≤p L0 then L1 ∈ P.
Section
V.5 NP
Recall that a machine is nondeterministic if from a present configuration and input

it may pass to a next configuration with zero, or one, or more than one, next states.
This is a nondeterministic Turing machine.
P = {q 0 01q 2 , q 0 0Rq 1 , q 1 0Bq 1 , q 1 01q 3 , q 2 11q 2 , q 3 10q 3 }
For these machines, an input triggers a computational history that is a tree.

01 ⊢ 00
⊢ q3 q3
00
⊢ 0
⊢ q1
q1
00
⊢ ···
q0 10 ⊢ 10 ⊢ 10
q2 q2 q2
The highlighted edges show one branch of that tree.

Recall also that we have two mental models of how these devices operate. The
first is that the machine is unboundedly parallel, so it simultaneously computes all
of the tree’s branches. The second is that the machine guesses which branch in
the tree to follow — or is told by some demon — and then deterministically verifies
that branch.
With the first model, an input string is accepted if at least one of the triggered
branch ends in an accepting state. With the second model, the input is accepted if
there is a sequences of guesses that the machine could make, consistent with the
input, that ends in an accepting state.
Nondeterministic Turing machines This modifies the definition of a Turing

machine by changing the transition function so that it outputs sets.
5.1 Definition A nondeterministic Turing machine P is a finite set of instructions
qpTpTn qn ∈ Q × Σ × (Σ ∪ { L, R }) × Q , where Q is a finite set of states and Σ is
a finite set of tape alphabet characters, which contains at least two members,
including blank, and does not contain the characters L or R. Some of the states,
A ⊆ Q , are accepting states, while others, R ⊆ Q − A, are rejecting states. The
association of the present state and tape character with what happens next is
given by the transition function, ∆ : Q × Σ → P ((Σ ∪ { L, R }) × Q).
After the definition of Turing machine, we gave a description of how these
machines act, as a sequence of ‘⊢’ steps from an initial configuration that involves
the input string. Exercise 5.37 asks for a similar description for these machines.
When the nondeterministic machine is a Turing machine instead of a Finite
State machine then it adds some wrinkles. The computation tree might have
branches that don’t halt, or that compute differing outputs. The simplest approach
is to not describe a function computed by a nondeterministic machine, but instead
do what we did with Finite State machines and describe when the input is accepted.
5.2 Definition A nondeterministic Turing machine accepts an input string if there
is at least one branch in the computation tree, one sequence of valid transitions
from the starting configuration, that ends in an accepting state. The machine
rejects the input if every branch ends in a reject state.
Note the asymmetry, that acceptance requires only one accepting branch while
Section 5. NP 307
rejecting requires that every branch rejects.†

5.3 Definition A nondeterministic Turing machine P decides the language L if
for every input string, when the string is a member of L then P accepts it, and
when the string is not a member of L then P rejects it. A nondeterministic Turing
machine recognizes the language L if for every input string, when the string is a
member of L then P accepts it, and when the string is not a member of L then P
does not accept it.
For Finite State machines, nondeterminism does not make any difference, in
that a language is recognized by a nondeterministic Finite State machine if and
only if it is recognized by some deterministic Finite State machine. But Pushdown
machines are different: there are jobs that a nondeterministic Pushdown machine
can do but that cannot be done by any deterministic machine.
5.4 Lemma Deterministic and nondeterministic Turing machines decide the same
languages. They also recognize the same languages.
Proof A deterministic Turing machine is a special case of a nondeterministic one.
So if there is a deterministic machine that decides a language, or recognizes a
language, then there is a nondeterministic one that does also.
Suppose that a nondeterministic Turing machine P decides a language L.
We will define a deterministic machine Q that decides the same language. This
machine does a breadth-first search of P ’s computation tree. Fix an input σ . If
P accepts σ then the search done by Q will eventually find that, and then Q
accepts σ . If P does not accept σ then every branch in its computation tree halts,
in a rejecting state. We claim that there is a longest such branch. Otherwise,
the computation tree would have an infinite branch (by König’s Lemma, which is
included in Exercise 5.36), which would contradict that P halts on every branch.
So, the breadth-first search done by Q will eventually find that all branches reject,
and then Q rejects σ .
We will not use the ‘recognizes’ part below so it is left as part of Exercise 5.36.
That proof is, basically, time-slicing. With the machines that are on our desks
and in our pockets, we simulate an unboundedly-parallel computer by having the
CPU switch among processes, giving each enough time to make some progress
without starving other processes. This is a kind of dovetailing. The user perceives
that many things are happening at once, although actually there is only one, or at
least a limited number of,‡ simultaneous physical processes.
So nondeterminism doesn’t add to what can be computed in principle. But
that doesn’t mean that these machines are worthless. For one thing, we saw that
nondeterministic Finite State machines can be a good impedance match for the
problems that we want to solve. Turing machines are similar. Nondeterministic
Turing machines can be good for solving some problems that on a serial device
†
Of course, if rejecting the input only required at least one rejecting branch then we could find machines
both accepting and rejecting an input. ‡ The number is limited by how many CPU’s the device has.
are hard to conceptualize. The Traveling Salesman problem is an example. The

nondeterministic machine finds the salesman’s best circuit by making a sequence
of where-next guesses, or is given a circuit by some oracular demon, and then
checks whether this circuit is shorter than a given bound. So for some problems,
nondeterminism simplifies going from the problem to a solution.
Speed The real excitement is that a nondeterministic Turing machine, if we had

one, might be much faster than a deterministic one.
5.5 Example Consider Satisfiability. Is this propositional logic formula satisfiable?
(P ∨ Q) ∧ (P ∨ ¬Q) ∧ (¬P ∨ Q) ∧ (¬P ∨ ¬Q ∨ ¬R) ∧ (Q ∨ R) (∗)
The natural approach is to compute a truth table and see whether the final column
has any T ’s. Here, the formula is satisfiable because the TT F row ends in a T .
P Q R P ∨Q P ∨ ¬Q ¬P ∨ Q ¬P ∨ ¬Q ∨ ¬R Q ∨R (∗)
F F F F T T T F F
F F T F T T T T F
F T F T F T T T F
F T T T F T T T F
T F F T T F T F F
T F T T T F T T F
T T F T T T T T T
T T T T T T F T F
As to runtime, the number of table rows grows exponentially. Specifically, it is 2
raised to the number of input variables. So this approach is very slow on a serial
model of computation.
Each line of the truth table is easy; the issue is that there are a lot of lines. So
this problem perfectly suited for unbounded parallelism. For each line we could
fork a child process. These children are done quickly, certainly in polytime. If at
the end any child is holding a ‘T ’ then we declare that the expression is satisfiable.
In total then, a nondeterministic machine does this job in polytime while a serial
machine appears to require exponential time.
So while adding nondeterminism to Turing machines doesn’t allow them to
compute any entirely new functions, a person could sensibly conjecture that it does
allow them to compute some of those functions more quickly.
Definition Next we give a class of language decision problems associated with

nondeterministic Turing machines. (Recall that we do not distinguish much
between a language decision problem, and the language itself.)
5.6 Definition The complexity class NP is the set of languages for which there is
a nondeterministic Turing machine decider that runs in polytime, meaning that
there is a polynomial p such that any accepting branch halts in time p .
Section 5. NP 309
The next follows immediately because a deterministic Turing machine is a

special case of a nondeterministic one.
5.7 Lemma P ⊆ NP †
A pattern in mathematical presentations is to have a definition that is conceptu-
ally clear, followed by a result that is what we use in practice to determine whether
the definition applies.
This is where we use the mental model of the machine guessing or being told an
answer. Consider the Satisfiability example above. Imagine the demon whispering,
“Psst! Check out ω = TTF.” With that hint, we can deterministically verify, quickly,
that the formula is satisfiable.
5.8 Definition A verifier for a language L is a deterministic Turing machine V that
halts on all inputs, and such that for every string σ we have σ ∈ L if and only if
there is a witness or certificate ω ∈ B∗ so that V accepts ⟨σ , ω⟩ .
5.9 Lemma A language is in NP if and only if it has a polytime verifier. That is,
L ∈ NP exactly when there is a deterministic Turing machine V that halts on all
inputs and so that σ ∈ L if and only if there is a witness ω where V accepts ⟨σ , ω⟩ ,
in time polynomial in |σ | .
So to show that a language L is in NP, we will produce a verifier V. It takes as
input a pair containing a candidate for language membership σ and a hint ω . If
σ ∈ L then the verifier will confirm it, using the hint, in polytime. If σ < L then
no hint will cause the verifier to falsely report that σ is in the language.
The lemma’s proof is below, on page 311. Before that, we will clarify some
aspects of the definition and lemma with a few examples and comments.
5.10 Example The Satisfiability problem is to decide membership in this language.

SAT = {σ σ represents a propositional logic formula that is satisfiable }
Lemma 5.9 requires that we produce a deterministic Turing machine verifier, V . It

must input a pair, ⟨σ , ω⟩ , where σ represents a propositional logic formula. It must
have the property that if σ ∈ SAT then there is a witness ω so that the verifier
accepts the input pair ⟨σ , ω⟩ , while if σ < SAT then there is no witness that will
result in the verifier accepting the input. Consider V below. For witnesses, we use
strings ω that V takes as pointing to a line of the truth table.
†
Very important: no one knows whether P is a strict subset, that is, whether P , NP or P = NP. This is
the biggest open problem in the Theory of Computing. We will say more in a later section.
Start
Read σ , ω
Compute line ω of σ ’s truth table
Y N
It gives T ?
Accept Reject
Given a satisfiable candidate σ , such as the one from (∗), for this verifier there is a
suitable witness, namely ω = TTF, so that V can check in polytime that on that
line, the candidate evaluates to T . But, if given a candidate that is not satisfiable,
such as σ = P ∧ ¬P , then there does not exist a hint that will cause V to accept,
because no line that is pointed-to will give T .
Before the next example, a few points. First, the most striking thing about
Definition 5.8 is that it says that “there exists” a witness ω . It does not say where
that witness comes from. A person with a computational habit of thinking may
well ask, “but how will we find the ω ’s?” The question is not how to find them, the
question is whether there is a Turing machine that leverages the hint ω ’s to verifies
that the σ ’s are in L. In short, we don’t compute the ω ’s, we just use them.
The second point is that if σ ∈ L then the definition requires that there exists a
witness ω . But if σ < L then then the definition does not require a witness to that.
Instead, the opposite is true: the definition requires that from among all possible
witnesses ω ∈ B∗, there is none such that the verifier accepts ⟨σ , ω⟩ .†
One consequence of this asymmetry in the verifier definition is that if a problem L
is in NP then it is not clear whether its complement, Lc = B∗− L, is in NP. Consider
again the Satsifiability problem. If a propositional logic expression σ is satisfiable
then a suitable witness for that is the single line of the truth table. But there is no
obvious suitable witness to non-satisfiability; the natural thing to do is to check all
lines, which appears to take more than polytime. In short, where the complexity
class co-NP is the collection of complements of problems from NP, we don’t know
whether NP = co-NP.
Third, a comment on the runtime of the verifier. Consider the problem of chess.
Imagine that a demon hands you some papers and tells you that they contain a
chess strategy where you cannot lose. Checking that by having a computer step
through the responses to each move and responses to those responses, etc., at least
appears to take exponential time. So it appears that this perfect strategy is, in a
sense, useless. The definition requires that the verifier runs in polytime in order to
make the verification tractable.
Our final point is related to that. Observe that because the verifier runs in time
polynomial in |σ | , the witness ω must have length that is polynomial in |σ | . If the
†
Because of this, perhaps we should refer to ω with terms like ‘potential witness’ or ‘candidate
certificate’, but no one does.
Section 5. NP 311
witness were too long then the machine would not even be able to read it in the
allotted time.† So if we think of it as that ω gives proof that σ ∈ L and that the
verifier checks that proof, then the definition requires that the proof is tractable.
5.11 Example The Hamiltonian Path problem is like the Hamiltonian Circuit problem
except that, instead of requiring that the starting vertex equal the ending one, it
inputs two vertices.

{ ⟨G , v, v̂⟩ some path in G between v and v̂ visits every vertex exactly once }
We will show that this problem is in the class NP. Lemma 5.9 requires that we
produce a deterministic Turing machine verifier V . Here is our verifier, which takes
inputs ⟨σ , ω⟩ , where the candidate is σ = ⟨G , v, v̂⟩ . We take each witness to be a
path, ω = ⟨v, v 1 , ... v̂⟩ .
Start
Read σ , ω
Check that ω is a path in σ ’s graph
Y N
All vertices visited once?
Accept Reject
If there is a Hamiltonian path then there is a witness ω so that V will accept its
input. Clearly that acceptance takes polytime. If the graph has no Hamiltonian
path then for no ω will V be able to verify the hint and accept its input.
5.12 Example The Composite problem asks whether a number has a nontrivial factor.

L = {n ∈ N+ n has a divisor a with 1 < a < n }
Briefly, the verifier inputs ⟨σ , ω⟩ , where σ represents a number n > 1. As the

witness ω , we can use any number. The verifier checks that ω is between 1 and n ,
and that it divides n . If σ ∈ L then there is a suitable witness, which the verifier
can check in polytime. If σ < L then no witness will make verifier accept an input
pair ⟨σ , ω⟩ , because there is no nontrivial factor for ω to represent.
At last, here is the proof of Lemma 5.9.
Proof Suppose first that the language L is accepted by the nondeterministic Turing
machine P in polynomial time; we will construct a polynomial time verifier V . Let
p : N → N be the polynomial such that on input σ ∈ L, the machine P has an
accepting branch of length at most p(|σ |). Make a witness ω out of this accepting
branch: any Turing machine has a finite number of states, k , so we can represent
†
Some authors instead define that the verifier runs in time polynomial in its input, ⟨σ, ω ⟩ , with the
restriction that ω must have length polynomial in σ . Without a restriction, the verifier could have
exponential runtime if the witness has exponential length.
a branch of a computation tree with a string of numbers less than k . With that
witness, a deterministic verifier can retrace P ’s accepting branch. The branch’s
length must be less than p(|σ |), so the verifier V can do the retracing in polynomial
time.
Conversely, suppose that the language L is accepted by a verifier V that runs in
time bounded by a polynomial q , and that takes input ⟨σ , ω⟩ . We will construct a
nondeterministic Turing machine P that accepts an input bitstring τ if and only if
τ ∈ L.
The key is that this machine is allowed to be nondeterministic. Given a
candidate bitstring τ , (1) P nondeterminstically produces a witness bitstring κ of
length less than q(|τ |) (informally speaking, it guesses κ , or gets it from a demon)
(2) it then runs ⟨τ , κ⟩ through the verifier V , and (3) if the verifier accepts its input
then P accepts τ , while if the verifier does not accept then P also does not accept.
By definition, the nondeterministic machine P accepts the string if there is a
branch that accepts the string, and P rejects the string if every branch rejects it.
Suppose first that τ ∈ L. Because V is a verifier, in this case there exists a witness κ
that will result in V accepting ⟨τ , κ⟩ , so there is a way for the prior paragraph
to result in acceptance of τ , and so P accepts τ . Now suppose that τ < L. By
the definition of a verifier, no hint κ will result in V accepting ⟨τ , κ⟩ , and thus P
rejects τ .
A common reaction to the second half of that proof is something like, “Wait,
the machine pulls the witness κ out of thin air? How is that possibly legal?”
This reaction — about nondeterministic Turing machines and everyday experience
versus abstraction — is common and very reasonable, so we will address it.
As to everyday reality, we today know of no way to build physical devices that
bear the same relationship to nondeterministic Turing machines that ordinary
computers bear to deterministic ones. (Of course, you can write a program to
simulate nondeterministic behavior, at a cost in efficiency, but no device does it
natively.) When Turing formulated his definition there were no practical physical
computers matching it, but they were clearly coming and appeared soon after;
will we someday have nondeterministic computers? Putting aside proposals that
involve things like time travel through wormholes as too exotic, we will address
the devices that seem most likely to be coming, quantum computers.
Well-established physical theory says that subatomic particles can be in a
superposition of many states at once. Naively, it might seem that because of
this multi-way branching, if we could manipulate these then we would have
nondeterministic computation. But, that we know of, this is false. That we know
of, to get information out of a quantum computer we must use interference, and
we cannot read individual particles.†
However, the fact that we do not have practical nondeterministic devices, and
do not believe that we will in the near future, does not mean that their study is a
†
Some popularizations wrongly suggest that quantum computers are nondeterministic. That is, they
miss the point about interference.
Section 5. NP 313
purely academic exercise.† The nondeterministic Turing machine model is very

fruitful.
For one thing, Lemma 5.9 translates questions about nondeterministic machines
to questions about deterministic ones, the verifiers — a problem is in P if it has
a deterministic decider and is in NP if it has a deterministic verifier. Just as
computably enumerable sets seem to be the limit of what can be in theory be
known, polytime verification seems to be the limit of what can feasibly be done.
For another thing, the class of problems that are associated with these machines
are eminently practical, and computer scientists have been trying to solve them
since computers have existed. Much more on this in a later section.
In summary, we are interested in knowing for which problems there are good
algorithms, and for which are there not. In this section, we defined the class of
problems for which there is a good way to verify a solution, in contrast with the
problems for which there is a good way to generate that solution. We will next
consider whether these two classes differ.
V.5 Exercises
✓ 5.13 Your study partner asks, “In Lemma 5.9, since the witness ω is not required
to be effectively computable, why can’t I just take it to be the bit 1 if σ ∈ L, and 0
if not? Then writing the verifier is easy: just ignore σ and follow the bit.” They
are confused. Straighten them out.
✓ 5.14 Decide if each formula is satisfiable.
(a) (P ∧ Q) ∨ (¬Q ∧ R)
(b) (P → Q) ∧ ¬((P ∧ Q) ∨ ¬P)
5.15 True or false? If a language is in P then it is in NP.
5.16 Uh-oh. You find yourself with nondeterministic Turing machine where on
input σ , one branch of the computation tree accepts and one rejects. Some
branches don’t halt at all. What is the upshot?
✓ 5.17 You get an exercise, Write a nondeterministic algorithm that inputs a maze
and outputs 1 if there is a path from the start to the end.
(a) You hand in an algorithm that does backtracking to find any possible solution.
Your professor sends it back, and says to try again. What was wrong?
(b) You hand in an algorithm that, each time it comes to a fork in the maze,
chooses at random which way to go. Again you get it back with a note to
work out another try. What is wrong with this one?
(c) Give a right answer.
5.18 Sketch a nondeterministic algorithm to search an unordered array of numbers,
to see if it contains the number k . Describe it both in terms of unbounded
parallelism and in terms of guessing.
†
Not that there anything wrong with that.
5.19 A simple substitution cipher encrypts text by substituting one letter for
another. Start by fixing a permutation of the letters, for example ⟨F, P, ...⟩ Then
the cipher is that any A is replaced by a F, any B is replaced by a P, etc. Sketch three
algorithms for decoding a substitution cipher (assume that you have a program that
can recognize a correctly decoded string): (a) one that is deterministic, (b) one
that is nondeterministic and is expressed in terms of unbounded parallelism, and
(c) one expressed in terms of guessing.
✓ 5.20 Outline a nondeterministic algorithm that inputs a finite planar graph and
outputs Yes if and only if the graph has a four-coloring (that is, the algorithm
recognizes a correct four-coloring). Describe it both in terms of unbounded
parallelism and in terms of a demon providing a witness.
5.21 The Integer Linear Programming problem is to maximize a linear objective
function f (x 0 , ... x n ) = d 0x 0 + · · · + dn x n subject to constraints ai, 0x 0 + · · · +
ai,n x n ≤ bi , where all of x j , d j , b j , ai, j are integers. Recast it as a family of
language decision problems. Sketch a nondeterministic algorithm, giving both an
unbounded parallelism formulation and a guessing formulation.
✓ 5.22 The Semiprime problem inputs a number n ∈ N and decides if its prime
e e
factorization has exactly two primes, n = p00 p11 where ei > 0. State it as
a language decision problem. Sketch a nondeterministic algorithm that runs
in polytime. Give both an unbounded parallelism formulation and a guessing
formulation.
5.23 For each, give a language so that it is a language decision problem. Then
give a polytime nondeterministic algorithm. State it in terms of guessing.
(a) Three Dimensional Matching: where X , Y , Z are sets of integers having n
elements, given as input a set of triples M ⊆ X × Y × Z , decide if there
is an n -element subset M̂ ⊆ M so that no two triples agree on their first
coordinates, or second, or third.
(b) Partition: given a finite multiset A of natural numbers, decide if A splits into
multisets Â, A−Â so the elements total to the same number, a ∈Â a = a<Â a .
Í Í
5.24 Sketch a nondeterministic algorithm that inputs a planar graph and a bound
B ∈ N and decides whether the graph is B -colorable. Describe it in terms of
unbounded parallelism and also in terms of the machine guessing.
✓ 5.25 For each problem, cast it as a language decision problem and then prove that
it is in NP by filling in the blanks in this argument.
Lemma 5.9 requires that we produce a deterministic Turing machine verifier, V . It must
input pairs of the form ⟨σ , ω⟩ , where σ is (1) . It must have the property that if
σ ∈ L then there is an ω such that V accepts the input, while if σ < L then there is no
such witness ω . And it must run in time polynomial in |σ | .
The verifier interprets the bitstring witness ω as (2) , and checks that (3) .
Clearly that check can be done in polytime.
If σ ∈ L then by definition there is (4) , and so a witness ω exists that will cause V
Section 5. NP 315
to accept the input pair ⟨σ , ω⟩ . If σ < L then there is no such (5) , and therefore no
witness ω will cause V to accept the input pair.
(a) The Double-SAT problem inputs a propositional logic statement and decide
whether it has at least two different substitutions of Boolean values that
make it true.
(b) The Subset Sum problem inputs a set of numbers S ⊂ N and a target
sum T ∈ N, and decides whether least one subset of S adds to T .
✓ 5.26 In the British TV game show Countdown that is quite popular, players
are given six numbers from S = { 1, 2, ... 10, 25, 50, 75, 100 } (numbers may be
repeated), along with a target integer T ∈ [100 .. 999]. They must construct an
arithmetic expression that evaluates to the target, using the given numbers at
most once. The expression can involve addition, subtraction, multiplication, and
division (the division must have no remainder). Show that the decision problem
for this language
CD = { ⟨s 0 , ... s 5 ,T ⟩ ∈ S 6 × I an allowed combination of the si gives T }

is in NP. Use Lemma 5.9.

✓ 5.27 Recall that we recast Traveling Salesman optimization problem as a language
decision problem for a family of languages. Show that each such language is
in NP by applying Lemma 5.9, sketching a verifier that works with a suitable
witness.
5.28 The problem of Independent Sets starts with a graph and a natural number n
and decides whether in the graph there are n -many independent vertices, that is,
vertices that are not connected. State it as a language decision problem, and use
Lemma 5.9 to show that this problem is in NP.
✓ 5.29 Use Lemma 5.9 to show that the Knapsack problem is in NP.
5.30 True or false? For the language { ⟨a, b, c⟩ ∈ N3 a + b = c }, the problem of

deciding membership is in NP.

✓ 5.31 The Longest Path problem is to input a graph and a bound, ⟨G , B⟩ , and
determine whether the graph contains a simple path of length at least B ∈ N. (A
path is simple if no two of its vertices are equal). Show that this is in NP.
5.32 Recast each as a language decision problem and then show it is in NP.
(a) The Linear Divisibility problem inputs a pair of natural numbers σ = ⟨a, b⟩
and asks if there is an x ∈ N with ax + 1 = b .
(b) Given n points scattered on a line, how far they are from each other defines
a multiset. (Recall that a multiset is like a set but element repeats don’t
collapse.) The reverse of this problem, starting with a multiset M of numbers
and deciding whether there exist a set of points on a line whose pairwise
distances defines M , is the Turnpike problem.
5.33 Is NP countable or uncountable?
✓ 5.34 Show that this problem is in NP. A company has two delivery trucks.
They work with a weighted graph that called the road map. (Some vertex is
distinguished as the start/finish.) Each morning the company gets a set of vertices,
V . They must decide if there are two cycles such that every vertex in V is on at
least one of the two cycles, and both cycles have length at most B ∈ N.
✓ 5.35 Two graphs G0 , G1 are isomorphic if there is a one-to-one and onto function
f : N0 → N1 such that {v, v̂ } is an edge of G0 if and only if { f (v), f (v̂) } is
an edge of G1 . Consider the problem of computing whether two graphs are
isomorphic.
(a) Define the appropriate language.
(b) Show that the problem of determining membership in that language is a
member of the class NP.
5.36 The proof of Lemma 5.4 leaves two things undone.
(a) (König’s Lemma) Prove that if a connected tree has infinitely many vertices,
but each vertex has finite degree, then the tree has an infinite path. Hint: fix
a vertex v 0 and for each of its neighbors, look at how many vertices can be
reached without going through v 0 . One of the neighbors must have infinitely
many such vertices; call it v 1 . Iterate.
(b) Prove that if a nondeterministic Turing machine recognizes a language then
there is a deterministic machine that also recognizes it.
5.37 Following the definition of Turing machine, on page 8, we gave a formal
description of how these machines act. We did the same for Finite State machines
on page 184, and for nondeterministic Finite State machines on page 192. Give a
formal description of the action of a nondeterministic Turing machine.
5.38 (a) Show that the Halting problem in not in NP.
(b) What is wrong with this reasoning? The Halting problem is in NP because
given ⟨P , x⟩ , we can take as the witness ω a number of steps for P to halt on
input x . If it halts in that number of steps then the verifier accepts, and if
not then the verifier rejects.
Section
V.6 Polytime reduction
When we studied incomputability we found a sense in which we could think of
some problems as harder than others. Consider, the halts_on_three_checker
routine that, given x , decides whether Turing machine P halts on input 3. We
showed that with such a program we could solve the Halting problem. We denoted
this with K ≤T halts_on_three_checker.
Formally, we write B ≤T A when there is a computable function f such that
x ∈ B if and only if f (x) ∈ A. We say that B is Turing-reducible to A, because to
solve B , it suffices to solve A.
Section 6. Polytime reduction 317
In general, we say that problem B ‘reduces to’ problem A if we can answer

questions about B by accessing information about A. For example, in Calculus,
finding the maximum of a polynomial function on a closed interval reduces to
finding the zeroes of the derivative. Another example is that, given a list of numbers,
finding the median reduces to the problem of sorting the list. A reduction is a
way to translate problems from one domain to another. The intuition is that the
capability to do A gives the capability to do B , and so A is harder, or contains more
information, than B .
The reduction ≤T translates via arbitrary computable functions. But in this
chapter we are focused on solving problems efficiently. Consequently, the right idea
of reduction is that L1 ≤ L0 if a method for solving L0 efficiently gives us a method
to solve L1 efficiently, that is, to translate by doing at most polynomially-much
computation.
6.1 Definition Let L0 , L1 be languages, subsets of some Σ∗. Then L1 is polynomial
time reducible to L0 , or Karp reducible, or polynomial time mapping reducible, or
polynomial time many-one reducible, written L1 ≤p L0 , if there is a computable
reduction function or transformation function f : B∗ → B∗ that runs in polynomial
time and such that σ ∈ L1 if and only if f (σ ) ∈ L0 .
6.2 Example Recall the Shortest Path problem that inputs a weighted graph, two
vertices, and a bound, and decides if there is path between the vertices of length
less than the bound.

L0 = { ⟨G , v 0 , v 1 , B⟩ there is path between the vertices of length less than B }
Recall also the Vertex-to-Vertex Path problem that inputs a graph and two vertices,
and decides if there is a path between the two.

L1 = { ⟨H, w 0 , w 1 ⟩ there is path between the vertices }
Suppose that we have an algorithm to decide questions of membership in L0

that is fast. Then here is a strategy to leverage that algorithm that will quickly
decide whether ⟨H, w 0 , w 1 ⟩ ∈ L1 : make a weighted graph G by starting with H
and giving all its edges weight 1. Then present the input f (H) = ⟨G , w 0 , w 1 , | H |⟩
to the fast algorithm (where | H | is the number of vertices). Clearly the step of
translating H into G is fast, and so the result is a fast way to decide L1 . We write
Vertex-to-Vertex Path ≤p Shortest Path.
6.3 Lemma Polytime reduction is reflexive: L ≤p L for all languages. It is also
transitive: L2 ≤p L1 and L1 ≤p L0 imply that L2 ≤p L0 . Every nontrivial
language is P hard, that is, every nontrivial language L0 has the property that
if L1 ∈ P then L1 ≤p L0 . Under polytime reduction, the class P is closed
downward: if L0 ∈ P and L1 ≤p L0 then L1 ∈ P. So is the class NP.
Proof The first two sentences, and downward closure of NP, are in Exercise 6.23.
For the third sentence, fix a L0 that is nontrivial, so there is a σ ∈ L0 and a

τ < L0 . Let L1 be an element of P. We will specify a reduction function f L1 giving
L1 ≤p L0 . For any α ∈ B∗ , computing whether α ∈ L1 can be done in polytime. If
it is a member then set f L1 (α) = σ , and if not then set f L1 (α) = τ .
For downward closure of P, suppose that L1 ≤p L0 via the function f , and
also suppose that there is a polytime algorithm for determining membership
in L0 . Determine membership in L1 by: given input σ , find f (σ ) and apply the
L0 -algorithm to determine if f (σ ) ∈ L0 . Where the L0 algorithm runs in time that
is O(ni ), and where f runs in time that is O(n j ), then determining L1 membership
in this way runs in time that is O(n max(i, j) ).
Slow
..
.
Fast
6.4 Figure: The bean encloses all problems (we only show a few problems for graphical
clarity). Problems are shown connected if there is a polynomial time reduction from
one to the other. Highlighted are connections within the complexity class P.
6.5 Example We will show that Subset Sum ≤p Knapsack. Recall that the Subset Sum
problem starts with a multiset S = {s 0 , ... sk −1 } ⊂ N (a set in which repeated
numbers are allowed; basically a list of numbers) and a target T ∈ N+ . It asks if
there is a subset whose elements add to the target.

L0 = { ⟨S,T ⟩ some subset of S adds to T }
The Knapsack problem starts with a multiset of objects K = {k 0 , ... kn−1 }, along
with a bound W ∈ N and a target V ∈ N. There are also two functions, {w, v }K N+,
giving each ki a weight w(ki ), and a value v(ki ). The problem is to decide if there
is subset A ⊆ K such that the sum of the element weights is less than or equal
to W while the sum of the element values is greater than or equal to V .

L1 = { ⟨K, w, v,W , V ⟩ some A ⊆ K has a ∈A w(a) ≤ W and a ∈A v(a) ≥ V }
Í Í
A reduction function f must input pairs ⟨S,T ⟩ , must output 5-tuples ⟨K, w, v,W , V ⟩ ,
must run in polytime, and must be such that ⟨S,T ⟩ ∈ L0 holds if and only if
⟨K, w, v,W , V ⟩ ∈ L1 holds.
As an illustration, suppose that we want to know if there is a subset of
S = { 18, 23, 31, 33, 72, 86, 94 } that adds to T = 126, and we have access to an
oracle that can quickly solve any Knapsack problem. We could let K equal S , let
w and v be such that w(18) = v(18) = 18, w(23) = v(23) = 23, etc., and set the
weight and value targets W and V to be T = 126.
In general, given ⟨S,T ⟩ , take f (⟨S,T ⟩) = ⟨S, w, v,T ,T ⟩ , where the functions are
given by w(si ) = v(si ) = si . Then clearly ⟨S,T ⟩ ∈ L0 if and only if f (⟨S,T ⟩) ∈ L1 ,
and clearly f is polytime.
The prior two examples show one kind of natural reduction, when one problem
is a special case of another, or at least closely related.
In addition, the prior example suggests that where the transformation from one
problem set to another is concerned, the details can hide the ideas. Often authors
will suppress the details and instead outline the transformation. We will do the
same here.
6.6 Example We will sketch an argument that the Graph Colorability problem reduces
to the Satisfiability problem, that Graph Colorability ≤p Satisfiability.
Recall that a graph is k -colorable if we can partition the vertices into k many
classes, called ‘colors’ because that’s how they are pictured, so that there is no edge
between two same-colored vertices.
6.7 Animation: A 3-coloring of a graph.
And, a propositional logic expression is satisfiable if there is an assignment of the

values T and F to the variables that makes the statement as a whole evaluate to T .
We’ll go through the k = 3 construction; other k ’s work the same way. Write
the set of 3-colorable graphs as L1 and write the set of satisfiable propositional logic
statements as L0 . To show that L1 ≤p L0 , we will produce a translation function f .
It inputs a graph G and a outputs a propositional logic expression E = f (G ), such
that the graph is 3-colorable if and only if the expression is satisfiable. The function
can be computed in polytime.
Let G have vertices {v 0 , ... vn−1 }. The expression will have 3n -many Boolean
variables: a 0 , ... an−1 , and b0 , ... bn−1 , and c 0 , ... c n−1 . The idea is that if the i -th
vertex vi gets the first color then the associated variable is ai = T , while if it gets
the second color then bi = T , and if it gets the third color then c i = T . Thus, for
each vertex vi , create a clause saying that it gets at least one color.
(ai ∨ bi ∨ c i )
In addition, for each edge {vi , v j }, create three clauses that together ensure that
the edge does not connect two same-color vertices.
(¬ai ∨ ¬a j ) (¬bi ∨ ¬b j ) (¬c i ∨ ¬c j )

The desired expression is the conjunction of the clauses.

This illustrates.
E = (a 0 ∨b0 ∨c 0 )∧(a 1 ∨b1 ∨c 1 )∧(a 2 ∨b2 ∨c 2 )∧(a 3 ∨b3 ∨c 3 )

v0 v1 v2
∧ (¬a 0 ∨ ¬a 1 ) ∧ (¬b0 ∨ ¬b1 ) ∧ (¬c 0 ∨ ¬c 1 )
v3 ∧ (¬a 0 ∨ ¬a 3 ) ∧ (¬b0 ∨ ¬b3 ) ∧ (¬c 0 ∨ ¬c 3 )
∧ (¬a 1 ∨ ¬a 2 ) ∧ (¬b1 ∨ ¬b2 ) ∧ (¬c 1 ∨ ¬c 2 )
∧ (¬a 1 ∨ ¬a 3 ) ∧ (¬b1 ∨ ¬b3 ) ∧ (¬c 1 ∨ ¬c 3 )
The graph has four vertices, so the expression starts with four clauses, saying that
for each vertex vi at least one of the associated variables ai , bi , or c i is T . The
graph has four edges, v 0v 1 , v 0v 3 , v 1v 2 , and v 2v 3 . The expression continues with
three clauses for each edge, together ensuring that the variables associated with
the edge’s vertices do not both have the value T . Thus, E is satisfiable if and only if
G has a 3-coloring.
Completing the proof means checking that the translation function, which
inputs a bitstring representation of G and outputs a bitstring representation of S , is
polynomial. That’s clear, although the argument is messy so we omit it.
Echoing what we said above, the significance of the reduction is that we now
know that if we could solve the Satisfiability problem in polynomial time then we
could solve the Graph Colorability problem in polynomial time.
So in this sense, the Satisfiability problem is at least as hard as Graph Colorability.
This section’s final example gives a problem that is at least as hard as Satisfiability.
6.8 Example Recall that the Clique problem is the decision problem for the lan-
guage L = { ⟨G , B⟩ G has a clique of at least B vertices }, where a clique is a set
of vertices that are all mutually connected. We will sketch the argument that
Satisfiability ≤p Clique.
The reduction f inputs a propositional logic expression E and outputs a pair
f (E) = ⟨G , B⟩ . It must run in polytime, and must be such that E ∈ SAT if and only
if f (E) ∈ L.
Consider this expression.
E = (x 0 ∨ x 1 ) ∧ (¬x 0 ∨ ¬x 1 ) ∧ (x 0 ∨ ¬x 1 )
It has three clauses, x 0 ∨ x 1 , ¬x 0 ∨ ¬x 1 , and x 0 ∨ ¬x 1 . In a clause, an atom is

either a variable x i or its negation ¬x i .
For each occurrence of an atom in a clause, put a vertex in G . The expression E
has three clauses with two atoms each, so the graph below has six vertices. As to
the edges in G , connect vertices if the associated atoms are in different clauses and
are not negations (that is, not x i and ¬x i ).
v 0, 0 v 0, 1
v 1, 0 v 2, 0
v 1, 1 v 2, 1
Observe that E is satisfiable if and only if the graph has a 3-clique. Showing that
the translation function f is polytime is routine.
Those examples give some sense of why the Satisfiability problem can be
convenient, a benchmark problem for reducibility. Often it is natural to describe
the conditions in a problem with logical statements. In the next section we will
give a theorem saying that Satisfiability is at least as hard as every problem in NP.
We close with a comment on ≤p . The definition of L1 ≤p L2 is that σ ∈ L1 if
and only if f (σ ) ∈ L0 for some polytime computable f . So f takes the input σ
and does a computation, and at the end asks the L0 oracle the single question of
the membership of f (σ ). Other reductions are possible, for instance one that can
consult L0 any finite number of times, called Cook reducibility.
V.6 Exercises
6.9 Your friend is confused. “Lemma 6.3 says that every language in P is ≤p
to every other language. But there are uncountably many languages and only
countably many f ’s because they each come from some Turing machine. So I’m
not seeing how there are enough reduction functions for a given language to
be related to all others.” Straighten them out. Hint: the definition of reduction
function is asymmetric. Such a function must do the right thing for every input,
but it need not be onto and so it may leave elements of the codomain untouched,
that is, free to vary.
✓ 6.10 Show that if L0 < P and L0 ≤p L1 then L1 < P also. What about NP?
6.11 Prove that L ≤p Lc if and only if Lc ≤p L.
6.12 Example 6.5 includes as illustration a Subset Sum problem, where S =
{ 18, 23, 31, 33, 72, 86, 94 } and T = 126. Solve it.
✓ 6.13 Suppose that the language A is polynomial time reducible to the language B ,
A ≤p B . Which of these are true?
(a) A tractable way to decide A can be used to tractably decide B .
(b) If A is tractably decidable then B is tractably decidable also.
(c) If A is not tractably decidable then B is not tractably decidable too.
✓ 6.14 Fix an alphabet Σ. The Substring problem inputs two strings and decides if
the second is a substring of the first. The Cyclic Shift problem inputs two strings
and decides if the second is a cyclic shift of the first. (Where α = a 0a 1 ... an−1
and β = b0b1 ... bn−1 are length n strings, β is a cyclic shift of α if there is an
index k ∈ [0 .. n − 1] such that ai = b(k +i) mod n for all i .)
(a) Name three cyclic shifts of α = 0110010.

(b) Decide whether β = 101001101 is a cyclic shift of α = 001101101.
(c) State the Substring problem as a language decision problem.
(d) Also state the Cyclic Shift problem as a language decision problem.
(e) Show that Cyclic Shift ≤p Substring.
✓ 6.15 The Independent Set problem inputs a graph and a bound, and decides if
there is a set of vertices, of size at least equal to the bound, that are not connected
by any edge. The Vertex Cover problem inputs a graph and a bound and decides
if there is a vertex set, of size less than or equal to the bound, such that every
edge contains at least one vertex in the set.
(a) State each as a language decision problem.
(b) Consider this graph. Find a vertex cover with four elements.
v0 v1 v2 v3 v4
v5 v6 v7 v8 v9
(c) In that graph find an independent set with six elements.

(d) Show that in a graph, S is an independent set if and only if N − S is a vertex
cover, where N is the set of vertices.
(e) Conclude that Vertex Cover ≤p Independent Set.
(f) Also conclude that Independent Set ≤p Vertex Cover.
✓ 6.16 Show that Hamiltonian Circuit ≤p Traveling Salesman.
(b) Produce the reduction function.
6.17 The Vertex Cover problem inputs a graph and a bound and decides if there is
a vertex set, of size less than or equal to the bound, such that every edge contains
at least one vertex in the set. The Set Cover problem inputs a set S , a collection
of subsets S 0 ⊆ S , . . . S n ⊆ S , and a bound, and decides if there is a subcollection
of the S j , with a number of sets at most equal to the bound, whose union is S .
(b) Find a vertex cover for this graph.
c f l
q1 q3 q6 q8
a
д j
q0 d q5 n
m
h k
b
q2 e
q4 q7 q9
i
(c) Make a set S consisting of all of that graph’s edges, and for each v make a
subset Sv of the edges incident on that vertex. Find a set cover.
(d) Show that Vertex Cover ≤p Set Cover.
✓ 6.18 In this network, each edge is labeled with a capacity. (Imagine railroad lines
going from q 0 to q 6 .)
q1 2 q4
3 2 4 1
q0 1 q3 2 q6
2 2 2 2
q2 q5
1
The Max-Flow problem is to find the maximum amount that can flow from left to
right. That is, we will find a flow Fqi ,q j for each edge, subject to the constraints
that the flow through an edge must not exceed its capacity and that the flow
into a vertex must equal the flow out (except for the source q 0 and the sink q 6 ).
The problem is to find the edge flows so that the source and sink see maximal
total flow. The Linear Programming optimization problem starts with a list of
linear equalities and inequalities, such as ai, 0x 0 + · · · + ai,n−1x n−1 ≤ bi for
a 0 , ... an−1 , bi ∈ Q, and it looks for a sequence ⟨s 0 , ... sn−1 ⟩ ∈ Qn that satisfies
all of the constraints, and such that a linear expression c 0x 0 + · · · + c n−1x n−1 is
maximal.
(a) Express each as a language decision problem, remembering the technique of
converting optimization problems using bounds.
(b) By eye, find the maximum flow for the above network.
(c) For each edge vi v j , define a variable x i, j . Describe the constraints on that
variable imposed by the edge’s capacity. Also describe the constraints on the
set of variables imposed by the limitation that for many vertices the flow in
must equal the flow out. Finally, use the variables to give an expression to
optimize in order to get maximum flow.
(d) Show that Max-Flow ≤p Linear Programming.
6.19 The Max-Flow problem inputs a directed graph where each edge is labeled
with a capacity, and the task is to find a the maximum amount that can flow from
the source node to the sink node (for more, see Exercise 6.18). The Drummer
problem starts with two same-sized sets, the rock bands, B , and potential
drummers, D . Each band b ∈ B has a set Sb ⊆ D of drummers that they would
agree to take on. The goal is to make the most number of matches.
(a) Consider four bands B = {b 0 , b 1 , b 2 , b 3 } and drummers D = {d 0 , d 1 , d 2 , d 3 } .
Band b0 likes drummers d 0 and d 2 . Band b1 likes only drummer d 1 , and b2
also likes only d 1 . Band b3 like the sound of both d 2 and d 3 . What is the
largest number of matches?
(b) Express each as a language decision problem.
(c) Draw a graph with the bands on the left and the drummers on the right.
Make an arrow from a band to a drummer if there is a connection. Now add
a source and a sink node to make a flow diagram.
(d) Show that Drummer ≤p Max-Flow.
6.20 In a propositional logic expression, a single variable is an atom, and an
atom or its negation is a literal. We shall say that a clause is a disjunction of
literals, so that P 0 ∨ ¬P 1 ∨ ¬P 2 is a 3-literal clause. Note that a clause evaluates
to T if and only if at least one of the literals evaluates to T . A propositional
logic expression is in Conjunctive Normal Form if consists of a conjunction of

clauses. The 3-Satisfiability problem is to decide the satisfiability of propositional
logic expression where every clause consists of three literals (involving different
variables). One such expression is (P 0 ∨ ¬P 1 ∨ ¬P 2 ) ∧ (P 1 ∨ P 2 ∨ ¬P 3 ). The
Independent Set problem inputs a graph and a bound, and decides if there is a
set of vertices, of size at least equal to the bound, that are not connected by any
edge.
(a) In this graph, find an independent set.
q0 q1 q2
q3 q4 q5
(b) State Independent Set as a language decision problem.

(c) Decide if E = (P 0 ∨ ¬P 1 ∨ ¬P 2 ) ∧ (P 1 ∨ P 2 ∨ ¬P 3 ) is satisfiable.
(d) State 3-Satisfiability as a language decision problem.
(e) With the expression E , make a triangle for each of the two clauses, where
the vertices of the first are v 0 , v 1 , and v 2 , while the vertices of the second
are w 1 , w 2 , and w 3 . In addition to the edges forming the triangles, also put
one connecting v 1 with w 1 , and one connecting v 2 with w 2 .
(f) Sketch an argument that 3-Satisfiability ≤p Independent Set.
✓ 6.21 The 3-Satisfiability problem is to decide the satisfiability of propositional

logic expression where every clause consists of three literals (see Exercise 7.24
for more). The Linear Programming language decision problem starts with a
list of linear equalities and inequalities, such as ai, 0x 0 + · · · + ai,n−1x n−1 ≤ bi
for a 0 , ... an−1 , bi ∈ Q, and it looks for a sequence ⟨s 0 , ... sn−1 ⟩ ∈ Qn that that
is feasible, that satisfies all of the constraints. The Integer Linear Programming
problem adds the requirement that all of the numbers be integers.
(a) Consider the propositional logic clause P 0 ∨ ¬P 1 ∨ ¬P 2 . Create variables z 0 ,
z 1 , and z 2 and list linear constraints such that each must be either 0 or 1.
Also give a linear inequality that holds if and only if the clause is true.
(b) Show that 3-Satisfiability ≤p Integer Linear Programming.
✓ 6.22 We can do reductions between problems of types other than language decision
problems. Here are two optimization problems. The Assignment problem inputs
two same-sized sets, of workers W = {w 0 , ... w n−1 } and tasks T = {t 0 , ... tn−1 }.
For each worker-task pair there is a cost C(w i , t j ). The goal is to assign each of
the tasks, one per worker, at minimal total cost. The Traveling Salesman problem,
of course, inputs a graph whose edge weights give a cost for traversing that edge,
and asks for circuit of minimal total cost.
(a) By eye, solve this Assignment problem instance.
Section 7. NP completeness 325
Cost C(ti , w j ) w0 w1 w2 w3
t0 13 4 7 6
t1 1 11 5 4
t2 6 7 2 8
t3 1 3 5 9
(b) Consider this bipartite graph.
w0 w1 w2 w3
t0 t1 t2 t3
Each ti is shown connected to each w j . As edge weights, add the costs

from the table. In addition, connect each pair of w ’s with an edge of
weight 0, and similarly connect each pair of t ’s. Restate the Assignment
problem as that of finding a circuit in this graph. Use this to show that
Assignment ≤p Traveling Salesman.
✓ 6.23 Lemma 6.3 leaves a couple of points undone.
(a) Show that ≤p is reflexive and transitive.
(b) It says that nontrivial languages are P hard. What about trivial ones? Which
languages reduce to the empty set? To B∗ ?
(c) Show that NP is downward closed, that if L1 ≤p L0 and L0 ∈ NP then
L1 ∈ NP also.
6.24 Is there a connection between subset and polytime reducibility? Find
languages L0 , L1 ∈ P (B∗ ) for each: (a) L0 ⊂ L1 and L0 ≤p L1 , (b) L0 1 L1
and L0 ≤p L1 , (c) L0 ⊂ L1 and L0 ≰p L1 , (d) L0 1 L1 and L0 ≰p L1 .
6.25 When Li ≤p Lj , does that mean that the best algorithm to decide Li takes
time that is less than or equal to the amount taken by the best algorithm for
Lj ? Fix a language decision problem L0 whose fastest algorithm is O(n 3 ), an L1
whose best algorithm is O(n 2 ), a L2 whose best is O(2n ), and a L3 whose best is
O(lg n). In the array entry i, j below, put ‘N’ if Li ≤p Lj is not possible.
L0 L1 L2 L3
L0 (0,0) (0,1) (0,2) (0,3)
L1 (1,0) (1,1) (1,2) (1,3)
L2 (2,0) (2,1) (2,2) (2,3)
L3 (3,0) (3,1) (3,2) (3,3)
Section
V.7 NP completeness
Because P ⊆ NP, the class NP contains lots of easy problems, ones with
a fast algorithm. For instance, one member of NP is the problem of
determining whether a number is odd. Nonetheless, the interest in the
class is that it also contains lots of problems that seem to be hard. Can
we prove that these problems are indeed hard?
This question was raised by S Cook in 1971. He noted that with
polynomial time reducibility we have a way to make precise that an
efficient solution for one problem implies in an efficient solution for the
other. And, he showed that among the problems in NP, there are ones
that are maximally hard.†
7.1 Theorem (Cook-Levin theorem) The Satisfiability problem is in NP,
and has the property any problem in NP reduces to it: L ≤p SAT for
any L ∈ NP.
First, SAT ∈ NP because a nondeterministic machine can guess which line
of the truth table to verify. Said another way: given a Boolean formula, use as
a witness ω a sequence giving an assignment of truth values that satisfies the
formula.
We will not step through the proof here, but here is the basic idea.
We are given L ∈ NP and must show that L ≤p SAT . For this, we must
product a function f L that translates membership questions for L into
a Boolean formulas, such that the membership answer is ‘yes’ if and only
if the formula is satisfiable.
The only thing that we know about L is that its member σ ’s are
accepted by a nondeterministic machine P in time given by a polyno-
mial q . So the proof constructs, from ⟨P , σ , q⟩ , a Boolean formula that
Leonid Levin yields T if and only if P accepts σ . The Boolean formula encodes the
b 1948 constraints under which a Turing machine operates, such as that the
only tape symbol that can be changed in one step is the symbol under
the machine’s Read/Write head.
7.2 Definition A problem L is NP complete if it is a member of NP and any
member L̂ of NP is polynomial time reducible to it, L̂ ≤p L.
7.3 Definition A problem L is NP-hard if every problem in NP reduces to it.

In general, for a complexity class C, a problem L is C-hard when all problems
in that class reduce to it: if L̂ ∈ C then L̂ ≤p L. A problem is C complete if it is
hard for that class and also is a member of that class.
So a problem is NP complete if it is, in a sense, at least as hard as any problem
in NP. The Cook-Levin Theorem says that there is at least one NP complete
problem, namely SAT . In fact, we shall see that there are many such problems.
The NP complete problems are to the class NP as the problems Turing-equivalent
to K are to the computably enumerable sets, where K is the solution to the Halting
problem. They are at the top level of their class — if we could solve the one problem
the we could solve every other problem in that class. This sketch illustrates.
†
This was also shown by L Levin, but he was behind the Iron Curtain so knowledge of his work did not
have a chance to spread to the rest of the world for some time.
Slow
..
.
Fast
7.4 Figure: The bean contains all problems. The subset in the bottom right is NP, drawn
with P as a proper subset (although, strictly speaking, we don’t know that is true).
The top right, shaded, has the NP-hard problems. The highlighted intersection is the
set of NP complete problems.
7.5 Lemma If L0 is NP complete, and L0 ≤p L1 , and L1 ∈ NP then L1 is NP complete.

Soon after Cook raised the question of NP completeness, R Karp
brought it to widespread attention. He produced a list of twenty one
problems, drawn from Computer Science, Mathematics, and the natural
sciences, that were well-known to be difficult, so that many smart people
had for many years been unable find efficient algorithms. Karp showed
that they were all NP complete and so if we could efficiently solve
any of them then we could efficiently solve every one. Not every hard
problem is NP complete but many thousands of problems have been Richard M. Karp
shown to be in this category and so whatever it is that makes these b 1935
problems hard, all of them share it.
Typically, we prove that a problem L is NP complete in two halves. The first
half is to show that L ∈ NP. Usually this is easy; we just produce a witness that can
be verified in polytime. The second half is to show that the problem is NP-hard.
Often this involves showing that some problem already known to be NP complete
reduces to L, The following list contains the NP complete problems that are most
often used for this. For instance, to show that L is NP hard we might show that
3-SAT ≤p L. These descriptions appeared earlier; they are repeated here for
convenience.
7.6 Theorem (Basic NP Complete Problems) Each of these problems is NP-
complete.
3-Satisfiability, 3-SAT Given a propositional logic formula in conjunctive normal
form in which each clause has at most 3 variables, decide if it is satisfiable.
3 Dimensional Matching Given as input a set M ⊆ X × Y × Z , where the sets
X , Y , Z all have the same number of elements, n, decide if there is a matching,
a set M̂ ⊆ M containing n elements such that no two of the triples in M̂ agree
on any of their coordinates.
Vertex cover Given a graph and a bound B ∈ N, decide if the graph has a B -vertex
cover, a size B set of vertices C such that for any edge vi v j , at least one of its
ends is a member of C .
Clique Given a graph and a bound B ∈ N, decide if the graph has a B -clique, a
set of B -many vertices such that any two are connected.
Hamiltonian Circuit Given a graph, decide if it contains a Hamiltonian circuit, a
cyclic path that includes each vertex.
Partition Given a finite multiset S , decide if there is a division of the set into
two parts Sˆ and S − Sˆ so the total of the elements in the two is the same,
s ∈Sˆ s = s<Sˆ s .
Í Í
We will not show here that these are all NP complete; for that, see (Garey and
Johnson 1979).
7.7 Example The Traveling Salesman problem is NP complete. We can prove this by
showing that the Hamiltonian Circuit problem reduces to it: Hamiltonian Circuit ≤p
Traveling Salesman. Recall that we have recast Traveling Salesman as the decision
problem for the language of pairs ⟨G , B⟩ , where B is a parameter bound. Recall
also that this problem is a member of NP.
The reduction function inputs an instance of Hamiltonian Circuit, a graph Ĝ =
⟨N̂ , Ê ⟩ whose edges are unweighted. It returns the instance of Traveling Salesman
that uses the vertex set N̂ as cities, that takes the distances between the cities to
be d(vi , v j ) = 1 if vi v j ∈ Ê and d(vi , v j ) = 2 if vi v j < Ê , and such that the bound
is the number of vertices, B = | N̂ | .
This bound means that there will be a Traveling Salesman solution if and only if
there is a Hamiltonian Circuit solution, namely the salesman uses the edges of the
Hamiltonian circuit. All that remains is to argue that the reduction function runs
in polytime. The number of edges in a graph is no more than twice the number of
vertices, so polytime in the input graph size is the same as polytime in the number
of vertices. The reduction function’s algorithm examines all pairs of vertices, which
takes time that is quadratic in the number of vertices.
A common strategy to show that a given problem is NP complete using the List
of Basic NP Complete Problems is to show that a special case of it is on the list.
7.8 Example The Knapsack problem starts with a multiset of objects S = {s 0 , ... sk −1 },
where each element has a weight w(si ) ∈ N+ and a value v(si ) ∈ N+, and where
there are two overall criteria: a weight bound B ∈ N+ and a value target T ∈ N+.
The problem is to find a knapsack C ⊆ S whose elements have total weight less
than or equal to the bound, and total value greater than or equal to the target.
Observe first that this is an NP problem. As the witness we can use the k -bit
string ω such that ω[i] = 1 if si is in the knapsack C , and ω[i] = 0 if it is not. A
deterministic machine can verify this witness in polynomial time since it only has
to total the weights and values of the elements of C .
To finish, we must show that Knapsack is NP-hard. We will show that a special
case is NP-hard. Consider the case where w(si ) = v(si ) for all si ∈ S , and where the
two criteria each equal half of the total of all the weights, B = T = 0.5 · 0 ≤i <k w(i).
Í
This is a Partition problem, which is in the above list of basic problems.

One of Karp’s points was the practical importance of NP completeness. Many
problems from applications fall into this class. The next example illustrates. It also
illustrates that many reductions are complex.
7.9 Example Usually, colleges make a schedule by putting classes into time slots and
then students pick which classes they will take. Imagine instead that students
first select the classes, and then the college decides if there is a non-conflicting
time schedule for those classes. We will show that this decision problem, L, is
NP complete.
To be more specific, here is an instance of the problem L. Consider a college
with 2 time slots and k classes, each with some capacity for enrolled students.
Every student s has two disjoint lists of classes ℓs , ℓˆs , and will enroll in one class
from each list. For example, perhaps they choose MA-101-A from the first list and
PY-101-A from the second. The college collects all these student choices and then
must decide if there is a way to partition the set of classes into two time slots, so
that no student has two classes in the same time slot.
First we show that L ∈ NP. As a witness ω we can use a sequence of assignments
of students to classes and classes to time slots. This has length polynomial in the
number of students, and we can check in polytime that each student is assigned
one class from each of their lists, that each class is given a time slot, and there is
no conflict.
What remains is to show that L is NP-hard. We will show that 3-SAT ≤p L.
We must produce a reduction function that takes as input instances propositional
logic expressions consisting of m -many clauses, each of which contains 3 or fewer
of the Boolean variables or their negations that are joined by ∧’s, and these clauses
are all joined by ∨’s. The reduction function’s output is an instance of L, and
(besides being computable in polytime) it must have the property that the output
instance has a successful schedule if and only if the input formula is satisfiable.
We next describe what the reduction function computes. Its input instance
has Boolean variables, x 0 , ... x n−1 . For each x i , the output instance will have an
associated course, c i , of capacity equal to the total number of students. There
will be more courses also, including one named ‘T’ and one named ‘F’. The idea
is that if the input instance is satisfied by an assignment where x i = T then the
output can allot course c i to the same time as course ‘F’. These courses will each
have capacity 2m + 1 (where m is the number of clauses in the expression). Also
start by giving the output instance a student s , whose two lists are ℓs = { T } and
ℓˆs = { F }. The presence of this student in the problem instance ensures that the ‘T’
and ‘F’ courses are in separate time slots.
Now consider the clauses in the input formula. The easiest clauses are the ones
without negations, of the form x i 0 , or x i 0 ∨ x i 1 , or x i 0 ∨ x i 1 ∨ x i 2 . If the clause is
x i 0 then in the output instance we create a student s with the two lists ℓs = {c i 0 }
and ℓˆs = { F } (thus, if the clause is x 7 then the lists are {c 7 } and { F }). If the
clause is x i 0 ∨ x i 1 then the student gets ℓs = {c i 0 , c i 1 } and ℓˆs = { F }. Similarly, for

x i 0 ∨ x i 1 ∨ x i 2 the created student’s lists are ℓs = {c i 0 , c i 1 , c i 2 } and ℓˆs = { F }.
Clauses consisting only of negations, either ¬x i 0 , or ¬x i 0 ∨ ¬x i 1 , or ¬x i 0 ∨
¬x i 1 ∨ ¬x i 2 , are similar. An example is that for the one-atom clause ¬x 7 , we create
a student with the two lists ℓs = { T } and ℓˆs = {c 7 }.
The messiest clauses are those with a mix of positive and negative atoms. We
will show what to do with an example: suppose that the j -th clause is x i 0 ∨x i 1 ∨¬x i 2 .
In addition to the associated classes c i 0 , c i 1 , and c i 2 created earlier, we also create
three new classes t j , f j and z j , with capacity of one student each. And we create
three students: s j , u j , and v j . Here are their lists.
Student ℓ ℓˆ
sj {c i 0 , c i 1 , t j } {c i 2 , f j }
uj {t j , z j } {F}
vj {T} { fj , zj }
To confirm that this reduction function suffices, we must show that if for the
input instance there is an assignment of truth values to the Boolean variables that
satisfies the expression then for the output course scheduling instance there is a
solution, and we must also show the converse.
So first suppose that there a way to give each Boolean variable a value of T or
F such that the entire propositional logic formula evaluates to T . Then we get a
non-conflicting course time slot allotment by: where x i = T , put the course c i in
the same time slot as course ‘T’, and where x j = F , put the course c j in the slot slot
with ‘F’. This clearly works for the all-positive atom or all-negative atoms clauses.
The interesting clauses are the mixed positive and negative atom ones, such as our
example x i 0 ∨ x i 1 ∨ ¬x i 2 . We can check that for each possible assignment making
this clause evaluate to T , there is a non-conflicting way to arrange the courses. For
instance, one such assignment is x i 0 = T , x i 1 = F , and x i 2 = T . Here is a way to
assign courses to time slots that will avoid conflicts. We have put course c i 0 in the
time slot with course ‘T’, and student s j can take it then. This student can also take
course c i 2 in the time slot when ‘F’ is offered. As to student u j , then take course t j
when ‘T’ is offered and also take course ‘F’. Likewise, student v j takes ‘T’ and also
takes course f j when ‘F’ is offered.
Conversely, suppose that for every assignment of truth values to the Boolean
variables, the input expression is not satisfied. We will show that as a result, for
any allocation of classes into time slots there is a conflict.
So fix an allocation of classes into time slots, say with the classes c i 0 , c i 1 , . . .
being offered in the same slot as class ‘T’ and the rest with class ‘F’. Associate with
that allocation of classes the assignment of Boolean variables that sets x i 0 = T ,
x i 1 = T , etc. and the rest to F . By our assumption this expression is not satisfiable
so this assignment causes the formula to yield a value of F , Thus the expression
has at least one clause, clause j , that does not evaluate to T .
The first possibility is that clause j has either all positive atoms or all negative
atoms. An example is the clause x i 0 ∨ x i 1 ∨ x i 2 . To make this clause evaluate to F ,
the assignment must have x i 0 = F , x i 1 = F , and x i 2 = F , and so the allocation
of classes must be that all three of c i 0 , c i 1 , and c i 2 go with class ‘F’. That gives a
conflict in problem instance’s course assignments, because we created a student s j
with the lists ℓs j = {c i 0 , c i 1 , c i 2 } and ℓˆs j = { F }.
The other case is that clause j has a mixed form, such as x i 0 ∨ x i 1 ∨ ¬x i 2 , so
that the Boolean variable assignment is x i 0 = F , x i 1 = F , and x i 2 = T . The output
course assignment problem instance puts c i 0 and c i 1 into a course at the same
time as class ‘F’, and c i 2 into a course at the same time as ‘T’. We claim that there
is no non-conflicting way to assign these students to courses. Refer to the table
above. This allocation of classes could only be non-conflicting if student s j selected
classes t j and f j . But then because of the capacity of one in these classes, student u j
must select z j and ‘F’, leaving student v j with a conflict.
Whew! To close, observe that the reduction function that creates the output L
instance from the input propositional logic formula runs in polytime.
Before we leave this discussion, we ask the natural question: what problems
are not complete? The short answer is that we don’t know. First, it is trivial from
the definition that if a problem is NP hard but not in NP then it is not NP complete.
Likewise, the empty language and the language of all strings are trivially not
complete. But as to proving that interesting problems from NP are not complete,
that is another matter. It is tied up with the question of whether P is unequal to NP,
which we address in the next subsection.
However, so that we have have not just brushed past the question, with the
assumption that P , NP, here are a few problems that are in NP and that
many people conjecture are tough but not NP complete. Most experts believe
that the Factoring problem is hard for classical computers† but that it is not
NP complete. Experts also suspect that the Graph Isomorphism problem and
the Vertex to Vertex Path problem are not NP complete. As always though, the
standard caution applies that without proof these judgements could be mistaken.
P = NP? Every deterministic Turing machine is trivially a nondeterministic ma-

chine and so P ⊆ NP. What about the other direction? One way to think of
nondeterministic machines is that they are unboundedly parallel. So the P ver-
sus NP question asks: does adding parallelism add speed?
The short answer is that no one knows. We don’t know which if these two
pictures is right.
†
In 1994, P Shor discovered an algorithm for a quantum computer that solves the Factoring problem in
polynomial time. This will have significant implications if quantum computation proves to be possible
to engineer.
7.10 Figure: Which is it: P ⊂ NP or P = NP?
There are a number of simple ways to settle the question. By Lemma 7.5, if someone
shows that any NP complete problem is a member of P then P = NP. In addition, if
someone shows that there is an NP problem that is not a member of P then P , NP.
However, despite nearly a half century of effort by many extremely smart people,
no one has done either one.
As formulated in Karp’s original paper, the question of whether P equals NP
might seem of only technical interest.
A large class of computational problems involve the determination of properties
of graphs, digraphs, integers, arrays of integers, finite families of finite sets, boolean
formulas and elements of other countable domains. Through simple encodings . . .
these problems can be converted into language recognition problems, and we can
inquire into their computational complexity. It is reasonable to consider such a problem
satisfactorily solved when an algorithm for its solution is found which terminates
within a number of steps bounded by a polynomial in the length of the input. We
show that a large number of classic unsolved problems of covering, matching, packing,
routing, assignment and sequencing are equivalent, in the sense that either each of
them possesses a polynomial-bounded algorithm or none of them does.
But Karp demonstrated that many of the problems that people had been struggling
with in practical applications fall into this category. Researchers who have been
trying to find an efficient solution to Vertex Cover, and those who have been
working on Clique found that they are in some sense working on the same problem.
By now the list of NP complete problems includes determining the best layout of
transistors on a chip, developing accurate financial-forecasting models, analyzing
protein-folding behavior in a cell, or finding the most energy-efficient airplane
wing. So the question of whether P = NP is extremely practical, and extremely
important.†
In practice, proving that a problem is a member of NP is often an ending point;
a researcher may well reason that continuing to try to find an algorithm will not be
fruitful, since many of the best minds of Mathematics, Computer Science, and the
natural sciences have failed at it. They may instead turn their attention elsewhere,
perhaps to approximations that are good enough; see Extra B.
†
One indication of its importance is its inclusion on Clay Mathematics Institute’s list of problems for
which there is a one million dollar prize; see http://www.claymath.org/millennium-problems.
In the book’s first part we studied problems that

are unsolvable. That was a black and white situation;
either a problem is mechanically solvable in principle
or it is not. We now find that many problems of interest
are solvable in principle, but that finding a solution is
infeasible. That is, the class of NP complete problems
form a kind of transition between the possible and the
impossible.
We can use this to engineering advantage. For Courtesy xkcd.com
instance, schemes for holding elections are notoriously
prone to manipulation and there are theorems saying that they must be. But we
can hope to use system that, while manipulatable in principle, is constructed so
that it is in practice infeasible to compute how to do the manipulation. Another
example of the same thing is the celebrated RSA encryption system that is used to
protect Internet commerce; see Extra A.
This returns us to the book’s opening question about mathematical proof. Recall
the Entscheidungsproblem that was a motivation behind the definition of a Turing
machine. It looks for an algorithm that inputs a mathematical statement and
decides whether it is true. It is perhaps a caricature but imagine that the job of
mathematicians is to prove theorems. The Entscheidungsproblem asks if we can
replace mathematicians with machines.
In the intervening century we have come to understand, through the work
of Gödel and others, the difference between a statement’s being true and its
being provable. Church and Turing expanded on this insight to show that the
Entscheidungsproblem is unsolvable. Consequently, we change to asking for an
algorithm that inputs statements and decides whether they are provable.
In principle this is simple. A proof is a sequence of statements, σ0 , σ1 , . . . σk ,
where the final statement is the conclusion, and where each statement either is
an axiom or else follows from the statements before it by an application of a rule
of deduction (a typical rule allows the simultaneous replacement of all x ’s with
y + 4’s). In principle a computer could brute-force the question of whether a given
statement is provable by doing a dovetail, a breadth-first search of all derivations.
If a proof exists then it will appear eventually.
The difficulty is that final word, eventually. This algorithm is very slow. Is there
a tractable way?
In the terminology that we now have, the modified Entscheidungsproblem is
a decision problem: given a statement σ and bound B ∈ N, we ask if there is a
sequence of statements ω witnessing a proof that ends in σ and that is shorter
than the bound. A computer can quickly check whether a given proof is valid —
that is, this problem is in NP. With the current status of the P versus NP problem,
the answer to the question in the prior paragraph is that no one knows of a fast
algorithm, but no one can show that there isn’t one either.
As far back as 1956, Gödel raised these issues in a letter to von Neumann (this
letter did not become public until years later).

One can obviously easily construct a Turing machine, which for every formula F in
first order predicate logic and every natural number n , allows one to decide if there
is a proof of F of length n (length = number of symbols). Let Ψ(F , n) be the number
of steps the machine requires for this and let ϕ(n) = maxF Ψ(F , n). The question is
how fast ϕ(n) grows for an optimal machine. One can show that ϕ(n) ≥ k · n . If
there really were a machine with ϕ(n) ∼ k · n (or even ∼ k · n 2 ), this would have
consequences of the greatest importance. Namely, it would obviously mean that in spite
of the undecidability of the Entscheidungsproblem, the mental work of a mathematician
concerning Yes-or-No questions could be completely replaced by a machine. After all,
one would simply have to choose the natural number n so large that when the machine
does not deliver a result, it makes no sense to think more about the problem. Now it
seems to me, however, to be completely within the realm of possibility that ϕ(n) grows
that slowly. Since it seems that ϕ(n) ≥ k · n is the only estimation which one can obtain
by a generalization of the proof of the undecidability of the Entscheidungsproblem and
after all ϕ(n) ∼ k ·n (or ∼ k ·n 2 ) only means that the number of steps as opposed to trial
and error can be reduced from N to log N (or (log N )2 ). . . . It would be interesting to
know, for instance, the situation concerning the determination of primality of a number
and how strongly in general the number of steps in finite combinatorial problems can
be reduced with respect to simple exhaustive search.
So we can compare P versus NP with the Halting problem. The Halting problem
and related results tell us, in the light of Church’s Thesis, what is knowable in
principle. The P versus NP question, in contrast, speaks to what we can ever know
in practice.
Discussion The P versus NP question is certainly the sexiest one in the Theoretical
of Computing today. It has attracted a great deal of speculation, and gossip. In 2018
a poll of experts found that out of 152 respondents, 88% thought that P , NP while
only 12% thought that P = NP. This subsection discusses some of the intuition
around the question.
First, the intuition around the P , NP conjecture. Imagine a
jigsaw puzzle. We perceive that if a demon gave us an assembled
puzzle ω , then checking that it is correct is very much easier than it
would have been to work out the solution from scratch. Checking
for correctness is mechanical, tedious. But the finding, we think,
is creative — we expect that solving a jigsaw puzzle by brute-force
trying every possible piece against every other is far too much
computation to be practical.
Similarly, schemes for encryption are engineered so that, given an encrypted
message, decrypting it with the key is fast and easy but trying to decrypt it by
trying all possible keys is, we think, just not tractable.
A problem is in P if finding a solution is fast, while a problem is in NP if verifying
the correctness of a given witness ω is fast. From this point of view, the result that
P ⊆ NP becomes the observation that if a problem is fast to solve then it must be
fast to verify. But most experts perceive that inclusion in the other direction is
extremely unlikely.
Restated informally, the P versus NP question asks if finding a solution as fast
as recognizing one. If P = NP then the two jobs are, in a sense, equally difficult.
Some commentators have extended this thinking outside of
Theoretical Computer Science. Cook is one, “Similar remarks apply
to diverse creative human endeavors, such as designing airplane
wings, creating physical theories, or even composing music. The
question in each case is to what extent an efficient algorithm for
A Selman’s plate, recognizing a good result can be found.” Perhaps it is hyperbole to
courtesy S Selman say that if P = NP then writing great symphonies would be a job
for computers, a job for mechanisms, but it is correct to say that if
P = NP and if we can write fast algorithms to recognize excellent music — and our
everyday experience with Artificial Intelligence makes this seem more and more
likely — then we could have fast mechanical writers of excellent music.
We finish with a taste of the contrarian view, the conjecture that P = NP.
Many observers have noted that there are cases where everyone “knew” that
some algorithm was the fastest but in the end it proved not to be so. The section on
Big-O begins with one, the grade school algorithm for multiplication. Another is
the problem of solving systems of linear equations. The Gauss’s Method algorithm,
which runs in time O(n 3 ), is perfectly natural and had been known for centuries
without anyone making improvements. However, while trying to prove that Gauss’s
Method is optimal, Strassen found a O(n lg 7 ) method (lg 7 ≈ 2.81).†
A more dramatic speedup happens with the Matching problem: given a graph
with the vertices representing people, connect two vertices if the people are
compatible. We want a set of edges that is maximal, and such that no two edges
share a vertex. The naive algorithm tries all possible match sets, which takes 2m
checks where m is the number of edges. Even with only a hundred people, there
are more things to try than atoms in the universe. But since the 1960’s we have an
algorithm that runs in polytime.
Every day on the Theory of Computing blog feed there are examples of this,
of researchers producing algorithms faster than the ones previously known. A
person can certainly have the sense that we are only just starting to explore what
is possible with algorithms. R J Lipton captured this sense.
Since we are constantly discovering new ways to program our “machines”, why not a
discovery that shows how to factor? or how to solve SAT ? Why are we all so sure that
there are no great new programming methods still to be discovered? . . . I am puzzled
that so many are convinced that these problems could not fall to new programming
†
Here is an analogy: consider the problem of evaluating 2p 3 + 3p 2 + 4p + 5. Someone might claim
that writing it as 2 · p · p · p + 3 · p · p + 4 · p + 5 makes obvious that it requires six multiplications. But
rewriting it as p · (p · (2 · p + 3) + 4 + 5 shows that it can be done with just three. That is, naturalness
and obviousness do not guarantee that something is correct. Without a proof, we must worry that
someone will produce a clever way to do the job with less.
tricks, yet that is what is done each and every day in their own research.
Knuth has a related but somewhat different take.
Some of my reasoning is admittedly naive: It’s hard to believe that P , NP and that
so many brilliant people have failed to discover why. On the other hand if you imagine
a number M that’s finite but incredibly large . . . then there’s a humongous number of
possible algorithms that do n M bitwise or addition or shift operations on n given bits,
and it’s really hard to believe that all of those algorithms fail.
My main point, however, is that I don’t believe that the equality P = NP will turn
out to be helpful even if it is proved, because such a proof will almost surely be
nonconstructive. Although I think M probably exists, I also think human beings will
never know such a value. I even suspect that nobody will even know an upper bound
on M .
Mathematics is full of examples where something is proved to exist, yet the proof
tells us nothing about how to find it. Knowledge of the mere existence of an algorithm
is completely different from the knowledge of an actual algorithm.
Of course, all this is speculation. Speculating is fun, and in order to make
progress in their work, investigators need to have intuition, need to have some
educated guesses. But in the end these researchers, and all of us, look to settle the
question with proof.
V.7 Exercises
✓ 7.11 You hear someone say, “The Satisfiability problem is NP because it is not
computable in polynomial time, so far as we know.” It’s a short sentence but find
three things wrong with it.
✓ 7.12 You have this person in your class who is no genius, which is fine, except that
they don’t know that. They say, “I will show that the Hamiltonian Circuit problem
is not in P, which will demonstrate that P , NP. The algorithm to solve a given
instance G of the Hamiltonian Circuit problem is: generate all permutations of G ’s
vertices, test each to find if it is a circuit, and if any circuits appear then accept
the input, else reject the input. For sure that algorithm is not polynomial, since
the first step is exponential.” Where is the mistake?
✓ 7.13 Your friend says, “The problem of recognizing when one string is a substring
of another has a polytime algorithm, so it is not in NP.” They have misspoken;
help them out.
7.14 Someone in your study group wants to ask your professor, “Is the brute force
algorithm for solving the Satisfiability problem NP complete?” Explain to them
that it isn’t a sensible question, that they are making a type error.
7.15 True or false?
(a) The collection NP is a subset of the NP complete sets, which is a subset of
NP-hard.
(b) The collection NP is a specialization of P to nondeterministic machines, so it
is a subset of P.
✓ 7.16 Assume that P , NP. Which of these statements can we infer from the fact
that the Prime Factorization problem is in NP, but is not known to be NP-complete?
(a) There exists an algorithm that solves arbitrary instances of the Prime Factorization
problem.
(b) There exists an algorithm that efficiently solves arbitrary instances of this
problem.
(c) If we found an efficient algorithm for the Prime Factorization problem then
we could immediately use it to solve Traveling Salesman.
✓ 7.17 Suppose that L1 ≤p L0 . For each, decide if you can conclude it. (a) If
L0 is NP complete then so is L1 . (b) If L1 is NP complete then so is L0 .
(c) If L0 is NP complete and L1 is in NP then L1 is NP complete. (d) If L1
is NP complete and L0 is in NP then L0 is NP complete. (e) It cannot be the
case that both L0 and L1 are NP complete (f) If L1 is in P then so is L0 .
(g) If L0 is in P then so is L1 .
7.18 Show that each of these is in NP but is not NP complete, assuming that
P , NP.
(a) The language of even
numbers.
(b) The language { G G has a vertex cover of size at most four } .
✓ 7.19 Traveling Salesman is NP complete. From P , NP which of the following
statements could we infer?
(a) No algorithm solves all instances of Traveling Salesman.
(b) No algorithm quickly solves all instances of Traveling Salesman.
(c) Traveling Salesman is in P.
(d) All algorithms for Traveling Salesman run in polynomial time.
✓ 7.20 Prove that the 4-Satisfiability problem is NP hard.
✓ 7.21 The Hamiltonian Path problem inputs a graph and decides if there are two
vertices in that graph such that there is a path between those two that contains
all the vertices.
(a) Show that Hamiltonian Path is in NP.
(b) This graph has a Hamiltonian path. Find it.
v0 v7 v6 v8
v2 v5 v4
v1 v3
What can we say about v 0 and v 8 ?

(c) Show that Hamiltonian Circuit ≤p Hamiltonian Path.
(d) Conclude that the Hamiltonian Path problem is NP complete.
✓ 7.22 The Longest Path problem is to input a graph and find the longest simple
path in that graph.
(a) Find the longest path in this graph.
q0 q1 q2
q3 q4 q5
q6 q7 q8
(b) Remembering the technique for converting an optimization problem to a

language decision problem by using bounds, state this as a language decision
problem. Show that Longest Path ∈ NP.
(c) Show that the Hamiltonian Path problem reduces to Longest Path. Hint: lever-
age the bound from the prior item.
(d) Use the prior exercise to conclude that the Longest Path problem is NP com-
plete.
✓ 7.23 The Subset Sum problem inputs a multiset T and a target B ∈ N, and decides
if there is a subset T̂ ⊆ T whose elements add to the target. The Partition problem
inputs a multiset S and decides whether or not it has a subset Sˆ ⊂ S so that the
sum or elements of Sˆ equals the sum of elements not in that subset.
(a) Find a subset of T = { 3, 4, 6, 7, 12, 13, 19 } that adds to B = 30.
(b) Find a partition of S = { 3, 4, 6, 7, 12, 13, 19 } .
(c) Show that if the sum of the elements in a set is odd then the set has no
partition.
(d) Express each problem as a language decision problem.
(e) Prove that Partition ≤p Subset Sum. (Hint: handle separately the case where
the sum of elements in S is odd.)
(f) Conclude that Subset Sum is NP complete.
7.24 The 3-Satisfiability problem is to decide the satisfiability of propositional logic
expression where every clause consists of three literals (consisting of different
literals, the things between the ∨’s). The Independent Set problem inputs a
graph and a bound, and decides if there is a set of vertices, of size at least equal
to the bound, that are not connected to each other by an edge.
(a) Find an independent set in this graph.
q0 q1 q2
q3 q4 q5
(b) State Independent Set as a language decision problem.

(c) Decide if E = (P 0 ∨ ¬P 1 ∨ ¬P 2 ) ∧ (P 1 ∨ P 2 ∨ ¬P 3 ) is satisfiable.
(d) State 3-Satisfiability as a language decision problem.
(e) With the expression E , make a triangle for each of the two clauses, where the
vertices of the first are labeled v 0 , v 1 , and v 2 , while the vertices of the second
are labeled w 1 , w 2 , and w 3 . In addition to the edges forming the triangles,
also put one connecting v 1 with w 1 , and one connecting v 2 with w 2 .
(f) Sketch an argument that 3-Satisfiability ≤p Independent Set.
Section 8. Other classes 339
✓ 7.25 The difficulty in settling P = NP is to get lower bounds. That is, the trouble
lies in showing that the given problem cannot be solved by any algorithm without
such-and-such many steps. A common mistake is to think that any algorithm
must visit all of its input and then to produce a problem with lots of input. Show
that the successor function can be done on a Turing machine in constant time,
in only a few steps, so that the running time if the input is large is the same as
the time if the input is small. That is, show that this problem can be done with
an algorithm that does not visit all the input: on a Turing machine given input n
in unary, with the head under the leftmost 1, end with n + 1-many 1’s, with the
head under the leftmost 1.
7.26 If P = NP then what happens to NP complete sets?
7.27 Are there any problems in NP and not in P that are known to not be NP
complete?
7.28 Find three languages so that L2 ≤p L1 ≤p L0 , and L2 , L0 are NP complete,
while L1 ∈ P.
7.29 Prove that if P = NP then every L ∈ P is NP complete, except for the problems
of determining membership in the empty language and the full language, L =
and L = Σ∗ .
7.31 The class P has some nice closure properties, and so does NP.
(a) Prove that NP is closed under union, so that if L, L̂ ∈ NP then L ∪ L̂ ∈ NP.
(b) Prove that NP is closed under concatenation.
(c) Argue that no one can prove that NP is not closed under set complement.
7.32 Is the set of NP complete sets countable or uncountable?
7.33 We will sketch a proof that the Halting
problem is NP hard but not NP.
Consider the language HP = { ⟨Pe , x⟩ ϕ e (x)↓}. (a) Show that HP < NP.
(b) Sketch an argument that for any problem L ∈ NP, there is a polynomial time
computable verifier, f : B∗ → B∗ , such that σ ∈ L if and only if f (σ ) ∈ HP .
Section
V.8 Other classes
There are many other defined complexity classes. The first one below is very
natural in the light of what we have seen.
We have used the Satisfiability problem as a touchstone result among problems
in NP. We have discussed computing it using a nondeterministic Turing machine
that is unboundedly parallel, or alternatively using a witness and verifier. But,
naively, in the more familiar computational setting of a deterministic machine, it
appears that we must enumerate the truth table. That is, it appears to takes time
that is exponential.
EXP In this chapter’s first section we included O(2n ) and O(3n ), and by extension
other exponentials, in the list of commonly encountered orders of growth.
Whereas a first take on polytime is “can conceivably be used,” a first approx-
imation of EXP is that for some of its problems the best algorithms are just too
slow to imagine using. However, the big take-away from EXP is that it contains
nearly every problem that we concern ourselves with in practice. We can construct
theories about still harder problems, but EXP is of interest because it is big enough
that it contains most problems that we seriously hope to ever attack.
8.1 Definition A language decision problem is an element of the complexity class
EXP if there is an algorithm for solving it that runs in time O(b p(n) ) for some
constant base b and polynomial p .
Satisfiability can be solved in exponential time by checking each row of the
truth table, and any NP problem can be solved from Satisfiability with only an
addition of polytime. So EXP has this relationship to the classes we have already
studied.
8.2 Lemma P ⊆ NP ⊆ EXP
Proof Fix L ∈ NP. We can verify L on a deterministic Turing machine P in
polynomial time using a witness whose length is bounded by the same polynomial.
Let this problem’s bound be nc .
We will decide L in exponential time by brute-forcing it: we will use P to run
every possible verification. Trying any single witness requires polynomial time,
nc . Witnesses are in binary so for length ℓ there are 0 ≤i ≤ℓ 2i = 2ℓ+1 − 1 many
Í
c
possible ones; In total then, brute force requires O(nc 2n ) operations. Finish by
c n c n c
observing that n 2 is in O(2 ).
We don’t know whether there are any NP problems that absolutely require
exponential time. Conceivably NP is contained in a smaller deterministic time
complexity class — for instance, maybe Satisfiability can be solved in less than
exponential time. But we just don’t know.
Slow
..
.
Fast
8.3 Figure: The bean encloses all problems. Shaded are the three classes P, NP, and
EXP, with EXP’s outline highlighted. They are drawn with strict containment, which
most experts guess is the true arrangement, but no one knows for sure.
Time Complexity Researchers have generalized to many more classes, trying

to capture various aspects of computation. For instance, the impediment that a
programmer runs across first is time.

8.4 Definition Let f : N → N. A decision problem for a language is an element of
DTIME(f ) if it is decided by a deterministic Turing machine that runs in time O(f ).
A problem is an element of NTIME(f ) if it is decided by a nondeterministic Turing
machine that runs in time O(f ).
8.5 Lemma A problem is polytime, P, if it is a member of DTIME(nc ) for some

power c ∈ N.
Ø
P= DTIME(nc ) = DTIME(n) ∪ DTIME(n 2 ) ∪ DTIME(n 3 ) ∪ · · ·
c ∈N
The matching statements hold for NP and EXP.

Ø
NP = NTIME(nc ) = NTIME(n) ∪ NTIME(n 2 ) ∪ NTIME(n 3 ) ∪ · · ·
c ∈N
c 2 3
Ø
EXP = DTIME(2n ) = DTIME(2n ) ∪ DTIME(2n ) ∪ DTIME(2n ) ∪ · · ·
c ∈N
Proof The only equality that is not immediate is the last one. Recall that a
problem is in EXP if an algorithm for it that runs in time O(b p(n) ) for some constant
base b and polynomial p . The equality above only uses the base 2. To cover the
2 2
discrepancy, we will show that 3n ∈ O(2(n ) ). Consider limx →∞ 2(x ) /3x . Rewrite
the fraction as (2x /3)x , which when x > 2 is larger than (4/3)x , which goes to
infinity. This argument works for any base, not just b = 3.
8.6 Remark While the above description of NP reiterates its naturalness, as we saw
earlier, the characterization that proves to be most useful in practice is that a
problem L is in NP if there is a deterministic Turing machine such that for each
input σ there is a polynomial length witness ω and the verification on that machine
for σ using ω takes polytime.
Space Complexity We can consider how much space is used in solving a problem.
8.7 Definition A deterministic Turing machine runs in space s : B∗ → R+ if for
all but finitely many inputs σ , the computation on that input uses less than or
equal to s(|σ |)-many cells on the tape. A nondeterministic Turing machine runs
in space s if for all but finitely many inputs σ , every computation path on that
input takes less than or equal to t(|σ |)-many cells.
The machine must use less than or equal to s(|σ |)-many cells even on non-
accepting computations.
8.8 Definition Let s : N → N. A language decision problem is an element of
DSPACE(s), or SPACE(s), if that languages is decided by a deterministic Turing
machine that runs in space O(s). A problem is an element of NSPACE(s) if
the languages is decided by a nondeterministic Turing machine that runs in

space O(s).
The definitions arise from a sense we have of a symmetry between time and
space, that they are both examples of computational resources. (There are other
resources; for instance we may want to minimize disk reading or writing, which
may be quite different than space usage.) But space is not just like time. For one
thing, while a program can take a long time but use only a little space, the opposite
is not possible.
8.9 Lemma Let f : N → N. Then DTIME(f ) ⊆ DSPACE(f ). As well, this holds for
nondeterministic machines, NTIME(f ) ⊆ NSPACE(f ).
Proof A machine can use at most one cell per step.

8.10 Definition
Ø
PSPACE = DSPACE(nc ) = DSPACE(n) ∪ DSPACE(n 2 ) ∪ DSPACE(n 3 ) ∪ · · ·
c ∈N
Ø
NPSPACE = NSPACE(nc ) = NSPACE(n) ∪ NSPACE(n 2 ) ∪ NSPACE(n 3 ) ∪ · · ·
c ∈N
So PSPACE is the class of problems that can be solved by a deterministic Turing

machine using only a polynomially-bounded amount of space, regardless of how
long the computation takes.
As even those preliminary results suggest, restricting by space instead of time
allows for a lot more power.
8.11 Lemma P ⊆ NP ⊆ PSPACE
Proof For any problem in NP, check all possible witness strings ω . These take at
most polynomial space. If any proof string works then the answer to the problem
is ‘yes’. Otherwise, the answer is ‘no’.
Note that the method in the proof may take exponential time but it takes only
polynomial space.
Here is a result whose proof is beyond our scope, but that serves as a caution
that time and space are very different. We don’t know whether deterministic
polynomial time equals nondeterministic polynomial time, but we do know the
answer for space.
8.12 Theorem (Savitch’s Theorem) PSPACE = NPSPACE
We finish with a list of the most natural complexity classes.
8.13 Definition These are the canonical complexity classes
1. L = DSPACE(lg n), deterministic log space and NL = NSPACE(lg n), nonde-

terministic log space
2. P, deterministic polynomial time and NP, nondeterministic polynomial time

3. E = ∪k =1, 2, ... DTIME(k n ) and NE = ∪k =1, 2, ... DTIME(k n )

k
4. EXP = ∪k =1, 2, ... DTIME(2n ), deterministic exponential time and NEXP =
k
∪k =1, 2, ... NTIME(2n ), nondeterministic exponential time
5. PSPACE, deterministic polynomial space
k
6. EXPSPACE = ∪k =1, 2, ... DSPACE(2n ), deterministic exponential space
The Zoo Researchers have studied a great many complexity classes. There
are so many that they have been gathered into an online Complexity Zoo, at
complexityzoo.uwaterloo.ca/.
One way to understand these classes is that defining a class asks a type of
Theory of Computing question. For instance, we have already seen that asking
whether NP equals P is a way of asking whether unbounded parallelism makes any
essential difference — can a problem change from intractable to tractable if we
switch from a deterministic to a nondeterministic machine? Similarly, we know
that P ⊆ PSPACE. In thinking about whether the two are equal, researchers are
considering the space-time tradeoff: if you can solve a problem without much
memory does that mean you can solve it without using much time?
Here is one extra class, to give some flavor of the possibilities. For more, see
the Zoo.
The class BPP, Bounded-Error Probabilistic Polynomial Time, contains the
problems solvable by an nondeterministic polytime machine such that if the answer
is ‘yes’ then at least two-thirds of the computation paths accept and if the answer is
‘no’ then at most one-third of the computation paths accept. (Here all computation
paths have the same length.) This is often identified as the class of feasible problems
for a computer with access to a genuine random-number source. Investigating
whether BPP equals P is asking whether whether every efficient randomized
algorithm can be made deterministic: are there some problems for which there are
fast randomized algorithms but no fast deterministic ones?
On reading in the Zoo, a person is struck by two things. There are many, many
results listed — we know a lot. But there also are many questions to be answered —
breakthroughs are there waiting for a discoverer.
V.8 Exercises
✓ 8.14 Give a naive algorithm for each problem that is exponential. (a) Subset Sum
problem (b) k Coloring problem
2
8.15 Show that n ! is 2O(n ). Show that Traveling Salesman ∈ EXP.
✓ 8.16 This illustrates how large a problem can be and still be in EXP. Consider a
game that has two possible moves at each step. The game tree is binary.
(a) How many elementary particles are there in the universe?
(b) At what level of the game tree will there be more possible branches than
there are elementary particles?
(c) Is that longer than a chess game can reasonably run?
8.17 We will show that a polynomial time algorithm that calls a polynomial time
subroutine can run, altogether, in exponential time.
(a) Verify that the grade school algorithm for multiplication gives that squaring
an n -bit integer takes time O(n).
(b) Verify that repeated squaring of an n -bit integer gives a result that has length
2i n , where i is the number of squarings.
(c) Verify that if your polynomial time algorithm calls a squaring subroutine n
times then the complexity is O(4n n 2 ), which is exponential.
Extra
V.A RSA Encryption
One of the great things about the interwebs, besides that you can get free Theory
of Computing books, is that can buy stuff. You send a credit card number and a
couple of days later the stuff appears.
For this to be practical your credit card number must be kept secret. It must be
encrypted.
When you visit a web site using a https address, that site sends you information,
called a key, that your browser uses to encrypt your card number. The web site
then uses a different key to decrypt. This is an important point: the decrypter
must differ from than the encrypter since anyone on the net can see the encrypter
information that the site sent you. But the site keeps the decrypter information
private. These two, encrypter and decrypter, form a matched pair. We will describe
the mathematical technologies that make this work.
The arithmetic We can view that everything on a modern computer is numbers.

Consider the message ‘send money’. Its ASCII encoding is 115 101 110 100 32 109
111 110 101 121. Converting to a bitstring gives 01110011 01100101 01101110
01100100 00100000 01101101 01101111 01101110 01100101 01111001. In
decimal that’s 544 943 221 199 950 100 456 825. So there is no loss in generality
in viewing everything we do, including encryption systems, as numerical operations.
To make such systems, mathematicians and computer scientists have leveraged
that there are there are things we can do easily, but that we do not know how to
easily undo — that are numerical operations we can use for encryption that are
fast, but such that the operations needed to decrypt (without the decrypter) are
believed to be so slow that they are completely impractical. So this is engineering
Big-O.
We will describe an algorithm based on the Factoring Problem. We have
algorithms for multiplying numbers that are fast. The algorithms that we have
for starting with a number and decomposing it into factors are, by comparison,
Extra A. RSA Encryption 345
quite slow. To illustrate this, you might contrast the time it takes you to multiply
two four-digit numbers by hand with the time it takes you to factor an eight-digit
number chosen at random. Set aside an afternoon for that second job, it’ll take a
while.
The algorithm that we shall describe exploits the difference.
It was invented in 1976 by three graduate students, R Rivest,
A Shamir, and L Adleman. Rivest read a paper proposing key
pairs and decided to implement the idea. Over the course of
a year, he and Shamir came up with a number of ideas and for
each Adleman would then produce a way to break it. Finally
they thought to use Fermat’s Little Theorem. Adleman was Adi Shamir (b 1952), Ron
unable to break it since, he said, it seemed that only solving Rivest (b 1947), Leonard
Factoring would break it and no one knew how to do that. Their Adleman (b 1945)
algorithm, called RSA, was first announced in Martin Gardner’s
Mathematical Games column in the August 1977 issue of Scientific American. It
generated a tremendous amount of interest and excitement.
The basis of RSA is to find three numbers, a modulus n , an encrypter e , and a
decrypter d , related by this equation (here m is the message, as a number).
(me )d ≡ m (mod n)
The encrypted message is me mod n . To decrypt it, to recover m , calculate

(me )d mod n . These three are chosen so that knowing e and n, or even m , still
leaves a potential secret-cracker who is looking for d with an extremely difficult
job.
To choose them, first choose distinct prime numbers p and q . Pick these at
random so they are of about equal bit-lengths. Compute n = pq and φ(n) =
(p − 1) · (q − 1). Next, choose e with 1 < e < φ(n) that is relatively prime to n .
Finally, find d as the multiplicative inverse of e modulo n . (We shall show below
that all these operations, including using the keys for encryption and decryption,
can be done quickly.)
The pair ⟨n, e⟩ is the public key and the pair ⟨n, d⟩ is the private key. The
length of d in bits is the key length. Most experts consider a key length of 2 048
bits to be secure for the mid-term future, until 2030 or so, when computers will
have grown in power enough that they may be able to use an exhaustive brute-force
search to find d .
A.1 Example Alice chooses the primes p = 101 and q = 113 (these are too small to use
in practice but are good for an illustration) and then calculates n = pq = 11 413
and φ(n) = (p − 1)(q − 1) = 11 200. To get the encrypter she randomly picks
numbers 1 < e < 11 200 until she gets one that is relatively prime to 11 200,
choosing e = 3533. She publishes her public key ⟨n, e⟩ = ⟨11 413, 3 533⟩ on her
home page. She computes the decrypter d = e −1 mod 11 200 = 6 597, and finds a
safe place to store her private key ⟨n, d⟩ = ⟨11 413, 6 597⟩ .
Bob wants to say ‘Hi’. In ASCII that’s 01001000 01101001. If he converted that
string into a single decimal number it would be bigger than n so he breaks it into
two substrings, getting the decimals 72 and 105. Using her public key he computes
723533 mod 11413 = 10496 10535333 mod 11413 = 4861
and sends Alice the sequence ⟨10496, 4861⟩ . Alice recovers his message by using
her private key.
104966597 mod 11413 = 72 48616597 mod 11413 = 105

The arithmetic, fast We’ve just illustrated that RSA uses invertible operations.
There are lots of ways to get invertible operations so our understanding of RSA is
incomplete unless we know why it uses these particular operations. As discussed
above, the important point is that they can be done quickly, but undoing them,
finding the decrypter, is believed to take a very long time.
We start with a classic, beautiful, result from Number Theory.
A.2 Theorem (Prime Number Theorem) The number of primes less than n ∈ N is
approximately n/ln(n); that is, this limit is 1.
number of primes less than x

lim
x →∞ (x/ln x)
This shows the number of primes less than n for some values up to a million.
100000 (1 000 000, 78 498)
x / ln(x )
50000
500000 1000000
This theorem says that primes are common. For example, the number of primes
less than 21024 is about 21024 /ln(21024 ) ≈ 21024 /709.78 ≈ 21024 /29 . 47 ≈ 21015 .
Said another way, if we choose a number n at random then the probability that it
is prime is about 1/ln(n) and so a random number that is 1024 bits long will be
a prime with probability about 1/(ln(21024 )) ≈ 1/710. On average we need only
select 355 odd numbers of about that size before we find a prime. Hence we can
efficiently generate large primes by just picking random numbers, as long as we
can efficiently test their primality.
On our way to giving an efficient way to test primality, we observe that the
operations of multiplication and addition modulo m are efficient. (We will give
examples only, rather than the full analysis of the operations.)
A.3 Example Multiplying 3 915 421 by 52 567 004 modulo 3 looks hard. The naive
approach is to first take their product and then divide by 3 to find the remainder.
But there is a more efficient way. Rather than multiply first and then reduce
modulo m , reduce first and then multiply. That is, we know that if a ≡ b (mod m)
Extra A. RSA Encryption 347
and c ≡ d (mod m) then ac ≡ bd (mod m) and so since 3 915 421 ≡ 1 (mod 3)

and 52 567 004 ≡ 2 (mod 3) we have this.
3 915 421 · 52 567 004 ≡ 1 · 2 (mod 3)

Similarly, exponentiation modulo m is also efficient, both in time and in space.
A.4 Example Consider raising 4 to the 13-th power, modulo m = 497. The naive
approach would be to raise 4 to the 13-th power, which is a very large number, and
reduce modulo 497. But there is a better way.
Start by expressing the power 13 in base 2 as 13 = 8 + 4 + 1 = 11012 . So,
413 = 48 · 44 · 41 and we need the 8-th power, the 4-th power, and the first power
of 4. If we can efficiently get those powers then we can multiply them modulo m
efficiently, so we will be set.
Get the powers by repeated squaring (modulo m ). Start with p = 1. Squaring
gives 42 , then squaring again gives 44 , and squaring again gives 48 . Getting these
powers (modulo m ) just requires a multiplication, which we can do efficiently.
The last thing we need for efficiently testing primality is to efficiently find the
multiplicative inverse modulo m . Recall that two numbers are relatively prime
or coprime if their greatest common divisor is 1. For example, 15 = 3 · 5 and
22 = 2 · 11 are relatively prime.
A.5 Lemma If a and m are relatively prime then there is an inverse for a modulo m , a
number k such that a · k ≡ 1 (mod m)
Proof Because the greatest common divisor of a and m is 1, Euclid’s algorithm
gives a linear combination of the two, a sa + tm for some s, t ∈ Z, that adds to 1.
Doing the operations modulo m gives sa + tm ≡ 1 (mod m). Since tm is a multiple
of m , we have tm ≡ 0 (mod m), leaving sa ≡ 1 (mod m), and s is the inverse of a
modulo m .
Euclid’s algorithm is efficient, both in time and space, so finding an inverse
modulo m is efficient.
Now we can test for primes. The simplest way to test whether a number n is
prime is to try dividing n by all possible factors. But that is very slow. There is a
faster way, based on the next result.
A.6 Theorem (Fermat) For a prime p , if a ∈ Z is not divisible by p then ap−1 ≡ 1
(mod p).
Proof Let a be an integer not divisible by the prime p . Multiply a by each number
i ∈ { 1, ... p − 1 } and reduce modulo p to get the numbers r i = ia mod p .
We will show that the set R = {r 1 , ... rp−1 } equals the set P = { 1, ... p − 1 }.
First, R ⊆ P . Because p is prime and does not divide i or a , it does not divide their
product ia . Thus r i = ia . 0 (mod p) and so all the r i are members of the set
{ 1, ... p − 1 }.
To get inclusion the other way, P ⊆ R , note that if i 0 , i 1 then r i 0 , r i 1 . For,
with r 0 − r 1 = i 0a − i 1a = (i 0 − i 1 )a , because p is prime and does not divide i 0 − i 1

or a (as each is smaller in absolute value than p ), it does not divide their product.
That means that the two sets have the same number of elements, so P ⊆ R .
Now multiply together all of the elements of that set.
a · 2a · · · (p − 1)a ≡ 1 · 2 · · · (p − 1) (mod p)
p−1
(p − 1)! · a ≡ (p − 1)! (mod p)
Canceling the (p − 1)!’s gives the result.
A.7 Example Let the prime be p = 7. Any number a with 0 < a < p is not divisible
by p . Here is the list.
a 1 2 3 4 5 6
a 7−1 1 64 729 4 096 15 625 46 656
(a 6 − 1)/7 0 9 104 585 2 232 6 665
For instance, 15 625 = 7 · 2 232 + 1.

Given n , if we find a base a with 0 < a < n so that an−1 mod n is not 1 then n
is not prime.
A.8 Example Consider n = 415 692. If a = 2 then 2415692 ≡ 58346 (mod 415693)
so n is not prime.
There are n ’s where an − 1 ≡ 1 (mod n) but n is not prime. Such a number
is a Fermat liar or Fermat pseudoprime with base a . One for base a = 2 is
n = 341 = 11 · 31. However, computer searches suggest that these are very rare.
The rarity of exceptions suggests that we use a probabilistic primality test: given
n ∈ N to test for primality, pick at random a base a with 0 < a < n and calculate
whether an − 1 ≡ 1 (mod n). If that is not true then n is not prime.† If it is true
then we have evidence that n is prime.
Researchers have shown that if n is not prime then each choice of base a has
a greater than 1/2 chance of finding that an − 1 ≡ 1 (mod n). So if n were not
prime and we did the test with two different bases a 0 , a 1 then there would be a less
than (1/2)2 chance of getting both an0 − 1 ≡ 1 (mod n) and an1 − 1 ≡ 1 (mod n).
So we figure that there is at least a 1 − (1/2)2 chance that n is prime. After k -many
iterations of choosing a base, doing the calculation, and never finding that that n is
not prime, then we have a greater than 1 − (1/2)k chance that n is prime.
In summary, if n passes k -many tests for any reasonable-sized k then we are
quite confident that it is prime. Our interest in this test is that it is extremely fast;
it runs in time O(k · (log n)2 · log log n · log log log n). So we can run it lots of times,
becoming very confident, in not very much time.
†
In this case a is a witness to the fact that n is not prime.
Extra B. Tractability and good-enoughness 349
A.9 Example We could test whether n = 7 is prime by computing, say, that 36 ≡ 1

(mod 7), and 56 ≡ 1 (mod 7), and 66 ≡ 1 (mod 7). The fact that n = 7 does not
fail makes us confident it is prime.
The RSA algorithm also uses this offshoot of Fermat’s Little Theorem.
A.10 Corollary Let p and q be unequal primes and suppose that a is not divisible by
either one. Then a (p−1)(q−1) ≡ 1 (mod n).
Proof By Fermat, ap−1 ≡ 1 (mod p) and aq−1 ≡ 1 (mod q). Raise the first to the
q − 1 power and the second to the p − 1 power.
a (p−1)(q−1) ≡ 1 (mod p) a (p−1)(q−1) ≡ 1 (mod q)
Since a (p−1)(q−1) − 1 is divisible by both p and q , it is divisible by their product
pq = n .
Experts think that the most likely attack on RSA encryption is by factoring the
modulus n . Anyone who factors n can use the same method as the RSA key setup
process to turn the encrypter e into the decrypter d . That’s why n is taken to be
the product of two large primes; it makes factoring as hard as possible.
There is a factoring algorithm that takes only O(b 3 ) time (and O((b) space),
called Shor’s algorithm. But it runs only on quantum computers. At this moment
there are no such computers built, although there has been progress on that. For
the moment, RSA seems safe. (There are schemes that could replace it, if needed.)
V.A Exercises
✓ A.11 There are twenty five primes less than or equal to 100. Find them.
✓ A.12 We can walk through an RSA calculation.
(a) For the primes, take p = 11, q = 13. Find n = pq and φ(n) = (p − 1) · (q − 1).
(b) For the the encoder e use the smallest prime 1 < e < φ(n) that is relatively
prime with φ(n).
(c) Find the decoder d , the multiplicative inverse of e modulo n . (You can uses
Euclid’s algorithm, or just test the candidates.)
(d) Take the message to be represented as the number m = 9. Encrypt it and
decrypt it.
A.13 To test whether a number n is prime, we could just try dividing it by all
numbers less than it.
(a) Show that we needn’t try all numbers less than n , instead we can just try
√
all k with 2 ≤ k ≤ n . √
(b) Show that we cannot lower that any further than n .
(c) For input n = 1012 how many numbers would you need to test?
(d) Show that this is a terrible algorithm since it is exponential in the size of the
input.
A.14 Show that the probability that a random b -bit number is prime is about 1/b .
Extra
V.B Tractability and good-enoughness
Are we taking the right approach to characterizing the behavior of algorithms, to

understanding the complexity of computations?
A theory shapes the way that you look at the world. For instance, Newton’s
F = ma points to an approach to analyzing physical situations: if you see a change,
look for a force. That approach has been fantastically successful, enabling us to
build bridges, send people to the moon, etc.
So we should ask if our theory is right. Of course, the theorems are right —
the proofs check out, the results stand up to formalization, etc. But it is healthy
to examine the current approach to ask whether there is a better way to see the
problems in front of us.
In the theory we’ve outlined, Cobham’s Thesis identifies P with the tractable
problems. However, the situation today is not so neat.
First, there are some problems known to be in P for which we do not know a
practical approach. For one thing, as we discussed when we introduced Cobham’s
Thesis, a problem for which the smallest possible algorithm is O(n 1000 ) is not
practical. True, for problems that are announced with best known algorithms
having such huge exponents, over time researchers improve the algorithm and
the exponents drop, but nonetheless there are problems in the current literature
associated with impractical exponents. And also not practical is when an algorithm
is O(n 2 ) but whose running time on close inspection proves to be something like
21000n 2 .
On the other side of the ledger we have problems not
known to be in P for which we have solutions good enough
for practice.
One such problem is the Traveling Salesman problem.
Experts believe that it is not in P, since it is NP complete, but London pubs, courtesy
nonetheless there exist algorithms that can in a reasonable Google Earth
time find solutions for problem instances involving millions
of nodes, with a high probability finding a path just two or three percent away from
the optimal solution. An example is that recently a group of applied mathematicians
solved the the minimal pub crawl, the shortest route to visit all 24 727 UK pubs. The
optimal tour is 45 495 239 meters. The algorithm took 305.2 CPU days, running in
parallel on up to 48 cores on Linux servers.
In May 2004, the Traveling Salesman instance of visiting all 24 978 cities
in Sweden was solved, giving a tour of about 72 500 kilometers. The
approach was to find a nearly-best solution and then use that to find the
best one. The final stages, that improved the lower bound by by 0.000 023
percent, required eight years of computation time running in parallel on
a network of Linux workstations.
There are many results that give answers that are practical for problems
Tour of Swe-
den
Extra B. Tractability and good-enoughness 351
that our theory suggests are intractable. And many problems that are
attackable in theory but that turn out to be awkward in practice. So much
more work needs to be done.
Part Four
Appendix
Appendix A. Strings
An alphabet is a nonempty and finite set of symbols (sometimes called tokens). We
write symbols in a distinct typeface, as in 1 or a, because the alternative of quoting
them would be clunky.† A string or word over an alphabet is a finite sequence
of elements from that alphabet. The string with no elements is the empty string,
denoted ε .
One potentially surprising aspect of a ‘symbol’ is that a symbol may contain
more than one glyph. For instance, a programming language may have if as a
symbol, indecomposable into separate letters. For example, the Scheme alphabet
contains the symbols or and car, and allows variable names such as a, x, or
lastname. An example of a string is (or a ready), which is a sequence of five
alphabet elements ⟨(, or, a, ready, )⟩ .
Traditionally, alphabets are denoted with the Greek letter Σ. We will name
strings with lower case Greek letters, and denote the items in the string with the
associated lower case roman letter, as in σ = ⟨s 0 , ... , si−1 ⟩ or τ = ⟨t 0 , ... , t j−1 ⟩ . In
place of si we may write σ [i]. We also may write σ [i : j] for the substring between
terms i and j , including the first term but not the second, and we write σ [i :] for
the tail substring that starts with term i . We also use σ [−1] for the final character,
σ [−2] for the one before it, etc. For the string σ = ⟨s 0 , ... , sn−1 ⟩ , the length |σ | is
the number of symbols that it contains, n . In particular, the length of the empty
string is |ε | = 0.
The diamond brackets and commas are ungainly. For small-scale examples
and exercises, we use the shortcut of working with alphabets of single-character
symbols and then writing strings by omitting the brackets and commas. That is,
we write abc instead of ⟨a, b, c⟩ .‡ This convenience comes with the disadvantage
that without the diamond brackets the empty string is just nothing, which is why
we use the separate symbol ε .§
The alphabet consisting of the zero and one characters is B = { 0, 1 }. Strings
over this alphabet are bitstrings or bit strings.||
Where Σ is an alphabet, for k ∈ N the set of length k strings over that alphabet
is Σk . The set of strings over Σ of any (finite) length is Σ∗ = ∪k ∈N Σk . The asterisk
is the Kleene star, read aloud as “star.”
Strings are simple, so there are only a few operations. Let σ = ⟨s 0 ... , si−1 ⟩
and τ = ⟨t 0 , ... , t j−1 ⟩ be strings over an alphabet Σ. The concatenation σ ⌢τ or στ
appends the second sequence to the first: σ ⌢τ = ⟨s 0 ... , si−1 , t 0 , ... , t j−1 ⟩ . Where
†
We give them a distinct look to distinguish the symbol ‘a’ from the variable ‘a ’, so that we can tell “let
x = a” apart from “let x = a .” Symbols are not variables — they don’t hold a value, they are themselves
a value. ‡ To see why when we drop the commas we want the alphabet to consist of single-character
symbols, consider Σ = { a, aa } and the string aaa. Without the commas this string is ambiguous: it
could mean ⟨a, aa ⟩ , or ⟨aa, a ⟩ , or ⟨a, a, a ⟩ . § Omitting the diamond brackets and commas also blurs
the distinction between a symbol and a one-symbol string, between a and ⟨a ⟩ . However, dropping the
brackets it is so convenient that we accept this disadvantage. || Caution: in some contexts authors
consider infinite bitstrings, although ours will always be finite.
σ = τ0 ⌢ · · · ⌢τk −1 then we say that σ decomposes into the τ ’s and that each τi is a
substring of σ . The first substring, τ0 , is a prefix of σ . The last, τk −1 , is a suffix.
A power or replication of a string is an iterated concatenation with itself, so
that σ 2 = σ ⌢σ and σ 3 = σ ⌢σ ⌢σ , etc. We write σ 1 = σ and σ 0 = ε . The reversal
σ R of a string takes the symbols in reverse order: σ R = ⟨si−1 , ... , s 0 ⟩ . The empty
string’s reversal is ε R = ε .
For example, let Σ = { a, b, c } and let σ = abc and τ = bbaac. Then the
concatenation στ is abcbbaac. The third power σ 3 is abcabcabc, and the reversal
τ R is caabb. A string that equals its own reversal is a palindrome; examples are
α = abba, β = cdc, and ε .
Exercises
A.1 Let σ = 10110 and τ = 110111 be bit strings. Find each. (a) σ ⌢τ (b) σ ⌢τ ⌢σ
(c) σ R (d) σ 3 (e) 03 ⌢ σ
A.2 Let the alphabet be Σ = { a, b, c }. Suppose that σ = ab and τ = bca. Find
each. (a) σ ⌢τ (b) σ 2 ⌢τ 2 (c) σ R ⌢τ R (d) σ 3
A.3 Let L = {σ ∈ B∗ |σ | = 4 and σ starts with 0 }. How many elements are in

that language?
A.4 Suppose that Σ = { a, b, c } and that σ = abcbccbba. (a) Is abcb a prefix of σ ?
(b) Is ba a suffix? (c) Is bab a substring? (d) Is ε a suffix?
A.5 What is the relation between |σ | , |τ | , and |σ ⌢ τ | ? You must justify your
answer.
A.6 The operation of string concatenation follows a simple algebra. For each
of these, decide if it is true. If so, prove it. If not, give a counterexample.
R
(a) α ⌢ε = α and ε ⌢α = α (b) α ⌢β = β ⌢α (c) α ⌢ β R = β R ⌢α R (d) α R = α
R
(e) α i = α i
A.7 Show that string concatenation is not commutative, that there are strings σ
and τ so that σ ⌢τ , τ ⌢ σ .
A.8 In defining decomposition above we have ‘σ = τ0 ⌢ · · · ⌢ τn−1 ’, without
parentheses on the right side. This takes for granted that the concatenation
operation is associative, that no matter how we parenthesize it we get the same
string. Prove this. Hint: use induction on the number of substrings, n .
A.9 Prove that this constructive definition of string power is equivalent to the one
above.
(
ε – if n = 0
σn =
σ n−1 ⌢ σ – if n > 0
Appendix B. Functions
A function is an input-output relationship: each input is associated with a unique
output. An example is the association of each input natural number with an
output number that is twice as big. Another is the association of each string
of characters with the length of that string. A third is the association of each
polynomial an x n + · · · + a 1x + a 0 with a Boolean value T or F , depending on
whether 1 is a root of that polynomial.
For the precise definition, fix two sets, a domain D and a codomain C . A
function, or map, f : D → C is a set of pairs (x, y) ∈ D × C , subject to the
restriction of being well-defined, that every x ∈ D appears in one and only one
pair (more on this below). We write f (x) = y or x 7→ y and say ‘x maps to y ’.
(Note the difference between the arrow symbols in f : D → C and x 7→ y ). We
say that x is an input or argument to the function, and that y is an output or value.
An important point is what a function isn’t: it isn’t a formula or rule. The
function that gives the US presidents, f (0) = George Washington, etc., has no
sensible formula and isn’t determined by any rule less complex than an exhaustive
listing of cases. The same holds for a function that returns winners of the US
World Series, including next year’s winner. True, many functions are described by
a formula, such as E(m) = mc 2 , and as well, many functions are computed by a
program. But what makes something a function is that for each input there is one
and only one associated output. If we can calculate the outputs from the inputs,
that’s great, but that is not required.
Some functions take more than one input, for instance dist(x, y) = x 2 + y 2 .
p
We say that dist is 2-ary, and other functions are 3-ary, etc. The number of inputs
is the function’s arity. If the function takes only one input but that input is a tuple,
as with x = (3, 5), then we often drop the extra parentheses, so that instead of
f (x) = f ((3, 5)) we write f (3, 5).
Pictures We often illustrate functions using the familiar xy axes; here are graphs
of f (x) = x 3 and f (x) = ⌊x⌋ .
20
2
10
−2 2
2
−10
f (x) = x 3 f (x) = ⌊x⌋

−20
−4
We also illustrate functions with a bean diagram, which separates the domain and
the codomain sets. Below on the left is the action of the exclusive or operator.
−3 −2 −1 0 1 2 3
F, F
F ,T F
T,F
T
T ,T
−3 −2 −1 0 1 2 3
On the right is a variant of the bean diagram, using the number line to show the
absolute value function mapping integers to integers.
Codomain and range Where S ⊆ D is a subset of the domain, its image is

the set f (S) = { f (s) s ∈ S }. Thus, under the floor function, the image of the
positive reals is the nonnegative reals. Under the squaring function the image of
S = { 0, 1, 2 } is f (S) = { 0, 1, 4 }.
The image of the entire domain is the range of the function, ran(f ) = f (D) =
{ f (s) s ∈ S }. For instance, the range of the floor function д : R → R given
by д(x) = ⌊x⌋ is the integers. Note the difference between the range and the
codomain; the codomain is a convenient superset. An example is that for the real
function f (x) = x 6 − 31x 5 + 13x 2 + 4 we would usually write f : R → R, taking
the codomain to be all of the reals, rather than trouble with its exact range.
Domain Sometimes the function’s domain needs attention. Examples of such

functions are that f (x) = 1/x is undefined at x = 0, and that the infinite series
д(r ) = 1 + r + r 2 + · · · diverges when r is outside the interval (−1 .. 1). Formally,
when we define the function we must specify the domain to eliminate such
problems, for instance by defining the domain of f as R − { 0 }. However, we are
often casual about this.
In particular, in this subject we often have a function f : N → N such that on
some elements of the domain the function is undefined. We say that f is a partial
function. If instead it is defined on all inputs then it is a total function.
We sometimes have a function f : D → C and want to cut the domain back
to some subset S ⊆ D . The restriction f ↾S is the function with domain S and
codomain C defined by f ↾S (x) = f (x).
Well-defined The definition of a function contains the condition that each domain
element maps to one and only one codomain element, y = f (x). We refer to this
condition by saying that functions are well-defined.
When we are considering a relationship between x ’s and y ’s and asking if it is
a function, well-definedness is typically what is at issue.† For instance, consider
the set of ordered pairs (x, y) where the square of y is x . If x = 9 then both
y = 3 and y = −3 are related to x , so this is not a functional relationship — it is
not well-defined — because there is not one and only one y for each x . Another
example is that when setting up a company’s email we may decide to use each
†
Sometimes people say that they are, “checking that the function is well-defined.” In a strict sense this
is confused, because if it is a function then it is by definition well-defined. However, while all tigers
have stripes, we do sometimes say “striped tiger.” Natural language is funny that way.
person’s first initial and last name, but the problem is that there can easily be more
than one, say, mdouglas. The issue here is that the relation (email, person) is not
well-defined.
If a function is suitable for graphing on xy axes then visual proof of well-
definedness is that for any x in the domain, the vertical line at x intercepts the
graph in one and only one point.
One-to-one and onto The definition of function has an asymmetry: among the
ordered pairs (x, y), it requires that each domain element x be in one pair and
only one pair, but it does not require the same of the codomain elements.
A function is one-to-one (or 1-1, or an injection) if each codomain element y is
in at most one pair. The function below is one-to-one because for every element y
in the codomain bean, the bean on the right, there is at most one arrow ending
at y .
The most common way to verify that a function f is one-to-one is to assume that
f (x 0 ) = f (x 1 ) and then deduce that therefore x 0 = x 1 . If a function is suitable
for graphing on xy axes then visual proof is that for any y in the codomain, the
horizontal line at y intercepts the graph in at most one point.
A function is onto (or a surjection) if each codomain element y is in at least
one pair. Thus, a function is onto if its codomain equals its range. The function
below is onto because every element in the codomain bean has at least one arrow
ending at it.
The most common way to verify that a function is onto is to start with a codomain
element y and then produce a domain element x that maps to it. If a function is
suitable for graphing on xy axes then visual proof is that for any y in the codomain,
the horizontal line at y intercepts the graph in at least one point.
As the above pictures suggest, where the domain and codomain are finite,
when there is a function f : D → C then we can conclude some things about the
number of elements in the sets. The first is that the number of elements in the
domain is less than or equal to the number in the codomain. The other is that if
the function is onto then the number of elements in the domain equals the number
in the codomain if and only if the function is one-to-one.
Correspondence A function is a correspondence (or bijection) if it is both one-to-
one and onto. The picture on the left shows a correspondence between two finite
sets, both with four elements, and the one on the right shows a correspondence
between the natural numbers and the primes.
0 1 2 3 4 5 6 7 ...
...
2 3 5 7 11 13 17 19
The most common way to verify that a function is a correspondence is to separately

verify that it is one-to-one and that it is onto. If the function is suitable for graphing
on xy axes then visual proof is that for any y in the codomain, the horizontal line
at y intercepts the graph in exactly one point.
As the picture on the left above suggests, where the domain and codomain are
finite, if a function is a correspondence then its domain has the same number of
elements as its codomain.
Composition and inverse If f : D → C and д : C → B then their composition

д ◦ f : D → B is defined by д ◦ f (d) = д( f (d) ). For instance, the real functions
f (x) = x 2 and д(x) = sin(x) combine to give д ◦ f = sin(x 2 ).
Composition does not commute. Using the functions from the prior paragraph,
f ◦ д = sin(x 2 ) is different from f ◦ д = (sin x)2 , since they are unequal when
x = π . Composition can fail to commute more dramatically: if f : R2 → R is given
by f (x 0 , x 1 ) = x 0 , and д : R → R is д(x) = x , then д ◦ f (x 0 , x 1 ) = x 0 is perfectly
sensible but composition in the other order is not even defined.
The composition of one-to-one functions is one-to-one, and the composition of
onto functions is onto. Of course then, the composition of correspondences is a
correspondence.
An identity function id : D → D is given by id(d) = d for all d ∈ D . It acts as
the identity element in function composition, so that if f : D → C then f ◦ id = f
and if д : C → D then id ◦д = д. As well, if h : D → D then h ◦ id = id ◦h = h .
Given f : D → C , if д ◦ f is the identity function then д is a left inverse function
of f , or what is the same thing, f is a right inverse of д. If д is both a left and right
inverse of f then we simply say that it is an inverse (or two-sided inverse) of f ,
and denoted it as f −1 . If a function has an inverse then that inverse is unique. A
function has a two-sided inverse if and only if it is a correspondence.
Exercises
B.1 Let f : R → R be f (x) = 3x + 1 and д : R → R be д(x) = x 2 + 1. (a) Show
that f is one-to-one and onto. (b) Show that д is not one-to-one and not onto.
B.2 Show each.
(a) Let д : R3 → R2 be the projection map (x, y, z) 7→ (x, y) and let f : R2 → R3
be the map (x, y) 7→ (x, y, 0). Then д is a left inverse of f but not a right
inverse.
(b) The function f : Z → Z given by f (n) = n 2 has no left inverse.
(c) Where D = { 0, 1, 2, 3 } and C = { 10, 11 } , the function f : D → C given by
0 7→ 10, 1 7→ 11, 2 7→ 10, 3 7→ 11 has more than one right inverse.
B.3 (a) Where f : Z → Z is f (a) = a + 3 and д : Z → Z is д(a) = a − 3, show
that д is inverse to f .
(b) Where h : Z → Z is the function that returns n + 1 if n is even and returns
n − 1 if n is odd, find a function inverse to h .
(c) If s : R+ → R+ is s(x) = x 2 , find its inverse.
B.4 Let D = { 0, 1, 2 } and C = { 10, 11, 12 }. Also let f , д : D → C be f (0) = 10,
f (1) = 11, f (2) = 12, and д(0) = 10, д(1) = 10, д(2) = 12. Then: (a) verify
that f is a correspondence (b) construct an inverse for f (c) verify that д is not
a correspondence (d) show that д has no inverse.
B.5 (a) Prove that a composition of one-to-one functions is one-to-one. (b) Prove
that a composition of onto functions is onto. With the prior item this gives that a
composition of correspondences is a correspondence. (c) Prove that if д ◦ f is
one-to-one then f is one-to-one. (d) Prove that if д ◦ f is onto then д is onto.
(e) If д ◦ f is onto, is f onto? If it is one-to-one, is д one-to-one?
B.6 Prove.
(a) A function has an inverse if and only if that function is a correspondence.
(b) If a function has an inverse then that inverse is unique.
(c) The inverse of a correspondence is a correspondence.
(d) If f and д are each invertible then so is д ◦ f , and (д ◦ f )−1 = f −1 ◦ д−1 .
B.7 Let D and C be finite sets. Prove that if there is a correspondence f : D → C
then the two have the same number of elements. Hint: for each you can do
induction either on |C | or |D| .
(a) If f is one-to-one then |C | ≥ |D| .
(b) If f is onto then |C | ≤ |D| .
Part Five
Notes
Notes
These are citations or discussions that supplement the text body. Each refers to a word or phrase from that text
body, in italics, and then the note is in plain text. Many of the entries include links to more detail.
Cover
Calculating the bonus http://www.loc.gov/pictures/item/npc2007012636/
Preface
in addition to technical detail, also attends to a breadth of knowledge S Pinker emphasizes that a liberal approach
involves understanding in a context (Pinker 2014). “It seems to me that educated people should know something
about the 13-billion-year prehistory of our species and the basic laws governing the physical and living world,
including our bodies and brains. They should grasp the timeline of human history from the dawn of agriculture
to the present. They should be exposed to the diversity of human cultures, and the major systems of belief
and value with which they have made sense of their lives. They should know about the formative events in
human history, including the blunders we can hope not to repeat. They should understand the principles behind
democratic governance and the rule of law. They should know how to appreciate works of fiction and art as
sources of aesthetic pleasure and as impetuses to reflect on the human condition. On top of this knowledge,
a liberal education should make certain habits of rationality second nature. Educated people should be able
to express complex ideas in clear writing and speech. They should appreciate that objective knowledge is
a precious commodity, and know how to distinguish vetted fact from superstition, rumor, and unexamined
conventional wisdom. They should know how to reason logically and statistically, avoiding the fallacies and
biases to which the untutored human mind is vulnerable. They should think causally rather than magically, and
know what it takes to distinguish causation from correlation and coincidence. They should be acutely aware of
human fallibility, most notably their own, and appreciate that people who disagree with them are not stupid
or evil. Accordingly, they should appreciate the value of trying to change minds by persuasion rather than
intimidation or demagoguery.” See also https://www.aacu.org/leap/what-is-a-liberal-education
Who among us has not had an Ah-ha! moment I ask the indulgence of these experts. I believe that I have used their
informal speech only in appropriate contexts, but I would like to be corrected if not.
Prologue
Entscheidungsproblem Pronounced en-SHY-dungs-problem.
D Hilbert and W Ackermann Hilbert was a very prominent mathematician, perhaps the world’s most prominent
mathematician, and Ackermann was his student. So they made an impression when they wrote, “[This] must
be considered the main problem of mathematical logic” (Hilbert and Ackermann 1950), p 73.
mathematical statement Specifically, the statement as discussed by Hilbert and Ackermann comes from a first-order
logic (versions of the Entscheidungsproblem for other systems had been proposed by other mathematicians).
First-order logic differs from propositional logic, the logic of truth tables, in that it allows variables. Thus for
instance if you are studying the natural numbers then you can have a Boolean function Prime(x). (In this
context a Boolean function is traditionally called ‘predicate’.) To make a statement that is either true or false we
must then quantify statements, as in the (false) statement “for all x ∈ N, Prime(x) implies PerfectSquare(x).”
The modifier “first-order” means that the variables used by the Boolean functions are members of the domain
of discourse (for Prime above it is N), but we cannot have that variables themselves are Boolean functions.
(Allowing Boolean functions to take Boolean functions as input is possible, but would make this a second-order,
or even higher-order, logic.)
after a run He was 22 years old at the time. (Hodges 1983), p 96. This book is the authoritative source for Turing’s
fascinating life. During the Second World War, he led a group of British cryptanalysts at Bletchley Park, Britain’s
code breaking center, where his section was responsible for German naval codes. He devised a number of
techniques for breaking German ciphers, including an electromechanical machine that could find settings for
the German coding machine, the Enigma. Because the Battle of the Atlantic was critical to the Allied war effort,
and because cracking the codes was critical to defeating the German submarine effort, Turing’s work was very
important. (The major motion picture on this The Imitation Game (Wikipedia 2016) is a fun watch but is not a
slave to historical accuracy.) After the war, at the National Physical Laboratory he made one of the first designs
for a stored-program computer. In 1952, when it was a crime in the UK, Turing was prosecuted for homosexual
acts. He was given chemical castration as an alternative to prison. He died in 1954 from cyanide poisoning
which an inquest determined was suicide. In 2009, following an Internet campaign, British Prime Minister
Gordon Brown made an official public apology on behalf of the British government for “the appalling way he
was treated.”
Olympic marathon His time at the qualifying event was only ten minutes behind what was later the winning time
in the 1948 Olympic marathon. For more, see https://www.turing.org.uk/book/update/part6.html and
http://www-groups.dcs.st-and.ac.uk/~history/Extras/Turing_running.html.
clerk Before the engineering of computing machines had advanced enough to make capable machines widely
available, much of what we would today do with a program was done by people, then called “computers.” This
book’s cover shows human computers at work.
Katherine Johnson, b 1918
Another example is that, as told in the film Hidden Figures, the trajectory for US astronaut John Glenn’s
pioneering orbit of Earth was found by the human computer Katherine Johnson and her colleagues, African
American women whose accomplishments are all the more impressive because they occurred despite appalling
discrimination.
don’t involve random methods We can build things that return completely random results; one example is a device
that registers consecutive clicks on a Geiger counter and if the second gap between clicks is longer then the
first it returns 1, else it returns 0. See also https://blog.cloudflare.com/randomness-101-lavarand-in-
production/.
analog devices See (A/V Geeks 2013) about slide rules, (Wikipedia contributers 2016c) about nomograms,
(navyreviewer 2010) about a naval firing computer, and (Unknown 1948) about a more general-purpose machine.
reading results off of a slide rule or an instrument dial Suppose that an intermediate result of a calculation is 1.23.
If we read it off the slide rule with the convention that the resolution accuracy is only one decimal place then we
write down 1.2. Doubling that gives 2.4. But doubling the original number 2 · 1.23 = 2.46 and then rounding
to one place gives 2.5.
no upper bound This explication is derived from (Rogers 1987), p 1–5.
more is provided Perhaps the clerk has a helper, or the mechanism has a tender.
A reader may object that this violates the goal of the definition, to model physically-realizable computations We often
describe computations that do not have a natural resource bound. The algorithm for long division that we learn
in grade school has no inherent bounds on the lengths of either inputs or outputs, or on the amount of available
scratch paper.
are so elementary that we cannot easily imagine them further divided (Turing 1937)
LEGO’s See for instance https://www.youtube.com/watch?v=RLPVCJjTNgk&t=114s.
Finally, it trims off a 1 The instruction q 4 11q 5 won’t ever be reached, but it does no harm. It is there for the
definition of a Turing machine, to make ∆ defined on all qp Tp . See also the note to that definition.
transition function The definition describes ∆ as a function ∆ : Q × Σ → (Σ ∪ { L, R }) × Q . That is a white lie. In
Ppred the state q 3 is used only for the purpose of halting the machine, and so there is no defined next state. In
Padd , the state q 5 plays the same role. So strictly speaking, the transition function is a partial function, one
where for some members of the domain there is no associated value; see page 357. (Alternatively, we could
write the set of states as Q ∪ Q̂ where the states in Q̂ are there only for halting, and the transition function’s
definition is ∆ : Q × Σ → (Σ ∪ { L, R }) × (Q ∪ Q̂).) We have left this point out of the main presentation since it
doesn’t seem to cause confusion and the discussion can be a distraction.)
a complete description of how these machines act It is reasonable to ask why our standard model is one that is so
basic that programming can be annoying. Why not choose a real world machine? The reason is that, as here,
we can completely describe the actions of the Turing machine model, or of any of the other simple model that
are sometimes used, in only a few paragraphs. A real machine might take a full book. Turing machine because
they are simple to describe, historically important, and the work in Chapter Five needs them.
q is a state, a member of Q We are vague about what ‘states’ are but we assume that whatever they are, the set of
states Q is disjoint from the set Σ ∪ { L, R }.
a snapshot, an instant in a computation So the configuration, along with the Turing machine, encapsulates the
future history of the computation.
omit the part about interpreting the strings We do this for the same reason that we would say, “This is me when I
was ten.” instead of, “This is a picture of me when I was ten.”
a physical system evolves through a sequence of discrete steps that are local, meaning that all the action takes place
within one cell of the head Adapted from (Widgerson 2017).
constructed the first machine See (Leupold 1725).
A number of mathematicians See also (Wikipedia contributers 2014).
Church suggested to Gödel (Soare 1999)
established beyond any doubt (Gödel 1995)
Church’s Thesis is central to the Theory of Computation Some authors have claimed that neither Church nor Turing
stated anything as strong as is given here but instead that they proposed that the set of things that can be done
by a Turing machine is the same as the set of things that are intuitively computable by a human computer; see
for instance (B. J. Copeland and Proudfoot 1999). But the thesis as stated here, that what can be done by a
Turing machine is what can be done by any physical mechanism that is discrete and deterministic, is certainly
the thesis as it is taken in the field today. And besides, Church and Turing did not in fact distinguish between
the two cases; (Hodges 2016) points to Church’s review of Turing’s paper in the Journal of Symbolic Logic: “The
author [i.e. Turing] proposes as a criterion that an infinite sequence of digits 0 and 1 be ‘computable’ that it
shall be possible to devise a computing machine, occupying a finite space and with working parts of finite size,
which will write down the sequence to any desired number of terms if allowed to run for a sufficiently long
time. As a matter of convenience, certain further restrictions are imposed on the character of the machine,
but these are of such a nature as obviously to cause no loss of generality — in particular, a human calculator,
provided with pencil and paper and explicit instructions, can be regarded as a kind of Turing machine.” This
has Church referring to the human calculator not as the prototype but instead as a special case of the class of
defined machines.
we cannot give a mathematical proof We cannot give a proof that starts from axioms whose justification is on
firmer footing than the thesis itself. R Williams has commented, “[T]he Church-Turing thesis is not a formal
proposition that can be proved. It is a scientific hypothesis, so it can be ‘disproved’ in the sense that it is
falsifiable. Any ‘proof’ must provide a definition of computability with it, and the proof is only as good as that
definition.” (SE user Ryan Williams 2010)
formalizes the notion of ‘effective’ or ‘intuitively mechanically computable’ Kleene wrote that “its role is to delimit
precisely an hitherto vaguely conceived totality.” (Kleene 1952), p 318.
Turing wrote (Turing 1937)
systematic error (Dershowitz and Gurevich 2008) p 304.
it is the right answer Gödel wrote, “the great importance . . . [of] Turing’s computability [is] largely due to the
fact that with this concept one has for the first time succeeded in giving an absolute definition of an interesting
epistemological notion, i.e., one not depending on the formalism chosen.” (Gödel 1995), pages 150–153.
can compute all of the functions that can be done by a machine with two or more tapes For instance, we can simulate
a two-tape machine P2 on a one-tape machine P1 . One way to do this is by having P1 use its even-numbered
tape positions for P2 ’s first tape and using its odd tape positions for P2 ’s second tape. (A more hand-wavy
explanation is: a modern computer can clearly simulate a two-tape Turing machine but a modern computer has
sequential memory, which is like the one-tape machine’s sequential tape.)
evident immediately (Church 1937)
S Aaronson has made this point From his blog Shtetl-Optimized, (Aaronson 2012b).
supply a stream of random bits Some CPU’s come with that capability built in; see for instance https://en.
wikipedia.org/wiki/RdRand.
beyond discrete and deterministic From (SE author Andrej Bauer 2016): “Turing machines are described concretely
in terms of states, a head, and a working tape. It is far from obvious that this exhausts the computing possibilities
of the universe we live in. Could we not make a more powerful machine using electricity, or water, or quantum
phenomena? What if we fly a Turing machine into a black hole at just the right speed and direction, so that it
can perform infinitely many steps in what appears finite time to us? You cannot just say ‘obviously not’ — you
need to do some calculations in general relativity first. And what if physicists find out a way to communicate
and control parallel universes, so that we can run infinitely many Turing machines in parallel time?”
everything that experiments with reality would ever find to be possible Modern Physics is a sophisticated and
advanced field of study so we could doubt that anything large has been overlooked. However, there is historical
reason for supposing that such a thing is possible. The physicists H von Helmholtz in 1856, and S Newcomb
in 1892, calculated that the Sun is about 20 million years old (they assumed that the Sun glowed from the
energy provided by its gravitational contraction in condensing from a nebula of gas and dust to its current
state). Consistently with that, one of the world’s most reputable physicists, W Kelvin, estimated in 1897 that
the Earth was, “more than 20 and less than 40 million year old, and probably much nearer 20 than 40” (he
calculated how long it would take the Earth to cool from a completely molten object to its present temperature).
He said, “unless sources now unknown to us are prepared in the great storehouse of creation” then there was
not enough energy in the system to justify a longer estimate. One person very troubled by this was Darwin,
having himself found that a valley in England took 300 million years to erode, and consequently that there was
enough time, called “deep time,” for the slow but steady process of evolution of species to happen. Then, in
1896, everything changed. A Becquerel discovered radiation. All of the prior calculations did not account for it
and the apparent discrepancy vanished. (Wikipedia contributers 2016a)
the unique solution is not computable See (Pour-El and Richards 1981).
compute a solution See http://www.smbc-comics.com/?id=3054.
Three-Body Problem See https://en.wikipedia.org/wiki/Three-body_problem
we can still wonder See (Piccinini 2017).
This big question remains open A sample of readings: frequently cited is (Black 2000), which takes the thesis to
be about what is humanly computable, and (B. Jack Copeland 1996), (B. Jack Copeland 1999), and (B. Jack
Copeland 2002) argue that computations can be done that are beyond the capabilities of Turing machines, while
(Davis 2004), (Davis 2006), and (Gandy 1980) give arguments that most Theory of Computing researchers
consider persuasive.
Often when we want to show that something is computable by a Turing machine The same point stated another
way, from (SE author Andrej Bauer 2018): In books on computability theory it is common for the text to skip
details on how a particular machine is to be constructed. The author of the computability book will mumble
something about the Turing-Church thesis somewhere in the beginning. This is to be read as “you will have to
do the missing parts yourself, or equip yourself with the same sense of inner feeling about computation as I did”.
Often the author will give you hints on how to construct a machine, and call them “pseudo-code”, “effective
procedure”, “idea”, or some such. The Church-Turing thesis is the social convention that such descriptions of
machines suffice. (Of course, the social convention is not arbitrary but rather based on many years of experience
on what is and is not computable.) . . . I am not saying that this is a bad idea, I am just telling you honestly what
is going on. . . . So what are we supposed to do? We certainly do not want to write out detailed constructions
of machines, because then students will end up thinking that’s what computability theory is about. It isn’t.
Computability theory is about contemplating what machines we could construct if we wanted to, but we don’t.
As usual, the best path to wisdom is to pass through a phase of confusion.
Plonk! See (Wikipedia contributers 2015a).
Suppose that you have infinitely many dollars. (Joel David Hamkins 2010)
H Grassmann produced a more elegant definition Dedekind used this definition to give the first rigorous proof of
the laws of elementary school arithmetic.
logically problematic The sense of something perplexing about recursion is often expressed with an story. The
philosopher W James gave a public lecture on cosmology, and was approached by an older woman from the
audience. “Your theory that the sun is the center of the solar system, and the earth orbits around it, has a
good ring, Mr James, but it’s wrong.” she said. “Our crust of earth lies on the back of a giant turtle.” James
gently asked, “If your theory is correct then what does this turtle stand on?” “You’re very clever, Mr James,”
she replied, “but I have an answer. The first turtle stands on the back of a second, far larger, turtle.” James
persisted, “And this second turtle, Madam?” Immediately she crowed, “It’s no use Mr James — it’s turtles all the
way down.” (Wikipedia contributers 2016e)
See also Room 8, winner of the 2014 short film award from the British Academy of Film and Television Arts.
define the function on higher-numbered inputs using only its values on lower-numbered ones For the function specified
by f (0) = 1 and f (n) = n · f (f (n − 1) − 1), try computing the values f (0) through f (5).
One elegant thing about Grassmann approach A Perl’s epigram, “Recursion is the root of computation since it
trades description for time” expresses this elegance. The grade school definition of addition is prescriptive in
that it gives a procedure. But Grass man’s definition is descriptive in giving the meaning, the semantics, of
the operation. The recursive definition implicitly includes steps, and with them time, in that you need to keep
expanding the recursive calls. But it does not include them in preference to what they are about.
the first sequence of numbers ever computed on an electronic computer It was computed on EDSAC, on 1949-May-06.
See (N. J. A. Sloane 2019) and (William S. Renwick 1949).
Towers of Hanoi The puzzle was invented by E Lucas in 1883 but the next year H De Parville made of it quite a
great problem with the delightful problem statement.
hyperoperation (Goodstein 1947)
H3 (4, 4) is much greater than the number of elementary particles in the universe The radius of the universe if about
45 × 109 light years. That’s about 1062 Plank units. A system of much more than r 1 . 5 particles packed in
r Plank units will collapse rapidly. So the number of particles is less than 1092 , which is about 2305 , which is
much less than H3 (4, 4). (Levin 2016)
a programming language having only bounded loops computes all of the primitive recursive functions (Meyer and
Ritchie 1966)
output only primes In fact, there is no polynomial with integer coefficients that outputs a prime for all integer
inputs, except if the polynomial is constant. This was shown in 1752 by C Goldbach. The proof is so simple, and
delightful, and not widely known, that we will give it here. Suppose p is a polynomial with integer coefficients
that on integer inputs returns only primes. Fix some n̂ ∈ N, and then p(n̂) = m̂ is a prime. Into the polynomial
plug n̂ + k · m̂ , where k ∈ Z. Expanding gives lots of terms with m̂ in them, and gathering together like terms
shows this.
p(n̂ + k · m̂) ≡ p(n̂) mod m̂
Because p(n̂) = m̂ , this gives that p(n̂ + k · m̂) = m̂ since that is the only prime number that is a multiple of m̂ ,
and p outputs only primes. But with that, p(n) = m̂ has infinitely many roots, and is therefore the constant
polynomial.
looking for something that is not there Goldbach’s conjecture is that every even number can be written as the sum
of at most two primes. Here are the first few instances: 2 = 2, 4 = 2 + 2, 6 = 3 + 3, 8 = 5 + 3, 10 = 7 + 3. A
natural attack is to do an unbounded computer search. As of this writing the conjecture has been tested up to
1018 .
Collatz conjecture See (Wikipedia contributors 2019a).
sin(x) may be calculated via its Taylor polynomial The Taylor series is sin(x) = x − x 3 /3! + x 5 /5! − x 7 /7! + · · · .
We might do a practical calculation by deciding that a sufficiently good approximation is to terminate that series
at the x 5 term, giving a Taylor polynomial.
C Shannon See http://www.newyorker.com/tech/elements/claude-shannon-the-father-of-the-information-
age-turns-1100100.
master’s thesis See https://en.wikipedia.org/wiki/A_Symbolic_Analysis_of_Relay_and_Switching_Circuits.
kind of not gate This shows an N-type Metal Oxide Semiconductor Transistor. There are many other types.
problem of humans on Mars To get there the idea was to use a rocket ship impelled by dropping a sequence of
atom bombs out the bottom; the energy would let the ship move rapidly around the solar system. This sounds
like a crank plan but it is perfectly feasible (Brower 1983). Having been a key person in the development of the
atomic bomb, von Neumann was keenly aware of their capabilities.
J Conway Conway was a magnetic person, and extraordinarily creative. See an excerpt from an excellent biography
at https://www.ias.edu/ideas/2015/roberts-john-horton-conway.
earliest computer crazes (Bellos 2014)
zero-player game See https://www.youtube.com/watch?v=R9Plq-D1gEk.
a rabbit Discovered by A Trevorrow in 1986.
For technical convenience This presentation is based on that of (Hennie 1977), (Smoryński 1991), and (Robinson
1948).
giving a programming language that computes primitive recursive functions See the history at (Brock 2020).
LOOP program (Meyer and Ritchie 1966)
Background
Deep Field movie https://www.youtube.com/watch?v=yDiD8F9ItX0
two paradoxes These are veridical paradoxes: they may at first seem absurd but we will demonstrate that they are
nonetheless true. (Wikipedia contributers 2018)
Galileo’s Paradox He did not invent it but he gave it prominence in his celebrated Discourses and Mathematical
Demonstrations Relating to Two New Sciences.
same cardinality Numbers have two natures. First, in referring to the set of stars known as the Pleiades as the
“Seven Sisters” we mean to take them as a set, not ordered in any way. In contrast, second, in referring to
the “Seven Deadly Sins,” well, clearly some of them rate higher than others. The first reference speaks to
the cardinal nature of numbers and the second is their ordinal nature. For finite numbers the two are bound
together, as Lemma 1.5 says, but for infinite numbers they differ.
was proposed by G Cantor in the 1870’s For his discoveries, Cantor was reviled by a prominent mathematician and
former professor L Kronecker as a “corrupter of youth.” That was pre-Elvis.
which is Cantor’s definition (Gödel 1964)
the most important infinite set is N Its existence is guaranteed by the Axiom of Infinity, one of the standard axioms
of Mathematics, the Zermelo-Frankel axioms.
due to Zeno Zeno gave a number of related paradoxes of motion. See (Wikipedia contributers 2016f) (Huggett
2010), (Bragg 2016), as well as http://www.smbc-comics.com/comic/zeno and this xkcd.
Courtesy xkcd.com
the distances x i+1 − x i shrink toward zero, there is always further to go because of the open-endedness at the left of the
interval (0 .. ∞) A modern version of exploiting open-endedness is the Thomson’s Lamp Paradox: a person
turns on the room lights and then a minute later turns them off, a half minute later turns them on again, and a
quarter minute later turns them off, etc. After two minutes, are the lights on or off? This paradox was devised
in 1954 by J F Thomson to analyze the possibility of a supertask, the completion of an infinite number of tasks.
Thomson’s answer was that it creates a contradiction: “It cannot be on, because I did not ever turn it on without
at once turning it off. It cannot be off, because I did in the first place turn it on, and thereafter I never turned
it off without at once turning it on. But the lamp must be either on or off” (Thomson 1954). See also the
discussion of the Littlewood Paradox (Wikipedia contributers 2016d).
numbers the diagonals Really, these are the anti-diagonals, since the diagonal is composed of the pairs ⟨n, n⟩ .
arithmetic series with total d(d + 1)/2 It is called the d -th trianglar number
cantor(x, y) = x + [(x + y)(x + y + 1)/2] The Fueter-Pólya Theorem says that this is essentially the only quadratic
function that serves as a pairing; see (Smoryński 1991). No one knows whether there are pairing functions that
are any other kind of polynomial.
memoization The term was invented by Donald Michie (Wikipedia contributers 2016b), who among other
accomplishments was a coworker of Turing’s in the World War II effort to break the German secret codes.
assume that we have a family of correspondences f j : N → S j To pass from the original collection of infinitely many
onto functions fi : N → S i to a single, uniform, family of onto functions f j (i) = f (j, y) we need some version of
the Axiom of Choice, perhaps Countable Choice. We omit discussion of that because it would take us far afield.
doesn’t matter much For more on “much” see (Rogers 1958).
but that we won’t make precise One problem with this scheme is that it depends on the underlying computer.
Imagine that your computer uses eight bit words. If we want the map from a natural number to a source code
and the input number is 9 then in binary that’s 1001, which is not eight bits and to disassemble it you need to
pad the it out to the machine’s word length, as 00001001. Another issue is the ambiguity caused by leading 0’s,
e.g.the bit string 00000000 00001001 also represent 9 but disassembles to a two-operation source. We could
address this by imagining that the operation with instruction code 00000000 is NOP and then disallow source
code that starts with such an instruction (reasoning that starting a serial program with fewer NOP’s won’t change
its input-output behavior), except for the source consisting of a single NOP. But we are getting into the weeds
of computer architecture here, which is not where we want to be, so we take this numbering scheme only
informally.
adding the instruction q j+k BBq j+k
This is essentially what a compiler calls ‘unreachable code’ in that it is not a state that the machine will ever be
in.
central to the entire Theory of Computation The classic text (Rogers 1987) says, “It is not inaccurate to say that our
theory is, in large part, a ‘theory of diagonalization’.”
This technique is diagonalization The argument just sketched is often called Cantor’s diagonal proof, although it
was not Cantor’s original argument for the result, and although the argument style is not due to Cantor but
instead to Paul du Bois-Reymond. The fact that scientific results are often attributed to people who are not
their inventor is Stigler’s law of eponymy, because it wasn’t invented by Stigler (who attributes it to Merton). In
mathematics this is called Boyer’s Law, who didn’t invent it either. (Wikipedia contributers 2015b).
Musical Chairs It starts with more children than chairs. Some music plays and the children walk around the chairs.
But the music stops suddenly and each child tries to sit, leaving someone without a chair. That child has to
leave the game, a chair is removed, and the game proceeds.
so many real numbers This is a Pigeonhole Principle argument.
there are jobs that no computer can do To a person with training in programming, where all of the focus is on
getting the computer to do things, the existence of jobs that cannot be done can be a surprise, perhaps even
a shock. One thing that it points out is that the topics introduced here are nontrivial, that formalizing the
definition of mechanical computation and the results about infinity leads to interesting conclusions.
Your friend is confused about the diagonal argument From (SE author Kaktus and various others 2019).
ENIAC, reconfigure by rewiring. Jean Jennings (left), Marlyn Wescoff (center), and Ruth Lichterman program the
ENIAC, circa 1946. U. S. Army Photo
A pattern in technology is for jobs done in hardware to migrate to software One story that illustrates the naturalness
of this involves the English mathematician C Babbage, and his protogee A Lovelace. In 1812 Babbage was
developing tables of logarithms. These were calculated by computers — the word then current for the people
who computed them by hand. To check the accuracy he had two people do the same table and compared. He
was annoyed at the number of discrepencies and had the idea to build a machine to do the computing. He got a
government grant to design and construct a machine called the difference engine, which he started in 1822.
This was a single-purpose device, what we today would call a calculator. One person who became interested in
the computations was an acquaintance of his, Lovelace (who at the time was named Byron, as she was the
daughter of the poet Lord Byron).
Charles Babbage, 1791–1871 Ada Lovelace (nee Byron), 1815–1852
However, this machine was never finished because Babbage had the thought to make a device that would be
programmable, and that was too much of a temptation. Lovelace contributed an extensive set of notes on a
proposed new machine, the analytical engine, and has become known as the first programmer.
controlled by paper cards It weaves with hooks whose positions, raised or lowered, are determined by holes
punched in the cards
have the same output behavior A technical point: Turing machines have a tape alphabet. So a universal machine’s
input or output can only involve symbols that it is defined as able to use. If another machine has a different
tape alphabet then how can the universal machine simulate it? As usual, we define things so that the universal
machine manipulates representations of the other machine’s alphabet. This is similar to the way that an
everyday computer represents decimals using binary.
flow chart Flowcharts are widely used to sketch algorithms; here is one from XKCD.
Courtesy xkcd
See also http://archive.computerhistory.org/resources/text/Remington_Rand/Univac.Flowmatic.

1957.102646140.pdf.
consecutive nines At the 762-nd decimal point there are six nines in a row. This is call the Feynman point; see
https://en.wikipedia.org/wiki/Feynman_point. Most experts guess that for any n the decimal expansion
contains a sequence of n consecutive nines but no one has proved or disproved that.
there is a difference between showing that this function is computable This is a little like Schrödinger’s cat paradox
(see https://en.wikipedia.org/wiki/Schr%C3%B6dinger’s_cat) in that it seems that one of the two is
right but we just don’t know which.
“something is computable if you can write a a program for it” is naive From (SE author JohnL 2020): “Most people,
I believe, felt a bit disoriented the first time when this kind of proof/conclusion was encountered. Or at least
myself. The essential point is we do not have to identify/construct/bind to one algorithm that decides [it]. We
do not have to understand fully what is [the problem]. All we need is there exists an algorithm that decides [it],
whatever [the answer] turns out to be. This deviates from . . . the naive sense of decidability . . . that you might
have even before you encountered the theory of computation/decidability/computability.
π ’s i -th decimal place As we have noted, some real numbers have two decimal representations, one ending in 0’s
and one ending in 9’s. But every such number is rational (as “ending in 0’s” implies) and π is not rational, so π
is not one of these numbers.
partial application See (Wikipedia contributors 2019d).
it must be effective In fact, careful analysis shows that it is primitive recursive.
In f ’s top case the output value doesn’t matter Sometimes we use 42 when we need an arbitrary output value,
because of its connection with The Hitchhiker’s Guide to the Galaxy, (Adams 1979). See also (Wikipedia
contributors 2020a).
undecidable The word ‘undecidable’ is used in mathematics in two different ways. The definition here of course
applies to the Theory of Computation. In relation to Gödel’s theorems, it means that a statement is cannot be
proved true or proved false within a particular formal system.
halt on some inputs but not on others A Turing machine could fail to halt because it has an infinite loop. The
Turing machine P0 = {q 0 BBq 0 , q 0 11q 0 } never halts, cycling forever in state q 0 . We could patch this problem;
we could write a program inf_loop_decider that at each step checks whether a machine has ever before in
this computation had the same configuration as it has now. This program will detect infinite loops like the prior
one.
However, note that there are machines that fail to halt but do not have loops, in that they never repeat a
configuration. One is P1 = {q 0 B1q 1 , q 1 1Rq 0 } which when started on a blank tape will endlessly move to the
right, writing 1’s.
Similarly, 28 = 1 + 2 + 4 + 7 + 14 is perfect After 6 and 28 comes 496 and 8128. No one knows if there are
infinitely many.
understand the form of all even perfect numbers A number is an even perfect number if and only if it has the form
(2p − 1) · 2p−1 where 2p − 1 is prime.
involving an unbounded search A computer program that solved the Halting Problem, if one existed, could be very
slow. So this might not be a feasible way to settle this question. But at the moment we are studying what can be
done in principle.
functions that solve it (Wikipedia contributers 2017h)
dovetailing A dovetail joint is often used in woodworking, for example to hold together the sides of a drawer. It
weaves the two sides in alternately, as shown here.
won’t be a physically-realizable discrete and deterministic mechanism Turing introduced oracles in his PhD thesis. He
said, “We shall not go any further into the nature of this oracle apart from saying that it cannot be a machine.”
(Turing 1938)
magic smoke See (Wikipedia contributers 2017f).
we will instead use Church’s Thesis For a full treatment see (Rogers 1987).
the notion of partial computable function seems to have an in-built defense against diagonalization (Odifreddi 1992),
p 152.
this machine’s name is its behavior Nominative determinism is the theory that a person’s name has some influence
over what they do with their life. Examples are: the sprinter Usain Bolt, the US weatherman Storm Fields, the
baseball player Prince Fielder. and the Lord Chief Justice of England and Wales named Igor Judge, I Judge. See
https://en.wikipedia.org/wiki/Nominative_determinism.
considered mysterious, or at any rate obscure For example, “The recursion theorem . . . has one of the most
unintuitive proofs where I cannot explain why it works, only that it does.” (Fortnow and Gasarch 2002)
Once upon a time This mathematical fable came from David Hilbert in 1924. It was popularized by George Gamow
in One, Two, Three . . . Infinity. (Kragh 2014).
Napoleon’s downfall in the early 1800’s See (Wikipedia contributers 2017d).
period of prosperity and peace See (Wikipedia contributers 2017i).
A A Michelson, who wrote in 1899, “The more important fundamental laws and facts of physical science have all been
discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence
of new discoveries is exceedingly remote.” Michaelson was a major figure. From 1901 to 1903 he was president of
the American Physical Society. In 1910–1911 he was president of the American Association for the Advancement
of Science and from 1923–1927 he was president of the National Academy of Sciences. In 1907 he received the
Copley Medal from the Royal Society in London, and the Nobel Prize. He remains well known today for the
Michelson–Morley experiment that tried to detect the presence of aether, the hypothesized medium through
with light waves travel.
working out the rules of a game by watching it being played See https://www.youtube.com/watch?v=o1dgrvlWML4
many observers thought that we basically had got the rules An example is that Max Planck was advised not to
go into physics by his professor, who said, “in this field, almost everything is already discovered, and all that
remains is to fill a few unimportant holes.” (Wikipedia contributors 2017)
the discovery of radiation This happened in 1896, before Michaelson’s statement. Often the significance of things
takes time to be apparent
“everything is relative.” Of course, the history around Einstein’s work is vastly more complex and subtle. But we
are speaking of the broad understanding, not of the truth.
loss of certainty This phrase is the title of a famous popular book on mathematics, by M Klein. The book is fun
and a thought-provoking read. Also thought-provoking are some criticisms of the book. (Wikipedia contributors
2019b) is good introduction to both.
the development of a fetus is that it basically just expands The issue was whether the fetus began preformed or as a
homogeneous mass, see (Maienschein 2017). Today we have similar questions about the Big Bang — we are
puzzled to explain how a mathematical point, which is without internal structure and entirely homogeneous,
could develop into the very non-homogeneous universe that we see today.
potential infinite regress This line of thinking often depends on the suggestion that all organisms were created at
the same time, that they have existed since the beginning of the posited creation.
discovery by Darwin and Wallace of descent with modification through natural selection Darwin wrote in his
autobiography, “The old argument of design in nature, as given by Paley, which formerly seemed to me so
conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for
instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a
door by man. There seems to be no more design in the variability of organic beings and in the action of natural
selection, than in the course which the wind blows. Everything in nature is the result of fixed laws.”
the car is in some way less complex than the robot This is an information theoretic analog of the Second Law
of Thermodynamics. E Musk has expressed something of the same sentiment, “The extreme difficulty of
scaling production of new technology is not well understood. It’s 1000% to 10,000% harder than making
a few prototypes. The machine that makes the machine is vastly harder than the machine itself.” See
https://twitter.com/elonmusk/status/1308284091142266881.
self-reference ‘Self-reference’ describes something that refers to itself. The classic example is the Liar paradox,
the statement attributed to the Cretian Epimenides, “All Cretans are liars.” Because he is Cretian we take
the statement to be an utterance about utterances by him, that is, to be about itself. If we suppose that the
statement is true then it asserts that anything he says is false, so the statement is false. But if we suppose that it
is false then we take that he is saying the truth, that all his statements are false. Its a paradox, meaning that the
reasoning seems locally sound but it leads to a global impossibility.
This is related to Russell’s paradox, which lies at the heart of the diagonalization technique, that if we define
the collection of sets R = {S | S < S } then R ∈ R holds if and only if R < R holds.
Self-reference is obviously related to recurrence. You see it sometimes pictured as an infinite recurrence, as
here on the front of a chocolate product.
Because of this product, having a picture contain itself is sometimes known as the Droste effect. See also https://
www.smithsonianmag.com/science-nature/fresh-off-the-3d-printer-henry-segermans-mathematical-
sculptures-2894574/?no-ist
Besides the Liar paradox there are many others. One is Quine’s paradox, a sentence that asserts its own
falsehood.
“Yields falsehood when preceded by its quotation”
yields falsehood when preceded by its quotation.
If this sentence were false then it would be saying something that is true. If this sentence were true then what it
says would hold and it would be not true.
A wonderful popular book exploring these topics and many others is (Hofstadter 1979).
quine Named for the philosopher Willard Van Orman Quine.
for routines to have access to their code Introspection is the ability to inspect code in the system, such as to inspect
the type of objects. Reflection is the ability to make modifications at runtime.
We will show how a routine can know its source This is derived from the wonderful presentation in (Sipser 2013).
The verb ‘to quine’ Invented by D Hofstadter .
which n -state Turing Machine leaves the most 1’s after halting R H Bruck famously wrote (R H Bruck n.d.), “I might
compare the high-speed computing machine to a remarkably large and awkward pencil which takes a long time
to sharpen and cannot be held in the fingers in the usual manner so that it gives the illusion of responding to my
thoughts, but is fitted with a rather delicate engine and will write like a mad thing provided I am willing to let
it dictate pretty much the subjects on which it writes.” The Busy Beaver machine is the maddest writer possible.
Radó noted in his 1962 paper This paper (Radó 1962) is exceptionally clear and interesting.
Σ(n) is unknowable See (Aaronson 2012a). See also https://www.quantamagazine.org/the-busy-beaver-
game-illuminates-the-fundamental-limits-of-math-20201210/.
a 7918-state Turing machine The number of states needed has since been reduced. As of this writing it is 1919.
the standard axioms for Mathematics This is ZFC, the Zermelo–Fraenkel axioms with the Axiom of Choice. (In
addition, they also took the hypothesis of the Stationary Ramsey Property.)
take the floor Let the n -th triangle number be t(n) = 0 + 1 + · · · + n = n(n + 1)/2. The function t is monotonicly
increasing and there are infinitely many triangle numbers. Thus for every natural number c there is a unique
triangle number t(n) that is maximal so that c = t(n) + k for some k ∈ N. Because t(n + 1) = t(n) + n + 1, we
see that k < n + 1, that is, k ≤ n . Thus, to compute the diagonal number d from the Cantor number c of a pair,
we have (1/2)d(d + 1) ≤√c < (1/2)(d + 1)(d + 2). Applying
√ the quadratic formula
√ to the left half and right
halves gives (1/2)(−3 + 1 + 8c) < d ≤ (1/2)(−1 + 1 + 8c). Taking (1/2)(−1 + 1 + 8c) to be α gives that
c ∈ (α − 1 .. α] so that d = ⌊α⌋ . (Scott 2020)
let’s extend to tuples of any size See https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it.
Languages
having elephants move to the left side of a road or to the right Less fancifully, we could be making a Turing machine
out of LEGO’s and want to keep track by sliding a block from one side of a column to the other. Or, we could
use an abacus.
we could translate any such procedure While a person may quite sensibly worry that elephants could be not just on
the left side or the right, but in any of the continuum of points in between, we will make this assertion without
more philosophical analysis than by just referring to the discrete nature of our mechanisms (as Turing basically
did). That is, we take it as an axiom.
finite set { 1000001, 1100001 } Although it looks like two strings plucked from the air, the language is not without
sense. The bitstring 1000001 represents capital A in the ASCII encoding, while 1100001 is lower case a. The
American Standard Code for Information Interchange, ASCII, is a widely used, albeit quite old, way of encoding
character information in computers. The most common modern character encoding is UTF-8, which extends
ASCII. For the history see https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt.
palindrome Sometimes people tease Psychology by labeling it the study of college freshmen because so many
studies start, roughly, “we put a bunch of college freshmen in a room, lied to them about what we were doing,
and . . . ” In the same way, Theory of Computing sometimes seems like the study of palindromes.
words from English that are palindromes Some people like to move beyond single word palindromes to make
sentence-length palindromes that make some sense. Some of the more famous are: (1) supposedly the first
sentence ever uttered, “Madam, I’m Adam” (2) Napoleon’s lament, “Able was I ere I saw Elba” and (3) “A man, a
plan, a canal: Panama”, about Theodore Roosevelt.
In practice a language is usually given by rules Linguists started formalizing the description of language, including
phrase structure, at the start of the 1900’s. Meanwhile, string rewriting rules as formal, abstract systems were
introduced and studied by mathematicians including Axel Thue in 1914, Emil Post from the 1920’s through the
1940’s and Turing in 1936. Noam Chomsky, while teaching linguistics to students of information theory at MIT,
combined linguistics and mathematics by taking Thue’s formalism as the basis for the description of the syntax
of natural language. (Wikipedia contributers 2017e)
“the red big barn” sounds wrong. Experts vary on the exact rules but one source gives (article) + number +
judgment/attitude + size, length, height + age + color + origin + material + purpose + (noun), so that “big
red barn” is size + color + noun, as is “little green men.” This is called the Royal Order of Adjectives; see
http://english.stackexchange.com/a/1159. A person may object by citing “big bad wolf” but it turns out
there is another, stronger, rule that if there are three words then they have to go I-A-O and if there are two
words then the order has to be I followed by either A or O. Thus we have tick tock but not tock tick. Similarly
for tic-tac-toe, mishmash, King Kong, or dilly dally.
very strict rules Everyone who has programmed has had a compiler chide them about a syntax violation.
grammars are the language of languages. From Matt Swift, http://matt.might.net/articles/grammars-bnf-
ebnf/.
this grammar Taken from https://en.wikipedia.org/wiki/Formal_grammar.
dangling else See https://en.wikipedia.org/wiki/Dangling_else.
postal addresses. Adapted from https://en.wikipedia.org/wiki/BackusNaur_Form.
Recall Turing’s prototype computer In this book we stick to grammars where each rule head is a single nonterminal.
That greatly restricts the languages that we can compute. More general grammars can compute more, including
every set that can be decided by a Turing machine.
often state their problems For instance, see the blogfeed for Theoretical Computer Science http://cstheory-
feed.org/ (Various authors 2017)
represent a graph in a computer Example 3.2 make the point that a graph is about the connections between vertices,
not about how it is drawn. This graph representation via a matrix also illustrates that point because it is, after
all, not drawn.
a standard way to express grammars One factor influencing its adoption was a letter that D Knuth wrote to the
Communications of the ACM (D. E. Knuth 1964). He listed some advantages over the grammar-specification
methods that were then widely used. Most importantly, he contrasted BNF’s ‘<addition operator>’ with ‘A’,
saying that the difference is a great addition to “the explanatory power of a syntax.” He also proposed the name
Backus Naur Form.
some extensions for grouping and replication The best current standard is https://www.w3.org/TR/xml/.
Time is a difficult engineering problem One complication of time, among many, is leap seconds. The Earth is
constantly undergoing deceleration caused by the braking action of the tides. The average deceleration of the
Earth is roughly 1.4 milliseconds per day per century, although the exact number varies from year to year
depending on many factors, including major earthquakes and volcanic eruptions. To ensure that atomic clocks
and the Earth’s rotational time do not differ by more than 0.9 seconds, occasionally an extra second is added to
civil time. This leap second can be either positive or negative depending on the Earth’s rotation — on occasion
there are minutes with only 58 seconds, and on occasion minutes with 60.
Adding to the confusion is that the changes in rotation are uneven and we cannot predict leap seconds far into
the future. The International Earth Rotation Service publishes bulletins that announce leap seconds with a
few weeks warning. Thus, there is no way to determine how many seconds there will be between the current
instant and ten years from now. Since the first leap second in 1972, all leap seconds have been positive and
there were 23 leap seconds in the 34 years to January 2006. (U.S. Naval Observatory 2017)
RFC 3339 (Klyne and Newman 2002)
strings such as 1958-10-12T23:20:50.52Z This format has a number of advantages including human readability,
that if you sort a collection of these strings then earlier times will come earlier, simplicity (there is only one
format), and that they include the time zone information.
a BNF grammar Some notes: (1) Coordinated Universal Time, the basis for civil time, is often called UTC, but is
sometimes abbreviated Z, (2) years are four digits to prevent the Y2K problem (Encyclopædia Britannica 2017),
(3) the only month numbers allowed are 01–12 and in each month only some day numbers are allowed, and
(4) the only time hours allowed are 00–23, minutes must be in the range 00–59, etc. (Klyne and Newman 2002)
Automata
what jobs can be done by a machine with bounded memory From Rabin, Scott, Finite Automata and Their Decision
Problems, 1959: Turing machines are widely considered to be the abstract prototype of digital computers; workers
in the field, however, have felt more and more that the notion of a Turing machine is too general to serve as an
accurate model of actual computers. It is well known that even for simple calculations it is impossible to give an a
priori upper bound on the amount of tape a Turing machine will need for any given computation. It is precisely this
feature that renders Turing’s concept unrealistic. In the last few years the idea of a finite automaton has appeared
in the literature. These are machines having only a finite number of internal states that can be used for memory
and computation. The restriction on finiteness appears to give a better approximation to the idea of a physical
machine. Of course, such machines cannot do as much as Turing machines, but the advantage of being able to
compute an arbitrary general recursive function is questionable, since very few of these functions come up in practical
applications.
transition function ∆ : Q × Σ → Q Some authors allow the transition function to be partial. That is, some authors
allow that for some state-symbol pairs there is no next state. This choice by an author is a matter of convenience,
as for any such machine you can create an error state q error or dead state, that is not an accepting state and that
transitions only to itself, and send all such pairs there. This transition function is total, and the new machine
has the same collection of accepted strings as the old.
Unicode While in the early days of computers characters could be encoded with standards such as ASCII, which
includes only upper and lower case unaccented letters, digits, a few punctuation marks, and a few control
characters, today’s global interconnected world needs more. The Unicode standard assigns a unique number
called a code point to every character in every language (to a fair approximation). See (Wikipedia contributers
2017k).
how phone numbers used to be handled in North America See the description of the North America Numbering Plan
(Wikipedia contributers 2017g).
same-area local exchange Initially, large states, those divided into multiple numbering plan areas were assigned
area codes with a 1 in the second position while areas that covered entire states or provinces got codes with 0
as the middle digit. This was abandoned by the early 1950s. (Wikipedia contributers 2017g).
switching with physical devices The devices to do the switching were invented in 1889 by an undertaker whose
competitor’s wife was the local telephone operator and routed calls to her husband’s business. (Wikipedia
contributers 2017b)
Alcuin of York (735–804) See https://www.bbc.co.uk/programmes/m000dqy8.
a wolf, a goat, and a bundle of cabbages This translation is from A Raymond, from the University of Washington.
Traveling Salesman problem See https://nbviewer.jupyter.org/url/norvig.com/ipython/TSP.ipynb.
roads in the US lower forty eight states
See https://wiki.openstreetmap.org/wiki/TIGER.
no-state A person can wonder about no-state. Where is it, exactly? We can think that it is like what happens if
you write a program with a sequence of if-then statements and forget to include an else. Obviously a computer
goes somewhere, the instruction pointer points to some next address, but what happens is not sensible in terms
of the model you’ve written.
Alternatively, the wonderful book (Hofstadter 1979) describes a place named Tumbolia, which is where holes
go when they are filled (also where your lap goes when you stand). Perhaps the machines go there.
amb(S, R 0 , R 1 ... R n−1 ) The name amb abbreviates ‘ambiguous function’. Here is a small example. Essentially
Amb(x,y,z) splits the computation into three possible futures: a future in which the value x is yielded, a future
in which the value y is yielded, and a future in which the value z is yielded. The future which leads to a
successful subsequent computation is chosen. The other “parallel universes” somehow go away. (Amb called
with no arguments fails.) The output is 2 4 because Amb(1,2,3) correctly chooses the future in which x has
value 2, Amb(7,6,4,5) chooses 4, and consequently Amb(x*y = 8) produces a success.
These were described by John McCarthy in (McCarthy 1963). “Ambiguous functions are not really functions.
For each prescription of values to the arguments the ambiguous function has a collection of possible values. An
example of an ambiguous function is less(n) defined for all positive integer values of n . Every non-negative
integer less than n is a possible value of less(n). First we define a basic ambiguity operator amb(x, y) whose
possible values are x and y when both are defined: otherwise, whichever is defined. Now we can define less(n)
by less(n) = amb(n − 1, less(n − 1)).”
a demon The term ‘demon’ comes from Maxwell’s demon. This is a a thought experiment created in 1867 by the
physicist J C Maxwell, about the second law of thermodynamics, which says that it takes energy to raise the
temperature of a sealed system. Maxwell imagined a chamber of gas with a door controlled by an all-knowing
demon. When the demon sees a gas molecule of gas approaching that is slow-moving, it opens the door and lets
that molecule out of the chamber, thereby raising the chamber’s temperature without applying any external
heat. See (Wikipedia contributors 2019c).
Pronounced KLAY-nee His son Ken Kleene, wrote, “As far as I am aware this pronunciation is incorrect in all known
languages. I believe that this novel pronunciation was invented by my father.” (Computing 2017)
mathematical model of neurons (Wikipedia contributers 2017c)
have a vowel in the middle Most speakers of American English cite the vowels as ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’. See (Bigham
2014).
before and after pictures This diagram is derived from (Hopcroft, Motwani, and Ullman 2001).
The fact that we can describe these languages in so many different ways (SE author David Richerby 2018).
just list all the cases In practice the suggestion in the first paragraph to list all the cases may not be reasonable.
For example, there are finitely many people and each has finitely many active phone numbers so the set of all
currently-active phone numbers is a regular language. But constructing a Finite State machine for it is silly. In
addition, a finite regular language doesn’t have to be large for it to be difficult, in a sense. Take Goldbach’s
conjecture, that every even number greater than 2 is the sum of two primes, as in 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5,
. . . (see https://en.wikipedia.org/wiki/Goldbach%27s_conjecture). Computer testing shows that this
pattern continues to hold up to very large numbers but no one knows if it is true for all evens. Now consider the
set consisting of the string σ ∈ { 0, ... 9 }∗ representing the smallest even number that is not the sum of two
primes. This set is finite since it has either one member or none. But while that set is tiny, we don’t know what
it contains.
performing that operation on its members always yields another member Familiar examples are that adding two
integers always gives an integer so the integers are closed under the operation of addition, and that squaring an
integer always results in an integer so that the integers are closed under squaring.
the machine accepts at least one string of length k , where n ≤ k < 2n This gives an algorithm that inputs a Finite
State machine and determines, in a finite time, if it recognizes an infinite language.
be aware that another algorithm See (Knuutila 2001).
For the third This is derived from the presentation in (Hopcroft, Motwani, and Ullman 2001).
\d We shall ignore cases of non-ASCII digits, that is, cases outside 0–9.
ZIP codes ZIP stands for Zone Improvement Plan. The system has been in place since 1963 so it, like the music
movement called ‘New Wave’, is an example of the danger of naming your project something that will become
obsolete if that project succeeds.
a colon and two forward slashes The inventor of the World Wide Web, T Berners Lee, has admitted that the two
slashes don’t have a purpose (Firth 2009).
more power than the theoretical regular expressions that we studied earlier Omitting this power, and keeping the
implementation in sync with the theory, has the advantage of speed. See (Cox 2007).
valid email addresses
This expression follows the RFC 822 standard. The full listing is at http://www.ex-
parrot.com/pdw/Mail-RFC822-Address.html. It is due to Paul Warren who did not write it by hand but
instead used a Perl program to concatenate a simpler set of regular expressions that relate directly to the
grammar defined in the RFC. To use the regular expression, should you be so reckless, you would need to
remove the formatting newlines.
J Zawinski The post is from alt.religion.emacs on 1997-Aug-12. For some reason it keeps disappearing from
the online archive.
Now they have two problems. A classic example is trying to use regular expressions to parse significant parts of an
HTML document. See (bobnice 2009).
regex golf See https://alf.nu/RegexGolf, and https://nbviewer.jupyter.org/url/norvig.com/ipython/
xkcd1313.ipynb.
Complexity
A natural next step is to look to do jobs efficiently S Aaronson states it more provocatively as, “[A]s computers
became widely available starting in the 1960s, computer scientists increasingly came to see computability
theory as not asking quite the right questions. For, almost all the problems we actually want to solve turn out
to be computable in Turing’s sense; the real question is which problems are efficiently or feasibly computable.”
(Aaronson 2011)
A Karatsuba See https://en.wikipedia.org/wiki/Anatoly_Karatsuba.
clever algorithm The idea is: let k = ⌈n/2⌉ and write x = x 1 2k + x 0 and y = y1 2k + y0 (so for instance,
678 = 21 · 25 + 6 and 42 = 1 · 25 + 10). Then xy = A · 22k + B · 2k + C where A = x 1y1 , and B = x 1y0 + x 0y1 ,
and C = x 0y0 (for example, 28 476 = 21 · 210 + 216 · 25 + 60). The multiplications by 22k and 2k are just
bit-shifts to known locations independent of the values of x and y , so they don’t affect the time much. But the
two multiplications for B seem remove all the advantage and still give n 2 time. However, Karatsuba noted that
B = (x 0 + x 1 ) · (y0 + y1 ) − A − C Boom: done. Just one multiplication.
table below shows why This table is adapted from (Garey and Johnson 1979).
there are 3.16 × 107 seconds in a year The easy way to remember this is the bumper sticker slogan by Tom Duff
from Bell Labs: “π seconds is a nanocentury.”
very, very much larger than polynomial growth According to an old tale from India, the Grand Vizier Sissa Ben
Dahir was granted a wish for having invented chess for the Indian King, Shirham. Sissa said, “Majesty, give
me a grain of wheat to place on the first square of the board, and two grains of wheat to place on the second
square, and four grains of wheat to place on the third, and eight grains of wheat to place on the fourth, and so
on. Oh, King, let me cover each of the 64 squares of the board.”
“And is that all you wish, Sissa, you fool?” exclaimed the astonished King.
“Oh, Sire,” Sissa replied, “I have asked for more wheat than you have in your entire kingdom. Nay, for more
wheat that there is in the whole world, truly, for enough to cover the whole surface of the earth to the depth of
the twentieth part of a cubit.”
Sissa has the right idea but his arithmetic is slightly off. A cubit is the length of a forearm, from the tip of
the middle finger to the bottom of the elbow, so perhaps twenty inches. The geometric series formula gives
1 + 2 + 4 + · · · + 263 = 264 − 1 = 18 446 744 073 709 551 615 ≈ 1.84 × 1019 grains of rice. The surface are
of the earth, including oceans, is 510 072 000 square kilometers. There are 1010 square centimeters in each
square kilometer so the surface of the earth is 5.10 × 1018 square centimeters. That’s between three and four
grains of rice on every square centimeter of the earth. Not rice an inch thick, but still a lot.
Another way to get a sense of the amount of rice is: there are about 7.5 billion people on earth so it is on the
order of 108 grains of rice for each person in the world. There are about 1 000 000 = 107 grains of rice in a
bushel. In sum, ten bushels for each person.
Cobham’s thesis Credit for this goes to both A Cobham and J Edmonds, separately; see (Cobham 1965) and
(Edmunds 1965).
Jack Edmonds, b 1934 Alan Cobham, 1927–2011
Cobham’s paper starts by asking whether “is it harder to multiply than to add?” a question that we still cannot
answer. Clearly we can add two n -bit numbers in O(n) time, but we don’t know whether we can multiply in
linear time.
Cobham then goes on to point out the distinction between the complexity of a problem and the running time
of a particular algorithm to solve that problem, and notes that many familiar functions, such as addition,
multiplication, division, and square roots, can all be computed in time “bounded by a polynomial in the lengths
of the numbers involved.” He suggests we consider the class of all functions having this property.
As for Edmunds, in a “Digression” he writes: “An explanation is due on the use of the words ‘efficient algorithm.’
According to the dictionary, ‘efficient’ means ‘adequate in operation or performance.’ This is roughly the
meaning I want — in the sense that it is conceivable for [this problem] to have no efficient algorithm. . . . There
is an obvious finite algorithm, but that algorithm increases in difficulty exponentially with the size of the graph.
It is by no means obvious whether or not there exists an algorithm whose difficulty increases only algebraically
with the size of the graph . . . If only to motivate the search for good, practical algorithms, it is important to
realize that it is mathematically sensible even to question their existence.”
tractable Another word that you can see in this context is ‘feasible’. Some authors use them to mean the same
thing, roughly that we can solve reasonably-sized problem instances using reasonable resources. But some
authors use ‘feasible’ to have a different connotation, for instance explicitly disallowing inputs are too large,
such as having too many bits to fit in the physical universe. The word ‘tractable’ is more standard and works
better with the definition that includes the limit as the input size goes to infinity, so here we stick with it.
slower by nine extra assignments Assuming that the compiler does not optimize it out of the loop.
The definition of Big O ignores constant factors This discussion originated as (SE author babou and various others
2015).
the order of magnitude of these constants For a rough idea of that these may be, here are some numbers that every
programmer should know.
Operation Cost in nanoseconds

Cache reference 0.5–7
Branch mispredict 5
Main memory reference 100
Send 1K bytes over 1 Gbps network 10 000
Read 1 MB sequentially from disk 20 000 000
Send packet CA to Netherlands to CA 150 000 000
A nanosecond is 10−9 seconds. For more, see https://www.youtube.com/watch?v=JEpsKnWZrJ8&app=

desktop.
update that standard Even Knuth had to update standards, from his machine model MIX to MMIX.
an important part of the field’s culture That is, these are storied problems.
inventor of the quaternion number system See https://en.wikipedia.org/wiki/History_of_quaternions.
Around the World Another version was called The Icosian Game. See http://puzzlemuseum.com/month/picm02/
200207icosian.htm.
This is the solution given by L Euler The figure is from (Euler 1766).
find the shortest-distance circuit that visits every city Traveling Salesman was first posed by K Menger, in an article
that appeared in the same journal and the same issue as Gödel’s Incompleteness Theorem.
no circuit is possible For each land mass, for each bridge in there must be an associated bridge out. So an at least
necessary condition is that the land masses have an even number of associated edges.
the countries must be contiguous A notable example of a non-contiguous country in the wold today is that Russia is
separated from Kaliningrad, the city that used to be known as Kónigsberg.
we can draw it in the plane This is because the graph comes from a planar map.
inputs a planar graph The graph is undirected and without loops.
Counties of England and the derived planar graph This is today’s map. At the time, some counties were not
contiguous.
it was controversial
See https://www.maa.org/sites/default/files/pdf/upload_library/22/Ford/Swart697-707.pdf.
An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities See
https://www.3quarksdaily.com/3quarksdaily/2018/02/george-boole-and-the-calculus-of-thought-
5.html
conjunctive normal form Any Boolean function can be expressed in that form.
Below are the numbers for the 2020 election Here are the state abbreviations.
State Abbr. State Abbr. State Abbr.

Alabama AL Kentucky KY North Dakota ND
Alaska AK Louisiana LA Ohio OH
Arizona AZ Maine ME Oklahoma OK
Arkansas AR Maryland MD Oregon OR
California CA Massachusetts MA Pennsylvania PA
Colorado CO Michigan MI Rhode Island RI
Connecticut CT Minnesota MN South Carolina SC
Delaware DE Mississippi MS South Dakota SD
District of Columbia DC Missouri MO Oklahoma OK
Florida FL Montana MT Tennessee TN
Georgia GA Nebraska NE Texas TX
Hawaii HI Nevada NV Utah UT
Idaho ID New Hampshire NH Vermont VT
Illinois IL New Jersey NJ Virginia VA
Indiana IN New Mexico NM Washington WA
Iowa IA New York NY Wisconsin WI
Kansas KS North Carolina NC Wyoming WY
ignore some fine points For example, both Maine and Nebraska have two districts, and each elects their own
representative to the Electoral College, rather than having two state-wide electors who vote the same way.
words can be packed into the grid The earliest known example is the Sator Square, five Latin words that pack into
a grid.
S A T O R
A R E P O
T E N E T
O P E R A
R O T A S
It appears in many places in the Roman Empire, often as graffiti. For instance, it was found in the ruins of
Pompeii. Like many word game solutions it sacrifices comprehension for form but it is a perfectly grammatical
sentence that translates as something like, “The farmer Arepo works the wheel with effort.”
popularized as a toy It was invented by Noyes Palmer Chapman, a postmaster in Canastota, New York. As early as
1874 he showed friends a precursor puzzle. By December 1879 copies of the improved puzzle were circulating
in the northeast and students in the American School for the Deaf and other started manufacturing it. They
become popular as the “Gem Puzzle.” Noyes Chapman had applied for a patent in February, 1880. By that time
the game had became a craze in the US, somewhat like Rubik’s Cube a century later. It was also popular in
Canada and Europe. See (Wikipedia contributers 2017a)
we know of no efficient algorithm to find divisors An effort in 2009 to factor a 768-bit number (232-digits) used
hundreds of machines and took two years. The researchers estimated that a 1024-bit number would take about
a thousand times as long.
Factoring seems, as far as we know today, to be hard Finding factors has for many years been thought hard. For
instance, a number is called a Mersenne prime if it is a prime number of the form 2n − 1. They are named after
M Mersenne, a French friar and important figure in the early sharing of scientific results, who studied them
in the early 1600’s. He observed that if n is prime then 2n − 1 may be prime, for instance with n = 3, n = 7,
n = 31, and n = 127. He suspected that others of that form were also prime, in particular n = 67.
On 1903-Oct-31 F N Cole, then Secretary of the American Mathematical Society, made a presentation at a
math meeting. When introduced, he went to the chalkboard and in complete silence computed 267 − 1 =
147 573 952 589 676 412 927. He then moved to the other side of the board, wrote 193 707 721 times
761 838 257 287, and worked through the calculation, finally finding equality. When he was done Cole returned
to his seat, having not uttered a word in the hour-long presentation. His audience gave him a standing ovation.
Cole later said that finding the factors had been a significant effort, taking “three years of Sundays.”
Platonic solids See (Wikipedia contributers 2017j).
as shown Some PDF readers cannot do opacity, so you may not see the entire Hamiltonian path.
Six Degrees of Kevin Bacon One night, three college friends, Brian Turtle, Mike Ginelli, and Craig Fass, were
watching movies. Footloose was followed by Quicksilver, and between was a commercial for a third Kevin Bacon
movie. It seemed like Kevin Bacon was in everything! This prompted the question of whether Bacon had ever
worked with De Niro? The answer at that time was no, but De Niro was in The Untouchables with Kevin Costner,
who was in JFK with Bacon. The game was born. It became popular when they wrote to Jon Stewart about it
and appeared on his show. (From (Blanda 2013).) See https://oracleofbacon.org/.
uniform family of tasks From (Jones 1997).
There is no widely-accepted formal definition of ‘algorithm’ This discussion derives from (Pseudonym 2014).
default interpretation of ‘problem’ Not every computational problem is naturally expressable as a language decision
problem Consider the task of sorting the characters of strings into ascending order. We could try to express it as
the language of sorted strings {σ ∈ Σ∗ σ is sorted }. But this does not require that we find a good way to sort
an unsorted input. Another thought is to consider the language of pairs ⟨σ , p⟩ where p is a permutation of the
numbers 0, ... |σ | − 1 that brings the string into ascending order. But here also the formulation seems to not
capture the sorting problem, in that recognizing a correct permutation feels different than generating one from
scratch.
input two numbers and output their midpoint See https://hal.archives-ouvertes.fr/file/index/docid/
576641/filename/computing-midpoint.pdf.
final two bits are 00 Decimal representation is not much harder since a decimal number is divisible by four if and
only if the final two digits are in the set { 00, 04, ... 96 }.
everything of interest can be represented with reasonable efficiency by bitstrings See https://rjlipton.wordpress.
com/2010/11/07/what-is-a-complexity-class/. Of course, a wag may say that if it cannot be represented
by bitstrings then it isn’t of interest. But we mean something less tautological: we mean that if we could want to
compute with it then it can be put in bitstrings. For example, we find that we can process speech, adjust colors
on an image, or regulate pressure in a rocket fuel tank, all in bitstrings, despite what may at first encounter
seem to be the inherently analog nature of these things.
Beethoven’s 9th Symphony The official story is that CD’s are 72 minutes long so that they can hold this piece.
researchers often do not mention representations This is like a programmer saying, “My program inputs a number”
rather than, “My program inputs the binary representation of a number.” It is also like a person saying, “That’s
me on the card” rather than “On that card is a picture of me.”
leaving implementation details to a programmer (Grossman 2010)
complexity class There are various definitions, which are related but not equivalent. Some authors fold in the
requirement that that a class be associated with some resource specification. This has some implications because
if an author, for instance, requires that each class be problems that are somehow solvable by Turing machines
then each class is countable. Our definition is more general and does not imply that a class is countable.
the time and space behavior We will concentrate our attention resource bounds in the range from logarithmic and
exponential, because these are the most useful for understanding problems that arise in practice.
less than centuries See the video from Google at https://www.youtube.com/watch?v=-ZNEzzDcllU and S Aaron-
son’s Quantum Supremacy FAQ at https://www.scottaaronson.com/blog/?p=4317.
the claim is the subject of scholarly controversy See the posting from IBM Research at https://www.ibm.com/blogs/
research/2019/10/on-quantum-supremacy/ and G Kalai’s Quantum Supremacy Skepticism FAQ at https:
//gilkalai.wordpress.com/2019/11/13/gils-collegial-quantum-supremacy-skepticism-faq/.
We will give the class P a lot of attention This discussion gained much from the material in (Allender, Loui, and
Regan 1997). This includes several direct quotations.
not able rightly to apprehend This refers to a time when C Babbage was looking for money to build a mechanical
computer and was asked in a parliamentary committee, “Pray, Mr. Babbage, if you put into the machine wrong
figures, will the right answers come out?” and he said, “I am not able rightly to apprehend the kind of confusion
of ideas that could provoke such a question.” Babbage for the win.
RE Recall that ‘recursively enumerable’ is an older term for ‘computably enumerable’.
adds some wrinkles But it avoids a wrinkle that we needed for Finite State machines, ε transitions, since Turing
machines are not required to consume their input one character at a time.
function computed by a nondeterministic machine One thing that we can do is to define that the nondeterministic
machine computes f : B∗ → B∗ if on an input σ , all branches halt and they all leave the same value on the
tape, which we call f (σ ). Otherwise, the value is undefined, f (σ )↑.
might be much faster R Hamming gives this example to demonstrate that an order of magnitude change in speed
can change the world, can change what can be done: we walk at 4 mph, a car goes at 40 mph, and an airplane
goes at 400 mph. This relates to the bug picture that opens this chapter.
the problem of chess Chess is known to be a solvable game. This is Zermelo’s Theorem (Wikipedia contributers
2017l) — there is a strategy for one of the two players that forces a win or a draw, no matter how the opponent
plays
at least appears to take exponential time In the terminology of a later section, chess is known to be EXP complete.
See (Fraenkel and Lichtenstein 1981).
in a sense, useless Being given an answer with no accompanying justification is a problem. This is like the Feynman
algorithm for doing Physics: “The student asks . . . what are Feynman’s methods? [M] Gell-Mann leans coyly
against the blackboard and says: Dick’s method is this. You write down the problem. You think very hard. (He
shuts his eyes and presses his knuckles parodically to his forehead.) Then you write down the answer.” (Gleick
1992) It is also like the mathematician S Ramanujan, who relayed that the advanced formulas that he produced
came in dreams from the god Narasimha. Some of these formulas were startling and amazing, but some of
them were wrong. (India Today 2017) And of course the most famous example of a failure to provide backing is
Fermat writing in a book he owned that that there are no nontrivial instances of x n + y n , z n for n > 2 and
then saying, “I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.”
Countdown See https://en.wikipedia.org/wiki/Countdown_(game_show).
quite popular See also https://en.wikipedia.org/wiki/8_Out_of_10_Cats_Does_Countdown.
a graph This is the Petersen graph, a rich source of counterexamples for conjectures in Graph Theory
Drummer problem This is often called the Marriage problem, where the men name suitable women. But perhaps
it is time for a new paradigm.
NP complete The name came from a contest run by Knuth; see http://blog.computationalcomplexity.org/
2010/11/by-any-other-name-would-be-just-as-hard.html.
there are many such problems The “at least as hard” is true in the sense that such problems can answer questions
about any other problem in that class. However, note that it might be that one NP complete problem runs in
nondeterministic time that is O(n) while another runs in O(n 1 000 000 ) time. So this sense is at odds with our
earlier characterization of problems that are harder to solve.
The following list These are from the classic standard reference (Garey and Johnson 1979).
tied up with the question of whether P is unequal to NP Ladner’s theorem is that if P , NP then there is a problem
in NP − P that is not NP complete.
A large class See (Karp 1972).
often an ending point That is, as P Pudlàk observes, we treat P , NP as an informal axiom. (Pudlàk 2013)
caricature Paul Erdős joked that a mathematician is a machine for turning coffee into theorems.
completely within the realm of possibility that ϕ(n) grows that slowly Hartmanis observes in (Hartmanis 2017) that
it is interesting that Gödel, the person who destroyed Hilbert’s program of automating mathematics, seemed to
think that these problems quite possibly are solvable in linear or quadratic time.
In 2018 a poll The poll was conducted by W Gasarch, a prominent researcher and blogger in Computational
Complexity. There were 124 respondents. For the description see https://www.cs.umd.edu/users/gasarch/
BLOGPAPERS/pollpaper3.pdf. Note the suggestions that both respondents and even the surveyor took the
enterprise in a light-hearted way.
88% thought that P , NP Gasarch divided respondents into experts, the people who are known to have seriously
thought about the problem, and the masses. The experts were 99% for P , NP.
Cook is one See (S. Cook 2000).
Many observers For example, (Viola 2018)
O(n lg 7 ) method (lg 7 ≈ 2.81) Strassen’s algorithm is used in practice. The current record is O(n 2 . 37 ) but it is
not practical. It is a galactic algorithm because while runs faster than any other known algorithm when the
problem is sufficiently large, but the first such problem is so big that we never use the algorithm. For other
examples see (Wikipedia contributors 2020b).
Matching problem The Drummer problem described earlier is a special case of this for bipartite graphs.
Even with only a hundred people There are about 1080 atoms in the universe. A graph with 100 vertices has the
100
potential for 2 edges, which is about 1002 . Trying every edge would be 210 000 ≈ 1010 000/3 . 32 cases, which
is much greater than 1080 .
since the 1960’s we have an algorithm Due to J Edmonds.
Theory of Computing blog feed (Various authors 2017)
R J Lipton captured this sense (Lipton 2009)
Knuth has a related but somewhat different take (D. Knuth 2014)
exploits the difference Recent versions of the algorithm used in practice incorporate refinements that we shall not
discuss. The core idea is unchanged.
Their algorithm, called RSA Originally the authors were listed in the standard alphabetic order: Adleman, Rivest,
and Shamir. Adleman objected that he had not done enough work to be listed first and insisted on being listed
last. He said later, “I remember thinking that this is probably the least interesting paper I will ever write.”
tremendous amount of interest and excitement In his 1977 column, Martin Gardner posed a $100 challenge, to
crack this message: 9686 9613 7546 2206 1477 1409 2225 4355 8829 0575 9991 1245 7431 9874 6951
2093 0816 2982 2514 5708 3569 3147 6622 8839 8962 8013 3919 9055 1829 9451 5781 5254 The
ciphertext was generated by the MIT team from a plaintext (English) message using e = 9007 and this number
n (which is too long to fit on one line).
114, 381, 625, 757, 888, 867, 669, 235, 779, 976, 146, 612, 010, 218, 296, 721, 242,
362, 562, 561, 842, 935, 706, 935, 245, 733, 897, 830, 597, 123, 563, 958, 705,
058, 989, 075, 147, 599, 290, 026, 879, 543, 541
In 1994, a team of about 600 volunteers announced that they had factored n .
p =3, 490, 529, 510, 847, 650, 949, 147, 849, 619, 903, 898, 133, 417, 764,
638, 493, 387, 843, 990, 820, 577
and
q = 32, 769, 132, 993, 266, 709, 549, 961, 988, 190, 834, 461, 413, 177, 642, 967,
992, 942, 539, 798, 288, 533
That enabled them to decrypt the message: the magic words are squeamish ossifage.
computer searches suggest that these are very rare For instance, among the numbers less than 2.5 × 1010 there are
only 21 853 ≈ 2.2 × 104 pseudoprimes base 2; that’s six orders of magnitude.
any reasonable-sized k Selecting an appropriate k is an engineering choice between the cost of extra iterations
and the gain in confidence.
we are quite confident that it is prime We are confident, but not certain. There are numbers, called Carmichael
numbers, that are pseudoprime for every base a relatively prime to n . The smallest example is n = 561 = 3 · 11 · 17,
and the next two are 1 105 and 1 729. Like pseudoprimes, these seem to be very rare. Among the numbers
less than 1016 there are 279 238 341 033 922 primes, about 2.7 × 1014 , but only 246 683 ≈ 2.4 × 105 -many
Carmichael numbers.
the minimal pub crawl See (W. Cook et al. 2017).
Appendix
empty string, denoted ε Possibly ε came as an abbreviation for ‘empty’. Some authors use λ , possibly from the
German word for ‘empty’, leer. (Sirén 2016)
reversal σ R of a string The most practical current notion of a string, the Unicode standard, does not have string
reversal. All of the naive ways to reverse a string run into problems for arbitrary Unicode strings which may
contain non-ASCII characters, combining characters, ligatures, bidirectional text in multiple languages, and
so on. For example, merely reversing the chars (the Unicode scalar values) in a string can cause combining
marks to become attached to the wrong characters. Another example is: how to reverse ab<backspace>ab?
The Unicode Consortium has not gone through the effort to define the reverse of a string because there is no
real-world need for it. (From https://qntm.org/trick.)
Credits
Prologue
I.1.11 SE user Shuzheng, https://cs.stackexchange.com/q/45589/50343
I.1.12 Question by SE user Arsalan MGR, https://cs.stackexchange.com/q/
135343/50343
Background
II.2 Image credit: Robert Williams and the Hubble Deep Field Team (STScI) and
NASA.
II.1 Image credit File:Galilee.jpg. (2018, September 27). Wikimedia Commons, the
free media repository. Retrieved 22:19, January 26, 2020 from https://commons.
wikimedia.org/w/index.php?title=File:Galilee.jpg&oldid=322065651.
II.2.39 The answer derives from one by Edward James, along with one by Keith
Ramsay.
II.3.17 User scherk at pbworks.com.
II.3.27 Michael J Neely
II.3.29 Answer from Stack Exchange member Alex Becker.
II.4 ENIAC Programmers, 1946 U. S. Army Photo from Army Research Labs Technical
Library
II.4.5 Started onStack Exchange
II.4.8 From a Stack Exchange question.
II.5.12 SE user npostavs, https://cs.stackexchange.com/a/44875/50343
II.5.29 SE user Raphael https://cs.stackexchange.com/a/44901/50343
II.6.10 Question by SE user MathematicalOrchid, https://cs.stackexchange.
com/q/2811/67754, and answer by SE user Andrej Bauer.
II.6.24 SE user Rajesh R
II.8.13 http://people.cs.aau.dk/~srba/courses/tutorials-CC-10/t5-sol.
pdf
II.8.15 SE user Karolis Juodelė
II.8.18 SE user Noah Schweber
II.9.10 (Rogers 1987), p 214.
II.9.12 (Rogers 1987), p 214.
II.9.15 (Rogers 1987), p 214.
II.A.1 https://www.ias.edu/ideas/2016/pires-hilbert-hotel
II.C.2 https://research.swtch.com/zip and Kevin Matulef
II.C.3 http://en.wikipedia.org/wiki/Quine_%28computing%29
II.C.4 J Avigad Computability and Incompleteness Lecture notes, https://www.
andrew.cmu.edu/user/avigad/Teaching/candi_notes.pdf
Languages
III.1.25 F Stephan, https://www.comp.nus.edu.sg/~fstephan/toc01slides.
pdf
III.1.36 SE user babou
III.2.9 SE user Rick Decker
III.2.16 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.2.19 (Hopcroft, Motwani, and Ullman 2001), exercise 5.1.2.
III.2.31 https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_
buffalo_buffalo_Buffalo_buffalo, https://cse.buffalo.edu/~rapaport/
BuffaloBuffalo/buffalobuffalo.html
III.2.35 http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
III.3.23 T Zaremba, http://www.geom.uiuc.edu/~zarembe/graph3.html.
III.A.10 http://people.cs.ksu.edu/~schmidt/300s05/Lectures/GrammarNotes/
bnf.html
Automata
IV.1.42 From Introduction to Languages by Martin, edition four, p 77.
IV.4.24 (Rich 2008), https://math.stackexchange.com/a/1102627
IV.4.27 (Rich 2008)
IV.5.19 SE user David Richerby, https://cs.stackexchange.com/a/97885/67754
IV.5.23 (Rich 2008)

IV.5.30 SE user Brian M Scott, https://math.stackexchange.com/a/1508488
IV.5.31 https://cs.stackexchange.com/a/30726
IV.6.15 https://www.eecs.wsu.edu/~cook/tcs/l10.html
Complexity
V.4 Some of the discussion is from https://softwareengineering.stackexchange.
com/a/20833.
V.4 Discussion of the third issue started as https://cs.stackexchange.com/
questions/9957/justification-for-neglecting-constants-in-big-o.
V.4 The fourth point derives from https://stackoverflow.com/a/19647659.
V.4 This discussion originated as (SE author templatetypedef 2013).
V.1.50 Stack Exchange user templatetypedef https://stackoverflow.com/a/
19647659/7168267
V.1.52 Stack Exchange user Daniel Fischer, https://math.stackexchange.com/
a/674039, and Stack Exchange user anon, https://math.stackexchange.com/
a/61741
V.1.58 Stack Exchange user Ilmari Karonen, https://math.stackexchange.com/
questions/925053/using-limits-to-determine-big-o-big-omega-and-big-
theta
V.2.24 Sean T. McCulloch, https://npcomplete.owu.edu/2014/06/03/3-dimensional-
matching/
V.2.61 Jan Verschelde, http://homepages.math.uic.edu/~jan/mcs401/partitioning.
pdf
V.3.9 A.A. at https://rjlipton.wordpress.com/2010/11/07/what-is-a-complexity-
class/#comment-8872
V.4.16 https://cs.stackexchange.com/q/57518
V.5.18 Paul Black, https://xlinux.nist.gov/dads/HTML/nondetermAlgo.html
V.6.21 SE user user326210, https://math.stackexchange.com/a/2564255

V.7.9 https://people.cs.umass.edu/~barring/cs311/disc/9.html
V.2 By Psyon (Own work) CC BY-SA 3.0 https://commons.wikimedia.org/wiki/
File:Jigsaw_Puzzle.svg
V.7.12 William Gasarch, https://www.cs.umd.edu/ gasarch/COURSES/452/F14/poly.pdf
V.7.16 http://www.cs.princeton.edu/courses/archive/fall02/cos126/exercises/
np.html
np-sol.html
np.html
V.7.22 Y Lyuu, https://www.csie.ntu.edu.tw/~lyuu/complexity/2016/20161129s.
pdf
V.7.28 SE user Yuval Filmus https://cs.stackexchange.com/a/132902/50343
V.8.17 SE user Yuval Filmas https://cs.stackexchange.com/a/54452/50343
Appendix
Notes
Bibliography
A/V Geeks, YouTube user, ed. (2013). Slide Rule - Proportion, Percentage, Squares And
Square Roots (1944). Division of Visual Aids, US Office of Education. url: https :
//www.youtube.com/watch?v=dT7bSn03lx0 (visited on 08/09/2015).
Aaronson, Scott (Aug. 14, 2011). Why Philosophers Should Care About Computational Com-
plexity. url: https://arxiv.org/abs/1108.1791.
— (May 3, 2012a). The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam
Yedidia and me. url: http://www.scottaaronson.com/blog/?p=2725.
— (Aug. 30, 2012b). The Toaster-Enhanced Turing Machine. url: http://www.scottaaronson.
com/blog/?p=1121 (visited on 05/28/2015).
Adams, Douglas (1979). The Hitchhiker’s Guide to the Galaxy. Harmony Books. isbn:
9780345391803.
Allender, Eric, Michael C. Loui, and Kenneth W. Regan (1997). “Complexity Classes”. In:
ed. by Mikhail J. Atallah and Marina Blanton. Boca Raton, Florida: CRC Press. Chap. 27.
Bellos, Alex (Dec. 15, 2014). “The Game of Life: a beginner’s guide”. In: The Guardian.
url: http://www.theguardian.com/science/alexs-adventures-in-numberland/
2014/dec/15/the-game-of-life-a-beginners-guide (visited on 07/14/2015).
Bernstein, Ethan and Umesh Vazirani (1997). “Quantum Complexity Theory”. In: SIAM
Journal of Compututing 26.5, pp. 1411–1473.
Bigham, D S (Aug. 19, 2014). How Many Vowels Are There in English? (Hint: It’s More Than
AEIOUY.) Slate. url: http://www.slate.com/blogs/lexicon_valley/2014/08/
19/aeiou_and_sometimes_y_how_many_english_vowels_and_what_is_a_vowel_
anyway.html (visited on 06/12/2017).
Black, Robert (2000). “Proving Church’s Thesis”. In: Philosophia Mathematica 8, pp. 244–258.
Blanda, Stephanie (2013). The Six Degrees of Kevin Bacon. [Online; accessed 2019-Apr-
01]. url: https://blogs.ams.org/mathgradblog/2013/11/22/degrees- kevin-
bacon/.
bobnice, stackoverflow user (2009). Answer to: RegEx match open tags except XHTML self-
contained tags. url: https://stackoverflow.com/a/1732454/7168267 (visited on
01/27/2019).
Bragg, Melvyn (Sept. 2016). Zeno’s Paradoxes. Podcast. Guests: Marcus du Sautoy, Barbara
Sattler, and James Warren. British Broadcasting Corporation. url: https://www.bbc.
co.uk/programmes/b07vs3v1.
Brock, David C. (2020). Discovering Dennis Ritchie’s Lost Dissertation. [Online; accessed
2020-Jun-20]. url: https://computerhistory.org/blog/discovering- dennis-
ritchies-lost-dissertation/.
Brower, Kenneth (1983). The Starship and the Canoe. Harper Perennial; Reprint edition. isbn:
978-0060910303.
Church, Alonzo (1937). “Review of Alan M. Turing, On computable numbers, with an
application to the Entscheidungsproblem”. In: Journal of Symbolic Logic 2, pp. 42–43.
Cobham, A (1965). “The intrinsic computational difficulty of functions”. In: Logic, Methodology
and Philosophy of Science: Proceedings of the 1964 International Congress. Ed. by Y Bar-
Hillel. North-Holland Publishing Company, pp. 24–30.
BIBLIOGRAPHY 397
Computing, Free Online Dictionary of (2017). Stephen Kleene. [Online; accessed 21-June-2017].
url: http://foldoc.org/Stephen%20Kleene.
Cook, Stephen (2000). The P vs NP Problem. Official problem description. Clay Mathematics
Institute. url: https: // www.claymath .org /sites/ default/ files/ pvsnp .pdf
(visited on 01/11/2018).
Cook, William et al. (2017). UK Pubs Travelling Salesman Problem. url: http://www.math.
uwaterloo.ca/tsp/pubs/index.html (visited on 12/16/2017).
Copeland, B. J. and D. Proudfoot (1999). “Alan Turing’s Forgotten Ideas in Computer
Science”. In: Scientific American 280.4, pp. 99–103.
Copeland, B. Jack (Sept. 1996). “What is Computation?” In: Computation, Cognition and AI,
pp. 335–359.
— (1999). “Beyond the universal Turing machine”. In: Australasian Journal of Philosophy
77.1, pp. 46–67.
— (Aug. 19, 2002). The Church-Turing Thesis; Misunderstandings of the Thesis. url: http://
plato.stanford.edu/entries/church-turing/#Bloopers (visited on 01/07/2016).
Cox, Russ (2007). Regular Expression Matching Can Be Simple And Fast (but is slow in Java,
Perl, PHP, Python, Ruby, . . .) url: https://swtch.com/~rsc/regexp/regexp1.html
(visited on 06/29/2019).
Davis, Martin (2004). “The Myth of Hypercomputation”. In: Alan Turing: Life and Legacy of
a Great Thinker. Ed. by Christof Teuscher. Springer, pp. 195–211. isbn: ISBN 978-3-662-
05642-4.
— (2006). “Why there is no such discipline as hypercomputation”. In: Applied Mathematics
and Computation 178, pp. 4–7.
Dershowitz, Nachum and Yuri Gurevich (Sept. 2008). “A Natural Axiomatization of Com-
putability and Proof of Church’s Thesis”. In: Bulletin of Symbolic Logic 14.3, pp. 299–
350.
Edmunds, Jack (1965). “Paths, trees, and flowers”. In: Canadian Journal of Mathematics 17,
pp. 449–467.
Encyclopædia Britannica, The Editors of (2017). Y2K bug. url: https://www.britannica.
com/technology/Y2K-bug (visited on 05/10/2017).
Euler, L (1766). “Solution d’une question curieuse que ne paroit soumise a aucune analyse
(Solution of a curious question which does not seem to have been subjected to any
analysis)”. In: Mémoires de l’Academie Royale des Sciences et Belles Lettres, Année 1759 15.
[Online; accessed 2017-Sep-23, article 309], pp. 310–337. url: http://eulerarchive.
maa.org/.
Firth, Niall (Oct. 14, 2009). “Sir Tim Berners-Lee admits the forward slashes in every
web address ‘were a mistake’”. In: Daily Mail. url: https://www.dailymail.co.
uk / sciencetech / article - 1220286 / Sir - Tim - Berners - Lee - admits - forward -
slashes-web-address-mistake.html (visited on 11/29/2018).
Fortnow, Lance and Bill Gasarch (2002). Computational Complexity Blog. [Online; ac-
cessed 2017-Nov-13]. url: http://blog.computationalcomplexity.org/2002/11/
foundations-of-complexitylesson-7.html.
Fraenkel, Aviezri S. and David Lichtenstein (1981). “Computing a Perfect Strategy for n × n
Chess Requires Time Exponential in n ”. In: Journal Of Combinatorial Theory, Series A,
pp. 199–214.
398 BIBLIOGRAPHY
Gandy, Robin (1980). “Church’s Thesis and Principles for Mechanisms”. In: The Kleene
Symposium. Ed. by J. Barwise, H. J. Keisler, and K Kunen. North-Holland Amsterdam,
pp. 123–148. isbn: 978-0-444-85345-5.
Garey, Michael and David Johnson (1979). Computers and Intractability, A Guide to the
Theory of NP Completeness. W. H. Freeman.
Gleick, James (Sept. 20, 1992). “Part Showman, All Genius”. In: New York Times Magazine.
url: https : / / www . nytimes . com / 1992 / 09 / 20 / magazine / part - showman - all -
genius.html (visited on 11/27/2020).
Gödel, K. (1964). “What is Cantor’s Continuum Problem?” In: Philosophy of Mathematics:
Selected Readings. Ed. by Paul Benacerraf and Hilary. Putnam. Cambridge University
Press, pp. 470–494.
— (1995). “Undecidable diophantine propositions”. In: Collected works Volume III: Unpub-
lished essays and lectures. Ed. by S. Feferman et al. Oxford University Press.
Goodstein, R. L. (Dec. 1947). “Transfinite Ordinals in Recursive Number Theory”. In: Journal
of Symbolic Logic 12.4, pp. 123–129.
Grossman, Lisa (2010). Metric Math Mistake Muffed Mars Mereorology Misstion. [Online;
accessed 2017-May-25]. url: https://www.wired.com/2010/11/1110mars-climate-
observer-report/.
Hartmanis, J (2017). Gödel, von Neumann and the P =?NP Problem. url: http://www.cs.
cmu.edu/~15455/hartmanis-on-godel-von-neumann.pdf (visited on 12/25/2017).
Hennie, Fred (1977). Introduction to Computability. Addison-Wesley.
Hilbert, David and Wilhelm Ackermann (1950). Principles of theoretical logic. Trans. by
R E Luce. AMS Chelsea Publishing.
Hodges, Andrew (2016). Alan Turing in the Stanford Encyclopedia of Philosophy. url: http:
//www.turing.org.uk/publications/stanford.html (visited on 04/06/2016).
— (1983). Alan Turing: the enigma. Simon and Schuster. isbn: 0-671-49207-1.
Hofstadter, Douglas R. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books.
Hopcroft, John E, Rajeev Motwani, and Jeffrey D Ullman (2001). Introduction to Automata
Theory, Languages, and Computation. 2nd ed. Pearson Education. isbn: 0201441241.
Huggett, Nick (2010). Zeno’s Paradoxes — Stanford Encyclopedia of Philosophy. [Online;
accessed 23-Dec-2016]. url: https : / / plato . stanford . edu / entries / paradox -
zeno/#ParMot.
India Today (Apr. 26, 2017). “Srinivasa Ramanujan: The mathematical genius who credited his
3900 formulae to visions from Goddess Mahalakshmi”. In: India Today. url: https://
www.indiatoday.in/education-today/gk-current-affairs/story/srinivasa-
ramanujan-life-story-973662-2017-04-26 (visited on 11/27/2020).
Joel David Hamkins, mathoverflow.net user (2010). Answer to: Infinite CPU clock rate and
hotel Hilbert. url: https://mathoverflow.net/a/22038 (visited on 04/19/2017).
Jones, Neil D. (1997). Computability and Complexity From a Programming Perspective. 1st ed.
MIT Press. isbn: 978-0262100649.
Karp, Richard M (1972). “Reducibility Among Combinatorial Problems”. In: ed. by R. E.
Miller and J. W. Thatcher. New York: Plenum, pp. 85–103. url: https://www.loc.gov/
resource/cph.3c10471/ (visited on 12/21/2017).
Kleene, Stephen (1952). Introduction to Metamathematics. North-Holland Amsterdam.
Klyne, G. and C. Newman (July 2002). Date and Time on the Internet: Timestamps. RFC 3339.
RFC Editor, pp. 1–18. url: https://www.ietf.org/rfc/rfc3339.txt.
BIBLIOGRAPHY 399
Knuth, Donald (May 20, 2014). Twenty Questions for Donald Knuth. url: http://www.
informit.com/articles/article.aspx?p=2213858 (visited on 02/17/2018).
Knuth, Donald E. (Dec. 1964). Backus Normal Form vs. Backus Naur Form. Letter to the
Editor.
Knuutila, Timo (2001). “Redescribing an algorithm by Hopcroft”. In: Theoretical Computer
Science 250, pp. 333–363.
Kragh, Helge (Mar. 27, 2014). The True (?) Story of Hilbert’s Infinite Hotel. url: http :
//arxiv.org/abs/1403.0059.
Leupold, Jacob, 1674–1727 (1725). “Details of the mechanisms of the Leibniz calculator, the
most advanced of its time”. In: Illustration in: Theatrum arithmetico-geometricum, das
ist . . . [bound with Theatrum machinarium, oder, Schau-Platz der Heb-Zeuge/Jacob
Leupold. Leipzig, 1725]. Leipzig: Zufinden bey dem Autore und Joh. Friedr. Gleditschens
seel. Sohn: Gedruckt bey Christoph Zunkel, 1727. url: https : / / www . loc . gov /
resource/cph.3c10471/ (visited on 11/14/2016).
Levin, Leonid A. (Dec. 7, 2016). Fundamentals of Computing. url: https://www.cs.bu.
edu/fac/lnd/toc/.
Lipton, Richard Jay (Sept. 22, 2009). It’s All Algorithms, Algorithms and Algorithms.
url: https : / / rjlipton . wordpress . com / 2009 / 09 / 22 / its - all - algorithms -
algorithms-and-algorithms/ (visited on 02/17/2018).
Maienschein, Jane (2017). “Epigenesis and Preformationism”. In: The Stanford Encyclopedia
of Philosophy. Ed. by Edward N. Zalta. Spring 2017. Metaphysics Research Lab, Stanford
University.
McCarthy, John (1963). A Basis for a Mathematical Theory of Computation. url: http://www-
formal.stanford.edu/jmc/basis1.pdf (visited on 06/15/2017).
Meyer, Albert R. and Dennis M. Ritchie (1966). Research report: The complexity of loop
programs. Tech. rep. 1817. IBM.
N. J. A. Sloane, editor (2019). The On-Line Encyclopedia of Integer Sequences, A000290. url:
https://oeis.org/A000290 (visited on 03/02/2019).
navyreviewer, YouTube user (2010). Mechanical computer part 1. url: https : / / www .
youtube.com/watch?v=mpkTHyfr0pM (visited on 08/09/2015).
Odifreddi, Piergiorgio (1992). Clasical Recursion Theory. Elsevier Science. isbn: 0-444-87295-7.
Piccinini, Gualtiero (2017). “Computation in Physical Systems”. In: The Stanford Encyclopedia
of Philosophy. Ed. by Edward N. Zalta. Summer 2017. Metaphysics Research Lab, Stanford
University.
Pinker, Steven (Sept. 4, 2014). The Trouble With Harvard. url: https://newrepublic.com/
article/119321/harvard-ivy-league-should-judge-students-standardized-
tests (visited on 12/23/2020).
Pour-El, M. B. and I. Richards (1981). “The wave equation with computable initial data such
that its unique solution is not computable”. In: Adv. in Math 39, pp. 215–239.
Pseudonym, cs.stackexchange user (2014). Answer to: What exactly is an algorithm? url:
https://cs.stackexchange.com/a/31953 (visited on 12/27/2018).
Pudlàk, Pavel (2013). Logical Foundations of Mathematics and Computational Complexity.
Springer. isbn: 978-3-319-34268-9.
R H Bruck (n.d.). “Computational Aspects of Certain Combinatorial Problems”. In: AMS
Symposium in Applied Mathematics 6, p. 31.
400 BIBLIOGRAPHY
Radó, Tibor (May 1962). “On Non-computable Functions”. In: Bell Systems Technical Journal,
pp. 877–884. url: https : / / ia601900 . us . archive . org / 0 / items / bstj41 - 3 -
877/bstj41-3-877.pdf.
Rendell, Paul (2011). http://rendell-attic.org/gol/tm.htm. url: http://rendell-attic.
org/gol/tm.htm (visited on 07/21/2015).
Rich, Elaine (2008). Automata, Computability, and Complexity. Pearson. isbn: 978-0-13-
228806-4.
Robinson, Raphael (1948). “Recursion and Double Recursion”. In: Bulletin of the American
Mathematical Society 10, pp. 987–993.
Rogers Jr., Hartley (Sept. 1958). “Gödel numberings of partial recursive functions”. In:
Journal of Symbolic Logic 23.3, pp. 331–341.
— (1987). Theory of Recursive Functions and Effective Computability. MIT Press. isbn:
0-262-68052-1.
Scott, Brian M. (Feb. 14, 2020). Inverting the Cantor pairing function. Stack Exchange
user http : / / math . stackexchange . com / users / 12042 / brian - m - scott. url:
http://math.stackexchange.com/q/222835 (visited on 10/28/2012).
SE author Andrej Bauer (2016). Answer to: Is a Turing Machine “by definition” the most
powerful machine? [Online; accessed 2017-Nov-05]. Stack Overflow discussion board.
url: https://cs.stackexchange.com/a/66753/78536.
— (2018). Answer to: Problems understanding proof of smn theorem using Church-Turing
thesis. [Online; accessed 2020-Feb-13]. Stack Overflow discussion board. url: https:
//cs.stackexchange.com/a/97946/67754.
SE author babou and various others (2015). Justification for neglecting constants in Big O.
[Online; accessed 2017-Oct-29]. Computer Science Stack Exchange discussion board.
url: https://cs.stackexchange.com/a/41000/78536.
SE author David Richerby (2018). Why is there no permutation in Regexes? (Even if regular
languages seem to be able to do this). [Online; accessed 2020-Jan-01]. Stack Overflow
discussion board. url: https://cs.stackexchange.com/a/100215/67754.
SE author JohnL (2020). How to decide whether a language is decidable when not involving
turing machines? [Online; accessed 2020-Jun-11]. Computer Science Stack Exchange
discussion board. url: https://cs.stackexchange.com/a/127035/67754.
SE author Kaktus and various others (2019). Georg Cantor’s diagonal argument, what
exactly does it prove? [Online; accessed 2019-Dec-25]. Computer Science Stack Exchange
discussion board. url: https://math.stackexchange.com/q/2176304.
SE author templatetypedef (2013). What is pseudopolynomial time? How does it differ from
polynomial time? [Online; accessed 2017-Oct-29]. Stack Overflow discussion board. url:
https://stackoverflow.com/a/19647659.
SE user Ryan Williams (Sept. 2, 2010). Comment to answer for What would it mean to disprove
Church-Turing thesis? url: https://cstheory.stackexchange.com/a/105/4731
(visited on 06/24/2019).
Sipser, Michael (2013). Introduction to the Theory of Compuation. 3rd ed. Cengage. isbn:
978-1-133-18779-0.
Sirén, Jouni (CS StackExchange user) (2016). Answer to: What is the origin of λ for empty
string? Accessed 2016-October-20. url: http://cs.stackexchange.com/a/64850/
50343.
Smoryński, Craig (1991). Logical Number Theory I. Springer-Verlag.
BIBLIOGRAPHY 401
Soare, Robert I. (1999). “Computability and Incomputability”. In: Handbook of Computability

Theory. Ed. by E. R. Griffor. North-Holland, Amsterdam, pp. 3–36.
Thompson, Ken (Aug. 1984). “Reflections on trusting trust”. In: Communications of the ACM
27 (8), pp. 761–763.
Thomson, James F. (Oct. 1954). “Tasks and Super-Tasks”. In: Analysis 15.1, pp. 1–13.
Turing, A. M. (1937). “On Computable Numbers, with an Application to the Entschei-
dungsproblem”. In: Proceedings of the London Mathematical Society. 2nd ser. 42.
— (1938). “Systems of Logic Based on Ordinals”. PhD. Princeton University.
U.S. Naval Observatory, Time Service Dept. (2017). Leap Seconds. [Online; accessed 10-May-
2017]. url: http://tycho.usno.navy.mil/leapsec.html.
Unknown (1948). UCLA’s 1948 Mechanical Computer. Accessed 2019-September-18. url:
https://vimeo.com/70589461.
Various authors (2017). Theory of Computing Blog Aggregator. [Online; accessed 17-May-2017].
url: http://cstheory-feed.org/.
Viola, Emanuele (Feb. 16, 2018). I believe P=NP. url: https://emanueleviola.wordpress.
com/2018/02/16/i-believe-pnp/ (visited on 02/16/2018).
Widgerson, Avi (2017). Mathematics and Computation. [Draft of a to-be-published book;
accessed 2017-Oct-27]. url: https://www.math.ias.edu/avi/book.
Wikipedia (2016). The Imitation Game — Wikipedia, The Free Encyclopedia. [Online; accessed
28-June-2016]. url: https : / / en . wikipedia . org / w / index . php ? title = The _
Imitation_Game&oldid=723336480.
Wikipedia contributers (2014). History of the Church–Turing thesis — Wikipedia, The Free
Encyclopedia. [Online; accessed 2-October-2016]. url: https://en.wikipedia.org/
w/index.php?title=History_of_the_Church%E2%80%93Turing_thesis&oldid=
618643863.
— (2015a). Plonk (Usenet) — Wikipedia, The Free Encyclopedia. [Online; accessed 20-April-
2016]. url: https://en.wikipedia.org/w/index.php?title=Plonk_(Usenet)
&oldid=687617103.
— (2015b). Stigler’s law of eponymy — Wikipedia, The Free Encyclopedia. url: https:
//en.wikipedia.org/w/index.php?title=Stigler%27s_law_of_eponymy&oldid=
691378684 (visited on 02/14/2016).
— (2016a). Age of the Earth — Wikipedia, The Free Encyclopedia. [Online; accessed 13-June-
2016]. url: https://en.wikipedia.org/w/index.php?title=Age_of_the_Earth&
oldid=724796250.
— (2016b). Donald Michie — Wikipedia, The Free Encyclopedia. [Online; accessed 24-March-
2016]. url: https://en.wikipedia.org/w/index.php?title=Donald_Michie&
oldid=708156000 (visited on 03/24/2016).
— (2016c). Nomogram — Wikipedia, The Free Encyclopedia. [Online; accessed 6-October-
2016]. url: https://en.wikipedia.org/w/index.php?title=Nomogram&oldid=
742964268.
— (2016d). Ross–Littlewood paradox — Wikipedia, The Free Encyclopedia. [Online; accessed
9-February-2017]. url: https://en.wikipedia.org/w/index.php?title=Ross%E2%
80%93Littlewood_paradox&oldid=739534216.
— (2016e). Turtles all the way down — Wikipedia, The Free Encyclopedia. [Online; accessed
2016-September-04]. url: https : / / en . wikipedia . org / w / index . php ? title =
Turtles_all_the_way_down&oldid=736001775.
402 BIBLIOGRAPHY
Wikipedia contributers (2016f). Zeno’s paradoxes — Wikipedia, The Free Encyclopedia. [Online;
accessed 23-December-2016]. url: %5Curl%7Bhttps://en.wikipedia.org/w/index.
php?title=Zeno%27s_paradoxes&oldid=752685211%7D.
— (2017a). 15 puzzle — Wikipedia, The Free Encyclopedia. [Online; accessed 16-September-
2017]. url: https://en.wikipedia.org/w/index.php?title=15_puzzle&oldid=
789930961.
— (2017b). Almon Brown Strowger — Wikipedia, The Free Encyclopedia. [Online; accessed
9-June-2017]. url: https://en.wikipedia.org/w/index.php?title=Almon_Brown_
Strowger&oldid=783883144.
— (2017c). Artificial neuron — Wikipedia, The Free Encyclopedia. [Online; accessed 21-June-
2017]. url: https://en.wikipedia.org/w/index.php?title=Artificial_neuron&
oldid=780239713.
— (2017d). Aubrey–Maturin series — Wikipedia, The Free Encyclopedia. [Online; accessed
28-March-2017]. url: https://en.wikipedia.org/w/index.php?title=Aubrey%E2%
80%93Maturin_series&oldid=771937634.
— (2017e). Backus–Naur form — Wikipedia, The Free Encyclopedia. [Online; accessed
7-May-2017]. url: https://en.wikipedia.org/w/index.php?title=Backus%E2%
80%93Naur_form&oldid=778354081.
— (2017f). Magic smoke — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-October-
11]. url: https://en.wikipedia.org/w/index.php?title=Magic_smoke&oldid=
785207817.
— (2017g). North American Numbering Plan — Wikipedia, The Free Encyclopedia. [Online;
accessed 9-June-2017]. url: https://en.wikipedia.org/w/index.php?title=
North_American_Numbering_Plan&oldid=780178791.
— (2017h). Ouija — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2017]. url:
https://en.wikipedia.org/w/index.php?title=Ouija&oldid=776109372.
— (2017i). Pax Britannica — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-
2017]. url: https://en.wikipedia.org/w/index.php?title=Pax_Britannica&
oldid=775067301.
— (2017j). Platonic solid — Wikipedia, The Free Encyclopedia. [Online; accessed 2017-
October-22]. url: https://en.wikipedia.org/w/index.php?title=Platonic_
solid&oldid=801264236.
— (2017k). Unicode — Wikipedia, The Free Encyclopedia. url: https://en.wikipedia.
org/w/index.php?title=Unicode&oldid=784443067.
— (2017l). Zermelo’s theorem (game theory) — Wikipedia, The Free Encyclopedia. [Online;
accessed 2017-Nov-26]. url: https://en.wikipedia.org/w/index.php?title=
Zermelo%27s_theorem_(game_theory)&oldid=806070716.
— (2018). Paradox — Wikipedia, The Free Encyclopedia. [Online; accessed 14-December-
2018]. url: https://en.wikipedia.org/w/index.php?title=Paradox&oldid=
871193884.
Wikipedia contributors (2017). Philipp von Jolly — Wikipedia, The Free Encyclopedia. [Online;
accessed 30-January-2019]. url: https://en.wikipedia.org/w/index.php?title=
Philipp_von_Jolly&oldid=764485788.
— (2019a). Collatz conjecture — Wikipedia, The Free Encyclopedia. [Online; accessed
15-February-2019].
— (2019b). Mathematics: The Loss of Certainty — Wikipedia, The Free Encyclopedia. [Online;
accessed 30-January-2019]. url: https://en.wikipedia.org/w/index.php?title=
Mathematics:_The_Loss_of_Certainty&oldid=879406248.
— (2019c). Maxwell’s demon — Wikipedia, The Free Encyclopedia. [Online; accessed 1-
January-2020]. url: %5Curl%7Bhttps://en.wikipedia.org/w/index.php?title=
Maxwell%27s_demon&oldid=930445803%7D.
— (2019d). Partial application — Wikipedia, The Free Encyclopedia. [Online; accessed
26-December-2019].
— (2020a). Foobar — Wikipedia, The Free Encyclopedia. [Online; accessed 2020-Feb-14].
url: https://en.wikipedia.org/w/index.php?title=Foobar&oldid=934819128.
— (2020b). Galactic algorithm — Wikipedia, The Free Encyclopedia. [Online; accessed
2020-Jun-17]. url: https://en.wikipedia.org/w/index.php?title=Galactic_
algorithm&oldid=957279293.
William S. Renwick (May 6, 1949). The start of the EDSAC log. [Online; accessed 2019-Mar-02].
url: https://www.cl.cam.ac.uk/relics/elog.html.
Index
+ operation on a language, 217 picture, 170
15 Game problem, 283, 297 Backus-Naur form, BNF, 170
3 Dimensional Matching problem, 327 Berra, Y
3-SAT , see 3-Satisfiability problem picture, 189
3-Satisfiability problem, 279, 296, 324, 327, Big O, 263
338 Big Θ, 265
4-Satisfiability problem, 337 bijection, 359
bit string, see bitstring
accept a language, 147 bitstring, 354
accept an input, 184, 193, 196 blank, 8, 306
acceptable numbering, 73 blank, B, 5
accepted language, see recognized, 291 BNF, 170–175
accepting state, 13, 179, 306 body of a production, 150
nondeterministic Finite State machine,
Boole, G
192
picture, 278
Pushdown machine, 236
boolean, 279
accepts, 180, 184, 193, 196
expression, 279
Ackermann function, 30–33, 35, 49–53
function, 279
Ackermann, W, 3
variable, 279
picture, 33
bottom, ⊥, 236
action set, 8
BPP, Bounded-Error Probabilistic Polynomial
action symbol, 8
Time problem, 343
addition, 6
bridge, 296
adjacency matrix, 163
Broadcast problem, 281
adjacent, 162
Busy Beaver, 132–135
Adleman, L
button
picture, 345
start, 5
Agrawal, M
picture, 284
c.e. set, see computably enumerable
AKS primality test, 284
caching, 71
algorithm, 290
Cantor’s correspondence, 68–76
definition, 290
Cantor’s Theorem, 78
reliance on model, 290
alphabet, 145, 354 Cantor, G
input, 179 and diagonalization, 372
Pushdown machine, 236 picture, 63
tape, 8 cardinality, 61–83
amb function, 381 less than or equal to, 78
ambiguous grammar, 155 Chromatic Number problem, 278
argument, to a function, 356 Church’s Thesis, 14–21
Aristotle’s Paradox, 61, 63 and uncountability, 79
Assignment problem, 324 argument by, 19
asymptotically equivalent, 266, 274 clarity, 17
atom, 285 consistency, 16
convergence, 16
Backus, J coverage, 15
Extended, 301 K is complete, 114
Church, A computably enumerable set, 107
picture, 14 collection of, RE, 304
Thesis, 15 computation
circuit, 163, 300 distributed, 290
Euler, 163 Finite State machine, 184
gate, 300 nondeterministic Finite State machine,
Hamiltonian, 163 192, 196
wire, 300 relative to an oracle, 112
Circuit Evaluation problem, 300 step, 8
class, 146, 298 Turing machine, 9
complexity, 298 concatenation of languages, 147
Clique problem, 281, 291, 303, 320, 328 concatenation of strings, 354
clique, in a graph, 281 configuration, 8, 184, 192, 195
closed walk, 162 halting, 184, 192, 196
closure under an operation, 214 initial, 8
CNF, Conjunctive Normal Form, 279 conjunctive normal form, 43, 279
co-NP, 310 connected component, 296
Cobham’s Thesis, 269 connected graph, 162
Cobham, A context free
picture, 383 grammar, 151
Thesis, 269 language, 243
codomain, 356 control, of a machine, 5
codomain versus range, 357 converge, 10
Collatz conjecture, 100 Conway, J
coloring of a graph, 164 picture, 46
colors, 278 Cook reducibility, 321
complete, 114 Cook, S
for a class, 326 picture, 325
NP, 326 Cook-Levin theorem, 326
complete graph, 166 correspondence, 62, 359
complexity class, 298 Cantor’s, 70
canonical, 342 countable, 64
P, 299 countably infinite, 64
polytime, 299 Course Scheduling problem, 287
complexity function, 263 CPU of Turing machine, 5
Complexity Zoo, 299, 343 Crossword problem, 283
Composite problem, 284, 291, 296, 311 current symbol, 8
composition, 359 CW, 167
computable cycle, 162
relative to a set, 112 Cyclic Shift problem, 321
set, 107
computable function, 11 daemon, see demon
computable relation, 11 DAG, directed acyclic graph, 162
computable set, 11 dangling else, 155
computably enumerable, 107–111 De Morgan, A
in an oracle, 117 picture, 277
decidable, 291 picture, 383
language, 102 effective, 3
set, 107 effective function, 9–11
decidable language, 304 Electoral College, 283
decide a language, 147 empty language
decided, 13 decision problem, 295
decided language, 184, 291 empty string, ε , 8, 354
of a nondeterministic Turing machine, encrypter, 345
307 Entscheidungsproblem, 3, 14, 291, 333
decider, 11 unsolvability, 100
decides enumerate, 64
language, 304 ε moves, 194
set, 11 ε transitions, 194–197
decides language, 304 equinumerous sets, 63
decision problem, 3, 291 equivalent growth rates, 265
decrypter, 345 Euler Circuit problem, 277, 297
degree of a vertex, 165 Euler circuit, 163
degree sequence, 165 Euler, L
demon, or daemon, 191 picture, 276
derivation, 151, 153 eval, 85
derivation tree, 151 EXP, 340
description number, 73 expansion of a production, 150
determinism, 8, 16 Extended Church’s Thesis, 301
diagonal enumeration, 70 extended regular expression, 244
diagonalization, 76–83, 117 extended transition function, 184
effectivized, 93 for nondeterministic Finite State ma-
digraph, 162 chines, 196
directed acyclic graph, 162 nondeterministic Finite State machine,
directed graph, 162 193
Discrete Logarithm problem, 296
disjunctive normal form, DNF, 43 F − SAT problem, 296
distinguishable states, 226 Factoring problem, 331, 345
distributed computation, 290 Prime Factorization problem, 296
diverge, 10 Fibonacci numbers, 29
Divisor problem, 284, 296 final state, 179
domain, 356, 357 nondeterministic Finite State machine,
Double-SAT problem, 315 192
doubler function, 3, 13 finite set, 64
dovetailing, 107 Finite State automata, see Finite State ma-
Droste effect, 376 chine
Drummer problem, 323 Finite State machine, 178–189
DSPACE, 341 accept string, 184, 196
DTIME, 341 accepting state, 179
alphabet, 179
edge, 161 computation, 184
edge weight, 162 configuration, 184
Edmunds, J final state, 179
halting configuration, 184 left inverse, 359
initial configuration, 184 logarithmic growth, 267
input string, 184 µ recursive function (mu recursive), 35
language of, 184 next-state, 8, 179
minimization, 225–235 one-to-one, 62
next-state function, 179 one0-to-one, 358
nondeterministic, 192 onto, 62, 358
powerset construction, 198 order of growth, 263
product, 214 output, 356
reject string, 184 pairing, 69, 70
state, 179 partial, 10, 357
step, 184 partial recursive, 35
transition function, 179 polynomial growth, 267
Fixed point theorem, 117–123 predecessor, 6
discussion, 120–122 projection, 24, 35
Flauros, Duke of Hell range, 357
picture, 191 recursive, 11, 35
flow, 323 reduction, 317
flow chart, 85 restriction, 357
Four Color problem, 278 right inverse, 359
function, 356 successor, 12, 21, 24, 35
91 (McCarthy), 30 surjection, 358
Ackermann, 50 total, 10, 357
argument, 356 transition, 8, 179, 306
Big O, 263 unpairing, 69, 70
Big Θ, 265 value, 356
boolean, 279 well-defined, 356, 357
codomain, 356 zero, 24, 35
composition, 359 function problem, 290
computable, 11 functions, 356–360
computed by a Turing machine, 9 same behavior, 102
converge, 10 same order of growth, 265
correspondence, 62, 359
definition, 356 Galilei, G (Galileo)
diverge, 10 picture, 61
domain, 356 Galileo, 61
doubler, 3, 13 Galileo’s Paradox, 61, 63, 65
effective, 3 Game of Life, 46–49
enumeration, 64 gate, 43, 300
exponential growth, 267 general recursion, 30–37
extended transition, 184 general recursive function, 35
general recursive, 35 general unsolvability, 96–100
identity, 359 Gödel number, 73
image under, 357 Gödel, K, 14
index, 356 letter to von Neumann, 333
injection, 358 picture, 15
inverse, 359 picture with Einstein, 126
Gödel’s theorem, 14 planar, 167, 278
Goldbach’s conjecture, 102 representation, 163–164
grammar, 150–161 simple, 161
ambiguous, 155 spanning subgraph, 280
Backus-Naur form, BNF, 170 subgraph, 163
BNF, Backus-Naur form, 170 trail, 162
body of a production, 150 transition, 7
context free, 151 traversal, 162–163
derivation, 151 tree, 162, 280
expansion of a production, 150 vertex, 161
head, 150 vertex cover, 280
linear, 203 vertex degree, 165
nonterminal, 151 walk, 162
production, 150, 152 walk length, 162
rewrite rule, 150, 152 weighted, 162
right linear, 203 Graph Colorability problem, 278, 297, 319
start symbol, 152 Graph Connectedness problem, 296, 298
syntactic category, 151 Graph Isomorphism problem, 296, 331
terminal, 151 Grassmann, H, 22
graph, 161–170 picture, 21
adjacent edges, 162 guessing by a machine, 191, 194
bridge edge, 296
circuit, 163 Hailstone function, 100
clique, 281 Halt light, 5
closed walk, 162 halting configuration, 184, 192, 196
coloring, 164 Halting problem, 92–94, 102
colors, 278 as a decision problem, 297
complete, 166 discussion, 94–95
connected, 162 in wider culture, 124–126
connected component, 296 reduction to another problem, 99
cycle, 162 significance, 95–96
degree sequence, 165 unsolvability, 94
digraph, 162 halting state, 12
directed, 162 Halts on Three problem, 316
directed acyclic, 162 Hamilton, W R
edge, 161 picture, 275
edge weight, 162 Hamiltonian circuit, 163
Euler circuit, 163 Hamiltonian Circuit problem, 276, 297, 311,
Hamiltonian circuit, 163 322, 328
induced subgraph, 163 Hamiltonian Path problem, 311, 337
isomorphism, 164–165 hard
loop, 162 for a class, 326
matrix representation, 163 NP, 326
multigraph, 162 haystack, 300
node, 161 head
open walk, 162 read/write, 4
path, 162 head of a production, 150
Hilbert’s Hotel, 123–124 picture, 203
Hilbert, D, 3 Kn , 166
picture, 125 Knapsack problem, 282, 315, 328
Hofstadter, D, 377 Knight’s Tour problem, 276
hyperoperation, 32 Knuth, D
picture, 270
I/O head, see read/write head Kolmogorov, A
identity function, 359 picture, 259
Ignorabimus, 125 Königsberg, 277
image under a function, 357
Incompleteness theorem, 14 L’Hôpital’s Rule, 266
Independent Set problem, 288, 298, 322, 324, lambda calculus, λ calculus, 15
338 language, 145–150
index number, 73 + operation, 217
index set, 103 accept, 147
induced subgraph, 163 accepted by a Finite State machine, see
infinite set, 64 language,recognized by a Finite
infinity, 61–83 State machine
initial configuration, 8, 184, 192, 195 accepted by Turing machine, 106, 291
injection, 358 class, 146
input alphabet, 179 concatenation, 147
input string, 184, 192, 195 context free, 243
input symbol, 8 decidable, 102, 304
input, to a function, 356 decide, 147
instruction, 5, 8, 306 decided, 291
Integer Linear Programming problem, 314, 324 decided by a Finite State machine, 184
inverse of a function, 359 decided by a Turing machine, 13, 304
left, 359 decision problem, 291
right, 359 derived from a grammar, 153
two-sided, 359 grammar, 151
isomorphic, 165 Kleene star, 147
isomorphism, 165 non-regular, 219–225
of a Finite State machine, 184
K , the Halting problem set, 93, 109 of a nondeterministic Finite State ma-
complete among c.e. sets, 114 chine, 193
K 0 , set of halting pairs, 101, 110, 113 operations on, 147
Karatsuba, A, 260 power, 147
Karp reducible, 317 recognize, 147
Karp, R recognized, 291
picture, 327 recognized by a Finite State machine,
Kayal, N 184
picture, 284 recognized by Turing machine, 291
Kleene star, 64, 145, 147, 354 regular, 213–219
regular expression, 205 reversal, 147
Kleene’s fixed point theorem, 118 language decision problem, 291
Kleene’s theorem, 206–210 language recognition problem, 291
Kleene, S, 35 last in, first out (LIFO) stack, 236
left inverse, 359 Nearest Neighbor problem, 296, 298
leftmost derivation, 151 next state, 5, 8
LEGO, 5 next tape symbol, 5
length, 162 next-state function, 8, 179
length of a string, 354 nondeterministic Finite State machine,
Life, Game of, 46–49 192
LIFO stack, 236 NFSM, see nondeterministic Finite State ma-
light chine
Halt, 5 node, 161
Linear Divisibility problem, 315 nondeterminism, 189–203
Linear Programming language decision prob- for Finite State machines, 192
lem, 298, 324 for Turing machines, 305
Linear Programming optimization problem, nondeterministic Finite State machine, 192
323 accept string, 193, 196
Lipton’s Thesis, 293 computation, 192, 196
Longest Path problem, 315, 337 configuration, 192, 195
loop, 162 convert to a deterministic machine, 198
LOOP program, 53–58 ε moves, 194
ε transitions, 194
machine halting configuration, 192, 196
state, 9 initial configuration, 192, 195
map, see function input string, 192, 195
Marriage problem, see Drummer problem language of, 193
Matching problem, 335 language recognized, 193
matching, three dimensional, 282 reject string, 193, 196
Max-Flow problem, 323 nondeterministic Pushdown machine, 240–
McCarthy’s 91 function, 30 242
memoization, 71 nondeterministic Turing machine
memory, 4 accepting state, 306
metacharacter, 151, 171, 204 decided language, 307
minimization, 33 definition, 306
minimization of a Finite State machine, 225– instruction, 306
235 recognized language, 307
minimization, unbounded, 35 rejecting state, 306
Minimum Spanning Tree problem, 280 transition function, 306
modulus, 345 nonterminal, 151
Morse code, 167 NP, 305–316
µ -recursion (mu recursion), 33 NP complete, 325–331
µ recursive function, 35 basic problems, 327
multigraph, 162 NP hard, 326
multiset, 282 NSPACE, 341
Musical Chairs, 78 NTIME, 341
numbering, 73
n -distinguishable states, 225 acceptable, 73
n -indistinguishable states, 226
Naur, P Ω, Big Omega, 266
picture, 170 o , omicron, 266
one-to-one function, 62, 358 predecessor function, 6
onto function, 62, 358 prefix of a string, 355
open walk, 162 present state, 5, 8
optimization problem, 290 present tape symbol, 5
oracle, 111–117 Prime Factorization problem, 284, 290, 337
computably enumerable in, 117 primitive recursion, 21–30, 35
oracle Turing machine, 112 arity, 23
order of growth, 259–275 primitive recursive functions, 24
function, 263 private key, 345
Hardy hierarchy, 268 problem, 289
ouroboros, 84 decision, 291
output, from a function, 356 function, 290
Halting, 93, 94
P, 298–305 language decision, 291
P hard, 317 language recognition, 291
P versus NP, 331–336 optimization, 290
pairing function, 69, 70 search, 291
Paley, W unsolvable, 94
picture, 127 problem miscellany, 275–289
palindrome, 14, 146, 240, 355 problem reduction, 325
paradox problems
Aristotle’s, 61 tractable, 269
Galileo’s, 61 unsolvable, 107
Zeno’s, 65 product construction, 214
parameter, 88 production, 152
Parameter theorem, 88 production in a grammar, 150
parametrization, 88–90 program, 290
parametrizing, 88 projection function, 24, 35
parse tree, 151 propositional logic
partial function, 10, 357 atom, 285
partial recursive function, 35 pseudopolynomial, 272, 274
Partition problem, 283, 291, 314, 328, 329, 338 public key, 345
path, 162 Pumping lemma, 220
perfect number, 96 pumping length, 220
Péter, R Pushdown automata, see pushdown machine
picture, 49 Pushdown machine, 235–244
Petersen graph, 166 halting, 237
pipe symbol, | , 150 input alphabet, 236
planar graph, 167, 278 nondeterministic, 240–242
pointer, in C, 121 stack alphabet, 236
polynomial time, 299 transition function, 236, 237
polynomial time reducibility, 317
polytime, 299 Quantum Computing, 301
polytime reduction, 316 Quantum Supremacy, 301
power of a language, 147 Quantum Supremacy, 301
power of a string, 355 quine, 128
powerset construction, 198 Quine’s paradox, 376
r.e. set, see computably enumerable set relation, computable, 11
Radó, T replication of a string, 355
picture, 133 representation, of a problem, 293
RAM, see Random Access machine restriction, 357
Random Access machine, 270 reversal of a language, 147
range of a function, 357 reversal of a string, 355
RE, computably enumerable sets, 304 rewrite rule, 150, 152
reachable vertex, 163, 279 Rice’s theorem, 102–107
read/write head, 4 right inverse, 359
recognize a language, 147 right linear, 203
recognized Ritchie, D, 53
by a nondeterministic Finite State ma- picture, 53
chine, 193 Rivest, R
recognized language, 291 picture, 345
of a Finite State machine, 184 RSA Encryption, 344–349
of a nondeterministic Turing machine,
307 s -m-n theorem, 88
recursion, 21–37 same behavior, functions with, 102
Recursion theorem, 118 same order of growth, 265
recursive function, 11, 35 SAT , see Satisfiability, 292, 309
recursive set, 11 Satisfiability problem, 279, 288, 309, 319, 320,
recursively enumerable set, see computably 326, 340
enumerable set as a language recognition problem, 292
reduces to, 112 on a nondeterministic Turing machine,
reducibility 308
Cook, 321 satisfiable, 279
Karp, 317 Saxena, N
polynomial time, 317 picture, 284
polytime, 317 schema of primitive recursion, 23
polytime many-one, 317 Science United, 290
polytime mapping, 317 search problem, 291
reduction self reproducing program, 128
from the Halting problem, 99 self reproduction, 127–131
reduction function, 317 self-reference, 376
Reflections on Trusting Trust, 131 semicomputable set, 107
regex, 244 semidecidable set, 107
regular expression, 203–213 semidecide a language, 147
extended, 244 semiprime, 284
in practice, 244–252 Semiprime problem, 314
operator precedence, 205 set
regex, 244 c.e., 107
semantics, 205 cardinality, 63
syntax, 205 computable, 11, 107
regular language, 213–219 computably enumerable, 107–111
reject an input, 184, 193, 196 countable, 64
rejecting state, 13, 306 countably infinite, 64
rejects, 180 decidable, 107
decider, 11 next, 5
equinumerous, 63 present, 5
finite, 64 reject, 13
index, 103 rejecting, 306
infinite, 64 start, 6
oracle, 111–117 unreachable, 106
r.e., see computably enumerable set working, 12
recursive, 11 state machine, 9
recursively enumerable, see computably states, 4
enumerable set distinguishable, 226
reduces to, 112 n -distinguishable, 225
semicomputable, 107 n -indistinguishable, 226
semidecidable, 107 set of, 8
T equivalent, 113 Stator Square, 386
Turing equivalent, 113 STCON problem, see Vertex-to-Vertex Path
uncountable, 77 problem
undecidable, 94 step, 8, 184
Set Cover problem, 322 store, of a machine, 4
Shamir, A string, 145, 354–355
picture, 345 concatenation, 354
Shannon, C decomposition, 355
picture, 43 empty, 8, 354
Shortest Path problem, 277, 297, 298, 317 length, 354
∼, asymptotically equivalent, 274 power, 355
simple graph, 161 prefix, 355
SPACE, 341 replication, 355
span a graph, 280 reversal, 355
spanning subgraph, 280 substring, 355
st -Connectivity problem, see Vertex-to-Vertex suffix, 355
Path problem string accepted
st -Path problem, see Vertex-to-Vertex Path by deterministic Finite State machine,
problem 180, 184
stack, 235 by nondeterministic Finite State ma-
alphabet, 236 chine, 193, 196
bottom, ⊥, 236 string rejected, 180
LIFO, Last-In, First-Out, 236 String Search problem, 300
pop, 236 subgraph, 163
push, 236 induced, 163
Start button, 5, 180 Subset Sum problem, 282, 291, 298, 315, 338
start state, 6, 179 substring, 355
Pushdown machine, 236 Substring problem, 321
start symbol, 152 successor function, 12, 21, 24, 35
state, 179 suffix of a string, 355
accept, 13 surjection, 358
accepting, 179, 306 symbol, 8, 145, 354
final, 179 action, 8
halting, 12 current, 8
input, 8 control, 5
next, 8 CPU, 5
syntactic category, 151 current symbol, 8
decidable, 291
T equivalent, 113 decides a set, 11
T reducible, 112 deciding a language, 304
table, transition, 7 definition, 8
tape, 4 description number, 73
tape alphabet, 8 deterministic, 8
tape symbol, 8 for addition, 6
blank, 5 function computed, 9
terminal, 151 Gödel number, 73
tetration, 32 index number, 73
Thompson, K input symbol, 8
picture, 131 instruction, 5, 8
Three Dimensional Matching problem, 314 language accepted, 291
Three-dimensional Matching problem, 282 language decided, 13, 291
time taken by a machine, 270 language recognized, 291
token, 145, 354 multitape, 21
total function, 10, 357 next state, 5, 8
Towers of Hanoi, 26 next symbol, 5, 8
tractable, 268–269 next-state function, 8
trail, 162 nondeterminism, 305
transformation function, see reduction func- numbering, 73
tion palindrome, 14
transition function, 8, 179, 306 present state, 5, 8
extended, 184, 196 present symbol, 5
graph of, 7 rejecting state, 13
Pushdown machine, 236 simulator, 37–41
table of, 7 tape alphabet, 8
transition graph, 7 transition function, 8
transition table, 7 universal, 84–86
Traveling Salesman problem, 189, 276, 308, with oracle, 112
322, 324, 328, 337 Turing reducible, 112, 316
tree, 162, 280 Turing, A
Triangle problem, 303 picture, 3
triangular numbers, 25 Turnpike problem, 315
truth table, 41, 279 two-sided inverse, 359
Turing equivalent, 113
Turing machine, 3–14 unbounded minimization, 33
accept a language, 106 unbounded search, 34
accepting a language, 304 uncountable, 77
accepting state, 13 undecidable, 94
action set, 8 Unicode, 181, 380
action symbol, 8 uniformity, 86–87
computation, 9 Universal Turing machine, 84–86
configuration, 8 universality, 83–92
unpairing function, 69, 70
unreachable state, 106
unsolvability, 107
unsolvable problem, 94, 107
use-mention distinction, 120
value, of a function, 356

vertex, 161
reachable, 163, 279
vertex cover, 280
Vertex Cover problem, 280, 288, 322
Vertex cover problem, 327
Vertex to Vertex Path problem, 331
Vertex-to-Vertex Path problem, 279, 298, 317
von Neumann, J
picture, 46
walk, 162
walk length, 162
weight, 162
weighted graph, 162
well-defined, 356, 357
wire, 300
word, see string
working state, 12
⊢, yields
Finite State machine, 184
for Turing machines, 9
nondeterministic Finite State machine,
192
Zeno’s Paradox, 65
zero function, 24, 35
Zoo, Complexity, 343

Book

Uploaded by

Copyright:

Available Formats

Book

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Book

Uploaded by

Copyright:

Available Formats

Theory of Computation

Greek letters with pronounciation

Cover photo: Bonus Bureau, Computing Divison, 1924-Nov-24.

The Theory of Computation is a wonderful thing. It is beautiful. It has deep

Sections Reading Notes

Teach concepts, not tricks.

[W]hile many distinguished scholars have embraced [the Jane Austen

The power of modern programming languages is that they are expressive,

Of what use are computers? They can only give answers.

III Languages 145

Definition Based on these reflections, Turing pictured a box containing a mecha-

and can write the output to the tape.

Step Configuration Step Configuration

Padd = {q 0 BBq 1 , q 0 1Rq 0 , q 1 B1q 1 , q 1 11q 2 , q 2 BBq 3 , q 2 1Lq 2 ,

Step Configuration Step Configuration

And here is the corresponding table and graph for Padd .

Otherwise there will be only one such instruction, by determinism. There

⟨q 0 , 1, ε, 11⟩ ⊢ ⟨q 0 , 1, 1, 1⟩ ⊢ ⟨q 0 , 1, 11, ε⟩ ⊢ ⟨q 0 , B, 111, ε⟩ ⊢ ⟨q 1 , 1, 11, ε⟩

similar.) But a Turing machine simply manipulates characters and we externally

1.9 Definition A computable function, or recursive function,† is a total or partial

{q 0 BBq 4 , q 0 0Rq 0 , q 0 1Rq 1 , q 1 BBq 4 , q 1 0Rq 2 , q 1 1Rq 0 , q 2 BBq 4 , q 2 0Rq 0 , q 2 1Rq 3 }

(a) 11 (b) 1011 (c) 110 (d) 1101 (e) ε

✓ 1.24 Produce a doubler, a Turing machine that computes f (x) = 2x .

alphabet Σ = { 0, 1, B } that takes in two numbers represented in unary and

A number of mathematicians proposed formalizations. One was

2.11 Let f , д : N → N be computable functions that may be either total or partial

numbers your bills as 1, 3, 5, . . . At each step he buys your lowest-numbered

Primitive recursion Grade school students learn addition and multipli-

produced a more elegant definition. Here is the formula for addition,

This is definition by recursion, since ‘plus’ recurs in its definition.†

3.1 Example The expansion of product(2, 3) reduces to a sum of three 2’s.

product(2, 3) = plus(product(2, 2), 2)

We are interested in Grassmann’s definition because it is effective; it translates

this code exactly fits the definition of plus.

3.2 Definition A function f is defined by the schema‡ of primitive recursion from

The arity bookkeeping is that pred has no x i ’s so д is a function of zero-many

(a) Find f (0) and f (1). (b) Try to find f (2).

(a) Find F (0), . . . F (10).

(a) Find t(0), . . . t(10).

3.16 Consider this recurrence.

(a) Find d(0), . . . d(5).

(a) Compute the values for n = 1, . . ., 10.

the schema of primitive recursion. Use Euclid’s method to compute these.

and two-input functions.

(b) Equality predicate: equal(x, y) = 1 if x = y and 0 otherwise.

whose output otherwise is fˆ(x) = f (x) + 1. Then fˆ is a total function that in a

This is a compelling pattern.

4.1 Definition This is the hyperoperation H : N3 → N.

4.2 Lemma H0 (x, y) = y + 1, H1 (x, y) = x + y , H2 (x, y) = x · y , H3 (x, y) = x y .

This is a power tower. To evaluate these, recall that in exponentiation the

However, hyperoperation’s recursion line

H(n, x, y) = H(n − 1, x, H(n, x, y − 1))

does not fit the form of primitive recursion.

f (x 0 , ... , x k −1 , y) = h(f (x 0 , ... , x k −1 , y − 1), x 0 , ... , x k −1 , y − 1)

This is unbounded search: we have in mind the case that д is mechanically

Now, do the search with f (® x, y) = 0 .

and a way to compute the output of y 7→ x 2y 2 + x 1y + x 0 .

(let ([y 0])

Unbounded search is a theme in the Theory of Computation. For instance, we

{q 0 BRq 1 , q 0 1Rq 1 , q 1 BRq 2 , q 1 1Rq 2 , q 2 BLq 3 , q 2 1Lq 3 , q 3 BLq 4 , q 3 1Lq 4 }