Theoretical Computer Science An Introduction
Theoretical Computer Science An Introduction
Computer
Science
An Introduction
Markus Bläser
Universität des Saarlandes
Draft—February 10, 2015 and forever
2
c Markus Bläser 2007–2015
Part I
Computability
3
4
c Markus Bläser 2007–2015
0 Prelude: Sets, relations, words
0.1 Sets
A set is a “collection of objects”. While this is a rather naive definition that
has some pitfalls, it is sufficient for our needs here. Axiomatic set theory is
beyond the scope of this lecture. A set A1 that contains the numbers 2, 3,
and 5 is denoted by A1 = {2, 3, 5}. 2, 3, and 5 are the elements of A. We
write 2 ∈ A1 to indicate that 2 is an element of A1 and 4 ∈ / A1 to indicate
that 4 is not. If a set A contains only a finite number of elements, then we
call it finite and the number of elements is the size of A, which is denoted
by |A|.
Infinite sets cannot be denoted like this, we denote them for instance as
{0, 1, 2, 3, . . . } and hope for your intuition to fill in the remaining numbers.
The last set, as you already guessed, is the set of natural numbers. We will
also use the symbol N for it. Note that 0 ∈ N.
A set B is a subset of A if every element of B is also an element of A.
In this case, we write B ⊆ A. There is one distinguished set that contains
no elements, the empty set; we denote it by ∅. The empty set is a subset of
every other set.
There are various operations on sets. The union A ∪ B of two sets A and
B is the set that contains all elements that are in A or in B. The intersection
A ∩ B is the set that contains all elements that are simultaneously in A and
in B. For instance, the union of {2, 3, 5} and {2, 4} is {2, 3, 4, 5}, their
intersection is {2}. The cartesian product A × B is the set of ordered tuples
{(a, b) | a ∈ A, b ∈ B}. The cartesian product of {2, 3, 5} and {2, 4} consists
of the six tuples {(2, 2), (2, 4), (3, 2), (3, 4), (5, 2), (5, 4)}.
0.2 Relations
Let A and B be two sets. A (binary) relation on A and B is a subset
R ⊆ A × B. If (a, b) ∈ R, then we will say that a and b stand in relation R.
Instead of (a, b) ∈ R, we will sometimes write aRb. While this looks weird
when the relation is called R, let us have a look a the following example.
Example 0.1 R1 = {(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)} ⊆ {1, 2, 3} ×
{1, 2, 3} is a relation on {1, 2, 3}.
5
6 0. Prelude: Sets, relations, words
Exercise 0.1 Show that if R is an order, then R is acyclic, i.e., there does
not exist a sequence a1 , . . . , ai of pairwise distincts elements such that
Example 0.2 R3 = {(1, 1), (2, 2), (3, 3)} is the equality relation on {1, 2, 3}.
c Markus Bläser 2007–2015
0.3. Functions 7
0.3 Functions
A relation f on A × B is called a partial function if for all a ∈ A there is at
most one b ∈ B such that (a, b) ∈ f . If such a b exists, we also write f (a) = b.
If such a b does not exist, we say that f (a) is undefined. Occasionally, we
write f (a) = undefined, although this not quite correct, since it suggests
that “undefined” is an element of B. dom f denotes the set of all a such that
f (a) is defined. f is called total if dom f = A.
A total function f is called injective if for all a, a0 ∈ A, f (a) = f (a0 )
implies a = a0 . The image im f of a function is the set {f (a) | a ∈ A}. f is
called surjective if im f = B.
A total function f is called bijective if it is injective and surjective. In
this case, for every a ∈ A there is exactly one b ∈ B such that f (a) = b. In
this case, we can define the inverse f −1 of f , which is a function B → A:
f −1 (b) = a if f (a) = b.
0.4 Words
Let Σ be a finite nonempty set. In the context of words, Σ is usually called
an alphabet. The elements of Σ are called symbols or letters. A (finite)
word w over Σ is a finite sequence of elements from Σ, i.e, it is a function
w : {1, . . . , `} → Σ for some ` ∈ N. ` is called the length of w. The length
of w will also be denoted by |w|. There is one distinguished word of length
0, the empty word. We will usually denote the empty word by ε. Formally
it is the sequence ∅ → Σ.
c Markus Bläser 2007–2015
8 0. Prelude: Sets, relations, words
{1, . . . , ` + k} → Σ
(
w(i) if 1 ≤ i ≤ `
i 7→
x(i − `) if ` + 1 ≤ i ≤ ` + k.
u1 u2 . . . un ≤R
lex v1 v2 . . . vn if ui Rvi where i = min{1 ≤ j ≤ n | uj 6= vj }
or i does not exist.
Show that ≤R
lex is indeed a total order.
c Markus Bläser 2007–2015
1 Introduction
During the last year you learnt what a computer can do: how to model a
problem, how to develop an algorithm, how to implement it and how to test
your program. In this lecture, you will learn what a computer cannot do.
Not just because you are missing some particular software or you are using
the wrong operating system; we will reason about problems that a computer
cannot solve no matter what.
One such task is verification: Our input is a computer program P and
an input/output specification S.1 S describes the desired input/output be-
haviour. We shall decide whether the program fulfills the specification or
not. Can this task be automated? That means, is there a computer pro-
gram V that given P and S returns 1, if P fulfills the specification and 0
otherwise? V should do this correctly for all possible pairs P and S.
Let us consider a (seemingly) easier task. Given a program P , decide
whether P returns the value 0 on all inputs or not. That means, is there a
program Z that given P as an input returns 1 if P returns the value 0 on all
inputs and returns 0 if there is an input on which P does not return 0. How
hard is this task? Does such a program Z exists? The following program
indicates that this task is very hard (and one of the goals of the first part of
this lecture is to show that it is impossible in this general setting).2
Program 1 expects four natural numbers as inputs that are initially stored
in the variables x0 , . . . , x3 . We do not specify its semantic formally at this
point, but I am sure that you understand what Program 1 does. This pro-
gram returns 1 on some input if and only if there are natural numbers x0 > 2
and x1 , x2 , x3 ≥ 1 such that xx1 0 + xx2 0 = xx3 0 . The famous Fermat’s last theo-
rem states that such four numbers do not exist. It took almost four hundred
years until a valid proof of this conjecture was given.
9
10 1. Introduction
Program 1 Fermat
Input: x0 , . . . , x3
1: if x0 ≤ 2 then
2: return 0
3: fi
4: if x1 = 0 or x2 = 0 or x3 = 0 then
5: return 0
6: fi
x x x
7: if x1 0 + x2 0 = x3 0 then
8: return 1
9: else
10: return 0
11: fi
for n ≥ 3 and x, y, z > 0. The case n = 1 is of course trivial and for n = 2, the
equation is fulfilled by all Pythagorean triples. Although it was always called Fer-
mat’s last theorem, it was an unproven conjecture written by Fermat in the margin
of his copy of the Ancient Greek text Arithmetica by Diophantus. This note was
discovered posthumously. The last part of this note became famous: “[...] cuius
rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet”
(I have discovered a truly marvelous proof of this proposition. This margin is too
narrow to contain it.) Fermat’s last theorem was finally proven in 1995 by Andrew
Wiles with the help of Richard Taylor.
c Markus Bläser 2007–2015
2 WHILE and FOR programs
2.1 Syntax
Let us start with defining the syntax of WHILE programs. WHILE programs
are strings over some alphabet. This alphabet contains
variables: x0 , x1 , x2 , . . .
constants: 0, 1, 2, . . .
key words: while, do, od
other symbols: :=, 6=, ; , +, −
Note that every variable is a symbol on its own and so is every constant.
Also “:=” is treated as one symbol, we just write it like this in reminiscence of
certain programming languages. (No programming language discrimination
is intended. Please do not send any complaint emails like last year.)
xi := xj + xk or
xi := xj − xk or
xi := c,
where i, j, k ∈ N and c ∈ N.
(a) while xi 6= 0 do P1 od or
(b) P1 ; P2
11
12 2. WHILE and FOR programs
We call the set of all WHILE programs W. We give this set a little more
structure. The set of all WHILE programs that consist of only one simple
statement is called W0 . We inductively define the sets Wn as follows:
xi := xj + xk or
xi := xj − xk or
xi := c,
where i, j, k ∈ N and c ∈ N.
(a) for xi do P1 od or
(b) P1 ; P2
2.2 Semantics
A program P gets a number of inputs α0 , . . . , αs−1 ∈ N. The input is stored
in the variables x0 , . . . , xs−1 . The output of P is the content of x0 after the
execution of the program. The set X = {x0 , x1 , x2 , . . . } of possible variables
is infinite, but each WHILE or FOR program P always uses a finite number
of variables. Let ` = `(P ) denote the largest index of a variable in P . We
always assume that ` ≥ s − 1. A state is a vector S ∈ N`+1 . It describes the
c Markus Bläser 2007–2015
2.2. Semantics 13
is such a program.1 The while loop never terminates and there is no state
that can be reached since there is no “after the execution of the program”.
1
The assignment within the loop is necessary because the empty program is no valid
WHILE program. There is no particular reason for this, we just defined it like this.
c Markus Bläser 2007–2015
14 2. WHILE and FOR programs
(r)
does not terminate, or the ith position in ΦP1 (S) equals 0.2 Then
(
(r) (r)
ΦP1 (S) if r exists and ΦP1 (S) is defined
ΦP (S) =
undefined otherwise
(b) If P is P1 ; P2 for WHILE programs P1 and P2 , then
(
ΦP2 (ΦP1 (S)) if ΦP1 (S) and ΦP2 (ΦP1 (S)) are both defined
ΦP (S) =
undefined otherwise
c Markus Bläser 2007–2015
2.2. Semantics 15
whenever all arising terms are defined. If one of them is not defined, then
both ΦP2 (ΦP1 (S)) and ΦP20 (ΦP10 (S)) are not defined, which is exactly what
we want. In the equation above, we used the induction hypothesis two times.
c Markus Bläser 2007–2015
16 2. WHILE and FOR programs
Exercise 2.3 Show that every FOR loop can be simulated by a WHILE loop.
(Simulation here means that for every FOR program P of the form for xi
do P1 od we can find a WHILE program Q such that ΦP = ΦQ , i.e., both
programs compute the same function.)
c Markus Bläser 2007–2015
3 Syntactic sugar
The languages WHILE and FOR consist of five constructs. We now want to
convince ourselves that we are still able to compute every function Ns → N
that JAVA or C++ could compute. (And no, we are not able to display
fancy graphics.)
3.2 Assignments
WHILE does not contain assignments of the form xi := xj . But this can be
easily simulated by
1: xk := 0;
2: xi := xj + xk
where xk is a new variable.
Exercise 3.1 The evaluation strategy above is call by value. How do you
implement call by reference?
17
18 3. Syntactic sugar
3.5 Arrays
Built-in types like int in C++ or JAVA can only store a limited amount of
information. But we can get as many variables of this type as we want by us-
ing dynamic arrays (or memalloc, if you prefer that). WHILE programs only
have a finite number of variables but each of them can store an arbitrarily
large amount of information. In this way, we can simulate arrays.
Lemma 3.1 There are FOR computable functions h., .i : N2 → N and πi :
N → N, i = 1, 2, such that
πi (hx1 , x2 i) = xi
c Markus Bläser 2007–2015
3.5. Arrays 19
While you can forget about the rest of the construction of arrays once
you believe that we can simulate them, this pairing functions is essential for
later chapters and you should not forget about its properties.
c Markus Bläser 2007–2015
20 3. Syntactic sugar
can convert it into a true WHILE/FOR program. The idea of the algorithm
above is simple. If we want to extract ai for i > 0, then we will get it via
π1 ◦ π2 ◦ · · · ◦ π2 (A).
| {z }
n−i−1
π2 ◦ · · · ◦ π2 (A).
| {z }
n−1
c Markus Bläser 2007–2015
3.6. Further exercises 21
In the first for loop, we find the position of the element that we want
to change as we did before. In addition, we also store the elements that we re-
move from x0 in x1 . Assume that, for instance, x0 equals hai , hai−1 , . . . , ha2 , ha1 , a0 ii . . .ii
and x1 equals hai+1 , hai+2 , . . . , han−2 , han−1 , 0ii . . .ii. We replace ai by b. If
i = 0, then ai is the remaining element in x0 and we can just overwrite it
with b. If i > 0, then we throw ai away by computing π2 (x0 ) and replace it
by b via the pairing function. Note the 0 appended to the end of x1 . This
makes a case distinction redundant when reconstructing A in the second for
loop; we can always extract the next element from x1 by applying π1 . (One
could also make this convention for A.) In this second loop, we insert the
elements that were removed from x0 back into x0 . We extract them one by
one from x1 and add them to x0 by using the pairing function.
Exercise 3.5 A stack is a data structure that stores some objects, here our
objects will be natural numbers. We can either push a number onto the stack.
This operation stores the number in the stack. Or we can pop an element
from the stack. This removes the element from the stack that was the last to
be pushed onto the stack among all elements still in the stack. If the stack is
empty and we want to pop an element from the stack, this will result in an
error. So it works like a stack of plates where you can only either remove the
top plate or put another plate on the top.2 There is usually also a function
isempty that allows you to check whether a stack is empty or not.
1. How do you store a stack of natural numbers in one natural number?
Exercise 3.7 The following two statements are also useful. Explain how to
simulate them in simple WHILE.
1. Input: v1 , . . . , vs declares v1 , . . . , vs as the input variables.
c Markus Bläser 2007–2015
22 3. Syntactic sugar
Nk → N
(x1 , . . . , xk ) 7→ px1 1 · px2 2 · · · pxk k
is an injective mapping
c Markus Bläser 2007–2015
A Ackermann function
Chapters that are numbered with latin characters instead of numbers are for
your personal entertainment only. They are not an official part of the lecture,
in particular, not relevant for any exams. But reading them does not hurt
either . . .
It is clear that there are functions that are WHILE computable but not
FOR computable, since FOR programs can only compute total functions but
WHILE programs can compute partial ones. Are there total functions that
are WHILE computable but not FOR computable? I.e. are WHILE loops
more powerful than FOR loops? The answer is affirmative.
A.1 Definition
The Ackermann function is a WHILE computable but not FOR computable
total function, which was first published in 1928 by Wilhelm Ackermann, a
student of David Hilbert. The so called Ackermann-Péter-Function, which
was defined later (1955) by Rózsa Péter and Raphael Robinson has only two
variables (instead of three).
The Ackermann function is the simplest example of a well defined total
function that is WHILE computable but not FOR computable, providing
a counterexample to the belief in the early 1900s that every WHILE com-
putable function was also FOR computable. (At that time, the two concepts
were called recursive and primitive recursive.) It grows faster than an expo-
nential function, or even a multiple exponential function. In fact, it grows
faster than most people (including me) can even imagine.
The Ackermann function is defined recursively for non-negative integers
x, y by
a(0, y) = y + 1
a(x, 0) = a(x − 1, 1) for x > 0
a(x, y) = a(x − 1, a(x, y − 1)) for x, y > 0
23
24 A. Ackermann function
For x = 2, we have
c Markus Bläser 2007–2015
A.3. Some useful facts 25
= 22 · a(3, y − 2) + 32
= 22 · (2 · a(3, y − 3) + 3) + 3 · (1 + 2)
= 23 · a(3, y − 3) + 3 · (1 + 2 + 22 )
..
.
y−1
X
y
= 2 · a(3, 0) + 3 · 2i
k=0
y y
= 2 · a(3, 0) + 3 · (2 − 1)
= 2y · a(2, 1) + 3 · 2y − 3
= 2y · 5 + 3 · 24 − 3
= 2y+3 − 3.
2
..
.
Exercise A.1 Show that a(4, y) = |22{z } −3.
y+3
Lemma A.3 The function value is strictly greater then its second argument,
i.e., ∀x, y ∈ N
y < a(x, y).
a(0, y) = y + 1 > y.
c Markus Bläser 2007–2015
26 A. Ackermann function
where the first inequality follows from the induction hypothesis for x. The
induction step for the inner induction is shown as follows:
Proof. Using Lemma A.4 first and then Lemma A.5, we obtain
c Markus Bläser 2007–2015
A.4. The Ackermann function is not FOR computable 27
In other words, fP (n) bounds the sum of the values of x0 , . . . , x` after the
execution of P in terms of the sum of the values of x0 , . . . , x` before the
execution of P .
If P = xi := 0;, then the size of the ouput cannot be larger than the size of
the output. Hence
fP (n) ≤ n < a(0, n),
c Markus Bläser 2007–2015
28 A. Ackermann function
Proof. Assume that a was FOR computable, then â(k) = a(k, k) is FOR
computable as well. Let P be a FOR program for â. Lemma A.7 tells us
that there is a k such that
c Markus Bläser 2007–2015
A.4. The Ackermann function is not FOR computable 29
c Markus Bläser 2007–2015
4 Gödel numberings
We will deal with two fundamental questions that we will use in the following
chapters. The first one is: How many WHILE programs are there? And the
second one is: How can we feed a WHILE program into another WHILE
program as an input?
There are certainly infinitely many WHILE programs: x0 := 0 is one,
x0 := 0; x0 := 0 is another, x0 := 0; x0 := 0; x0 := 0 is a third one; I guess
you are getting the idea. But there are different “sorts of infinite”.
Recall that the pairing function h., .i is a bijection from N×N → N. Thus
N × N is countable, too.
We will show that the set of all WHILE programs is countably infinite.
By assigning each WHILE program a natural number in a unique way, we
can feed them into other WHILE programs, too. For just proving that the
set of all WHILE programs is countable, we can use any injective function.
But for the purpose of feeding WHILE programs into WHILE programs, this
function should also have some nice properties.
We will construct an injective function göd : W → N, that is, different
WHILE programs get different numbers. But this is not enough, we also
need the following two properties:
1
Recall that injective includes that the function is total
30
31
c Markus Bläser 2007–2015
32 4. Gödel numberings
Proof. We show the statement that for all n, göd(P ) = göd(Q) implies
P = Q for all P, Q ∈ Wn . From this, the assertion of the lemma follows.
Induction base: The statement is clear for all programs in W0 . This shows
the induction base.
Induction step: Now assume that göd(P ) = göd(Q) and assume that P ∈
Wn \ Wn−1 for some n > 0 and Q ∈ Wn . Since n > 0, göd(P ) is either
h3, hi, göd(P1 )ii or h4, hgöd(P1 ), göd(P2 )ii for some programs P1 , P2 ∈ Wn−1 .
We only treat the case göd(P ) = h3, hi, göd(P1 )ii, the other case is an exer-
cise. göd(P ) = göd(Q) in particular implies that π1 (göd(P )) = π1 (göd(Q)).
This shows that göd(Q) = h3, hj, göd(Q1 )ii. But also π2 (göd(P )) = π2 (göd(Q)),
i.e, hi, göd(P1 )i = hj, göd(Q1 )i. But since h., .i is a bijection, this means that
i = j and göd(P1 ) = göd(Q1 ). By the induction hypothesis, P1 = Q1 . Thus
P = Q.
Exercise 4.3 Fill in the missing part in the proof of Lemma 4.3.
The mapping göd associates with every i ∈ G a function ϕgöd−1 (i) . Instead of
ϕgöd−1 (i) we write ϕi . If we associate with every i ∈
/ G a fixed function ϕi , say
the function that is undefined everywhere, we get an infinite sequence of functions
(ϕi )i∈N .
c Markus Bläser 2007–2015
33
c Markus Bläser 2007–2015
5 Diagonalization
Theorem 5.1 The set of all total functions N → {0, 1} is not countable.
Proof overview: The proof will use a technique that is called Cantor’s
diagonal argument. We assume that the set of all total functions N → {0, 1},
call it F , is countable. Then there is a bijection n between F and N, i.e.,
each function in f ∈ F gets a “number” n(f ). We construct a total function
c : N → {0, 1} that differs from every f ∈ F on the input n(f ). c ∈ F by
construction, on the other hand it differs from every f ∈ F on some input.
This is a contradiction.
0 1 2 3 ...
0 f0 (0) f0 (1) f0 (2) f0 (3) ...
1 f1 (0) f1 (1) f1 (2) f1 (3) ...
2 f2 (0) f2 (1) f2 (2) f2 (3) ...
3 f3 (0) f3 (1) f3 (2) f3 (3) ...
.. .. .. .. .. ..
. . . . . .
34
5.1. Proof by “counting” 35
c differs from the fi ’s in the entries that are underlined in the table in
Figure 5.1. Clearly c ∈ F . But this means that there is an index i0 such
that fi0 = c, since {fi | i ∈ N} = F . In particular,
But this is a contradiction since fi0 (i0 ) is a natural number and the equation
x = 1 − x has no integral solution.
c Markus Bläser 2007–2015
36 5. Diagonalization
Overview over alternative proof of Corollary 5.2: We will use the same
diagonalization scheme as in Figure 5.1. The construction becomes explicit,
since we do not use a hypothetical enumeration of all characteristic functions
but an enumeration of all WHILE programs that we already constructed.
c Markus Bläser 2007–2015
5.3. Further exercises 37
c Markus Bläser 2007–2015
6 A universal WHILE program
In this chapter, we will construct the WHILE programs C and U for our
function göd. Assume that we are given an index g ∈ im göd, i.e., a valid
encoding g of a WHILE program P and an m ∈ N. U now has to simulate
P on input m with only a fixed number of variables. The program P has a
fixed number of variables, too, but since U has to be capable of simulating
every WHILE program, there is no a priori bound on the number of variables
in P . Thus U will use an array X to store the values of the variables of P .
Luckily, we do already know how to simulate arrays in WHILE (and even
FOR). Let ` be the largest index of a variable that occurs in P . Then an
array of length ` + 1 is sufficient to store all the values. It is not too hard to
extract this number ` given g. But since any upper bound on ` is fine too,
we just use an array of length g in U . g is an upper bound on ` because of
Exercise 6.1 (and the way we constructed göd).
A simple statement is encoded as h0, hi, hj, kiii (addition), h1, hi, hj, kiii
(subtraction), or h2, hi, cii (initialization with constant). Using π1 , we can
project onto the first component of these nested pairs and find out whether
the statement is an addition, subtraction, or initialization with a constant.
The result that we get by application of π2 then gives us the information
about the variables and/or constants involved. Program 6 shows how to
perform the addition. X stores the array that we need to simulate. When
we plug this routine into U , we might have to rename variables.
Exercise 6.2 Write the corresponding programs for subtraction and initial-
ization with a constant.
38
39
c Markus Bläser 2007–2015
40 6. A universal WHILE program
c Markus Bläser 2007–2015
6.1. Further exercises 41
Exercise 6.4 Modify the program U in such a way that we get the program
C that checks whether a given g is in im göd. Can you achieve that C is a
FOR program?
c Markus Bläser 2007–2015
7 The halting problem
Here we want to know whether a WHILE program halts on its own Gödel
number. While this is not as natural as the regular halting problem, it is a
little easier to prove that it is not decidable. In the next chapter, we formally
show that it is indeed a special case of the halting problem and develop a
general method to show that problems are not decidable.
42
7.2. Recursively enumerable languages 43
Further reading:
– The Turing Archive. www.turingarchive.org
– Andrew Hodges. Alan Turing: The Enigma, Walker Publishing Company, 2000.
Remark 7.3 In condition 1.(b) of the definition above, we can always as-
sume that ϕP (x) is undefined. We can modify P in such a way that whenever
it returns 0, then it enters an infinite loop. Thus on x ∈ L, P halts (and
outputs 1), on x ∈/ L, P does not halt.
Theorem 7.4 The halting problem and the special halting problem are re-
cursively enumerable.
Proof. Let hg, mi be the given input. First we use the WHILE program
C to check whether g ∈ im göd. If not, then we enter an infinite loop. If
yes, then we simulate g on m using the univeral WHILE program U . It is
easy to see that this program terminates if and only if g encodes a WHILE
program and göd−1 (g) halts on m. If it terminates, then we return 1.
c Markus Bläser 2007–2015
44 7. The halting problem
Remark 7.5 The set that corresponds to the characteristic function c con-
structed in the alternative proof of Corollary 5.2 is not recursively enumer-
able, since we diagonalized against all WHILE programs not only those that
compute total functions.
1. L ∈ REC.
2. L, L̄ ∈ RE.
Proof. For the “⇒” direction, note that L ∈ REC implies L̄ ∈ REC and
that REC ⊆ RE.
For the other direction, note that there are WHILE programs P and P̄
that halt on all m ∈ L and m ∈ / L. So either P or P̄ halts on a given m.
The problem is that we do not know which. If we run P first then it might
not halt on m ∈ L̄ and we never have a chance to run P̄ on m.
The trick is to run P and P̄ in parallel. To achieve this, we modify
our universal WHILE program U . In the while loop of U , we will simulate
one step of P and one step of P̄ . (We need two stacks S1 , S2 to do this,
two instances cur 1 , cur 2 of the variable cur , etc.) Eventually, one of the
programs P or P̄ will halt. Then we know whether m ∈ L or not.
Exercise 7.1 Show that the following three statements are equivalent:1
1. L ∈ RE.
1
This explains the name recursively enumerable: There is a WHILE computable func-
tion, here ϕP , that enumerates L, that means, if we compute ϕP (0), ϕP (1), ϕP (2), . . . , we
eventually enumerate all elements of L.
c Markus Bläser 2007–2015
8 Reductions
Let us come back to the verification problem: Does a given program match a
certain specification? One very general approach to model this is the follow-
ing: Given two encodings i, j ∈ im göd, do the WHILE programs göd−1 (i)
and göd−1 (j) compute the same function, i.e., is ϕgöd−1 (i) = ϕgöd−1 (j) . The
index i is the program that we want to verify, the index j is the specification
that it has to match. So let
One can of course complain that a WHILE program is a very powerful spec-
ification. So we will also investigate the following (somewhat artifical but
undeniably simple) special case:
45
46 8. Reductions
Example 8.2 If P were a program for H (we already know that it does not
exist), then the following P0 would be a program for H0 :
1: x0 := hx0 , x0 i;
2: P
The first line prepares the input for P . H0 expects a Gödel number, say g,
and the first line forms the pair hg, gi, the input for H. We have g ∈ H0
iff hg, gi ∈ H Questions about membership in H0 reduce to questions about
membership in H.
The second important property of reductions is that they should be “easy”:
If g is the Gödelnumber of P , then h4, hgöd(x0 := hx0 , x0 i), gii is the Gödel
number of P0 . Note that göd(x0 := hx0 , x0 i) is a constant that can be hard-
wired into the reduction.
The arguments above show H0 ≤ H.
It computes ϕP = ϕP 0 ◦ f . We have
c Markus Bläser 2007–2015
8.1. Many-one reductions 47
and
x∈ / L0 ⇒ ϕP 0 (f (x)) is undefined ⇒ ϕP (x) is undefined.
/ L ⇒ f (x) ∈
Thus L ∈ RE (as witnessed by P ). The second statement is shown in the
same fashion, we just set ϕP 0 (m) = 0 if m ∈ / L0 and have to replace “is
undefined” in the second series of implications by “= 0”.
c Markus Bläser 2007–2015
48 8. Reductions
N N
L0
L
f
Lemma 8.6 H0 ≤ V0 .
Note that Qi completely ignores its input. If P halts on its own Gödel
number i, then Qi always outputs 0, i.e., ϕQi (x) = 0 for all x. If P does not
halt on i, then Qi never halts, that is, ϕQi is the function that is nowhere
defined. In other words,
c Markus Bläser 2007–2015
8.2. Termination and Verification 49
• i
i 7→ h4, hh2, h0, iii, h4, hi, h2, h0, 0iiiiii (8.1)
Note that the right hand side is not a Gödel number if i is not. (Therefore,
we do not exactly compute f but something similar which does the job, too.)
The above mapping is WHILE computable since the pairing function is. So
the reduction H0 ≤ V0 has an easy explicit description.
But wait, is it really this easy? Well, almost. Note that when concate-
nating two programs with Gödel numbers j and k, then the corresponding
Gödel number is h4, hj, kii. But this is only defined if the program corre-
sponding to j is a simple statement or a while loop. So if P above is not a
simple statement or a while loop, this reduction is not correct.
There are two ways to solve this. Either we can parse P and restructure
the whole program. This can be done in using a stack like the programs C
and U do. Or we can wrap P into a while loop that is executed exactly once:
1: xi := 1;
2: while xi 6= 0 do
3: xi := 0;
4: P
5: od
Note that the variable xi cannot appear in P since i is the Gödel number
of P . This works because our paring function increases monotonically and
every Gödel number contains the indices of the variables occuring in the
corresponding program. In the new program, we now either concatenate
programs where the first statement is a simple statement or a while loop. So
the final reduction is given by an expression similar to (8.1), more precisely
by
i 7→ c(h2, h0, iii, c(h2, hi, 1ii, c(h3, hi, c(h2, hi, 0ii, i)ii)))
where c(a, b) := h4, ha, bii. Thus f is WHILE computable.1
Exercise 8.2 Show that there is a WHILE program that given two encodings
i, j ∈ im göd, constructs the Gödel number of a WHILE program that is the
concatenation of göd−1 (i) and göd−1 (j). In the next chapter, we will see a
more formal way to do this.
c Markus Bläser 2007–2015
50 8. Reductions
Lemma 8.7 V0 ≤ V .
If i ∈
/ H0 , then in step 2 of Ki , P does not halt on i for any value of x0 .
Thus Ki always outputs 0. If i ∈ H0 , then there is a t ∈ N such that M
halts within t steps on i. Thus Ki will output 1 for every input ≥ t.
Thus the mapping
(
göd(Ki ) if i ∈ im göd,
i 7→
y otherwise,
Lemma 8.9 V0 ≤ T .
c Markus Bläser 2007–2015
8.2. Termination and Verification 51
Theorem 8.10 V , V0 , and T are not recursively enumerable nor are their
complements.
To reduce L to L0 :
2. Give a formal proof that your mapping f has indeed the reduc-
tion property.
c Markus Bläser 2007–2015
9 More on reductions
From now on, if i ∈ im göd, we will use ϕi as a synonym for ϕgöd−1 (i) ; this
is only done to simplify notations as a reward for working through eight
chapters so far.
52
9.2. Reductions via the S-m-n Theorem 53
4: xm−1 := ηm−1 ;
.
5: ..
6: x0 := η0 ;
7: Pg
This program first copies the input z, which stands in the variables x0 , . . . , xn−1
into the variables xm , . . . , xm+n−1 . Then it stores y into x0 , . . . , xm−1 . The
values of y are hardwired into Qg,y . Then we run Pg on the input (x, y).
Thus Qg,y computes ϕPg (y, z) but only the entries from z are considered as
inputs.
The function
Snm : (g, y) 7→ göd(Qg,y )
is FOR computable. (We saw how to show this in the last chapter.) The
constructions in the last chapter were built in such a way that we automat-
ically get that if g is not a Gödel number, then (g, y) is not mapped to a
Gödel number either. This mapping above is the desired mapping, since
ϕm+n
g (y, z) = ϕm+n n n n
Pg (y, z) = ϕQg,y (z) = ϕgöd(Qg,y ) (z) = ϕSnm (g,y) (z)
The function f is WHILE computable, we can use C and the clocked version
of the universal Turing machine U . Let e be a Gödel number of f . By the
S-m-n Theorem,
f (g, m) = ϕ2e (g, m) = ϕS11 (e,g) (m)
for all g, m ∈ N. But by construction,
/ im gödTM or göd−1
g ∈ H̄0 ⇐⇒ g ∈ TM (g) does not halt on g
⇐⇒ f (g, m) = 0 for all N
⇐⇒ S11 (e, g) ∈ V0
c Markus Bläser 2007–2015
54 9. More on reductions
be the set of all encodings of Turing machines that compute a function that
is defined for at least c different arguments. Here is potential application:
As the last assignment of the Programmierung 2 lecture, you have to deliver
a program. You still need one point to be qualified for the exam. The TA
claims that your program does not halt on any input and you get no points
for your program. We will show that D1 ∈ / REC. This is good for you, since
it means that the TA will not be able to algorithmically verify his claim.
On the other hand, we will show that D1 ∈ RE, which is again good for
you, since it means that if your program halts on at least one input, you can
algorithmically find this input and maybe get the missing point. . .
Let i ∈ H0 . Then f (i, x) is defined for all x by construction, i.e., the function
x 7→ f (i, x) is total and in particular, its domain has at least c elements.
Thus S11 (e, i) ∈ Dc . Let i ∈ / H0 . Then f (i, x) is not defined for all x by
1
construction. Thus S1 (e, i) ∈ / Dc . The function i 7→ S11 (e, i) is recursive by
the S-m-n theorem (note that e is just a fixed number), thus it is the desired
reduction.
c Markus Bläser 2007–2015
9.3. More problems 55
c Markus Bläser 2007–2015
10 Rice’s Theorem
We saw numerous proofs that certain languages are not decidable. Rice’s
Theorem states that any language L is not decidable if it is defined in se-
mantic terms. This means that whether i ∈ im göd is in L only depends on
ϕi , the function computed by the machine göd−1 (i).
Proof overview: Let f (g, z) = ϕe (g, z). Now the S-m-n Theorem states
that f (g, z) = ϕSn1 (e,g) (z) for all z. If we now set g = e, then we are almost
there: we have e on the left-hand side and Sn1 (e, e) on the right-hand side
which is “almost the same”. If we now replace g, the first argument of f , by
something of the form Sn1 (y, y), then basically the same argument gives the
desired result.
ϕn+1
e (y, z) = ϕnSn1 (e,y) (z) for all z ∈ Nn .
Exercise 10.1 Show that there is a Gödel number j with dom ϕj = {j}.
56
10.1. Recursion Theorem 57
This means that the WHILE program given by the Gödel number g computes
the constant function with value g and in this sense outputs its own source
code.
Assume that ϕe0 (e0 ) is defined. Then f (e0 , e0 ) = ϕe0 (e0 ) is undefined by
construction, a contradiction. But if ϕe0 were undefined, then f (e0 , e0 ) =
ϕe0 (e0 ) = 0, a contradiction again. Thus H cannot be decidable.
This is the set of all minimal WHILE programs (“shortest source codes”) in
the sense that for every g ∈ Min, whenever g 0 computes the same functions
as g, then g ≤ g 0 .
c Markus Bläser 2007–2015
58 10. Rice’s Theorem
Theorem 10.4 (Fixed Point Theorem) For all Turing computable total
functions f : N → N with im f ⊆ im göd and for all n ∈ N \ {0} there is an
e ∈ im göd such that
ϕnf(e) = ϕne .
Exercise 10.2 Show that I is an index set if and only if there is a set F of
WHILE computable functions such that I = {i ∈ im göd | ϕi ∈ F }.
c Markus Bläser 2007–2015
10.3. Rice’s Theorem 59
c Markus Bläser 2007–2015
60 10. Rice’s Theorem
Exercise 10.3 Let I be a recursively enumerable index set. Show that for
all g ∈ I, there is an e ∈ I with dom ϕe ⊆ dom ϕg and dom |ϕe | is finite.
c Markus Bläser 2007–2015
11 Gödel’s incompleteness theorem
Loosely speaking, Gödel’s incompleteness theorem states that there are for-
mulas that are true but we cannot prove that they are true. Formulas here
means quantified arithmetic formulas, i.e., we have formulas with existen-
tial and universal quantifiers over the natural numbers with addition and
multiplication as our operations. “We cannot prove” means that there is no
effective way to show that the formula is true.
61
62 11. Gödel’s incompleteness theorem
1. If s and t are terms, then (s = t) is true if a(s) = a(t) for all assign-
ments a.
c Markus Bläser 2007–2015
11.2. Computability and representability 63
∃h(y1 = y2 + h + 1).
c Markus Bläser 2007–2015
64 11. Gödel’s incompleteness theorem
is a bijection.3
too.
3
i mod j here denotes the unique integer r ∈ {0, 1, . . . , j − 1} such that i = qj + r for
some q.
4
We could also assume that t = 1. Then the induction base would be trivial. But it
turns out that we would have to treat the case t = 2 in the induction step, so we can to
it right away.
5
We can get such integers, the so-called cofactors, via the extended Euclidian algorithm
for computing gcds.
c Markus Bläser 2007–2015
11.2. Computability and representability 65
where πn1 ,N = (m1 , m2 ). Since both mappings above are bijections, their
“composition” πn1 ,...,nt is a bijection, too.
Proof. Assume there are i < j and a prime number p such that p|(1+i·s!)
and p|(1 + j · s!). Thus p|((j − i) · s!). Since 0 ≤ j − i ≤ s and p is prime,
p|s!. From this p|1 follows.
Lemma 11.9 For all numbers a1 , . . . , ak , there are numbers A and S and
a formula M (x, u, v, w) such that M (x, κ, A, S) is true if and only if we
substitute aκ for x.
Proof. Consider
c Markus Bläser 2007–2015
66 11. Gödel’s incompleteness theorem
The variable t denotes the number of times the while loop is executed. The
variables Ai and Si store values that encode the values that xi attains after
each execution of P1 in the while loop. The second line of the definition
of FP ensures that before the first execution, the value of the variable xλ
is yλ , 0 ≤ λ ≤ `. The third line ensures that after the tth execution, the
value of the variable xλ is zλ , 0 ≤ λ ≤ `. The fourth and fifth line ensure
that the first time that xi contains the value 0 is after the tth execution of
the while loop. The remainder of the formula ensures that the values that
x0 , . . . , x` have after the (τ + 1)th execution of the while loop are precisely
6
P1 or P2 might not contain the variable x` . We pad ΦPi to a function N` → N` in the
obvious way.
c Markus Bläser 2007–2015
11.3. Proof systems 67
the values that we get if we run P1 with x0 , . . . , x` containing the values after
τ th execution. Note that the formula M is satisfied by at most one value
for fixed τ , Aλ , and Sλ . This ensure consistency, i.e, even if we Aλ and Sλ
do not contain the values from Lemma 11.9, if the formula FP is satisfied,
then the values stored in Aλ and Sλ , 0 ≤ λ ≤ `, correspond to an execution
of the WHILE program P .
Theorem 11.13 T ∈
/ RE.
c Markus Bläser 2007–2015
68 11. Gödel’s incompleteness theorem
1. P ⊆ N is decidable and
Theorem 11.16 There is no complete proof system for the set of all true
arithmetic formulas T .
c Markus Bläser 2007–2015
12 Turing machines
Turing machines are another model for computability. They were intro-
duced by Alan Turing in the 1930s to give a mathematical definition of an
algorithm. When Turing invented his machines, real computers were still
to be built. Turing machines do not directly model any real computers
or programming languages. They are abstract devices that model abstract
computational procedures. The intention of Alan Turing was to give a for-
mal definition of “intuitively computable”; rather than modeling computers,
he modeled mathematicians. We will see soon that Turing machines and
WHILE programs essentially compute the same functions.
Turing machines are the model for computations that you find in
the textbooks. In my opinion, WHILE programs are easier to un-
derstand; it usually takes some time to get familiar with Turing ma-
chines.
I hope that at the end of this part you will see that it does not really matter
whether one uses Turing machines or WHILE programs. All we need is a
Gödel numbering, a universal Turing machine/WHILE program, and the
ability to compute a Gödel number of the composition of two programs
from the two individual Gödel numbers, i.e. an acceptable programming
system. In theory, computer scientist are modest people.
12.1 Definition
A Turing machine M has a finite control and a number, say k, of tapes. The
finite control is in one of the states from a set of states Q. Each of the tapes
consists of an infinite number of cells and each cell can store one symbol
from a finite alphabet Γ, the tape alphabet. (Here, finite alphabet is just a
fancy word for finite set.) Γ contains one distinguished symbol, the blank
. Each tape is two-sided infinite.1 That is, we can formally model it as a
function T : Z → Γ and T (i) denotes the content of the ith cell. Each tape
has a head that resides on one cell. The head can be moved back and forth
on the tape, in a cell by cell manner. Only the content of the cells on which
the heads currently reside can be read by M . In one step,
1
In some textbooks, the tapes are only one-sided infinite. As we will see soon, this
does not make any difference.
69
70 12. Turing machines
··· 0 1 1 0 0 0 1 1 ···
··· 0 0 1 1 0 0 1 ···
..
··· 1 0 ···
3. M moves each head either one cell to the left, not at all, or one cell to
the right.
δ : Q × Γk → Q × Γk × {L, S, R}k .
c Markus Bläser 2007–2015
12.1. Definition 71
In the beginning, all tapes are filled with blanks. The only exception is
the first tape; here the input is stored. The input of a Turing machine is a
string w ∈ Σ∗ where Σ ⊆ Γ \ {} is the input alphabet. It is initially stored
in the cells 0, . . . , |w| − 1 of the first tape. All heads stand on the cell with
number 0 of the corresponding tape. The Turing machine starts in its start
state q0 and may now perform one step after another as described by the
transition function δ.
0 1
add (back, 1, L) (add, 0, R) (back, 1, L)
back (back, 0, L) (back, 1, L) (stop,, R)
stop — — —
Above, “—” stands for undefined. In the state add, INC goes to the right
replacing every 1 by a 0 until it finds the first 0 or . This 0 or is then
replaced by a 1 and INC enters the state back. In the state back, INC goes
to the left leaving the content of the cells unchanged until it finds the first .
It goes one step to the right and is done.
Instead of a table, a transition diagram is often more understandable.
Figure 12.2 show this diagram for the Turing machine of this exercise. The
states are drawn as circles and an arrow from q to q 0 with the label “α; β, r”
c Markus Bläser 2007–2015
72 12. Turing machines
1;1,L
1;0,R 0;0,L
0;1,L
1,L R
add back stop
Figure 12.2: The transition diagram of the Turing machine INC from Ex-
ample 12.2.
means that if the Turing machine is in state q and reads α, then it goes to
state q 0 , writes β, and moves its head as given by r ∈ {L, S, R}.
c Markus Bläser 2007–2015
12.2. Configurations and computations 73
x0κ = uκ βκ vκ
and
pκ − 1 if rκ = L,
0
pκ = pκ if rκ = S,
pκ + 1 if rκ = R,
x0κ = uκ βκ
and
p0κ = |xκ | + 1.
We denote the fact that C 0 is a successor of C by C `M C 0 . Note that by
construction, each configuration has at most one successor. We denote the
reflexive and transitive closure of the relation `M by `∗M , i.e., C `∗M C 0 iff
there are configurations C1 , . . . , C` for some ` such that C `M C1 `M . . . `M
C` `M C 0 . If M is clear from the context, we will often omit the subscript
M.
A configuration that has no successor is called a halting configuration.
A Turing machine M halts on input w iff SCM (w) `∗M Ct for some halting
configuration Ct . (Note again that if it exists, then Ct is unique.) Otherwise
M does not halt on w. If M halts on w and Ct is a halting configuration,
we call a sequence SCM (w) `M C1 `M C2 `M . . . `M Ct a computation of
M on w. If M does not halt on w, then the corresponding computation is
infinite.
Assume that SCM (w) `∗M Ct and Ct = (q, (p1 , x1 ), . . . , (pk , xk )) is a
halting configuration. Let i ≤ p1 be the largest index such that x1 (i) = .
If such an index does not exist, we set i = 0. In the same way, let j ≥ p1
be the smallest index such that x1 (j) = . If such an index does not exist,
then j = |x1 | + 1. Let y = x1 (i + 1)x1 (i + 2) . . . x1 (j − 1). In other words, y
is the word that the head of tape 1 is standing on. y is called the output of
M on w.
c Markus Bläser 2007–2015
74 12. Turing machines
2
So if we want to compute a function, a 5-tuple is sufficient to describe the Turing
machine. If we want to decide or recognize languages, then we take a 6-tuple. We could
also always take a 6-tuple and simply ignore the accepting states if we compute functions.
c Markus Bläser 2007–2015
13 Examples, tricks, and
syntactic sugar
1; R
0; R
S
erase stop
75
76 13. Examples, tricks, and syntactic sugar
0;0,L|0;0,L
0;0,L|1;1,L
1;1,R| 1,R 1;1,L|0;0,L
0;0,R| 0,R 1;1,L|1;1,L
L| L R| R
copy back stop
R
backn no
0;0,R ,L
;1
1
1;1,L
0;0,L
L
R
backy yes
the state whether the content is zero or not. The Turing machine stops in
the state yes or no.
Exercise 13.1 Construct a Turing machine DEC that decreases the content
of the tape by 1 if the content is > 0.
c Markus Bläser 2007–2015
13.2. Some techniques and tricks 77
yes
stop
DEC COMPARE
no
Figure 13.4: The concatenated Turing machine. The triangle to the left
of DEC indicates that the starting state of DEC is the starting state of the
new Turing machine. The arrow labeled with stop means that whenever
DEC wants to enter the state stop of DEC, it enters the starting state of
COMPARE instead. The lines labeled with yes and no mean that yes and
no are the two halting states of COMPARE. Note that this concatenation
works well in particular because the Turing machines move the head back to
the first cell of the tape.
yes
stop
DEC COMPARE
no
Figure 13.5: The counting Turing machine. DEC is executed; after this,
the content of the tape is compared with zero. If the state no is entered,
the machine enters again the starting state of DEC and decrease the content
again. If the content of the tape is zero, then the Turing machine stops in
the state yes.
13.2.2 Loops
If you concatenate Turing machines “with themselves”, you get loops. For
instance, if you want to have a counter on some tape that is decreased by
one until zero is reached, we can easily achieve this by concatenating the
machines DEC and COMPARE as depicted in Figure 13.5.
c Markus Bläser 2007–2015
78 13. Examples, tricks, and syntactic sugar
is defined by
if
c Markus Bläser 2007–2015
13.3. Syntactic sugar 79
The other case is defined symmetrically. If both are undefined, then ∆((q, q 0 ), γ1 , . . . , γk+k0 )
is undefined, too. On the first k tapes, N behaves like M , on the other k 0
tapes, N behaves like M 0 . If one machine stops, then N does not move its
head on the corresponding tapes anymore and just writes the symbols that
it reads all the time. If the second machine stops, too, then N stops. (Of
course, since N gets its input on the first tape, M 0 is simulated on the empty
tape. If we want to simulate M and M 0 on the same input, then we have to
copy the input from the first tape to the (k + 1)th tape, using for instance
COPY, before N starts with the simulation).
Here is one application: take M to be any machine and M 0 is the machine
from Figure 13.5. Modify the function ∆ such that M 0 is simulated normally,
but M only executes a step when M 0 changes it state from the state no to its
start state. In this way, M executes as many steps as given by the counter
in the beginning. We will need this construction later on.
Once we got even more experienced with Turing machines, we could even
write:
c Markus Bläser 2007–2015
80 13. Examples, tricks, and syntactic sugar
If in doubt . . .
c Markus Bläser 2007–2015
14 Church–Turing thesis
81
82 14. Church–Turing thesis
f
N → N
−1
↓ cod ↓ cod−1
fˆ
{0, 1}∗ → {0, 1}∗
In other words,
g
{0, 1}∗ → {0, 1}∗
↓ cod ↓ cod
ĝ
N → N
Exercise 14.2 Show the following: For every f : N → N and g : {0, 1}∗ →
ˆ
{0, 1}∗ , fˆ = f and ĝˆ = g. (“.̂ is self-inverse.”)
Remark 14.1 The mapping cod is not too natural. For instance, with f :
N → N, we could associate the mapping bin(x) 7→ bin(f (x)). But if cod is a
bijection, then we have the nice property that .̂ is self-inverse.
Exercise 14.3 Show that cod and cod−1 are functions that are easy to com-
pute. In particular:
So the only reason why a WHILE program or Turing machine cannot compute
cod or cod−1 is that they cannot directly store elements from {0, 1}∗ or N,
respectively.
1
This is quite funny. While both WHILE programs and Turing machines are mathe-
matical objects, we write WHILE programs but construct Turing machines.
c Markus Bläser 2007–2015
14.2. GOTO programs 83
Exercise 14.4 Try to get even shorter encodings in this way. What is the
shortest that you can get?
1. xi = xj + xk or
2. xi = xj − xk or
3. xi := c or
4. if xi 6= 0 then goto λ
The semantics of the first three statements is the same as for WHILE
programs. After the µth statement is executed, the program goes on with
the (µ+1)th statement. The only execption is the conditional jump if xi 6= 0
then goto λ. If the content of xi is zero, then we go on with the (µ + 1)th
statement, otherwise, we go on with statement λ. If we ever reach a line that
c Markus Bläser 2007–2015
84 14. Church–Turing thesis
c Markus Bläser 2007–2015
14.3. Turing machines can simulate GOTO programs 85
c Markus Bläser 2007–2015
86 14. Church–Turing thesis
no
q1 COMPARE yes
stop
q2 INC
q3
Figure 14.1: The simulating Turing machine for the example GOTO pro-
gram.
c Markus Bläser 2007–2015
14.5. Church–Turing thesis 87
Corollary 14.5 (Kleene normal form) For every WHILE computable func-
tion f , there are FOR programs P1 , P2 , P3 and a WHILE program of the form
P1 ; while xi 6= 0 do P2 od; P3
that computes f .
1. f is WHILE computable.
2. f is GOTO computable.
3. fˆ is Turing computable.
c Markus Bläser 2007–2015
B Primitive and µ-recursion
Definition B.1 The set of all primitive recursive functions is defined in-
ductively as follows:
1. Every constant function is primitive recursive.
2. Every projection psi : Ns → N (mapping (a1 , . . . , as ) to ai ) is primitive
recursive.
3. The successor function suc : N → N defined by suc(n) = n + 1 is
primitive recursive.
4. If f : Ns → N and gi : Nt → N, 1 ≤ i ≤ s, are primitive recursive, then
their composition defined by
(a1 , . . . , at ) 7→ f (g1 (a1 , . . . , at ), . . . , gs (a1 , . . . , at ))
is primitive recursive.
5. If g : Ns → N and h : Ns+2 → N are primitive recursive, then the
function f : Ns+1 → N defined by
f (0, a1 , . . . , as ) = g(a1 , . . . , as )
f (n + 1, a1 , . . . , as ) = h(f (n, a1 , . . . , as ), n, a1 , . . . , as )
is primitive recursive. This scheme is called primitive recursion.
We want to show that primitive recursive functions are the same as FOR
computable functions. Therefore, we first look at some fundamental func-
tions that appear in FOR programs:
88
B.1. Primitive recursion 89
Above, we did not write down the projections explicitly. The correct definition
looks like this:
add(0, y) = p11 (y),
add(x + 1, y) = suc(p31 (add(x, y), x, y)).
Since this looks somewhat confusing, we will omit the projections in the fol-
lowing.
c Markus Bläser 2007–2015
90 B. Primitive and µ-recursion
In the same way, we can see that the bounded existential quantifier defined
by (
1 if there is an x ≤ n with P (x) = 1
bounded-∃-P (n) :=
0 otherwise
is primitive recursive:
Above, P has only one argument. It is easy to see that for a predicate
with s arguments,
fi (0, a1 , . . . , as ) = gi (a1 , . . . , as ),
fi (n + 1, a1 , . . . , as ) = hi (f1 (n, a1 , . . . , as ), . . . , ft (n, a1 , . . . , as ), n, a1 , . . . , as ),
c Markus Bläser 2007–2015
B.1. Primitive recursion 91
Lemma B.4 Let P be a FOR program with s inputs. Let ` be the largest in-
dex of a variable in P . Then there are primitive recursive functions v0 , . . . , v` :
N`+1 → N such that
for all a0 , . . . , a` ∈ N.
Define u0 , . . . , u` by
uλ (0, a0 , . . . , a` ) = aλ
uλ (n + 1, a0 , . . . , a` ) = v1,λ (u0 (n, a0 , . . . , a` ), . . . , u` (n, a0 , . . . , a` ))
c Markus Bläser 2007–2015
92 B. Primitive and µ-recursion
Lemma B.5 For every primitive recursive function f , there is a FOR pro-
gram P with ϕP = f .
f (0, a1 , . . . , as ) = g(a1 , . . . , as ),
f (n + 1, a1 , . . . , as ) = h(f (n, a1 , . . . , as ), n, a1 , . . . , as ),
then there are programs P and Q that compute h and g, respectively. Now
the following program computes f (a0 , a1 , . . . , as ):
1: x0 := g(a1 , . . . , as );
2: for a0 do
3: x0 := h(x0 , a0 , a1 , . . . , as )
4: od
c Markus Bläser 2007–2015
B.2. µ-recursion 93
B.2 µ-recursion
The µ-operator allows unbounded search.
uλ (µuλ (a0 , . . . , a` ), a0 , . . . , a` )
c Markus Bläser 2007–2015
94 B. Primitive and µ-recursion
everywhere, h1, hs, iii encodes psi and so on. Then we define Gödel numbers for
composition and primitive and µ-recursion. If we have a functions f or arity s
and s functions gi of arity t and i and j1 , . . . , js are their Gödel numbers, then
h3, t, s, i, j1 , . . . , js i is the Gödel number for their composition.
Let θi be the function that is computed by the recursion scheme with Gödel
number i. If i is not a valid Gödel number, then θi is some dummy function, for
instance, the function that is undefined everywhere. Then the sequence (θi )i∈N is
a programming system.
Once we have constructed a universal Turing machine and/or WHILE program,
we will see that it is also universal. It is clearly acceptable, since composition is
directly available in recursion schemes.
c Markus Bläser 2007–2015
15 A universal Turing machine
where
00 if rκ = S
r̂κ = 10 if rκ = L
01 if rκ = R
[., .] denotes (one of) the pairing functions discussed in Section 14.1.3. It is
extended to larger tuples as expected:
The second part of the tuple is a dummy value, the non-existing state s + 1
is used for saying that the value is undefined.
We construct a mapping gödTM from the set of all Turing machines to
{0, 1}∗ by building a large pair consisting of:
95
96 15. A universal Turing machine
Definition 15.1 A mapping g from the set of all Turing machines over the
input alphabet {0, 1} to {0, 1}∗ is called a Gödel numbering if
1. g is injective,
To the right of this block, there is the block corresponding to the (i + 1)th
cells of M , to left the one corresponding to the (i − 1)th cells of M . Between
two such blocks, UTM writes $ as a separator. So the k tapes of M are
“interleaved”.
Of course, UTM has to bring its second tape into this form. In the
beginning, it initializes its second tape by writing the blocks
c Markus Bläser 2007–2015
15.2. A universal Turing machine 97
2. From these values, CTM can compute the size and the number
of tuples encoding the transition function.
Theorem 15.2 There is a Turing machine UTM that, given a pair [g, x]
with g ∈ im gödTM and x ∈ {0, 1}∗ , computes ϕgöd−1 (g) (x).
TM
Exercise 15.1 Show that the constructed Turing machine UTM is correct.
c Markus Bläser 2007–2015
98 15. A universal Turing machine
2. UTM copies the input x from the first tape to the second tape
as described above.
It replaces all # of the first block by ∗.
3. UTM moves the head to the first symbol of the leftmost block.
5. If M := göd−1
TM (g) is supposed to compute a function, then
UTM copies the content of tape 2 that corresponds to the first
tape of M back to tape 1 and stops.
c Markus Bläser 2007–2015
C Kolmogorov Complexity
C.1 Definition
Definition C.1 The Kolmororov complexity K(x) of a string x ∈ {0, 1}∗
is the length of a shortest string y ∈ {0, 1}∗ such that the universal Turing
machine UTM outputs x on input y.
This means, that we measure the number of bits that we need to produce
x. This can be seen as the ultimate compression task. The input y for UTM
is the compressed version of x.
Why not just take the length of a shortest encoding of a Turing machine
that outputs x on the empty word? The problem is that the encoding of
a Turing machine that outputs x on the empty word usually needs |x| + 1
states. Thus the trivial encoding of x would have length Θ(|x| log |x|), which
is not a desaster but also not very nice.
Lemma C.2 There is a constant u such that K(x) ≤ |x| + u for all x ∈
{0, 1}∗ .
1
Here is the place where we need that the length of a pair [a, b] can be bounded nicely
in terms of |a| and |b|.
99
100 C. Kolmogorov Complexity
Input: x = bin(n)
1. Count in binary to x.
The exercise above shows that there are sequences of words whose Kol-
mogorov complexity grows very slowly. Are there sequences whose complex-
ity is close to the upper bound of Lemma C.2?
Pitfall
Lemma C.4 For every natural number n ≥ 1, there is a word wn ∈ {0, 1}n
such that K(wn ) ≥ n.
Exercise C.2 Show that there are 2n (1 − 2−i ) words w in {0, 1}n such that
K(w) ≥ n − i.
c Markus Bläser 2007–2015
C.2. Kolmogorov random strings 101
Corollary C.6 For all universal Turing machines V and V 0 , there is a con-
stant cV,V 0 such that |KV (x) − KV 0 (x)| ≤ cV,V 0 for all x ∈ {0, 1}∗ .
Remark C.7 Above, the Turing machines V and V 0 could even work with
different Gödel numberings. We could even compare Turing machines with
any other acceptable programming system.
Input: bin(n)
4. Print x.
c Markus Bläser 2007–2015
102 C. Kolmogorov Complexity
This Turing machine finds an on input bin(n). Let e be the Gödel number
of this machine. By construction, UTM on input [e, bin(n)] outputs an .
Remark C.10 If there is an > 0 such that |A ∩ {0, 1}n | ≤ (2 − )n for all
n, then no long enough x in A can be Kolmogorov random.
Input: q1 , . . . , qs , log j
c Markus Bläser 2007–2015
C.4. Undecidability of the Kolmogorov Complexity 103
Lemma C.12 For each n + 1 ≤ i ≤ 2n, let si be the length of the sequence
of states between cell i and i + 1. There is an n + 1 ≤ j ≤ 2n such that
sj ≤ t/n, where t is the total number of steps of M on y0n #
Theorem C.13 For every 1-tape Turing machine M that solves the copy
problem, there is an > 0 such that M makes at least · n2 steps in the
worst case on inputs of length n.
Input: bin n
4. Otherwise, print x.
c Markus Bläser 2007–2015
104 C. Kolmogorov Complexity
Exercise C.3 Show that the language {[x, cod(k)] | K(x) = k} is not de-
cidable.
5. Return |z|.
N always terminates since in step 2, we check whether UTM will halt and
since there is always a string of length |x| + O(1) such that UTM outputs
x. By construction, N finds a shortest string y such that UTM on input y
outputs x. The length |y| of this is K(x).
c Markus Bläser 2007–2015
Part II
Complexity
105
16 Turing machines and
complexity classes
SC(x) `M C1 `M · · · `M Ct .
In other words, TimeM (n) measures the worst case behaviour of M on inputs
of length n. Let t : N → N be some function. A deterministic Turing machine
M is t time bounded if TimeM (n) ≤ t(n) for all n.
For a configuration C = (q, (p1 , x1 ), . . . (pk , xk )), Space(C) = max1≤κ≤k |xκ |
is the space used by the configuration. Occassionally, we will equip Turing
machines with an extra input tape. This input tape contains, guess what,
the input x of the Turing machine. This input tape is read-only, that is, the
106
16.1. Deterministic complexity classes 107
Turing machine can only read the symbols but not change them. (Techni-
cally, this is achieved by requiring that whenever the Turing machine reads
a symbol on the input tape it has to write the same symbol.) What is an
extra input tape good for? The space used on the input tape (that is, the
symbols occupied by the input) is not counted in the definition of Space(C).
In this way, we can talk about sublinear space complexity.
L can be recognized with space O(log n). We read the input and for every
0 that we encounter, we increase a binary counter on the work tape by one.
Then we read the input a second time and decrease the counter for every 1.
We accept if in the end, the counter on the work tape is zero. In every step,
we store number ≤ |x| on the work tape. This needs log n bits (on the work
tape).
SC(x) `M C1 `M · · · `M Ct
c Markus Bläser 2007–2015
108 16. Turing machines and complexity classes
S
For a set of functions T , DTime(T ) = t∈T DTime(t). DTimek (T ) is defined
analogously.
The same is done for space complexity: A language L is deterministically
s space recognizable if there is a deterministic Turing machine M such that
L = L(M ) and SpaceM (n) ≤ s(n) for all n. Note that a space bounded
Turing machine might not halt on inputs that are not in L(M ). But we will
see in the next chapter that one can effectively detect when a space bounded
machine has entered an infinite loop. In the same way, a function f is deter-
ministically computable in space s if there is a deterministic Turing machine
M that computes f and SpaceM (n) ≤ s(n) for all n. We will see that for
space bounded computations, also sublinear functions s are meaningful. But
to speak of sublinear space complexity, the input should not be counted. We
will use a Turing machine M with an extra input tape.
Exercise 16.1 Intuitively, it is clear that sublinear time is not very mean-
ingful here.1 Give a formal proof for this. In particular show: Let M
be a deterministic Turing machine. Assume that there is an n such that
M reads at most n − 1 symbols of the input x for each x with |x| = n.
Smare words ∗a1 , . . . , am with |ai | < n for all 1 ≤ i ≤ m such that
Then there
L(M ) = i=1 ai {0, 1} .
δ : Q × Γk → Q × Γk × {L, S, R}k ,
c Markus Bläser 2007–2015
16.2. Nondeterministic complexity classes 109
1;1,R 0;1,R
0;0,R 1;0,R
0;1,R
S
ident invert stop
0;1,R
c Markus Bläser 2007–2015
110 16. Turing machines and complexity classes
0 1 0 ident
0 1 0 ident 1 1 0 invert
0 1 0 ident 1 0 0 invert
0 1 1 stop 1 0 1 stop
Figure 16.2: The computation tree of the Turing machine from Figure 16.1
on the word 010. The nodes are labeled with the configurations, right to the
tape, there is the current state standing. The position of the head is marked
by the small triangle. There are four paths in the computation tree, two of
them are accepting (the state is stop).
If there is no x ∈ L(M ) with length n, then TimeM (n) = 0. Note that this
definition is somewhat different to the deterministic case, where we took
the maximum over all x of length n. Let t : N → N be some function. A
nondeterministic Turing machine M is weakly t time bounded if TimeM (n) ≤
t(n) for all n.
Exercise 16.2 Show that for any nondeterministic Turing machine M that
is weakly t time bounded there is an equivalent Turing machine M 0 (i.e.,
M (x) = M 0 (x) for all x) that is weakly O(t) time bounded such that for
every input x, the computation tree of M 0 on x is a binary tree.
c Markus Bläser 2007–2015
16.3. An example 111
S
For a set of functions T , NTime(T ) = t∈T NTime(t). NTimek (T ) is defined
analogously.
Warning! I am fully aware of the fact that there does not exists a
physical realization of a nondeterministic Turing machine! (At least, I do
not know of any.) Nondeterministic Turing machine are not interesting per
se (at least not for an overwhelming majority of the world population), they
are interesting because they characterize important classes of problems. The
most important ones are the so-called NP-complete problems, a class which
we will encounter soon. The example in Section 16.3 gives a first impression.
For a nondeterministic Turing machine M and an input x ∈ L(M ), we
define space SpaceM (x) as follows: we take the minimum over all accepting
paths of the maximum of the space used by any configuration along this path
if such an accepting path exists, and ∞ otherwise. We set
In the case of NSpace(s), the Turing machines have an extra input tape.
16.3 An example
Consider the following arithmetic formula
x1 + 2x2 (1 − x1 ) + x3 .
We want to know whether we can assign the values 0 and 1 to the variables
in such a way that the formula evaluates to 1. Above x1 7→ 1, x2 7→ 0, and
x3 7→ 0 is such an assignment. The formula
x1 (1 − x1 )
does not have such an assignmet. We want to decide whether a given formula
has such an assignment or not.
c Markus Bläser 2007–2015
112 16. Turing machines and complexity classes
Excursus: Formalization
2. a(F · G) = a(F ) · a(G) and a(F + G) = a(F ) + a(G) for formulas F and G.
Since in every formula, only a finite number of variable occur, we usually restrict
assignments to the variables occurring in a given formula. An assignment is called
an S assignment for some S ⊆ Z, if im a ⊆ S.
We can encode arithmetic formulas as follows: For instance, we can encode the
variable xi by 0 bin(i) and a constant n by 1σ(z) bin(|z|) where σ(z) is 1 if z ≥ 0 and
0 otherwise. Then we define the encoding c inductively by c(F · G) = [0, c(F ), c(G)]
and c(F +G) = [1, c(F ), c(G)]. This is a very structured encoding, since it explicitly
stores the order in which operations are performed. Alternatively, we first encode
xi by the string x bin(i) and z by σ(z) bin(|z|). Now we can view our formula
as a string over the alphabet {(, ), +, ·, x, 0, 1}. To get a string over {0, 1}, we just
replace each of the seven symbols by a different binary string of fixed length. (Three
is sufficient, since 23 ≥ 7.) This is a rather unstructured encoding. Nevertheless,
both encodings allow us to evaluate the formula in time O(`3 ).
Since the encoding does not matter in the following, we will not specify
it explicitly. We just assume that the encoding is reasonable. Since there
3
O(`3 ) can be easily achieved for reasonable encodings: A formula F of length ` has
at most ` arithmetic operations and the value of the formula in the end has at most `
bits (proof by induction). Addition and multiplication can be performed in time O(`2 ) by
the methods that you learn in school and we have ≤ ` of them. Using more sophisticated
methods and a better analysis, one can bring down the evaluation time to O(`1+ ) for any
> 0.
c Markus Bläser 2007–2015
16.3. An example 113
is usually no danger of confusion, we will even write F for both the for-
mula itself (as a mathematical object) and its encoding (instead of c(F ) or
something like that). Let
AFSAT ∈ DTime(O(2n · n3 ))
c Markus Bläser 2007–2015
114 16. Turing machines and complexity classes
1;1,R| 1,R
0;0,R| 0,R
S| S
gen stop
c Markus Bläser 2007–2015
16.3. An example 115
c Markus Bläser 2007–2015
17 Tape reduction, compression,
and acceleration
Remark 17.3 The construction is quite similar to the universal Turing ma-
chine that we constructed in the first part of this lecture.
116
17.1. Tape reduction 117
0 1 1 0 1 1
− ∗ −
0 0 1 0 0 1
− − −
..
..
1 0 1 0
∗ − −
Figure 17.1: Lefthand side: The k tapes of the k-tape Turing machine M .
Righthand-side: The one and only tape of the simulating machine S. The
tape of S is divided into 2k tracks, two for each tape of M . The first track
of each such pair of tracks stores the content of the corresponding tape of
M , the second stores the position of the head which is marked by “∗”.
the first blank (of S)1 . On its way, S collects the k symbols under the heads of
M and stores them in its finite control. Once S has collected all the symbols,
it can simulate the transition of M . It changes the state accordingly and
now moves to the left until it reaches the first blank (of S). On its way back,
it makes the changes that M would make. It replaces the entries in the
components marked by a ∗ and moves the ∗ in the corresponding direction.
If M has not halted yet, S repeats the loop described above. If M halts,
S halts, too, and accepts iff M accepts.
Remark 17.4 The above construction also works for nondeterministic Tur-
ing machines. Whenever S has collected all the symbols and simulates the
actual transition of M , it chooses one possible transition nondeterministi-
cally.
Remark 17.5 If M has an additional input tape, then we can also equip S
with an additional input tape. If M has a sublinear space bound, then also
S has a sublinear space bound.
c Markus Bläser 2007–2015
118 17. Tape reduction, compression, and acceleration
Input: x ∈ Σ∗
Output: accept if x ∈ L(M ), reject otherwise
c Markus Bläser 2007–2015
17.1. Tape reduction 119
But we tacitely assume that you can write down—at least in principle—the
transition function of the 1-tape Turing machine constructed above.
Let’s convince ourselves that we can really do this: Consider the part of
S that collects the symbols that M would read. The states are of the form
{collect} × Q × (Γ ∪ {/})k . The first entry of a tuple (collect, q, γ1 , . . . , γk )
indicates that we are in a collection phase. (If the collection phase were the
only phase that uses tuple of the form Q × (Γ ∪ {/})k , then we could skip
this first component.) The second component stores the current state of M .
It shall not be changed during the collection phase. Finally, γκ stores the
symbol that is read by M on the κth tape. γκ = / indicates that the position
of the head on tape κ has not been found yet.
The transition function δ 0 of S (restricted to the states of the collect
phase) is now defined as follows:
for 1 ≤ κ ≤ k. If the symbol η2κ on the 2κth track is ∗, then we found the
head on the κth tape and store η2κ−1 , the symbol that M reads on the κth
tape, in the state of S.
c Markus Bläser 2007–2015
120 17. Tape reduction, compression, and acceleration
Proof overview: In the same way as a 64 bit architectur can store more
information in one memory cell than an 8 bit architectur, we enlarge the
tape alphabet to store several symbols in one symbol and then just simulate.
c Markus Bläser 2007–2015
17.3. Acceleration 121
Q × {1, . . . , c}. (q, i) means that M is in state q and its head is on the ith
symbol of the current block. Assume that δ(q, η) = (q 0 , η 0 , R). Then
(
0 ((q 0 , i + 1), (γ10 , . . . , γc0 ), S) if i < c
δ ((q, i), (γ1 , . . . , γc )) =
((q 0 , 1), (γ10 , . . . , γc0 ), R) if i = c
for all q ∈ Q, i ∈ {1, . . . , c}, and all (γ1 , . . . , γc ) with γi = η, where γj0 = γj
for j 6= i and γi0 = η 0 .
Exercise 17.1 Show the following “converse” of Theorem 17.10: For any
s space and t time bounded Turing machine M with input alphabet {0, 1},
there is a O(s) space and O(t) time bounded Turing machine that only uses
the work alphabet {0, 1, }.
17.3 Acceleration
Next, we prove a similar speed up for time. This simulation is a little more
complicated than the previous one.
Exercise 17.2 Show the following: For all k ≥ 2, all t : N → N, and all
0 < ≤ 1,
DTimek (t(n)) ⊆ DTimek (n + (n + t(n)))
NTimek (t(n)) ⊆ NTimek (n + (n + t(n))).
What to measure?
c Markus Bläser 2007–2015
122 17. Tape reduction, compression, and acceleration
c Markus Bläser 2007–2015
18 Space versus Time,
Nondeterminism versus Deter-
minism
Time and space constructible functions “behave well”. One examples for
this is the following result.
Proof. We start with the first statement: Let M be some weakly t time
bounded Turing machine with L(M ) = L. Consider the following turing
machine N :
123
124 18. Space versus time, nondeterminism versus determinism
Input: x
3. When more than t(|x|) steps have been simulated, then stop
and reject.
Input: x
1. Mark 2s(|x|) cells with a new symbol on each work tape (see
Exercise 18.1), s(|x|) to the left of cell 0 and s(|x|) to the right.
c Markus Bläser 2007–2015
18.2. The configuration graph 125
If the Turing machine does not have an extra input tape, the last factor
|x| + 2 is not necessary. It is easy to see that (18.1) is bounded by cs(|x|)
for some constant c only depending on |Q|, |Γ|, and k. (To bound the last
factor |x| + 2, we need the assumption s(n) ≥ log n.)
c Markus Bläser 2007–2015
126 18. Space versus time, nondeterminism versus determinism
Remark 18.6 Corollary 18.5 is trivially true for deterministic time classes,
we just have to exchange accepting states with rejecting states.
c Markus Bläser 2007–2015
18.3. Space versus time 127
c Markus Bläser 2007–2015
128 18. Space versus time, nondeterminism versus determinism
c Markus Bläser 2007–2015
19 Space and time hierarchies
Hierarchies
• We use one tape to simulate M . The ith symbol of the work alphabet
is represented by the string bin(i) and the symbols are separated by
#.
129
130 19. Space and time hierarchies
2. Mark s2 (|x|) symbols to the left and right of cell 0 on the first
tape.
3. Simulate M := göd−1
TM (g) on x on the first tape (using the
machine U ).
5. If the simulation ever leaves the marked cells, then stop and
reject.
6. If more than 3s2 (|x|) step are simulated, then stop and accept.
(We can count up to this value by marking s2 (|x|) cells and
counting in ternary.)
c Markus Bläser 2007–2015
19.2. Deterministic hierarchy theorems 131
x. This means that either M performed more than 3s2 (|x|) many steps or
M halts on x and rejects. In the second case, we are done. For the first
case, note that there is a constant c such that M cannot make more than
cs1 (|x|) · (s1 (|x|) + 2) · (|x| + 2) steps without entering an infinite loop.1 Thus
if 3s2 (|x|) > cs1 (|x|) · (s1 (|x|) + 2) · (|x| + 2) then we get that x ∈
/ L(M ). But
3 s 2 (|x|) >c s 1 (|x|) · (s1 (|x|) + 2) · (|x| + 2) is equivalent to
This is fulfilled by assumption for all long enough x, i.e., for long enough z,
because s2 (|x|) ≥ log(|x|).
The second case is x ∈/ L(C1 ). We will show that now M accepts x. Note
that C1 always terminates. If C1 rejects y, then M ran out of space or M
halted and accepted. The second case, x ∈ L(M ) and we are done. We will
next show that the first case cannot happen. Since M is s1 space bounded,
the simulation via U needs space |g| · s1 (|x|). But |g| · s1 (|x|) ≤ s2 (|x|) for
sufficiently large |x|. Thus this case cannot happen.
C1 is s2 space bounded by construction. This proves the theorem.
The construction of C2 is similar, even easier. We do not have to check
whether M runs out of space. We do not need to count to 3s2 (|x|) to detect
infinite loops. Instead we count the number of steps made by C2 . If more
then t2 (|x|) step are made, then we stop and reject. In this way, C2 becomes
O(t2 ) time bounded. (To get down to t2 , use acceleration.) Since we can
simulate one step of M by |g| steps, the simulation of M takes |g| · t1 (|x|)
steps of C2 provided that M is t1 time bounded. This is less than t2 (|x|) if
z is long enough. The rest of the proof is similar.
DSpace(s1 ) ( DSpace(s2 ).
1
We cannot bound |x| + 2 by ds1 (|x|) , since s1 might be sublogarithmic.
c Markus Bläser 2007–2015
132 19. Space and time hierarchies
Next, we do the same for time complexity classes. The result will not be
as nice as for space complexity, since we cannot simulate arbitrary determin-
istic Turing machines by 1-tape Turing machines without any slowdown.
19.3 Remarks
The assumption t21 = o(t2 ) in the proof of the time hierarchy theorem is
needed, since we incurr a quadractic slowdown when simulating k-tape Tur-
ing machines by 1-tape Turing machines.
Hennie and Stearns showed the following theorem.
Theorem 19.4 (Hennie & Stearns) Every t time and s space bounded
deterministic k-tape Turing machine can be simulated by an O(t log t) time
bounded and O(s) space bounded deterministic 2-tape Turing machine.
DTime(t1 ) ( DTime(t2 ).
If the number of tapes is fixed, then one can obtain a tight time hierarchy.
Again we do not give a proof here.
2
If you can answer it, we should talk about your dissertation.
c Markus Bläser 2007–2015
19.3. Remarks 133
We conclude with pointing out that the assumption that s2 and t2 are
constructible are really necessary.
c Markus Bläser 2007–2015
134 19. Space and time hierarchies
For nondeterministic time, neither of the two approaches is known to work. But
one can get the following hierarchy result: For a function t : N → N, let t̃ be the
function defined by t̃(n) = t(n + 1). If t1 is time constructible and t̃1 = o(t2 ) then
NTime(t2 ) \ NTime(t1 ) 6= ∅.
The proof of this result is lengthy. Note that for polynomial functions or exponential
functions, t̃1 = O(t1 ). Thus we get a tight nondeterministic time hierarchy for these
functions.
c Markus Bläser 2007–2015
20 P and NP
We are looking for complexity classes, that are robust in the sense that
“reasonable” changes to the machine model should not change the class.
Furthermore, the classes should also characterize interesting problems.
Definition 20.1
[
P= DTime(O(ni ))
i∈N
[
NP = NTime(O(ni ))
i∈N
P (P stands for polynomial time) is the class of problems that are con-
sidered to be feasible or tractable. Frankly, an algorithm with running time
O(n1024 ) is not feasible in practice, but the definition above has been very
fruitful. If a natural problem turns out to be in P, then we usually will
have an algorithm whose running time has a low exponent. In this sense, P
contains all languages that we can decide quickly.
NP (NP stands for nondeterministic polynomial time and not for non-
polynomial time) on the other hand, is a class of languages that we would
like to decide quickly. There are thousands of interesting and important
problems in NP for which we do not know deterministic polynomial time
algorithms.
The class P is a robust class. A language that can be decided by a
deterministic Turing machine in polynomial time can be decided by a WHILE
program in polynomial time and vice versa. (This follows easily by inspecting
the simulations that we designed in the first part of the lecture. But read
the excursus in Chapter 16.) This is also true for NP, if we equip WHILE
programs with nondeterminism in a suitable way.
The question whether P = NP is one of the big open problems in com-
puter science. Most researchers believe that these classes are different, but
there is no valid proof so far. The best that we can show is
i
[ [
NP = NTime(O(ni )) ⊆ DTime(2O(n ) ) =: EXP,
i∈N i∈N
135
136 20. P and NP
20.1 Problems in P
Here is one important problem in P. You may consult your favourite book
on algorithms for many other ones.
s-t-CONN is the problem whether a directed graph has a path from a
given source node s to a target node t:
(G, s, t) is an encoding of the graph G and the source and target nodes s
and t. A reasonable encoding would be the following: All nodes are rep-
resented by numbers 1, . . . , n, written down in binary. We encode an edge
by [bin(i), bin(j)]. We encode the whole graph by building a large pair that
consists of bin(n), bin(s), bin(t), and the encodings of all edges, using our
pairing function. Since we only talk about polynomial time computability,
the concrete encoding does not matter, and we will not specify the encoding
in the following.
We will also just write (G, s, t) or G and will not apply an encoding
function. You are now old enough to distinguish whether we mean the graph
G itself or its encoding.
c Markus Bläser 2007–2015
20.3. Problems in NP 137
2. For all x ∈/ L and all c ∈ {0, 1}∗ , M on input [x, c] reads at most p(|x|)
bits of c and always rejects [x, c].
We denote the language L that M verifies by V (M ).
20.3 Problems in NP
There is an abundance of problems in NP. We here just cover the most basic
ones (most likely, even less).
A clique of a graph G = (V, E) is a subset C of V such that for all
u, v ∈ C with u 6= v, {u, v} ∈ E. A clique C is called a k-clique if |C| = k.
Clique is the following language:
c Markus Bläser 2007–2015
138 20. P and NP
c Markus Bläser 2007–2015
20.3. Problems in NP 139
Theorem 20.5 Clique, VC, Subset-Sum, HC, TSP, SAT, `SAT ∈ NP.
Proof. We show that all of the problems have a polynomial time verifier.
Let’s start with Clique. On input [x, y], a verifier M for Clique first checks
whether x is an encoding of the form (G, k). If not, M rejects. It now
interprets the string y as a list of k nodes of G, for instance such an encoding
could be bin(i1 )$ . . . $ bin(ik ) where i1 , . . . , ik are nodes of G. (Since y ∈
{0, 1}∗ , we would then map for instance 0 7→ 00, 1 7→ 01, and $ 7→ 11.) If
y is not of this form, then M rejects. If y has this form, then M checks
whether {ij , ih } is an edge of G, 1 ≤ j < h ≤ k. If yes, then M accepts,
otherwise it rejects.
We have to show that there is a y such that [x, y] ∈ L(M ) iff x = (G, k)
for some graph G that has a k-clique. Assume that x = (G, k) for some
graph G that has a k-clique. Then a list of the nodes that form a clique is
a proof that makes M accept. On the other hand, if G has no k-clique or x
is not a valid encoding, then no proof will make M accept.
For SAT and `SAT, an assignment to the variables that satisfies the
formula is a possible proof. For VC, a subset of the nodes of size less than
k that covers all edges is a possible proof. For Subset-Sum, it is the set of
indices I, for HC and TSP, it is the appropriate permutation. The rest of
the proof is now an easy exercise.
c Markus Bläser 2007–2015
140 20. P and NP
c Markus Bläser 2007–2015
21 Reduction and completeness
141
142 21. Reduction and completeness
• f is Turing computable.
• f is total.
• ≤ is transitive.
• ≤P is transitive.
• If L ≤P L0 and L0 ∈ P, then L ∈ P.
c Markus Bläser 2007–2015
21.2. NP-complete problems 143
c Markus Bläser 2007–2015
144 21. Reduction and completeness
Proof. We show that SAT ≤ `SAT. It suffices to show this for ` = 3. Let
φ be a formula in CNF. We have to map φ to a formula ψ in 3-CNF such
that φ is satisfiable iff ψ is satisfiable.
We replace each clause c of length > 3 of φ by a bunch of new clauses.
Let c = `1 ∨ · · · ∨ `k with literals `κ . Let y1 , . . . , yk−3 be new variables. (We
need new variables for each clause.) We replace c by
(`1 ∨`2 ∨y1 )∧(ȳ1 ∨`3 ∨y2 )∧· · ·∧(ȳk−4 ∨`k−2 ∨yk−3 )∧(ȳk−3 ∨`k−1 ∨`k ) (21.1)
c Markus Bläser 2007–2015
21.2. NP-complete problems 145
Reducing SAT to `SAT was not too hard. (Or at least, it does not look
too unreasonable that one can find such a reduction.) Reducing SAT or `SAT
to Clique, for instance, looks much harder, since these problems seem to be
completely unrelated. First such reductions look like art, but nowadays it
has become routine work (with some exceptions) and there is a huge toolbox
available.
We have to construct a pair (G, k) such that G = (V, E) has a k-clique iff φ
is satisfiable. We set V = {(1, 1), (1, 2), (1, 3), . . . , (m, 1), (m, 2), (m, 3)}, one
node for each literal of a clause. E is the set of all {(i, s), (j, t)} such that
i 6= j and `i,s 6= `¯j,t . In other words, there is no edge {(i, s), (j, t)} iff `i,s
and `j,t cannot be simultaneously set to 1 (because one is the negation of
the other). Finally, we set k = m.
If φ is satisfiable, then there is a satisfying assignment for φ, i.e., an
assignment that assigns to at least one literal of each clause the value 1. Let
`1,s1 , . . . , `m,sm be these literals. Then (1, s1 ), . . . , (m, sm ) form a clique of
size m in G.
Conversely, if G has a clique of size k, then it is of the form (1, s1 ), . . . , (m, sm ),
because there is no edge between (i, s) and (i, t) for s 6= t. Then we can set
all the literals `1,s1 , . . . , `m,sm to 1 and hence φ is satisfiable.
The mapping φ 7→ (G, k), is obviously polynomial time computable.
c Markus Bläser 2007–2015
146 21. Reduction and completeness
Theorem 21.15 Clique, VC, Subset-Sum, HC, TSP, SAT, 3SAT, and
AFSAT are NP-complete.
The proof of the theorem follows from the lemmas in this chapter.
c Markus Bläser 2007–2015
21.2. NP-complete problems 147
AFSAT
SAT HC TSP
3SAT Subset-Sum
Clique VC
c Markus Bläser 2007–2015
22 More reductions
Only the result on Subset-Sum was discussed in class. The results on HC are
not relevant for the exam.
In this chapter, we construct the two missing reductions from the last
chapter. They are more complicated than the ones in the last chapter, but
now you should be old enough to understand them. When you see such re-
ductions for the first time, they look like complicated magic, but constructing
them has become a routine job, with some notable exceptions.
22.1 Subset-Sum
We first start with the proof of Lemma 21.12. As Exercise 21.2 suggests, the
instances created by the reduction will use large numbers, that is, numbers
whose size is exponential in the number of clauses of the Boolean formula
(or equivalently, the length of the binary representation will we polynomial
in m).
Proof of Lemma 21.12. Let φ be a formula in 3-CNF. We have to con-
struct an instance of Subset-Sum, P i.e., numbers a1 , . . . , at , b such that there
is a subset I ⊆ {1, . . . , t} with i∈I ai = b iff φ is satisfiable.
Let x1 , . . . , xn be the variables of φ. Let c1 , . . . , cm be the clauses of φ.
For each literal ` we will construct a number a(`) as follows: The number
a(`) is of the form a0 (`) + 10n · a1 (`). The first part a0 (`) is the variable part,
the second part a1 (`) is the clause part. For a variable xν , let cµ1 , . . . , cµsν
be the clauses in which it appears positively, or in other words, cµ1 , . . . , cµsν
are the clauses that contain the literal xν . Then
For a literal x̄ν , let cµ̄1 , . . . , cµ̄s̄ν are the clauses that contain the literal x̄ν .
Then
a(x̄ν ) = 10ν−1 + 10n (10µ̄1 −1 + · · · + 10µ̄s̄ν −1 ).
Choosing a(xν ) indicates that we set xν to 1. Choosing a(x̄ν ) indicates that
we set x̄ν to 1, i.e., xν to 0. Of course we can set xν either to 1 or to 0. This
means that we shall only be able to select one of a(xν ) and a(x̄ν ). Thus in
the “target number” b = b0 + 10n b1 , be set b0 = 1 + 10 + · · · + 10n−1 .
The numbers a(xν ) and a(x̄ν ) have digits 0 or 1. For each position 10i ,
there are at most 3 numbers that have digit 1 at position 10i . In the variable
part, this is clear, since only a(xi ) and a(x̄i ) have a 1 in position 10i . In
148
22.2. Hamiltonian Cycle 149
the clause part, this is due to the fact that each clause consists of at most
three literals. Since our base 10 is larger than 3, in the sum of any subset
of a(xν ), a(x̄ν ), 1 ≤ ν ≤ n, no carry can occur. (We could have chosen a
smaller base but 10 is so convenient.) This means that any sum that yields
b0 in the lower n digits either contains a(xν ) or a(x̄ν ), 1 ≤ ν ≤ n. This
ensures consistency, that means, we can read off a corresponding assignment
from the chosen numbers.
Finally, we have to ensure that the assignment is also satisfying. This
is done by choosing b1 properly. Each clause should be satisfied, so a first
try would be to set b1 = 1 + 10 + · · · + 10m−1 . But a clause cµ could be
satisfied by two or three literals, in this case the digit of 10n−1+µ is 2 or
3. The problem is that we do not know in advance whether it is 1, 2, or 3.
Therefore, we set b1 = 3(1 + 10 + · · · + 10m−1 ) and introduce “filler numbers”
cµ,1 = cµ,2 = 10n−1+µ , 1 ≤ µ ≤ m. We can use these filler numbers to reach
the digit 3 in position 10n−1+µ . But to reach 3, at least one 1 has to come
from an a(xν ); thus the clause is satisfied if we reach 3.
Overall, the considerations above show that φ has a satisfying assignment
iff a subset of a(xν ), a(x̄ν ), 1 ≤ ν ≤ n, and cµ,1 , cµ,2 , 1 ≤ µ ≤ m, sums up to
b. Thus the reduction above is a polynomial time many one reduction from
3SAT to Subset-Sum.
c Markus Bläser 2007–2015
150 22. More reductions
u1 w1 u1,out w1,in
u2 w2 u2,out w2,in
.. v .. .. vin v vout ..
. . . .
us wt us,out wt,in
Figure 22.1: The gadget for the reduction: u1 , . . . , us are the nodes such that
there is an edge (ui , v) ∈ E, that is, an edge entering v. The nodes w1 , . . . , wt
are the nodes such that there is an edge (v, wj ) ∈ E, that is, an edge leaving
v. The two lists u1 , . . . , us and v1 , . . . , vt need not be disjoint. The righthand
side show the gadget. Every node is replaced by three nodes, v, vin , vout .
For every directed edge (x, y), we add the undirected edge {xout , yin }.
a1 a2
b1 b2
c1 c2
three nodes vin , v, vout and connect vin with v and vout with v.1 Then for
every directed edge (x, y) ∈ E, we add the undirected edge {xout , yin } to E 0 .
Figure 22.1 shows this construction.
Given G, we can construct G0 in polynomial time. Thus what remains to
show is the follow: G has a Hamiltonian cycle iff G0 has a Hamiltonian cycle.
Assume that G has a Hamiltonian cycle C. Then we get a Hamiltonian cycle
of G0 as follows. For every edge (x, y) of C, we take the edge {xout , yin }.
Furthermore We add the {vin , v} and {v, vout }. This gives a Hamiltonian
cycle of G0 .
For the converse direction, observe that every node in a Hamiltonian cycle
is incident with two edges. Since every node v is only incident with two edges
in G0 , the edges {vin , v} and {v, vout } have to be in a Hamiltonian cycle in
G0 . The other edges of the Hamiltonian cycle in G0 induce a Hamiltonian
cycle in G.
For 3SAT, we need a more complicated gadget.
1
Such a thing is usually called a gadget. We replace a node or edge (or something like
this) by a small graph (or something like this). The term gadget is used in an informal
way, there is no formal definition of a gadget.
c Markus Bläser 2007–2015
22.2. Hamiltonian Cycle 151
a1 a2 a1 a2 a1 a2
b1 b2 b1 b2 b1 b2
c1 c2 c1 c2 c1 c2
a1 a2 a1 a2 a1 a2
b1 b2 b1 b2 b1 b2
c1 c2 c1 c2 c1 c2
Figure 22.4: No matter how we connect a1 with b2 , there are always inner
nodes left that are not covered and cannot be covered by other paths.
1. For every nonempty subset S ⊆ {(a1 , a2 ), (b1 , b2 ), (c1 , c2 )}, there are
node disjoint path from s to t for all (s, t) ∈ S such that all inner
nodes of G lie on one of these path.
Proof. We start with the first part: Figure 22.3 shows these paths in the
case of one, two, or three pairs. Only the number of pairs in S matters, since
the structure of the gadget G is invariant under simultaneous cyclic shifts of
the nodes a1 , b1 , c1 and a2 , b2 , c2 .
For the second part consider any other pair. Since the gadget is invariant
under cyclic shifts, it is enough to consider the pair (a1 , b2 ) and the pair
(a2 , b1 ). Figure 22.4 shows all possibilities how to connect a1 with b2 . In
each case, inner nodes are not covered and it is not possible to cover all of
them with other paths. The other pair is an exercise.
Exercise 22.1 Draw the corresponding figures for the pair (a2 , b1 ).
c Markus Bläser 2007–2015
152 22. More reductions
corresponds to the fact that xi is set to 1, the other one corresponds to the
case that xi is set to 0. Let Cj be the gadget that represents cj . Assume
that xi occurs positively in clauses cj1 , . . . , cjs and negatively in the clauses
ck1 , . . . , ckt . Then an edge goes from xi to Cj1 . If it is the first literal in cj1 ,
then this edge goes to the node a1 , if it is the second, then it enters through
b1 , and if it is the third, it uses c1 . Then there is an edge from Cj1 to Cj2 .
It leaves Cj1 through the node corresponding to the entry node. I.e., if we
entered Cj1 through a1 , we also leave to through a2 , and so on. Finally,
the edge leaving Cjs goes to xi+1 . The second paths is constructed in the
same manner and goes through Ck1 , . . . , Ckt . Every clause gadget appears
on one, two, or three path, depending on the number of its literals. Finally,
we remove all the ai , bi , and ci nodes, i = 1, 2. For each such node, if there
is one edge going into it and a second one leaving it, then we replace these
two edges by one edge going from the start node of the first edge to the end
node of the second edge. When such a node is only incident with one edge,
we remove it and its edge completely.
Figure 22.5 shows an example. x2 is the first literal of c3 , the third of c5 ,
and the first of c8 . x̄1 is the second literal of c2 .
Let G be the graph constructed from φ. G can certainly constructed in
polynomial time. So it remains to show that φ has a satisfying assignment
iff G has a Hamiltonian cycle.
For the “⇒”-direction, let a be a satisfying assignment of φ. We construct
a Hamiltonian cycle as follows. If a(xi ) = 1, we use all the edges of the paths
to xi+1 that contain the clauses in which xi occurs positively. In the other
case, we use the other path. Since a is a satisfying assignment, at least one
of the inner node of Ci that were right of a1 , b1 , or c1 is incident with one of
these edges. And by construction, also the corresponding inner nodes that
were left of a2 , b2 , or c2 are. By the first part of Lemma 22.2, we can connect
the corresponding pairs such that all inner nodes of the gadget lie on a path.
This gives a Hamiltonian cycle of G.
For the converse direction, let H be a Hamiltonian cycle of G. By the
second part of Lemma 22.2, when the cycle enters a clause gadget through
the inner node that was right to a1 , it leaves it through the inner node left
to a2 and so forth. This means that the next variable node that the cycle
visits after xi is xi+1 . Since only one edge can leave xi , the cycle either goes
through the path with positive occurences of xi or through the path with
negative occurences of xi . In the first case, we set xi to 1, in the second to 0.
Since H is a Hamiltonian cycle, it goes through each clause gadget at least
once. Hence this assignment will be a satisfying assignment.
c Markus Bläser 2007–2015
22.2. Hamiltonian Cycle 153
C3 C5 C8
a1 a2 a1 a2 a1 a2
b1 b2 b1 b2 b1 b2
c1 c2 c1 c2 c1 c2
x2 x3
a1 a2
b1 b2
c1 c2
C2
C3 C5 C8
x2 x3
C2
c Markus Bläser 2007–2015
D Proof of the Cook–Karp–Levin
theorem
Reducing 3SAT to Subset-Sum, for instance, was a hard job, because the
problems look totally different. To show that SAT is NP-hard, we have to
reduce any language in NP to SAT. The only thing that we know about such
an L is that there is a polynomially time bounded nondeterministic Turing
machine M with L(M ) = M . Thus we have to reduce the question whether
a Turing machine M accepts a word x to the question whether some formula
in CNF is satisfiable. (This makes 3SAT ≤P Subset-Sum look like a picnic.)
The general reduction scheme looks as follows:
Turing machines
↓
oblivious Turing machines
↓
Boolean circuits
↓
CSAT
↓
SAT
154
D.1. Boolean functions and circuits 155
Exercise D.1 Every Boolean function f : {0, 1}n → {0, 1} can be computed
by a Boolean circuit of size 2O(n) . (Remark: This can be sharpened to (1 +
) · 2n /n for any > 0. The latter bound is tight: For any > 0 and any
large enough n, there is a Boolean function f : {0, 1}n → {0, 1} such that
every circuit computing f has size (1 − )2n /n.)
c Markus Bläser 2007–2015
156 D. Proof of the Cook–Karp–Levin theorem
3. C computes the function {0, 1}∗ → {0, 1} given by x 7→ C|x| (x). Since
we can interpret this as a characteristic function, we also say that C
decides a language. We write L(C) for this language.
c Markus Bläser 2007–2015
D.3. Simulating Turing machines by families of circuits 157
poor circuit simulate the behaviour of the Turing machine on all inputs of
length n. The idea is to tame the Turing machine.
Lemma D.4 Let t be time constructible. For every t time bounded deter-
ministic Turing machine M , there is an oblivious O(t2 ) time bounded 1-tape
deterministic Turing machine S with L(M ) = L(S).
c Markus Bläser 2007–2015
158 D. Proof of the Cook–Karp–Levin theorem
feed the input bits into these packets of edges that correspond to the n cells
that contain the input. In all other packets, we feed constants that encode
the blank, say 0 . . . 01. In the edges that correspond to the state we feed
the constants that that encode that start state, that is, 0 . . . 0. After the
last layer, we feed the edges that carry the state into a small circuit E that
outputs 1, iff the encoded state is accepting and 0 otherwise. See Figure D.1
for a sketch of Cn .
On input 1n , a Turing machine N can construct Cn as follows: Cn has a
very regular structure, so N constructs it layer by layer. The circuit D can
be “hard-wired” 1 into N , since the size of D is finite. The only problem is to
find out where to place D. But since M is oblivious, it suffices to simulate
M on 1n . This simulation also gives us the number of layers that Cn has,
namely the number of steps that M performs.
Since M is polynomial time bounded, the family (Cn ) is polynomial time
uniform.
Proof. Let L ∈ NP and let M be a polynomial time verifier for it. We can
assume that M is oblivious. Let p be the polynomial that bounds the length
of the certificates. We can also assume that all certificates y such that M
accepts [x, y] have length exactly p(|x|). To do so, we can for instance replace
each 0 of y by 00 and each 1 by 11 and pad the certificate by appending 01.
This doubles the length of the certificates, which is fine.
We saw in Lemma D.5, that for any oblivious polynomial time bounded
Turing machine, there is a polynomial time uniform family of polynomial
size circuits Ci that decides the same language.
Now our reduction works as follows: Since for each x of length n, all
interesting certificates (certificates such that M might accept [x, y]) have
the same length, all interesting pairs [x, y] have the same length `(n), which
depends only on n. Given x, we construct C`(|x|) . Then we construct a
circuit with n + p(n) inputs that given x and y computes [x, y] and use its
output as the input to C`(|x|) Finally, we specialize the inputs belonging to
the symbols of the first part of the input to x. Our reduction simply maps
x to this circuit.
1
This means that there is a “subroutine” in N that prints D on the tape
c Markus Bläser 2007–2015
D.4. The proof 159
0 1 0
0 x1 x2 x3 0 1
..
Figure D.1: The circuit Cn that simulates the Turing machine on inputs of
length n. At the top, the tape of the Turing machine is shown. The input
is 010. The Turing machine moves its head two times to the right during its
first two steps. The states are a subset of {0, 1}3 and the symbols of the tape
alphabet Γ are encoded as words from {0, 1}2 . Since 0 ∈ Γ is encoded by 00
and 1 ∈ Γ is encoded by 11, we just can duplicate the input node xν and
feed the two edges into D. A blank is represented by 01. There are “edges”
at the bottom that do not end in any nodes. They actually do not appear in
Cn , we have just drawn them to depict the regular structure of the layers.
c Markus Bläser 2007–2015
160 D. Proof of the Cook–Karp–Levin theorem
c Markus Bläser 2007–2015
Part III
Formal languages
161
23 Finite automata and regular lan-
guages
162
23.2. Finite automata 163
20c
1 1
2 2
2
10c 30c
B
1
1
2
B
0c brew
Figure 23.1: A finite automaton that models the coffee vending machine. A
label of 1 or 2 on the edge means that the customer has inserted 10 or 20
cents, respectively. The label B means that the “Money back” button was
pressed.
10c, 20c, and 30c correspond to the amount of money inserted so far, the
state brew is entered if at least 40 cents have been inserted. Of course, the
machine starts in the state 0c. This is indicated by the triangle on the left
side of the circle. Figure 23.1 shows a diagram of the automaton COFFEE.
An arc from one state to another means that if the customer performs the
action the edge is labeled with, then the automaton will change the state
accordingly. Once the state brew is reached, the machine is supposed to
brew a coffee. A clever coffee machine would then go back to the start state
but we leave our machine as it is for now.
Exercise 23.1 1. Modify the coffee automaton such that it gives change
back. The amount of change should be indicated by the state that the
automaton ends in.
2. Modify the coffee automaton such that the customer has the choice
between several types of coffees.
• Finite automata can be used for string matching, see for instance Chap-
ter 32 in “Introduction to Algorithms” by Cormen, Leiserson, Rivest,
and Stein.
c Markus Bläser 2007–2015
164 23. Finite automata and regular languages
In other words, M has no memory and can read each symbol of the input
once and that’s it. “No memory” means no memory on the tape, M can store
a finite amount of information in its states. This is—surprise!—the reason
why M is called a finite automaton.
After throwing away all the things that are not necessary, this is what is
left of the Turing machine:
(a) s0 = q0 ,
(b) for all 0 ≤ ν < n: δ(sν , wν+1 ) = sν+1 ,
2
This renders the first condition useless, but we keep the first one for aesthetic reasons.
c Markus Bläser 2007–2015
23.2. Finite automata 165
c Markus Bläser 2007–2015
166 23. Finite automata and regular languages
0
1 2 1
1 0
0
1
1
0 3
Figure 23.2: The finite automaton M1 . The double circle around state 3
indicates that 3 is an accepting state.
set
δ ∗ (q, ε) = q
(
δ(δ ∗ (q, w), σ) if δ ∗ (q, w) is defined
δ ∗ (q, wσ) =
undefined otherwise
The first line basically states that if the automaton reads the empty word
(which it cannot do), then it would stay in its actual state. Next, we get
δ ∗ (q, σ) = δ(δ ∗ (q, ε), σ) = δ(q, σ) for all q ∈ Q, σ ∈ Σ. So for words of length
1, δ and δ ∗ coincide, which is what we want. Intuitively, δ ∗ (q, w) is the state
that the automaton reaches if it starts in state q and then reads w.
c Markus Bläser 2007–2015
23.3. Closure properties, part I 167
Exercise 23.2 The automaton M1 basically searches for the string 000 in
the word w. Design a similar automaton that searches for the sequence 01011.
Can you devise an algorithm that given any sequence s, constructs an au-
tomaton that searches for s in a given word w?
ment is.
c Markus Bläser 2007–2015
168 23. Finite automata and regular languages
Lemma 23.6 Let M1 = (Q1 , Σ, δ1 , q0,1 , Qacc,1 ) and M2 = (Q2 , Σ, δ2 , q0,2 , Qacc,2 )
be two finite automata such that δ1 and δ2 are total functions. Then the tran-
sition function ∆ defined by
∆ : (Q1 × Q2 ) × Σ → Q1 × Q2
((q1 , q2 ), σ) 7→ (δ1 (q1 , σ), δ2 (q2 , σ))
fulfills
∆∗ ((q1 , q2 ), w) = (δ1∗ (q1 , w), δ2∗ (q2 , w))
for all q1 ∈ Q1 , q2 ∈ Q2 , and w ∈ Σ∗ .
Theorem 23.7 REG is closed under intersection, union, and set difference,
i.e, if A, B ⊆ Σ∗ are regular languages, then A ∩ B, A ∪ B, and A \ B are
regular, too.
c Markus Bläser 2007–2015
23.3. Closure properties, part I 169
c Markus Bläser 2007–2015
24 Nondeterministic finite au-
tomata
In the last chapter, we showed that REG is closed under the operations
complementation, union, and intersection. In this chapter, among other
things, we want to show the closure under concatenation and Kleene closure.
AB = {wx | w ∈ A, x ∈ B}.
A∗ = {x1 x2 . . . xm | m ≥ 0 and xµ ∈ A, 1 ≤ µ ≤ m}
2. {aa, b}∗ = {ε, aa, b, aaaa, aab, baa, bb, aaaaaa, aaaab, . . . } is the set of
all words in which a’s always occur in pairs.
1. ∅A = A∅ = ∅ for all A ⊆ Σ∗ .
170
24.1. Nondeterminism 171
24.1 Nondeterminism
When we showed that REG is closed under union or intersection, we took
two automata for A, B ∈ REG and constructed another automaton out of
these two automata for A ∩ B or A ∪ B. Here is an attempt for AB: w ∈ AB
if there are x ∈ A and y ∈ B such that w = xy. So we could first run A on x
and then B on y. The problem is that we do not know when we leave x and
enter y. The event that A enters an accepting state is not enough; during
the computation on x, A can enter and leave accepting states several times.
For instance, let A = {x ∈ {0, 1}∗ | the number of 0’s in x is even} and
B = {y ∈ {0, 1}∗ | the number of 1’s in y is odd}. How does an automata
for AB look like? In a first part, we have to count the 0’s modulo 2. At
some point, we have to switch and count the 1’s modulo 2. Figure 24.1 shows
an automaton for AB. The part consisting of the states 0-even and 0-odd
counts the 0’s modulo 2. From the state 0-even, we can go to the second part
of the automaton consisting of the states 1-even and 1-odd. This part counts
the number of 1’s modulo 2. The state 0-even is left by two arrows that are
labeled with 0 and two arrows that are labeled with 1. So this automaton is
nondeterministic. We introduced nondeterminism in the context of Turing
machines and the concept is the same for automata (which are restricted
Turing machines): The automaton can choose which of the transitions it
will make. The automaton accepts a word if there is a sequence of choices
such that the automaton ends in an accepting state. Among other things,
we use nondeterminism here to construct a nondeterministic automaton for
AB. The amazing thing is, that, in constrast to general Turing machines, we
know how to simulate a nondeterministic finite automaton by a deterministic
one (without any time loss).
Another way to introduce nondeterminism are ε-transitions. These are
arrows in the transition diagram that are labeled with ε. This means that the
automaton may choose to make the ε-transition without reading a symbol
of the input. Figure 24.2 shows an automaton with ε-transitions for AB.
c Markus Bläser 2007–2015
172 24. Nondeterministic finite automata
1 1
0
0−even o−odd
0
0 0
1
0
1
1−even 1−odd
1
1 1
0
0−even o−odd
0
0 0
1
1−even 1−odd
1
c Markus Bläser 2007–2015
24.1. Nondeterminism 173
(a) s0 = q0 .
(b) for all 0 ≤ µ < m, sµ+1 ∈ δ(sµ , uµ+1 ).
c Markus Bläser 2007–2015
174 24. Nondeterministic finite automata
and σ ∈ Σ, δ (ε) (q, σ) denotes all states that we can reach from q by making
an arbitrary number of ε-transitions and then one transition that is labeled
with σ. (And we are not allowed to make any ε-transitions afterwards.)
Formally,
δ (ε) : Q × Σ → P(Q)
(q, σ) 7→ {r | there are k ≥ 0 and s0 = q, s1 , . . . , sk such that
sκ+1 ∈ δ(sκ , ε), 0 ≤ κ < k, and r ∈ δ(sk , σ)}.
For a subset R ⊆ Q of the states, R(ε) denotes all the states in Q from which
we can reach a state in R just by ε-transitions. Formally,
R(ε) = {r ∈ Q | there are k ≥ 0 and s0 = r, s1 , . . . sk such that
sκ+1 ∈ δ(sκ , ε), 0 ≤ κ < k, and sk ∈ R.}.
δ ∗ (q, x) are all the states that we can reach if we start from q, read x, and
do not allow any ε-transition after reading the last symbol of x.
c Markus Bläser 2007–2015
24.1. Nondeterminism 175
0−even
0−odd 1−even
0−odd 1−odd
0−even 1−odd
Figure 24.3: The computation tree of the automaton from Figure 24.1 on the
word 01011. The root is the start state 0-even. Then the automaton reads a
0. It has two possibilities: either it moves to 0-odd or to 1-even. These two
possibilities are represented by the two children. There are four different
computations on 01011, two of them are accepting. These two accepting
configurations correspond to splitting 01011 either as ε ∈ A and 01011 ∈ B
or 0101 ∈ A and 1 ∈ B.
c Markus Bläser 2007–2015
176 24. Nondeterministic finite automata
Proof overview: By Lemma 24.6, we can assume that M does not have
any ε-transitions. Look at the computation tree of M on some word w of
length n. Each level i of the tree contains all the states that M can reach
after reading the first i symbols of w. If the nth level contains a state from
Qacc , then M accepts w. A deterministic automaton has to keep track of all
the states that are in one level. But it has to store them in one state. The
solution is to take P(Q), the power set of Q, as Q̂, the set of states of M̂ .
Proof. By Lemma 24.6, we can assume that M does not have any ε-
transitions. We set Q̂ = P(Q) and q̂0 = {q0 }. We define the transition
function δ̂ as follows:
[
δ̂(R, σ) = δ(r, σ) for all R ∈ Q̂ and σ ∈ Σ.
r∈R
(Note that δ is a function Q̂ × Σ → Q̂. It does map into the power set of Q
but not into the power set of Q̂. Thus M̂ is deterministic!) Finally, we set
Q̂acc = {R ∈ Q̂ | R ∩ Qacc 6= ∅}.
We now show by induction in the length of w ∈ Σ∗ that for all R ⊆ Q
and w ∈ Σ∗ , [
δ̂ ∗ (R, w) = δ ∗ (r, w).
r∈R
Above, the second equality follows from the induction hypothesis and the
third equality follows from the definition of δ̂. M̂ accepts w iff δ̂ ∗ (q0 , w) ∈
Q̂acc . M accepts w iff δ ∗ ({q0 }, w) ∩ Qacc 6= ∅. From the definition of Q̂acc it
follows that M̂ accepts w iff M accepts w. Thus L(M ) = L(M 0 ).
c Markus Bläser 2007–2015
24.3. Closure properties, part II 177
0
1
1 1
0 0 0
q0 q1 q2 q3
Exercise 24.3 Why does this approach not work for general nondetermin-
istic Turing machines?
Somehow one feels cheated when seeing the subset construction, but it is
correct. The deterministic finite automaton pays for being deterministic by a
huge increase in the number of states. If the nondeterministic automaton has
n states, the deterministic automaton has 2n states and there are examples
where this is (almost) neccessary. Here is one such example.
c Markus Bläser 2007–2015
178 24. Nondeterministic finite automata
M 0 . Formally,
δ(q, σ)
if q ∈ Q and σ 6= ε,
δ 0 (q, σ)
if q ∈ Q0 and σ 6= ε,
γ1 (q, σ) =
q00 if q ∈ Qacc and σ = ε,
undef. otherwise.
Exercise 24.4 Give an example that in the proof above, we need the extra
state r in N2 , i.e., show that it is not correct to create an ε-transition from
each accepting state to the start state and make the start state an accepting
state, too.
c Markus Bläser 2007–2015
25 Regular expressions
Above ∅ and ε are symbols that will represent the empty set and the set
{ε}, but they are not the empty set or the empty word themselves. But since
these underlined symbols usually look awkward, we will write ∅ instead of ∅
and ε instead of ε. It is usually clear from the context whether we mean the
symbols for the empty set and the empty word or the objects themselves.
25.1.2 Semantics
Definition 25.2 Let E be a regular expression. The language L(E) denoted
by E is defined inductively:
1. If E = ∅, then L(E) = ∅.
If E = ε, then L(E) = {ε}.
The symbol ∅ represents the empty set and the symbol ε represents the
set that contains solely the empty word. A symbol σ ∈ Σ represents the
set that contains the symbol itself. These three cases form the basis of the
definition. Next come the cases where E is composed of smaller expressions.
The operator “+” corresponds to the union of the corresponding languages,
the concatenation of the expression corresponds to the concatenation of the
179
180 25. Regular expressions
corresponding languages and the “ ∗ ”-operator stand for the Kleene closure.
Union, concatenation, and Kleene closure are also called the regular opera-
tions.
1
But notice that the languages of all valid source codes are usually not regular. They
are usually context-free, a concept that we will meet after Christmas. The use of “usually
context-free” is questionable here, in particular, Bodo disagrees. The set of source codes
of a “pure” and “simple” programming language like WHILE is context-free, the set of an
overloaded one like C++ or JAVA usually is not.
c Markus Bläser 2007–2015
25.2. Algebraic laws 181
7. E + E = E (union is idempotent),
10. (E ∗ )∗ = E ∗ ,
11. ∅∗ = ε,
12. ε∗ = ε.
Proof. We only prove the first, sixth, and tenth part. The rest is left as
an exercise.
For the first part, we use the fact that the union of sets is commutative:
L(E + F ) = L(E) ∪ L(F ) = L(F ) ∪ L(E) = L(F + E).
Part 6: For any two languages A and B, AB is the set of all words w = ab
with a ∈ A and b ∈ B. If one of A and B is empty, then no such word w
exists, thus AB = ∅ in this case.
Part 10: Let L := L(E). L∗ ⊆ (L∗ )∗ is clear, since (L∗ )∗ is the set of
all words that we get by concatenating an arbitrary number of words from
c Markus Bläser 2007–2015
182 25. Regular expressions
1. (E + F )∗ = (E ∗ F ∗ )∗ .
2. ε + EE ∗ = E ∗ .
3. (ε + E)∗ = E ∗ .
• Finally, we group all unions. Again, the order does not matter, but we
will do it from the left to the right by convention.
c Markus Bläser 2007–2015
25.3. Regular expressions characterize regular languages 183
Exercise 25.3 Construct finite automata that accept the languages ∅, {ε},
and {σ} for σ ∈ Σ.
L(Ei,j ) is exactly the set of all strings w such that δ ∗ (i, w) = j, i.e, if M starts
n
c Markus Bläser 2007–2015
184 25. Regular expressions
k such that
Proof. By induction on k, we construct expressions Ei,j
k ) = {w | δ ∗ (i, w) = j and for each prefix w 0 of w
L(Ei,j
(25.1)
with w0 6= ε and w0 6= w, δ ∗ (i, w0 ) ≤ k}
for all 1 ≤ i, j ≤ n and 0 ≤ k ≤ n.
Induction base: If k = 0, then M is not allowed to enter any intermediate
state when going from i to j. If i 6= j, then this means that M can only take
an arc directly from i to j. Let σ1 , . . . , σt be the labels of the arcs from i to j.
Then Ei,j 0 = σ + σ + · · · + σ with the convention that this means E 0 = ∅
1 2 t i,j
if t = 0, i.e., there are no direct arcs from i to j. If i = j, then let σ1 , . . . , σt
be the labels of all arcs from i to itself. Now we set Ei,i 0 = ε+σ +···+σ .
1 t
It is clear from the construction that Ei,j 0 fulfills (25.1).
Induction step: Assume that we have found regular expressions such that
k+1
(25.1) holds for some k. We have to construct the expressions Ei,j . A path
that goes from i to j and all states in between are from {1, . . . , k + 1} can go
from i to j with only going through states from {1, . . . , k}. For this, we know
already an regular expression, namely Ei,k k . Or, when going from i to j, we
Remark 25.7 The algorithm above does essentially the same as the Floyd–
k+1
Warshall algorithm for computing all-pair shortest paths. If we replace Ei,j =
k k k ∗ k k k k k
Ei,j + Ei,k+1 (Ek+1,k+1 ) Ek+1,j by di,j = min{di,j , di,k+1 + dk+1,j }, we can
compute the shortest distances between all pairs of nodes.
c Markus Bläser 2007–2015
26 The pumping lemma
A = {0n 1n | n ∈ N}.
If there were a finite automaton for A, then it would have to keep track of
the number of 0’s that it read so far and compare it with the number of
1’s. But in a finite number of states, you can only store a finite amount of
information. But M potentially has to be able to store an arbitrarily large
amount of information, namely n. (Warning! Never ever write this in an
exam! This is just an intuition. Maybe there is a way other than counting
to check whether the input is of the form 0n 1n —there is not, but you have
to give a formal proof.)
185
186 26. The pumping lemma
and sm+n0 +` ∈ Qacc . Since sm , . . . , sm+n0 are more than n states, there are
indices m ≤ j1 < j2 ≤ m + n0 such that sj1 = sj2 by the pigeon principle.
Let v = xyz with |x| = j1 − m and |y| = j2 − j1 > 0. Then δ ∗ (q0 , ux) = sj1 ,
δ ∗ (sj1 , y) = sj2 , and δ ∗ (sj2 , zw) ∈ Qacc . Since sj1 = sj2 ,
c Markus Bläser 2007–2015
26.3. How to decide properties of regular languages 187
B = {x ∈ {0, 1}∗ | the number of 0’s equals the number of 1’s in x}.
Proving that a language L is not regular via the pumping lemma can
be considered as a game between you and an adversary, for instance
your professor.
3. Your professor picks words x, y, z such that v = xyz and |y| >
0.
4. You pick an i ∈ N.
To show that a language L is not regular you can show that it does not
satisfy the condition of the pumping lemma. But you cannot prove that L
is regular by showing that L fulfills the condition of the pumping lemma.
There are non-regular languages that fulfill the condition of the pumping
lemma.
c Markus Bläser 2007–2015
188 26. The pumping lemma
c Markus Bläser 2007–2015
26.4. Further exercises 189
2. L = {0p | p is prime},
3. L = {ai bj ck | i + j = k}.
c Markus Bläser 2007–2015
190 26. The pumping lemma
c Markus Bläser 2007–2015
E The Myhill-Nerode theorem
and minimal automata
Remark E.3 If all states of M are reachable from the start state, i.e., for
all q ∈ Q, there is an x ∈ Σ∗ such that δ ∗ (q0 , x) = q, then index(≡M ) = |Q|.
191
192 E. The Myhill-Nerode theorem and minimal automata
1. L is regular.
L = {x ∈ Σ∗ | δ ∗ (q0 , x) ∈ Qacc }
[
= [x]M .
x: δ ∗ (q0 ,x)∈Qacc
Although the union is over infinitely many words x, only finitely many dis-
tinct equivalence classes appear in it.
2. =⇒ 3.: Let R be a right invariant equivalence relation with finite index
such that L = [x1 ]R ∪ · · · ∪ [xt ]R . We show that R is a refinement of ∼L , that
is, every equivalence class C of R is a subset of some equivalence class C 0 of
c Markus Bläser 2007–2015
E.1. The Myhill-Nerode theorem 193
xz ∈ L ⇐⇒ yz ∈ L for all z ∈ Σ∗ .
Thus x ∼L y.
3. =⇒ 1.: Given ∼L , we construct a deterministic finite automaton M =
(Q, Σ, δ, q0 , Qacc ) with L = L(M ). We set
• q0 = [ε]∼L ,
since the words in an equivalence classes of ∼L are either all in L or all not
in L.
Exercise E.1 Show that δ ∗ ([ε]∼L , x) = [x]∼L for all x ∈ Σ∗ in the “3. =⇒
1.”-part of the proof of the Myhill–Nerode theorem.
c Markus Bläser 2007–2015
194 E. The Myhill-Nerode theorem and minimal automata
Pumping lemma: often easy to apply but does not always work
2. i(q0 ) = q00 ,
3. i(Qacc ) = Q0acc .
Such a mapping i is called an isomorphism. The first condition says that the
following diagram commutes:
Together with the second and third condition, this means that two isomor-
phic automata are the same up to renaming the states.
c Markus Bläser 2007–2015
E.2. The minimal automaton 195
Proof. Part 1: If we combine the “1. =⇒ 2.”- and the “2. =⇒ 3.”-part
of the proof of the Myhill-Nerode theorem, we see that the relation ≡M 0 is
a refinement of ∼L . Thus |Q0 | ≥ index(≡M 0 ) ≥ index(∼L ).
Part 2: Assume now that |Q0 | = |Q|. This means that index(∼L ) =
index(≡M 0 ). Since ≡M 0 is a refinement of ∼L , this means that the equiv-
alence classes of both relations are the same and hence both equivalence
relations are the same. In particular, we can just simply write [x] for the
equivalence class of x in any of the two relations. Let
b : Q0 → Q
q 0 7→ [x] where x is chosen such that (δ 0 )∗ (q00 , x) = q 0
2. b(Q0acc ) = Qacc .
For the first statement, let q = [x], and let b−1 (q) = q 0 . Then b(δ 0 (q 0 , σ)) =
[xσ] by the definition of b. For the second statement, let q 0 ∈ Q0acc and let
(δ 0 )∗ (q00 , x) = q 0 . Then b(q 0 ) = [x]. x ∈ L(M 0 ) = L and thus [x] ∈ Qacc .
This argument can be reversed, and thus we are done.
Example E.9 Consider the language L = L(0∗ 10∗ 10∗ ). L is the language
of all words in {0, 1}∗ that contain exactly two 1’s. We claim that ∼L has
four equivalence classes:
c Markus Bläser 2007–2015
196 E. The Myhill-Nerode theorem and minimal automata
1
0 0 0 0
1 1 1
[0] [1] [11] [111]
Figure E.1: The minimal automaton for L(0∗ 10∗ 10∗ ). It has four states that
count the number of 1’s.
c Markus Bläser 2007–2015
E.4. Further exercises 197
in the transition diagram that come into q now point to q 0 instead (i.e., if
δ(p, σ) = q, then δ(p, σ) = q 0 in the new automaton). The new automaton
has one state less, and we can go on until we do not find a pair of equivalent
states.
But there is a much faster algorithm. Basically, when we have such a z
that proves that q and q 0 are not equivalent, then all the pairs we go through
when reading z are not equivalent, too. Algorithm 1 constructs these pairs
backwards, starting from these pairs that have exactly one state in Qacc and
one state not in Qacc .
c Markus Bläser 2007–2015
198 E. The Myhill-Nerode theorem and minimal automata
c Markus Bläser 2007–2015
27 Grammars
In the example above, Satz would be the start variable. Subjekt, Prädikat,
. . . would be variables. The letters “d”, “e”, “r”, . . . are terminal symbols.
199
200 27. Grammars
Example 27.4 Let G1 = ({S}, {0, 1}, P1 , S) where P1 consists of the pro-
ductions
S→ε
S → 0S1
It is quite easy (but a little tedious) to see that L(G2 ) is the set of all WHILE
programs (now over a finite alphabet).2 From N + , we can derive all decimal
representations of natural numbers (without leading zeros). From V , we can
derive all variable names. From W , we can derive all WHILE programs.
The first three productions produce the simple statements, the other two the
while loop and the concatenation.
S → 0EZ | 0SEZ
ZE → EZ
0E → 01
1E → 11
1Z → 12
2Z → 22
2
Because of the simple structure of WHILE programs, we do not even need whitespaces
to separate the elements. Feel free to insert them if you like.
c Markus Bläser 2007–2015
27.1. The Chomsky hierachy 201
2. Show that whenever S ⇒∗G3 w, then the number of 0’s in w equals the
number of E’s plus 1’s in w and it also equals the number of Z’s plus
2’s in w.
c Markus Bläser 2007–2015
202 27. Grammars
to a linear chain, and “right” because the variable stand at the right-hand
end of the productions. Type-3 grammars are called regular grammars, too.
Theorem 27.12 explains this: type-3 languages are exactly the regular lan-
guages. The grammar that we get from the grammar in Example 27.6 by
only taking the variables {N, N + , Z, Z + }, these productions that use the
variables {N, N + , Z, Z + }, and the start symbol N + generates the digital
representations without leading zeros of all natural numbers. It is “almost”
right linear. The variables Z and Z + are just place holders for a bunch of
terminals. The grammar gets right linear if we replace the productions of
the type N → ZN and N + → Z + N by productions of the form N → 0N ,
N → 1N , etc.
S → ε | S0
S 0 → 01 | 0S 0 1
is a type-2 grammar for {0n 1n | n ∈ N}. We will later see a general way
to get rid of productions of the form A → ε in an “almost context-free”
grammar. (Note that this is not possible for context-sensitive grammars!)
In the same way, type-2 languages are a subset of the type-1 languages.
The language {0n 1n 2n | n ≥ 1} is context-sensitive, as shown in Exam-
ple 27.7, but we will see soon that it is not context-free. Hence this inclusion
is also strict.
The set of all type-0 languages equals RE—a fact that we will not prove
here. On the other hand, CSL ⊆ REC: Given a string w ∈ Σ∗ , we can
c Markus Bläser 2007–2015
27.2. Type-3 languages 203
generate all derivations for words of length |w|, because once we reached a
sentence of length > |w| in the derivation, we can stop, since productions of
context-sensitive grammars can never shorten a sentence. Thus the type-1
languages are a strict subset of the type-0 languages.
and in addition
c Markus Bläser 2007–2015
204 27. Grammars
Although all these concepts describe regular languages, they have different
properties: nondeterministic finite automata often have much fewer states
than deterministic ones for the same language. Deciding whether two de-
terministic finite automata recognize the same language is easy whereas
this is a hard problem for regular expressions (we will see this later on),
etc.
c Markus Bläser 2007–2015
28 Context-free grammars
S → ε | 0S1
S0 → ε | S
S → 01 | 0S1
E → E ∗ E | E + E | (E) | x
It generates all arithmetic expressions with the operations ∗ and + over the
variable1 x.2 A word w ∈ Σ∗ is in L(G) if S ⇒∗ w.3 A derivation is a
witness for the fact that S ⇒∗ w, i.e., a sequence of sentences such that
S ⇒ w1 ⇒ w2 ⇒ . . . ⇒ wt ⇒ w. Usually, a word w has many derivations.
Here are two examples for the word x + x ∗ x in the example above:
205
206 28. Context-free grammars
In the first derivation, we always replace the leftmost variable. Such deriva-
tions are called leftmost derivations. In the second derivation, we always
replace the rightmost variable. Such derivations are called, guess what, right-
most derivations. Although the two derivations are different, they are not
“really different”, they correspond to the same derivation tree.
1. A derivation tree (or parse tree) is an ordered tree with a node labeling
such that:
Figure 28.1 shows the derivation tree that corresponds to the two deriva-
tions (28.1) and (28.2). The leftmost derivation (28.1) is obtained by doing
a depth-first search and visiting the children from left to right, the rightmost
derivation (28.2) is obtained by visiting them from right to left. In general,
each derivation tree corresponds to exactly one left derivation and exactly
one right derivation.
But there is another derivation tree for x+x∗x. It is shown in Figure 28.2.
Having several derivation trees for the same word is in general a bad thing.
The derivation tree in Figure 28.2 is unnatural, because it does not re-
spect the usual precedence of the operators “∗” and “+”. But there is an
5
If t > 1, then every xτ ∈ V ∪ Σ. If t = 1, then x1 = ε is possible. In this case, A → ε
is a production of P .
6
There are far too many names for this.
c Markus Bläser 2007–2015
28.1. Derivation trees and ambiguity 207
E + E
x E ∗ E
x x
E ∗ E
E + E x
x x
c Markus Bläser 2007–2015
208 28. Context-free grammars
unambiguous grammar:
E →T |T +E
T →F |F ∗T
F → x | (E)
Exercise 28.1 Show that the language from Theorem 28.3 is context-free.
c Markus Bläser 2007–2015
28.2. Elimination of useless variables 209
c Markus Bläser 2007–2015
210 28. Context-free grammars
S → AB | 0
A→0
We cannot derive any terminal word from B, hence we remove the production
S → AB. Now we cannot derive any sentence of the form xAy from S, hence
we remove the rule A → 0, too. If we had reversed the order of the two steps,
then we would not have removed anything in the first step and only the rule
S → AB in the second step. The production A → 0 would not have been
removed.
Exercise 28.2 Show that the Algorithms 2 and 3 are indeed correct.
c Markus Bläser 2007–2015
29 Chomsky normal form
In this chapter, we show a formal form for context-free grammars, the so-
called Chomsky normal form. On the way, we also see how to eliminate
ε-productions.
211
212 29. Chomsky normal form
c Markus Bläser 2007–2015
29.3. The Chomsky normal form 213
A → BC
or
A→σ
with A, B, C ∈ V and σ ∈ Σ.
Proof. By the result of the previous sections, we can assume that G does
not contain any ε-productions and chain rules. Thereafter, L(G) does not
c Markus Bläser 2007–2015
214 29. Chomsky normal form
A → A1 C 2
C 2 → A2 C 3
...
Ct−2 → At−2 Ct−1
Ct−1 → At−1 At
Exercise 29.4 Prove that the grammar G0 constructed in the proof of The-
orem 29.6 indeed fulfills L(G0 ) = L(G) \ {ε} (and even L(G0 ) = L(G), since
we assumed that G does not contain any ε-productions).
Exercise 29.6 Show that for every context-free grammar G, there is a context-
free grammar H in Greibach normal form such that L(G) \ {ε} = L(H).
(Hint: First convert into Chomsky normal form.)
c Markus Bläser 2007–2015