Operating System
Operating System
Konrad Slind
slind@cs.utah.edu
1
Contents
1 Introduction 5
1.1 Why Study Theory? . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Computability . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Context-Free Grammars . . . . . . . . . . . . . . . . . 8
1.2.3 Automata . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Background Mathematics 10
2.1 Some Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Some Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Alphabets and Strings . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.1 Review of proof terminology . . . . . . . . . . . . . . 31
2.5.2 Review of methods of proof . . . . . . . . . . . . . . . 32
2.5.3 Some simple proofs . . . . . . . . . . . . . . . . . . . . 34
2.5.4 Induction . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Models of Computation 45
3.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Example Turing Machines . . . . . . . . . . . . . . . . 52
3.1.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.3 Coding and Decoding . . . . . . . . . . . . . . . . . . 64
3.1.4 Universal Turing machines . . . . . . . . . . . . . . . 67
3.2 Register Machines . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 The Church-Turing Thesis . . . . . . . . . . . . . . . . . . . . 72
2
3.3.1 Equivalence of Turing and Register machines . . . . . 74
3.4 Recognizabilty and Decidability . . . . . . . . . . . . . . . . . 78
3.4.1 Decidable problems about Turing machines . . . . . . 80
3.4.2 Recognizable problems about Turing Machines . . . 81
3.4.3 Closure Properties . . . . . . . . . . . . . . . . . . . . 83
3.5 Undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5.1 Diagonalization . . . . . . . . . . . . . . . . . . . . . . 85
3.5.2 Existence of Undecidable Problems . . . . . . . . . . 87
3.5.3 Other undecidable problems . . . . . . . . . . . . . . 89
3.5.4 Unrecognizable languages . . . . . . . . . . . . . . . . 95
4 Context-Free Grammars 97
4.1 Aspects of grammar design . . . . . . . . . . . . . . . . . . . 105
4.1.1 Proving properties of grammars . . . . . . . . . . . . 112
4.2 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3 Algorithms on CFGs . . . . . . . . . . . . . . . . . . . . . . . 117
4.3.1 Chomsky Normal Form . . . . . . . . . . . . . . . . . 118
4.4 Context-Free Parsing . . . . . . . . . . . . . . . . . . . . . . . 123
4.5 Grammar Decision Problems . . . . . . . . . . . . . . . . . . 129
4.6 Push Down Automata . . . . . . . . . . . . . . . . . . . . . . 130
4.7 Equivalence of PDAs and CFGs . . . . . . . . . . . . . . . . . 139
4.7.1 Converting a CFG to a PDA . . . . . . . . . . . . . . . 139
4.7.2 Converting a PDA to a CFG . . . . . . . . . . . . . . . 142
4.8 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5 Automata 145
5.1 Deterministic Finite State Automata . . . . . . . . . . . . . . 146
5.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.1.2 The regular languages . . . . . . . . . . . . . . . . . . 152
5.1.3 More examples . . . . . . . . . . . . . . . . . . . . . . 153
5.2 Nondeterministic finite-state automata . . . . . . . . . . . . . 156
5.3 Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.3.1 The product construction . . . . . . . . . . . . . . . . 161
5.3.2 Closure under union . . . . . . . . . . . . . . . . . . . 164
5.3.3 Closure under intersection . . . . . . . . . . . . . . . 164
5.3.4 Closure under complement . . . . . . . . . . . . . . . 164
5.3.5 Closure under concatenation . . . . . . . . . . . . . . 165
5.3.6 Closure under Kleene star . . . . . . . . . . . . . . . . 166
3
5.3.7 The subset construction . . . . . . . . . . . . . . . . . 167
5.4 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . 172
5.4.1 Equalities for regular expressions . . . . . . . . . . . . 175
5.4.2 From regular expressions to NFAs . . . . . . . . . . . 177
5.4.3 From DFA to regular expression . . . . . . . . . . . . 180
5.5 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.6 Decision Problems for Regular Languages . . . . . . . . . . . 194
5.6.1 Is a string accepted/generated? . . . . . . . . . . . . . 195
5.6.2 L(M) = ∅? . . . . . . . . . . . . . . . . . . . . . . . . . 196
5.6.3 L(M) = Σ∗ ? . . . . . . . . . . . . . . . . . . . . . . . . 198
5.6.4 L(M1 ) ∩ L(M2 ) = ∅? . . . . . . . . . . . . . . . . . . . 198
5.6.5 L(M1 ) ⊆ L(M2 )? . . . . . . . . . . . . . . . . . . . . . 198
5.6.6 L(M1 ) = L(M2 )? . . . . . . . . . . . . . . . . . . . . . 199
5.6.7 Is L(M) finite? . . . . . . . . . . . . . . . . . . . . . . . 199
5.6.8 Does M have as few states as possible? . . . . . . . . 199
4
Chapter 1
Introduction
5
2. Theory gives exposure to ideas that permeate Computer Science:
logic, sets, automata, grammars, recursion. Familiarity with these
concepts will make you a better computer scientist.
6
8. Theory gives a nice setting for honing your problem solving skills.
You probably haven’t gotten smarter since you entered university
but you have learned many subjects and—more importantly—you
have been trained to solve problems. The belief is that improving
your problem-solving ability through practice will help you in your
career. Theory courses in general, and this one in particular, provide
good exposure to a wide variety of problems, and the techniques you
learn are widely applicable.
1.2 Overview
Although the subject matter of this course is models of computation, we need
a framework—some support infrastructure—in which to work
1.2.1 Computability
In this section, we will start by considering a classic model of computation—
that of Turing machines (TMs). Unlike the other models we will study, a TM
can do everything a modern computer can do (and more). The study of
’fully fledged’, or unrestricted, models of computation, such as TMs, is
known as computability.
We will see how to program TMs and, through experience, convince
ourselves of their power, i.e., that every algorithm can be programmed on
7
a TM. We will also have a quick look at Register Machines, which are quite
different from TMs, but of equivalent power. This leads to a discussion of
‘what is an algorithm’ and the Church-Turing thesis. Then we will see a
limitative result: the undecidability of the halting problem. This states that
it is not possible to mechanically determine whether or not an arbitrary
program will halt on all inputs. At the time, this was a very surprising
result. It has a profound influence on Computer Science since it can be
leveraged to show that all manner of useful functionality that one might
wish to have computers provide is, in fact, theoretically impossible.
8
1.2.3 Automata
Automata (singular: automaton) are a simple but very important class of
computing devices. They are heavily used in compilers, text editors, VLSI
circuits, Artificial Intelligence, databases, and embedded systems.
We will introduce and give a precise definition of finite state automata
(FSAs) before investigating their extension to non-deterministic FSAs (NFAs).
It turns out that FSAs are equivalent to NFAs, and we will prove this. We
will discuss the languages recognized by FSAs, the so-called regular lan-
guages.
Automata are used to recognize, or accept, strings in a language. An
alternative viewpoint is that of regular expressions, which generate strings.
Regular expressions are equivalent to FSAs, and we will prove this.
Finally, we will prove the pumping lemma for regular languages. This,
along with the undecidability of the halting problem, is another of what
might be called negative, or limitative theorems, which show that there
are some aspects of computation that are not captured by the model be-
ing considered. In other words, they show that the model is too weak to
capture important notions.
Historical Remark. The history of the development of models of com-
putation is a little bit odd, because the most powerful models were in-
vestigated first. The work of Turing (Turing machines), Church (lambda
calculus), Post (Production Systems), and Goedel (recursive functions) on
computability happened largely in the 1930’s. These mathematicians were
trying to nail down the notion of algorithm, and came up with quite differ-
ent explanations. They were all right! Or at least that is the claim of the
Church-Turing Thesis, an important philosophical statement, which we will
discuss.
In the 1940’s restricted notions of computability were studied, in or-
der to give mathematical models of biological behaviour, such as the fir-
ing of neurons. These led to the development of automata theory. In the
1950’s, formal grammars and the notion of context-free grammars (and
much more) were invented by Noam Chomsky in his study of natural lan-
guage.
9
Chapter 2
Background Mathematics
This should be review from cs2100, but we may be rusty after the summer
layoff. We need some basic amounts of logic, set theory, and proof, as well
as a smattering of other material.
A∧B conjunction
A∨B disjunction
A⇒B (material )implication
A iff B equivalence
¬A negation
∀x. A universal quantification
∃x. A existential quantification
After syntax we have semantics. The meaning of a formula is expressed
in terms of truth.
10
• A iff B is true iff A and B have the same truth value.
• ¬A is true iff A is false
• ∀x. A is true iff A is true for all possible values of x.
• ∃x. A is true iff A is true for at least one value of x.
Note the recursion: the truth value of a formula depends on the truth
values of its sub-formulas. This prevents the above definition from being
circular. Also, note that the apparent circularity in defining iff by using ‘iff’
is only apparent—it would be avoided in a completely formal definition.
Remark. The definition of implication can be a little confusing. Implication
is not ‘if-then-else’. Instead, you should think of A ⇒ B as meaning ‘if A
is true, then B must also be true. If A is false, then it doesn’t matter what
B is; the value of A ⇒ B is true’.
Thus a statement such as 0 < x ⇒ x2 ≥ 1 is true no matter what the
value of x is taken to be (supposing x is an integer). This works well with
universal quantification, allowing the statement ∀x. 0 < x ⇒ x2 ≥ 1 to be
true. However, the price is that some plausibly false statements turn out
to be true; for example: 0 < 0 ⇒ 1 < 0. Basically, in an absurd setting,
everything is held to be true.
Example 1. Suppose we want to write a logical formula that captures the
following well-known saying:
You can fool all of the people some of the time, and you can fool
some of the people all of the time, but you can’t fool all of the people
all of the time.
We start by letting the atomic proposition F (x, t) mean ‘you can fool x at
time t’. Then the following formula
(∀x.∃t. F (x, t)) ∧
(∃x.∀t. F (x, t)) ∧
¬(∀x.∀t. F (x, t))
precisely captures the statement. Notice that the first line asserts that each
person could be fooled at a different time. If one wanted to express that
there is a specific time at which everyone gets fooled, it would be
∃t. ∀x. F (x, t) .
11
Example 2. What about
Everybody loves my baby, but my baby don’t love nobody but me.
Let the atomic proposition L(x, y) mean ‘x loves y’ and let b mean ‘my
baby’ and let me stand for me. Then the following formula
(∀x. L(x, b)) ∧ L(b, me) ∧ (∀x. L(b, x) ⇒ (x = me))
precisely captures the statement. It is interesting to pursue what this means,
since if everybody loves b, then b loves b. So I am my baby, which may be
troubling for some.
Example 3 (Lewis Carroll). From the following assertions
1. There are no pencils of mine in this box.
2. No sugar-plums of mine are cigars.
3. The whole of my property, that is not in the box, consists of cigars.
we can conclude that no pencils of mine are sugar-plums. Transcribed to
logic, the assertions are
∀x. inBox (x) ⇒ ¬Pencil (x)
∀x. sugarPlum(x) ∧ Mine(x) ⇒ ¬Cigar (x)
∀x. Mine(x) ∧ ¬inBox (x) ⇒ Cigar(x)
From (1) and (3) we can conclude All my pencils are cigars. Now we can use
this together with (2) to reach the conclusion
∀x. Pencil(x) ∧ Mine(x) ⇒ ¬sugarPlum(x).
These examples feature somewhat whimsical subject matter. In the
course we will be using symbolic logic when a high level of precision is
needed.
12
• B = {true, false}. The booleans, also known as the bit values. In
situations where no confusion with numbers is possible, one could
have B = {0, 1}.
Note. Z, Q, R, and C will not be much used in the course, although Q and
R will feature in one lecture.
Note. Some mathematicians think that N starts with 1. We will not adopt
that approach in this course!
There is a rich collection of operations on sets. Interestingly, all these
operations are ultimately built from membership.
13
• P ⊆Q∧Q⊆R ⇒P ⊆R
There is also a useful notion of proper subset: R ⊂ S means that all ele-
ments of R are in S, but S has one or more extras. Formally, R ⊂ S iff R ⊆
S ∧ R 6= S.
It is a common error to confuse ∈ and ⊆. For example, x ∈ {x, y, z}, but
that doesn’t allow one to conclude x ⊆ {x, y, z}. However, it is true that
{x} ⊆ {x, y, z}
Singleton sets A set with one element is called a singleton. Note well that
a singleton set is not the same as its element: ∀x. x 6= {x}, even though
x ∈ {x}, for any x.
Universe and complement Often we work in a setting where all sets are
subsets of some fixed set U (sometimes called the universe). In that case we
can write S to mean U − S. For example, if our universe is N, and Even is
the set of even numbers, then Even is the set of odd numbers.
14
Example 4. Let us take the Flintstone characters as our universe.
Then we know
∅= F ∩R
because the two families are disjoint. Also, we can see that
F ∪ R = {Mr . Slate}
Empty set The symbol ∅ stands for the empty set: the set with no ele-
ments. The notation {} may also be used. The empty set acts as an alge-
braic identity for several operations:
∅∪S = S
∅∩S = ∅
∅⊆S
∅−S =∅
S−∅=S
∅=U
15
Set comprehension This is also known as set builder notatation. The no-
tation is
{ | }.
| {z } | {z }
template condition
This denotes the set of all items matching the template, which also meet
the condition. This, combined with logic, gives a natural way to concisely
describe sets:
{x | x < 1} = {0}
{x | x > 1} = {2, 3, 4, 5, . . .}
{x | x ∈ R ∧ x ∈ S} = R∩S
{x | ∃y.x = 2y} = {0, 2, 4, 6, 8, . . .}
{x | x ∈ U ∧ x is male} = {Fred, Barney, BamBam, Mr . Slate}
Indexed union and intersection It sometimes happens that one has a set
of sets
{ { . . . }, . . . , { . . . } }
| {z } | {z }
S1 Sn
S 1 ∪ . . . ∪ Sn
S 1 ∩ . . . ∩ Sn
T
i∈I Si = {x | ∀i. i ∈ I ⇒ x ∈ Si }
The generality obtained from using index sets allows one to take the
bigunion of an infinite set of sets.
16
Power set The set of all subsets of a set S is known as the powerset of S,
written variously as P(S), Pow (S), or 2S .
Pow (S) = {s | s ⊆ S}
For example,
Pow {1, 2, 3} = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
Product of sets R×S, the product of two sets R and S, is made by pairing
each element of R with each element of S. Using set-builder notation, this
can be concisely expressed:
R × S = {(x, y) | x ∈ R ∧ y ∈ S}.
Example 5.
(F red, Barney), (F red, Betty), (F red, BamBam),
(W ilma, Barney), (W ilma, Betty), (W ilma, BamBam),
F ×R =
(P ebbles, Barney), (P ebbles, Betty), (P ebbles, BamBam),
(Dino, Barney), (Dino, Betty), (Dino, BamBam)
In general, the size of the product of two sets will be the product of the
sizes of the two sets.
S1 × S2 × . . . × Sn = {(a1 , . . . an ) | a1 ∈ S1 ∧ . . . ∧ an ∈ Sn }
An n-tuple (a1 , . . . an ) is formally written as (a1 , (a2 , . . . , (an−1 , an ) . . .)), but,
by convention, the parentheses are dropped. For example, (a, b, c, d) is the
conventional way of writing the 4-tuple (a, (b, (c, d))). Unlike sets, tuples
are ordered. Thus a is the first element of the tuple, b is the second, c is the
17
third, and d is the fourth. Equality on n-tuples is captured by the following
property:
Size of a set The size of a set, also known as its cardinality, is just the
number of elements in the set. It is common to write |A| to denote the
cardinality of set A.
A∪B = B∪A
A∩B = B∩A
A∪A = A
A∩A = A
A∪∅ = A
A∩∅ = ∅
The following identities are associative, distributive, and absorptive
properties:
18
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∪ (A ∩ B) = A
A ∩ (A ∪ B) = A
The following identities are the so-called De Morgan laws, plus a few
others.
A∪B = A∩B
A∩B = A∪B
A = A
A∩A = ∅
2.2.1 Functions
Informally, a function is a mechanism that takes an input and gives an
output. One can also think of a function as a table, with the arguments
down one column, and the results down another. In fact, if a function is
finite, a table can be a good way to present it. Formally however, a function
f is a set of ordered pairs with the property
(a, b) ∈ f ∧ (a, c) ∈ f ⇒ b = c
This just says that a function is, in a sense, univocal, or deterministic:
there is only one possible output for an input. Of course, the notation
f (a) = b is preferred over (a, b) ∈ f . The domain and range of a function f
are defined as follows:
19
A common notation for specifying that function f has domain A and
range B is the following:
f :A→B
Another common usage is to say ‘a function over (or on) a set’. This just
means that the function takes its inputs from the specified set. As a trivial
example, consider f , a function over N, described by f (x) = x + 2.
20
3. the algorithm runs for a very very very long time before returning
an answer.
The second and third kind of partiality are similar but essentially dif-
ferent. Pragmatically, there is no difference between a program that will
never return and one that will return after a trillion years. However, the-
oretically there is a huge difference: instances of the second kind are truly
partial functions, while instances of the third are still total functions. A
course in computational complexity explores the similarities and differ-
ences between the options.
If a partial function f is defined at an argument a, then we write f (a) ↓.
Otherwise, f (a) is undefined and we write f (a) ↑.
Believe it or not. ∅ is a function. It’s the nowhere defined function.
21
√
Example 8 (Square root in N). Let n denote the number x ∈ N such that
x2 ≤ n and (x + 1)2 > n.
√
n n
0 0
1, 2, 3 1
4, 5, 6, 7, 8 2
9, 10, 11, 12, 13, 14, 15 3
16 4
.. ..
. .
∀n. n ∈ N ⇒ n2 ∈ N.
A counter-example: N is not closed under subtraction: 2−3 ∈
/ N (unless
subtraction is somehow re-defined so that p − q = 0 when p < q).
The ‘closure’ terminology can be used for functions taking more than
one argument; thus, for example, N is closed under +.
Examples
Σ = {0, 1}
Σ = {a, b, c, d}
Σ = {f oo, bar}
22
Non-examples
• sets having symbols with shared substructure, e.g., {foo, foobar}, since
this can lead to nasty, horrible ambiguity.
2.3.1 Strings
A string over an alphabet Σ is a finite sequence of symbols from Σ. For
example, if Σ = {0, 1}, then 000 and 0100001 are strings over Σ. The strings
provided in most programming languages are over the alphabet provided
by the ASCII characters (and more extensive alphabets, such as Unicode,
are common).
NB. Authors are sometimes casual about representing operations on strings:
for example, string construction and string concatenation are both written
by adjoining blocks of text. This is usually OK, but can be ambiguous: if
Σ = {o, f, a, b, r} we could write the string foobar, or f · o · o · b · a · r (to be
really precise). Similarly, if Σ = {foo, bar}, then we could also write foobar,
or foo · bar .
The empty string There is a unique string ε which is the empty string.
There is an analogy between ε for strings and 0 for N. For example, both
are very useful as identity elements.
NB. Some authors use Λ to denote the empty string.
NB. The empty string is not a symbol, it’s a string with no symbols in it.
Therefore ε can’t appear in an alphabet.
len(ε) = 0
len(foobar) = 6
but len(foobar) = 2, if Σ = {foo, bar }.
NB. Unlike some programming languages, strings are not terminated with
an invisible ε symbol.
23
Concatentation The concatenation of two strings x and y just places them
next to each other, giving the new string xy. If we needed to be precise, we
could write x · y. Some properties of concatenation:
(aab)3 = aabaabaab
(aab)1 = aab
(aab)0 = ε
x0 = ε
xn+1 = xn · x
count(0, 0010) = 3
count(1, 000) = 0
count(0, ε) = 0
The formal definition of count is by recursion:
count(a, ε) = 0
count(a, b · t) = if a = b then count(a, t) + 1 else count(a, t)
24
Prefix A string x is a prefix of string y iff there exists w such that y = x · w.
For example, abaab is a prefix of abaababa. Some properties of prefix:
Pitfalls Here are some common mistakes people make when first con-
fronted with sets and strings. All the following are true, but surprise some
students.
• ∅
|{z} 6= ε
|{z} 6= {ε}
|{z}
empty set empty string singleton set holding empty string
The empty set has no elements in it. The empty string has no
characters in it. So . . . the empty set is the same as the empty string.
The first two assertions are true; however, the conclusion is false. Al-
though the length of ε is 0, and the size of ∅ is also 0, they are two quite
different things.
25
2.4 Languages
So much for strings. Now we discuss sets of strings, also called languages.
Languages are one of the important themes of the course.
We will start our discussion with Σ∗ , the set of all strings over alpha-
bet Σ. The set Σ∗ contains all strings that can be generated by iteratively
concatenating symbols from Σ, any number of times.
Σ∗ = {|{z}
ε , a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, aab, aac, . . .}
NB
Union
{a, b, ab} ∪ {a, c, ba} = {a, b, ab, c, ba}
Intersection
{a, b, ab} ∩ {a, c, ba} = {a}
26
Complement Usually, Σ∗ is the universe that a complement is taken with
respect to. Thus
A = {x ∈ Σ∗ | x ∈ / A}
For example
AR = {xR | x ∈ A}
AB = {xy | x ∈ A ∧ y ∈ B}
or using the ‘dot’ notation to emphasize that we are concatenating (note
the overloading of ·):
A · B = {x · y | x ∈ A ∧ y ∈ B}
Example 11. {a, ab} {b, ba} = {ab, abba, aba, abb}
Example 12. Two languages L1 and L2 such that L1 · L2 = L2 · L1 and L1 is
not a subset of L2 and L2 is not a subset of L1 and neither language is {ε}
are the following:
L1 = {aa} L2 = {aaa}
Notes
• In general AB 6= BA. Example: {a}{b} =
6 {b}{a}.
• A · ∅ = ∅ = ∅ · A.
• A · {ε} = A = {ε} · A.
• A · ε is nonsense—it’s syntactically malformed.
27
Iterated language concatenation Well, if we can concatenate two lan-
guages, then we can certainly repeat this to concatenate any number of
languages. Or concatenate a language with itself any number of times.
The operation An denotes the concatenation of A with itself n times. The
formal definition is
A0 = {ε}
n+1
A = A · An
Another way to characterize this is that a string is in An if it can be split
into n pieces, each of which is in A:
x ∈ An iff ∃w1 . . . wn . w1 ∈ A ∧ . . . ∧ wn ∈ A ∧ (x = w1 · · · wn ).
A · A · A · {ε} = A · A · A
= A · {aa, aab, aba, abab}
= {a, ab} · {aa, aab, aba, abab}
= {aaa, aaba, abaa, ababa, aaab, aabab, abaab, ababab}
S
A∗ = n∈N A
n
= A0 ∪ A1 ∪ A2 ∪ . . .
= {x | ∃n. x ∈ An }
= {x | x is the concatenation of zero or more strings from A}
28
[
A+ = An = A1 ∪ A2 ∪ A3 ∪ . . .
n>0
Example 14.
A = {a, ab}
A∗ = A0 ∪ A1 ∪ A2 ∪ . . .
= {ε} ∪ {a, ab} ∪ {aa, aab, aba, abab} ∪ . . .
+
A = {a, ab} ∪ {aa, aab, aba, abab} ∪ . . .
• L ⊆ L∗ .
29
Summary of useful properties of languages
Since languages are just sets of strings, the identities from Section 2.2 may
freely be applied to language expressions. Beyond those, there are a few
others:
A · (B ∪ C) = (A · B) ∪ (A · C)
(B ∪ C) · A = (B · A) ∪ (C · A)
A · (B0 ∪ B1 ∪ B2 ∪ . . .) = (A · B0 ) ∪ (A · B1 ) ∪ (A · B2 ) . . .
(B0 ∪ B1 ∪ B2 ∪ . . .) · A = (B0 · A) ∪ (B1 · A) ∪ (B2 · A) ∪ . . .
A∗∗ = (A∗ )∗ = A∗
A∗ · A∗ = A∗
A∗ = {ε} ∪ A+
∅∗ = {ε}
2.5 Proof
Now we discuss proof. In this course we will go through many proofs;
indeed, in order to pass this course, you will have to write correct proofs
of your own. This raises the weighty question
What is a proof?
which has attracted much philosophical discussion over the centuries. Here
are some (only a few) answers:
30
• A proof is an argument that convinces a machine. If humans cause so
much trouble, let’s banish them in favour of machines! After all,
machines have the advantage of being faster and more reliable than
humans. In the late 19th and early 20th Centuries, philosophers and
mathematicians developed the notion of a formal proof, one which
is a chain of extremely simple reasoning steps expressed in a rig-
orously circumscribed language. After computers were invented,
people realized that such proofs could be automatically processed:
a computer program could analyze a purported proof and render a
yes/no verdict, simply by checking all the reasoning steps.
This approach is quite fruitful (it’s my research area) but the proofs
are far too detailed for humans to deal with: they can take megabytes
for even very simple proofs. In this course, we are after proofs that
are readable but still precise enough that mistakes can easily be caught.
divides(x, y) = ∃z.x ∗ z = y
prime(n) = 1 < n ∧ ∀k. divides(k, n) ⇒ k = 1 ∨ k = n
31
Conjecture An unproved proposition. A conjecture has the connotation
that the author has attempted—but failed—to prove it.
To prove A iff B: There are three ways to deal with this (the first one is
most common):
• prove A ⇒ B and also B ⇒ A
• prove A ⇒ B and also ¬A ⇒ ¬B
32
To prove A ∨ B: Rarely happens. Select which-ever of A, B seems to be
true and prove it.
u<6⇒u<5
for an arbitrary u.
Not all universal statements are proved in this way. In particular, when
the quantification is over numbers (or other structured data, such as strings),
one often uses induction or case analysis. We will discuss these in more
depth shortly.
To prove ∃x. A: Supply a witness for x that will make A true. For ex-
ample, if we needed to show ∃x. even(x) ∧ prime(x), we would give the
witness 2 and continue on to prove even(2) ∧ prime(2).
33
or ML executes. This correspondence was recognized and made mathe-
matically precise in the late 1980’s by Tim Griffin, then a PhD student at
Cornell.
∀x. x ∈ P iff x ∈ Q
We will use the latter in the following proof, which will exercise some basic
definitions.
x ∈ A ∩ B iff x ∈ (U − (A ∩ B))
iff x∈U ∧x∈ / (A ∩ B)
iff x ∈ U ∧ (x ∈
/ A∨x∈ / B)
iff (x ∈ U ∧ x ∈
/ A) ∨ (x ∈ U ∧ x ∈
/ B)
iff (x ∈ U − A) ∨ (x ∈ U − B)
iff (x ∈ A) ∨ (x ∈ B)
iff x ∈ (A ∪ B)
34
ε ∈ A. Then A = {ε} ∪ A, so
A+ = A1 ∪ A2 ∪ . . .
= ({ε} ∪ A) ∪ A2 ∪ . . .
= A0 ∪ A1 ∪ A2 ∪ . . .
= A∗
ε∈
/ A. Then every string in A has length greater than 0, so every string
in A+ has length greater than 0. But ε, which has length 0, is in A∗ ,
so A∗ 6= A+ . [Merely noting that A 6= {ε} ∪ A and concluding that
A∗ 6= A+ isn’t sufficient, because you have to make the argument
that ε doesn’t somehow get added in the A2 ∪ A3 ∪ . . ..]
Example 20. Let A = {w ∈ {0, 1}∗ | w has an unequal number of 0s and 1s}.
Prove that A∗ = {0, 1}∗ .
Proof. We show that A∗ ⊆ {0, 1}∗ and {0, 1}∗ ⊆ A∗ . The first assertion is
easy to see, since any set of binary strings is a subset of {0, 1}∗ . For the
second assertion, the theorem in Example 16 lets us reduce the problem to
showing that {0, 1} ⊆ A, which is true, since 0 ∈ A and 1 ∈ A.
Example 21. Prove that L∗ = L∗ · L∗ .
Proof. Assume x ∈ L∗ . We need to show that x ∈ L∗ · L∗ , i.e., that there
exists u, v such that x = u · v and u ∈ L∗ and v ∈ L∗ . By taking u = x and
v = ε we satisfy the requirements and so x ∈ L∗ · L∗ , as required.
Contrarily, assume x ∈ L∗ · L∗ . Thus there exists u, v such that x = uv
and u ∈ L∗ and v ∈ L∗ . Now, if u ∈ L∗ , then there exists i such that u ∈ Li ;
similarly, there exists j such that v ∈ Lj . Hence uv ∈ Li+j . So there exists
an n (namely i + j) such that x ∈ Ln . So x ∈ L∗ .
35
Now we will move on to an example that uses proof by contradiction.
Example 22 (Euclid). The following famous theorem has an elegant proof
that illustrates some of our techniques, proof by contradiction in particu-
lar. The English statement of the theorem is
The prime numbers are an infinite set.
Re-phrasing this as For every prime, there is a larger one, we obtain, in math-
ematical notation:
0! = 1
(n + 1)! = (n + 1) ∗ n!
Proof. Towards a contradiction, assume the contrary, i.e., that there are
only finitely many primes. That means there’s a largest one, call it p.
Consider the number k = p! + 1. Now, k > p so k is not prime, by our
assumption. Since k is not equal to 1, it has a prime factor. Formally,
36
2.5.4 Induction
The previous methods we’ve seen are generally applicable. Induction, on
the other hand, is a specialized proof technique that only applies to struc-
tured data such as numbers and strings. Induction is used to prove uni-
versal properties.
Example 23. Consider the statement ∀n. 0 < n!. This statement is easy to
check, by calculation, for any particular number:
0 < 0!
0 < 1!
0 < 2!
...
but not for all of them (that would require an infinite number of cases to
be calculated, and proofs can’t be infinitely long). This is where induction
comes in: induction “bridges the gap with infinity”. How? In 2 steps:
Base Case Prove the property holds for 0: 0 < 0!, i.e., 0 < 1.
Step Case Assume the proposition for an arbitrary number, say k, and
then show the proposition holds for k + 1: thus we assume the in-
duction hypothesis (IH) 0 < k!. Now we need to show 0 < (k + 1)!. By
the definition of factorial, we need to show
0 < (k + 1) ∗ k! i.e.,
0 < k ∗ k! + k! (by the definition of factorial)
In your work, we will require that the base cases and steps cases be
clearly labelled as such, and we will also need you to identify the IH in the
step case. Finally, you will also need to show when you use the IH in the
proof of the step case.
Example 24. Iterated sums, via the Σ operator, yield many problems which
can be tackled by induction. Informally, Σni=0 = 0 + 1 + . . . + (n − 1) + n.
Let’s prove
∀n. Σni=0 (2i + 1) = (n + 1)2
37
Proof. By induction on n.
Σn+1
i=0 (2i + 1) = ((n + 1) + 1)2
= (n + 2)2
= n2 + 4n + 4
Σni=0 (2i + 1) +2(n + 1) + 1 = n2 + 4n + 4
| {z }
use of IH
(n + 1)2 + 2(n + 1) + 1 = n2 + 4n + 4
(n + 1)2 + 2(n + 1) + 1 = n2 + 4n + 4
n2 + 4n + 4 = n2 + 4n + 4
38
Proof. The ‘right-to-left’ direction is easy since A ⊆ A∗ , for all A. Thus it
remains to show L∗ ∗ ⊆ L∗ . Assume x ∈ L∗ ∗ . We wish to show x ∈ L∗ . By
the assumption there is an n such that x ∈ (L∗ )n . We now induct on n.
Base case. n = 0, so x ∈ (L∗ )0 , i.e., x ∈ {ε}, i.e., x = ε. This completes the
base case, as ε is certainly in L∗ .
Step case. Let IH = ∀x. x ∈ (L∗ )n ⇒ x ∈ L∗ . We want to show x ∈
(L∗ )n+1 ⇒ x ∈ L∗ . Thus, assume x ∈ (L∗ )n+1 , i.e., x ∈ L∗ · (L∗ )n . This
implies that there exists u, v such that u ∈ L∗ and v ∈ (L∗ )n . By the IH, we
have v ∈ L∗ . But then we have x ∈ L∗ because A∗ · A∗ = A∗ , for all A, as
was shown in Example 21.
39
This can be 2-colored as follows:
Now pick one side of the line (the left, say), and ‘flip’ the colors of the
regions on that side. Leave the coloring on the right side alone. This gives
40
us
which is again 2-colored. Now let’s see how to prove that this works in
general.
Proof. By induction on the number of lines on the plane.
Base case. If there are no lines on the plane, then pick a color and color
the plane with it. Since there are no adjacent regions, the property
holds.
Step case. Suppose the plane has n lines on it. The IH says that adjacent
regions have different colors. Now we add a line ℓ to the plane, and
recolor regions on the left of ℓ as stipulated above. Now consider any
two adjacent regions. There are three possible cases:
41
Example 28 (Incorrect use of induction). Let’s say that a set is monochrome
if all elements in it are the same color. The following argument is flawed.
Why?
Base case. The only set of size zero is the empty set, and clearly the
empty set of pigs is monochrome.
Step case. The inductive hypothesis is that any set with n pigs is
monochrome. Now we show that any set {p1 , . . . , pn+1 } consist-
ing of n + 1 pigs is also monochrome. By the IH, we know that
{p1 , . . . , pn } is monochrome. Similarly, we know that {p2 , . . . , pn+1 }
is also monochrome. So pn+1 is the same color as the pigs in
{p1 , . . . , pn }. Therefore {p1 , . . . , pn+1 } is monochrome.
3. Since all finite sets of pigs are monochrome, the set of all pigs is
monochrome. Since we just painted our pig yellow, it follows that
all pigs are painted yellow.
[Flaw: We make two uses of the IH in the proof, and implicitly take two
pigs out of {p1 , . . . , pn+1 }. That means that {p1 , . . . , pn+1 } has to be of size
at least two. Suppose it is of size 2, i.e., consider some two-element set of
pigs {p1 , p2 }. Now {p1 } is monochrome and so is {p2 }. But the argument
in the proof doesn’t force every pig in {p1 , p2 } to be the same color.]
Strong Induction
Occasionally, one needs to use a special kind of induction called strong,
or complete, induction to make a proof work. The difference between this
kind of induction and ordinary induction is the following: in ordinary in-
duction, the induction step is just that we assume the property P holds for
n and use that as a tool to show that P holds for n + 1; in strong induction,
the induction hypothesis is that P holds for all m strictly smaller than n
and the goal is to show that P holds for n.
Specified formally, we have
42
Mathematical induction
Strong induction
Some remarks:
43
• w = ε, i.e., ε ∈ X, or ε ∈ (A · X) ∪ B. But note that ε ∈
/ (A · X), by
the assumption ε ∈ / A. Thus we have ε ∈ B, and so ε ∈ A∗ · B,
as desired.
• w 6= ε. Since w ∈ (A · X) ∪ B, we consider the following cases:
(a) w ∈ (A · X). Since ε ∈/ A, there exist u, v such that w = uv,
u ∈ A, v ∈ X, and len(v) < len(w). By the IH, we have
v ∈ A∗ · B hence, by the semantics of Kleene star, we have
uv ∈ A∗ · B, as required.
(b) w ∈ B. Then w ∈ A∗ · B, since ε ∈ A∗ .
44
Chapter 3
Models of Computation
Now we start the course for real. The questions we address in this part
of the course have to deal with models for sequential computation1 in a
setting where there are no resource limits (time and space). Here are a few
of the questions that arise:
• What about assembly language (say for the x86). How does it com-
pare? Are low-level languages more powerful than high-level lan-
guages?
• What is an algorithm?
• What can’t computers do? For example, are there some optimizations
that a compiler can’t make? Are there purported programming tasks
that can’t be implemented, no matter how clever the programmer(s)?
1
We won’t consider models for concurrency, for example.
45
This is undoubtedly a collection of serious questions, and we should
say how we go about investigating them. First, we are not going to use any
particular real-world programming language: they tend to be too big.2 In-
stead, we will deal with relatively simple machines. Our approach will be
to convince ourselves that the machines are powerful enough to compute
whatever general-purpose computers can, and then to go on to consider
the other questions.
• There has to be a way to keep track of the current step of the calcula-
tion.
• There has to be a way to view the complete current state of the cal-
culation.
46
left and stretching off to the right. The tape is divided into cells, each of
which can hold one symbol. The input of the machine is a string w =
a1 · a2 · . . . · an initially written on the leftmost portion of the tape, followed
by an infinite sequence of blanks ( ):
a1 a2 · · · an−1 an ···
The machine is able to move a read/write head left and right over the
tape as it performs its computation. It can read and write symbols on the
tape as it pleases. These considerations led Turing to the following formal
definition.
Definition 1 (Turing Machine). A Turing machine is a 7-tuple
(Q, Σ, Γ, δ, q0 , qA , qR )
where
• Q is a finite set of states.
δ(qi , a) = (qj , b, d)
47
• The current state changes from qi to qj .
• The tape head moves to the left or right by one cell, depending on
whether d is L or R.
Example 30. We’ll build a TM that merely moves all the way to the end of
its input and stops. The states of the machine will just be {q0 , qA , qR }. (We
have to include qR as a state, even though it will never be entered.) The
input alphabet Σ = {0, 1}, for simplicity. The tape alphabet Γ = Σ ∪ { }
includes blanks, but is otherwise the same as the input alphabet. All that is
left to specify is the transition function. The machine simply moves right
along the tape until it hits a blank, then halts. Thus, at each step, it just
writes back the current symbol, remains in q0 , and moves right one cell:
δ(q0 , 0) = (q0 , 0, R)
δ(q0 , 1) = (q0 , 1, R)
Once the machine hits a blank, it moves one cell to the left and stops:
δ(q0 , ) = (qA , , L)
Notice that if the input string is ε, the first step the machine makes is mov-
ing left from the leftmost cell: it can’t do that, so the tape head just stays
in the leftmost cell.
Turing machines can also be represented by transition diagrams. A tran-
sition δ(qi , a) = (qj , b, d) between state qi and qj can be drawn as
a/b, d
qi qj
and means that if the machine is in state qi and the current cell has an a
symbol, then the current cell is updated to have a b symbol, the tape head
moves one cell to the left or right (according to whether d = L or d = R),
and the current state becomes qj .
For the current example, the state diagram is quite simple:
0/0, R
1/1, R
/ ,L
q0 qA
48
Example 31 (Unary addition). (Worked out in class.) Although Turing
machines manipulate symbols and not numbers, they are quite often used
to compute numerical functions such as addition, subtraction, multipli-
cation, etc. To take a very simple example, suppose we want to add two
numbers given in unary, i.e., as strings over Σ = {1}. In this representa-
tion, for example, 3 is represented by 111 and 0 is represented by ε. The
two strings to be added will be separated by a marker symbol X. Thus, if
we wanted to add 3 and 2, the input would be
1 1 1 X 1 1 ···
1 1 1 1 1 ···
Here is the desired machine. It traverses the first number, then replaces
the X with 1, then copies the second number, then erases the last 1 before
accepting.
1/1, R 1/1, R
X/1, R / ,L 1/ /L
q0 q1 q2 qA
49
0/0, R
1/1, R 1/1, R
/ ,R qR
/ ,L
q2 q3
0/ , R 0/ , L
/ ,R 0/0, L
q1 q4
1/1, L
1/ , R 1/ , L
/ ,L / ,L
q5 q6
qA 0/0, R qR
0/0, R / ,R
1/1, R
The general idea is to go from the ‘outside-in’ on the input string, can-
celling off equal symbols at each end. The loop q1 → q2 → q3 → q4 → q1
replaces the leading symbol (a 0) with a blank, then moves to the right-
most uncancelled symbol, checks that it is a 0, overwrites it with a blank,
then moves to the leftmost uncancelled symbol. If there isn’t one, then the
machine accepts. The lower loop q1 → q5 → q6 → q4 → q1 is essentially
the same as the upper loop, except that it cancels off a matching pair of
1s from each end. If the sought-for 0 (or 1) is not found at the rightmost
uncancelled symbol, then the machine rejects (from q3 or q6 ).
Now back to some more definitions. A configuration is a snapshot of the
complete state of the machine.
Definition 2 (Configuration). A Turing machine configuration is a triple
hℓ, q, ri, where ℓ is a string denoting the tape contents to the left of the tape
head and r is a string representing the tape to the right of the tape head.
Since the tape is infinite, there is a point past which the tape is nothing but
blanks. By convention, these are not included in r.4 The leftmost symbol
of r is the current tape cell. The state q is the current state of the machine.
A Turing machine starts in the configuration hε, q0 , wi and repeatedly
makes transitions until it ends up in qA or qR . Note that a machine may
4
However, this is not completely correct, since the machine may, for example, be given
the empty string as input, in which case r must have at least one blank.
50
never end up in qA or qR , in which case it is said to be looping or diverging.
After all, we would certainly want to model programs that never stop: in
many cases such programs are useless, but they are undeniably part of
what we understand by computation.
hu, qi , a · wi −→ hu · b, qj , wi.
of configurations, starting with the configuration hε, q0, wi, where the con-
figuration at step i + 1 is derived by making a transition from the configu-
ration at i.
A terminating execution is one which ends in an accepting configura-
tion hu, qA , wi or a rejecting configuration hu, qR , wi, for some u, w.
Remark. The following distinctions are important:
• M accepts w iff the execution of M on w is terminating and ends in
the accept state:
∗
hε, qo , wi −→ hℓ, qA , ri
51
• M rejects w iff the execution of M on w is terminating and ends in the
reject state:
∗
hε, qo, wi −→ hℓ, qR , ri
Once the machine enters state q3 , it has performed the addition and
now uses a loop to move the tape head leftmost. But when the machine is
moving left in a loop, such as in state q3 , there is a difficulty: the machine
should leave the loop once it bumps into the left edge of the tape. But once
52
the tape head reaches the leftmost cell, the machine will repeatedly try to
move left on a ‘1’, unaware that it is overwriting the same ‘1’ eternally.
There are two ways to deal with this problem:
• When making a looping scan to the leftmost cell, add some special-
purpose code to detect the left edge. We know that when the ma-
chine bumps into the left edge, it writes the new character on top of
the old and then can’t move the tape head. The idea is to write a
‘marked’ version of the symbol on the tape and attempt to move left.
In the next step, if the marked symbol is seen, then the machine must
be at the left edge and the loop can be exited. If the marked symbol is
not seen, then the machine has been able to move left, and we go back
and ‘erase’ the mark from the symbol before continuing. For the cur-
rent example this yields the following machine, where the leftward
loop in state q3 has been replaced by the loop q3 → q4 → q5 → q3 .
1/1, R 1/1, R
X/1, R / ,L 1/ , L 1/1̇, L 1̇/1, L
q0 q1 q2 q3 q4 qA
1/1/R
1̇/1, L
q5
53
Here is an execution of the second machine on the input 111X11:
hε, q0 , 111X11i
h1, q0 , 11X11i
h11, q0 , 1X11i
h111, q0 , X11i
h1111, q1 , 11i
h11111, q1 , 1i
h111111, q1 , i
h11111, q2 , 1 i
h1111, q3 , 1 i
h111, q4 , 11̇ i
h1111, q5 , 1̇ i
h111, q3 , 11 i
h11, q4 , 11̇1 i
h111, q5 , 1̇1 i
h11, q3 , 111 i
h1, q4 , 11̇11 i
h11, q5 , 1̇11 i
h1, q3 , 1111 i
hε, q4 , 11̇111 i
h1, q5 , 1̇111 i
hε, q3 , 11111 i
hε, q4 , 1̇1111 i
hε, qA , 11111 i
54
3. Otherwise, overwrite the ‘)’ with an ‘X’ and scan left for a ‘(’.
4. If one is found, overwrite it with ‘X’ and go to 1. Otherwise, reject.
The following diagram captures this algorithm. It is not quite right, be-
cause of left-edge detection problems, but it is close.
X/X, R
(/(, R X/X, L
)/X, L
q0 q1
(/X, R
/ ,L
X/X, L q2 qR
(/(, R
???
qA
In state q0 , we scan right, skipping open parens and X’s, looking for a
closing parenthesis, and transition to state q1 when one is found. If one is
not found, we must hit blanks, in which case we transition to state q2 .
q1 If we find ourselves in q1 , we’ve found the first ‘)’ and replaced it with
an ‘X’. Now we have to scan left and find the matching open paren-
thesis, skipping over any ‘X’ symbols. (Caution: left edge detection
needed!) Once the first open paren to the left is found, we over-write
it with an ‘X’ and go to state q0 . Thus we have successfully cancelled
off one pair of matching parens, and can go to the beginning of the
loop, i.e., q0 , to look for another pair.
q2 If we find ourselves in q2 , we have unsuccessfully searched to the right
looking for a closing paren. That means that every closing paren has
been paired up with an open paren. However, we must still deal with
the possibility that there are more open parens than closing parens
in the input, in which case we should reject. So we search back left
55
looking for a remaining open paren. If none exist, we accept; other-
wise we reject.
Thus, in state q2 we scan to the left, skipping over ‘X’ symbols. If we
encounter an open paren, we transition to state qR and reject. If we
don’t, then we ought to accept.
Now we will re-do the BAL example properly, using both ways of de-
tecting the left edge.
Example 35 (BAL done right (1)). We expect a ⋆ in the first cell, followed
by the real input.
X/X, R
(/(, R X/X, L
)/X, L
⋆/⋆, R
s q0 q1
(/X, R
/ ,L ⋆/⋆, L
X/X, L q2 qR
(/(, R
⋆/⋆, L
qA
Example 36 (BAL done right (2)). Each loop implementing a leftward scan
is augmented with extra states. The naive (incorrect) loop implementing
the left scan at q1 is replaced by a loop q1 → q4 → q5 → q1 , which is exited
either by encountering an open paren (transition to q0 ) or by bumping
against the left edge (no corresponding open paren to a close paren, so
transition to reject state qR ).
Similarly, the incorrect loop implementing the left scan at q2 is replaced
by a loop q2 → q6 → q7 → q2 , which is exited either by encountering an
open paren (open paren with no corresponding close paren, so transition
56
to qR ) or by bumping against the left edge (no unclosed open parens, so
transition to accept state qA ).
X/X, R
(/(, R
)/X, L
q0 q1
(/X, R
/ ,L X/Ẋ, L
Ẋ/X, L
q2 q5 q4
(/(, R
)/), R
Ẋ/X, L
X/Ẋ, L Ẋ/X, L
q7 q6
(/(, R (/(, R
)/), R qR
Ẋ/X, L
qA
{w · w | w ∈ {a, b}∗ }
57
go all the way to the left and mark the leftmost unmarked symbol.
Then go all the way to the right and mark the rightmost unmarked
symbol. Repeat until there are no unmarked symbols. Because we
have worked ‘outside-in’ this phase of processing should end up
with the tape head on the first symbol of the second half of the string.
If the string is not of even length then, at some step, the leftmost
symbol will get marked, but there will be no corresponding right-
most unmarked symbol.
2. Now we check that the two halves are equal. Starting from the first
character of the right half of the string, call it ċ, we remove the mark
and move left until the leftmost marked symbol is detected. We will
have to detect the left edge in this step! If the leftmost marked sym-
bol is indeed ċ, then we unmark it (otherwise we reject). Then we
scan right over (a) remaining marked symbols in the left half of the
string and then (b) unmarked symbols in the first part of the right
half of the string. We then either find a marked symbol, or we hit the
blanks.
3. Repeat for the second, third, etc characters. Finally, the rightward
scan for a marked symbol on the rhs doesn’t find anything and ends
in the blanks. And then we can accept.
Now that we have a good idea of how the algorithm should work, we
will go ahead and design the TM in detail. (But note that often this higher
level of description suffices to convince people that a proposed algorithm
is implementable on a TM, and actually providing the full TM description
is not necessary.)
In the transition diagram, we use several shorthand notations:
• Σ/Σ, L says that the transition replaces any symbol (so either 0 or
1) by itself and moves left. Thus Σ is being used to represent any
particular symbol in Σ, saving us from writing out two transitions.
• Σ⋆ = Σ ∪ {⋆}.
• Σ = Σ ∪ { }.
58
Σ/Σ, R
Σ̇ /Σ̇ , L Σ̇/Σ̇, L
2 3 qR
Σ/Σ̇, R
Σ/Σ̇, L
⋆/⋆, R Σ̇/Σ̇, R
0 1 4 Σ/Σ, L
1̇/1, L 0̇/0, L
/ ,L
Σ/Σ, L 5 qA 10 Σ/Σ, L
/ ,L
Σ̇/Σ̇, L Σ̇/Σ̇, L
1̇/1, L 0̇/0, L
Σ̇/Σ̇, L 6 9 11 Σ̇/Σ̇, L
Σ/Σ, R
Σ⋆/Σ⋆, R Σ/Σ, R Σ⋆/Σ⋆, R
7 8 12
1̇/1, R 0̇/0, R
qR
We assume that the input is prefixed with ⋆, thus the transition from
state 1 to 2 just hops over the ⋆. If the input is not prefixed with ⋆, there
is a transition to qR (not included in the diagram). Having got to state 2,
the first pass of processing proceeds in the loop of states 1 → 2 → 3 →
4 → 1. In state 2 the leftmost unmarked character is marked and then
there is a sweep over unmarked characters until either a marked character
or the blanks are encountered (state 2). Then the tape head is moved one
cell to the left. (Note that Σ = {0, 1} in this example.) In state 3, we
should be at the rightmost unmarked symbol on the tape. If it is however,
a marked symbol, that means that the leftmost unmarked symbol has no
corresponding rightmost unmarked symbol, so we reject. Otherwise, we
loop left over the unmarked symbols until we hit a marked symbol, then
59
move right.
We are then either at an unmarked symbol, in which case we go through
the 1 → 2 → 3 → 4 → 1 loop again, or else we are at a marked sym-
bol. In fact, this will be the first symbol in the second half of the string,
and we move to the second phase of processing. This phase features two
nearly identical loops. If the marked symbol is a 1̇, then the left loop
5 → 6 → 7 → 8 → 9 is taken; otherwise, if the marked symbol is 0̇,the
right loop 10 → 11 → 12 → 8 → 9 is taken.
We now describe the left loop. In state 1 the leftmost marked cell in
the second half of the string is a 1̇; now we traverse over the prefix of
unmarked cells in the second half (state 5); then we traverse over the suffix
of marked cells in the first half of the string (state 6). Thus we arrive either
at ⋆ or at the rightmost unmarked cell in the first half of the string, and
move right into state 7. This leaves us looking at a marked cell. We expect
the matching 1̇ to the one seen in state 1, which takes us to state 8 (after
unmarking it); if we see a 0̇, then we reject. So if we are in state 8, we have
located the matching symbol in the first half of the string, and unmarked
it. Now we move right to the next element to consider in the second half
of the string. This involves skipping over the remaining marked symbols
in the first half of the string (state 9), then the prefix of unmarked symbols
in the second half of the string (state 10).
Then we are either looking at a marked symbol, in which case we go
around the loop again (either to state 5 if it is a 1̇ or to state 10 if it is a 0̇).
Or else we are looking at the blanks, which means that there are no more
symbols to unmark, and we can accept.
We now trace the execution of M on the string 010010, by giving a
sequence of machine configurations. In several steps (17, 20, 24, 31, and
34) we use ellipsis to abbreviate a sequence of steps. Hopefully, these will
be easy to fill in!
60
Step Config Step Config
1 (ε, q0 , ⋆010010) 26 (⋆0̇1̇, q10 , 0̇01̇0̇)
2 (ε, q1 , 010010) 27 (⋆0̇, q11 , 1̇0̇01̇0̇)
3 (⋆, q1 , 010010) 28 (⋆, q11 , 0̇1̇0̇01̇0̇)
4 (⋆0̇, q2 , 10010) 29 (ε, q11 , ⋆0̇1̇0̇01̇0̇)
5 (⋆0̇1, q2 , 0010) 30 (⋆, q12 , 0̇1̇0̇01̇0̇)
6 (⋆0̇10, q2 , 010) 31 (⋆0, q8 , 1̇0̇01̇0̇) . . .
7 (⋆0̇100, q2 , 10) 32 (⋆01̇0̇, q8 , 01̇0̇)
8 (⋆0̇1001, q2, 0) 33 (⋆01̇0̇0, q9 , 1̇0̇)
9 (⋆0̇10010, q2, ) 34 (⋆01̇0̇, q5 , 010̇) . . .
10 (⋆0̇1001, q3, 0) 35 (⋆, q6 , 01̇0̇010̇)
11 (⋆0̇100, q4 , 10̇) 36 (⋆0, q7 , 1̇0̇010̇)
12 (⋆0̇10, q4 , 010̇) 37 (⋆01, q8 , 0̇010̇)
13 (⋆0̇1, q4 , 0010̇) 38 (⋆010̇, q8 , 010̇)
14 (⋆0̇, q4 , 10010̇) 39 (⋆010̇0, q9 , 10̇)
15 (⋆, q4 , 0̇10010̇) 40 (⋆010̇01, q9 , 0̇)
16 (⋆0̇, q1 , 10010̇) 41 (⋆010̇0, q10 , 10)
17 (⋆0̇1̇, q2 , 0010̇) . . . 42 (⋆010̇, q10 , 010)
18 (⋆0̇1̇001, q2, 0̇) 43 (⋆01, q10 , 0̇010)
19 (⋆0̇1̇00, q3 , 10̇) 44 (⋆0, q11 , 10̇010)
20 (⋆0̇1̇00, q4 , 1̇0̇) . . . 45 (⋆01, q12 , 0̇010)
21 (⋆0̇1̇, q4 , 001̇0̇) 46 (⋆010, q8, 010)
22 (⋆0̇, q1 , 1̇001̇0̇) 47 (⋆0100, q9, 10)
23 (⋆0̇1̇, q1 , 001̇0̇) 48 (⋆01001, q9, 0)
24 (⋆0̇1̇, q1 , 001̇0̇) . . . 49 (⋆010010, q9, )
25 (⋆0̇1̇0̇, q1 , 0̇1̇0̇) 50 (⋆01001, qA, 0)
End of example
61
ficult to model the essential aspects of a microprocessor as a Turing ma-
chine: the ALU operations (addition, multiplication, etc.) can be imple-
mented by the standard grade-school algorithms, the registers of the ma-
chine can be placed at certain specified sections of the tape, and the random-
access memory can also be modelled by the tape. And so on.
3.1.2 Extensions
On top of the basic Turing machine model, more convenient models can
be built. These new models still recognize the same set of languages, how-
ever.
Multiple Tapes
It can be very convenient to use Turing machines with multiple (unbounded)
tapes. For example, if asked to implement addition of binary numbers on
a Turing machine, it would be quite useful to have five tapes: a tape for
each number being added, one to hold the sum, one for the carry being
62
propagated, and one for holding the two original inputs. Such require-
ments can be easily implemented by an ordinary Turing machine: for n
tapes,
tape n
tape n−1
..
.
tape 1
we simply divide the single tape into n distinct regions, separated by spe-
cial markers, e.g., X in the following:
δ : Q × Γn → Q × Γn × {L, R}n
Each of the n sub-tapes will have one cell deemed to be the current cell.
We will use a system of markings to implement this idea. Since cells can’t
be marked, we will mark the contents of the current cell. This means that
the tape alphabet Γ will double in size: each symbol ai ∈ Γ will have a
marked analogue ȧi , and by convention, there will be exactly one marked
symbol per sub-tape. With this support, a move in the n-tape machine
will consist of (1) a left-to-right sweep wherein the steps prescribed by δ
are taken at each marked symbol, followed by (2) a right-to-left sweep to
reset the tape head to the left-most cell on the underlying tape.
63
By convention, a computation would start with the tape head on the left-
most symbol of the input string. This machine model is relatively easy
(although detailed) to implement using an ordinary Turing machine. The
main technique is to ‘fold’ the doubly infinite tape in two and merge the
cells so that alternating cells on the resulting singly infinite tape belong
to the two halves of the doubly-infinite tape. It helps to think of the tape
elements as being labelled with integers. Thus if the original tape was
labelled as
· · · −3 −2 −1 0 1 2 3 · · ·
the single-tape version would be laid out as
⋆ 0 1 −1 2 −2 3 −3 · · ·
Again, the details of how the control of the machine is achieved with an
ordinary Turing machine are relatively detailed.
A further extension—non-determinism—can be built on top of ordi-
nary Turing machines, and is discussed later.
◭ wa ‡ wb ◮
64
where ◭, ◮, and ‡ are symbols not occurring in the encoding of a and
b . The encoding of a list of objects [a1 , · · · , an ] can be implemented by
iterating the encoding of pairs, i.e., as
◭ a1 ‡ ◭ a2 ‡ ◭ · · · ‡ ◭ an−1 ‡ an ◮ · · · ◮◮◮
Finite sets can be encoded in the same way as lists. Arbitrary trees can
be represented by binary trees, and binary trees can be encoded, again, as
nested pairs. In effect, we are re-creating Lisp s-expressions. A graph is
usually represented as (V, E) where V is a finite set of nodes and E is a
finite set of edges, i.e., a set of pairs of nodes. Again, this format is easy to
encode and decode.
The art of coding and decoding data pervades Computer Science. There
is even a related research area known as Coding Theory, but the subject of
efficient algorithms for coding and decoding is really orthogonal to our
purposes in this course.
We shall henceforth assume that encoding and decoding of any desired
high-level data can be accomplished. We will assume that, for each par-
ticular problem intended to be solved by a TM, the input is given in an
encoding that the machine can decode correctly. Similarly, if a machine
produces output, it will likewise be decodable. For this, we will adopt the
notation hAi, to represent an object A which has been encoded to a string,
and which will be decodable by the TM to a correct representation of A. A
tuple of objects in the input will be written as hA1 , . . . , Ak i.
wQ X wΣ X wΓ X wδ X wq 0 X wq A X wq R X ···
65
where wQ , wΣ , wΓ , wδ , wq0 , wqA and wqR are strings representing the compo-
nents of the machine. In detail
• Q is a finite set of states. We could explicitly list out the state ele-
ments, but will instead just write out the number of states in unary
notation.
• Σ is the input alphabet, a finite set of symbols. Our format will list
the symbols out, in no particular order, separated by blanks. Recall
that the blank is not itself an input symbol.
• Γ is the tape alphabet, a finite set of symbols with the property that
Σ ⊂ Γ. In our format, we will just list the extra symbols not already
in Σ. Blank is a tape symbol.
• δ is a function and one might think that there would be a problem
with representing it, especially since functions can have infinite do-
mains. Fortunately, δ has a finite domain (since Q is finite and Γ
is finite, Q × Γ is finite). Therefore δ can be listed out. Each indi-
vidual transition δ(p, a) = (q, b, d) can be represented as a 5-tuple
(p, a, q, b, d). On the tape each of these tuples will look like p a q b d.
(If a or b happen to be the blank symbol no ambiguity should re-
sult.) Each 5-tuple will be separated from the others by a XX.
• q0 , qA , qR will be represented by numbers in unary notation.
Example 38. The following simple machine
0/0, R
1/1, R will be represented on tape by
/ ,L
q0 qA
66
Note that the direction L is encoded by 1 and R is encoded by 11.
Example 39. A TM M that takes an arbitrary TM description and tests
whether it has an even number of states can be programmed as follows:
on input hQ, Σ, Γ, δ, q0 , qA , qR i, M checks that the input is in fact a represen-
tation of a TM and then checks the number of states in Q. If that number
is even, then M clears the tape and transitions into its accept state (which
is different than the qA of the input machine) and halts; otherwise, it clears
the tape and rejects.
67
If M eventually halts on w, then U will detect this (since the last config-
uration on the tape will be in state qA or qR ) and will then transition into
the corresponding terminal state of U. Thus, if M halts when run on w,
then the simulation of M on w by U will also halt. If M loops on w, the
simulation of M’s execution of w will also diverge, endlessly appending
new configurations to the end of the tape.
Self application
A Turing machine M that takes as input an (encoded) TM and performs
some calculation using the description of the machine can, of course, be
applied to all manner of Turing machines. Is there a problem with ap-
plying M to its own description? After all, the notion of self-application
can be extremely confusing, since infinite regress is a lurking possibility.
But consider the ‘even-number-of-states-tester’ given above in Example
39. When applied to itself, i.e., to its own description, it performs in a
sensible manner since it just treats its input machine description as data.
The following ‘real-world’ example similarly shows that the treatment of
programs as (encoded) data allows some instances of self-reference to be
straightforward.
Example 40. The Unix wc utility counts the number of characters in a file.
It can be applied to itself
bash-2.05b$ /usr/bin/wc /usr/bin/wc
58 480 22420 /usr/bin/wc
with no fear of infinite regress. Turing machines that treat their input TM
descriptions simply as data typically don’t raise any difficult conceptual
issues.
68
• Inc r i. Add 1 to register r and move to instruction i
0 HALT
1 Test R0 I0 I0
0 HALT
1 Test R0 I1 I1
0 HALT
1 Test R0 I0 I2
2 Inc R1 I1
69
Execution starts at instruction 1. The Test instruction checks R0 , exiting if
it holds 0. Otherwise, it decrements R0 and transfers control to instruction
2. This then adds 1 to R1 and transfers control back to instruction 1.
The following table shows how the execution evolves, step by step
when R0 has been loaded with 3 and R1 with 19.
Step R0 R1 Instr
0 3 19 1
1 2 19 2
2 2 20 1
3 1 20 2
4 1 21 1
5 0 21 2
6 0 22 1
7 0 22 HALT
In the beginning (step 0), the machine is loaded with its input numbers,
and is at instruction 1. At step 1 R0 is decremented and the machine moves
to instruction 2. And so on.
Notice that we could also represent the execution by the following se-
quence of triples (R0 , R1 , Instr):
(3, 19, 1), (2, 19, 2), (2, 20, 1), (1, 20, 2), (1, 21, 1), (0, 21, 2), (0, 22, 1), (0, 22, 0).
OK, one more example. How about adding R0 and R1 , putting the
result in R2 , leaving R0 and R1 unchanged?
Example 44. As always, it is worth thinking about this at a high level be-
fore diving in and writing out the exact instructions. The best approach I
could think of uses five registers R0 , R1 , R2 , R3 , R4 . We use R3 and R4 to
store the original values of R0 and R1 . The program first (instructions 1–3)
repeatedly decrements R0 and adds 1 to both R2 and R3 . At the end of this
phase, R0 is 0 and both R2 and R3 will hold the original contents of R0 .
Next (instructions 4–6) the program repeatedly decrements R1 and adds
1 to both R2 and R4 . At the end of this phase, R1 is 0, R2 holds the sum of
the original values of R0 and R1 , and R4 holds the original contents of R1 .
Finally, a couple of loops (instructions 7–8 and 9–10) move the contents
of R3 and R4 back to R0 and R1 . Here is the program:
70
0 HALT
1 Test R0 I4 I2
2 Inc R2 I3
3 Inc R3 I1
4 Test R1 I7 I5
5 Inc R2 I6
6 Inc R4 I4
7 Test R3 I9 I8
8 Inc R0 I7
9 Test R4 HALT I10
10 Inc R1 I9
For the intrepid, here’s the execution of the machine when R0 = 2 and
R1 = 3.
71
3.3 The Church-Turing Thesis
Computability theory developed in the 1930’s in an amazing burst of cre-
ativity by logicians. We have seen two fully-featured models of computation—
Turing machines and Register machines—but there are many more, for
example λ-calculus (due to Alonzo Church), combinators (due to Haskell
Curry), Post systems (due to Emil Post), Term Rewriting systems, unre-
stricted grammars, cellular automata, FRACTRAN, etc.
These models have all turned out to be equivalent, in that each allows
the same set of functions to be computed. Before we give an indication of
what such an equivalence proof looks like in the case of Turing machines
and Register machines, we can make some general remarks.
A full model of computation can be seen as a setting forth of a general
way to do sequential computation, i.e., to deterministically compute the
values of functions, over some kind of data (often strings or numbers).
The requirement on an author of such a model is to show how all tasks
we regard as being ‘computable’ by a real mechanical device, or solvable
by an algorithm, may be realized in the model. Typically, this splits into
showing
• How all manner of data, e.g., trees, graphs, arrays, etc, can be en-
coded to, and decoded from, the data representation used by the
model.
Put this way, it seems possible that a chaotic situation could have de-
veloped, where multiple competing notions of computability struggled for
supremacy. But that hasn’t happened. Instead, the proofs of equivalence
among all the different models mean that people can use whatever model
they prefer, secure in the knowledge that, were it necessary or convenient,
they could have worked in any of the other models. This is the Church-
Turing Thesis: that any model of computation is as powerful as any other,
and that any fresh one proposed is anticipated to be equivalent to all the
others. This is what gives people confidence that any algorithm coded as a
‘C’ program can also be coded up in Java, or Perl, or any other general pur-
pose programming language. Note well, however, that all considerations
72
FRACTRAN
Unrestricted Grammars Register Machines
C ML
Java
73
no abstract definition of the term algorithm, of which the models of com-
putation are instances. Thus the search for an algorithm implementing a
requirement has to be met by supplying a program. If such a program ex-
ists in one model of general computation or programming language, then
it can be translated to any other model of general computation or pro-
gramming language. Contrarily, if a requirement cannot be implemented
in a particular model of computation, then it also cannot be implemented
in any other.
As a result of adopting the CT-Thesis, we can use abstract methods,
e.g., notation from high-level programming languages or even mathemat-
ics, to describe algorithmic behaviour, and know that the algorithm can be
implemented on a Turing Machine. Thus we may shift our attention from
painstakingly implementing algorithms in horrific detail on simple mod-
els of computation. We will now assume programs exist to implement the
desired algorithms. Of course, we may be challenged to show that a pur-
ported algorithm is indeed implementable; then we may choose whatever
Turing-equivalent model we wish in order to write the program.
Finally, the scope of the CT-Thesis must be understood. The models of
computation are intended to capture computation of mathematical func-
tions. In other words, whenever a TM M is applied to an input string w,
it will always return the same answer. Dealing with interactive, random-
ized, or distributed computation requires extensions which have been the
source of much debate. They are however, beyond the scope of this course.
74
can also be written in B, and vice versa. The simulation of A programs by
B programs is captured in the following diagram:
runA
(pA , iA ) result A
toB toA
(pB , iB ) result B
runB
which expresses the following equation:
75
• Increment that register (which is of course represented by a bit-
string). Also note that this operation could require resizing the
portion of tape for Ri .
• Move tape head all the way left.
• Move to state that represents the beginning of the TM instruc-
tion sequence for instruction k.
76
numbers. Goedel’s idea was to basically treat w as a representation of the
prime factorization of some number.
Example 45. Let’s use the string foobar as an example. Taking ASCII as
the coding system, we have: C(f) = 102, C(o) = 111, C(b) = 98, C(a) = 97,
and C(r) = 114. The Goedel number of foobar is calculated as follows:
77
3.4 Recognizabilty and Decidability
Now that we have become familiar with Turing machines and Register
machines, we should ask what they are good for. For example, Turing
machines definitely aren’t good for programming, so why do we study
and use them?
We know that a Turing machine takes an input and either accepts it,
rejects it, or loops. In order to tell if a Turing machine works properly, one
needs to specify what answers it should compute. The usual way to do
this is to specify the set of strings the TM must accept, i.e., its language.
We now make a crucial, as it turns out, distinction between TMs that reject
strings outside their language and TMs that either reject or loop when
given a string outside their language.
Definition 7 (Turing-recognizable, Recursively enumerable). A language
L is Turing recognizable, or recursively enumerable, if there is some Turing
machine M (called a recognizer) that accepts each string in L. For each
string not in L, M may either reject the string or diverge.
78
The following definition captures the subset of recognizers that never
loop.
Thus a decider always says ‘yes’ given a string in the specified lan-
guage and always says ‘no’ given a string outside the specified language.
Obviously, it is better to have a decider than a recognizer, since a decider
always gives a verdict in a finite time. Moreover, a decider is automatically
a recognizer.
A restatement of these definitions is the following.
Definition 9 (Decision problem). The decision problem for a language L is
just the question: is there a decider for L? Equivalently, one asks if L is de-
cidable. Similarly, the recognition problem for a language L is the question:
is L recognizable (recursively enumerable).
• binary strings
• natural numbers
79
• Well-formed C programs
• {hi, j, ki | i + j = k}
• {hℓ, ℓ′i | ℓ′ is a sorted permutation of ℓ}
In a better world than ours, one would expect that more informative
properties and questions, such as the following, would also be decidable:
Unfortunately, such is not the case. We can prove that none of the
above question are decidable, as we will see. Notice that, in order to make
such claims, a clever proof is necessary since we are asserting that var-
ious problems are algorithmically unsolvable, i.e., that no program can be
constructed to solve the problem.
But first we are going to look at a set of decision problems about TMs
that can be solved. If Turing machines seem too unworldly for you, the
problems are easily restated to apply to your favorite programming lan-
guage or microprocessor.
80
3. Does M take more than 3100 steps on some input?
The decider will simulate M on all strings of length ≤ 3100, for 3100
steps. If M has not entered qA or qR by then, for at least one string,
accept, else reject. We need only consider strings of length ≤ 3100:
longer strings will take more than 3100 steps to read.
4. Does M take more than 3100 steps on all inputs?
Similar to the previous, except that we require that M take ≥ 3100
steps for each string.
5. Does M ever move the tape head more than 3100 cells away from the
starting position?
M will either loop infinitely within the 3100 cells, stop within the
3100 cells, or break out of the 3100 cells. We can detect the infinite
loop by keeping track of the configurations that M can get into: for
a fixed tape size (3100 in this problem), this is a finite number of
configurations. In particular, if M has n states and k tape symbols,
then the number of configurations it can get into is 36 ∗ n ∗ k 3100. If M
hasn’t re-entered a previous state or halted in that number of moves,
the tape head must have moved more than 3100 cells away from its
initial position.
81
The following example illustrates an important technique for building
deciders and recognizers.
Example 47 (Dovetailing). Suppose our task is to take an arbitrary Turing
machine description M and tell whether there is any string that it accepts.
This decision problem can be formally stated as
ASome = {hMi | ∃w. M accepts w}
A naive stab at an answer would say that a recognizer for ASome is
readily implemented by generating strings in Σ∗ in increasing order, one
at a time, and running M on them. If a w is generated so that a simulated
execution of M accepts w, then our recognizer halts and accepts. However,
it may be the case that M accepts nothing, in which case this program will
loop forever. Thus, on the face of it, this is a recognizer but not a decider.
However, this is not even a recognizer! What if M loops on some w,
but would accept some (longer) string w ′ ? Blind simulation will loop on w
and M will never get invoked on w ′ .
This problem can be solved, in roughly the same way as the same prob-
lem is solved in time-shared operating systems running processes: some
form of fair scheduling. This can be implemented by interleaving the gen-
eration of strings with applying M to each of them for a limited number
of steps. The algorithm goes round-by-round. In the first round of the
algorithm, M is simulated for one step on ε. In the second round of the
algorithm, M is simulated for one more step on ε, and M is simulated for
one step on 0. In the third round of the algorithm, M is simulated for
one more step on ε and 0, and M is simulated for one step on 1. In the
fourth round, M is simulated for one more step on ε, 0, 1, and is simulated
for one step on 00. Computation proceeds, where in each round all exist-
ing sub-computations advance by one step, and a new sub-computation
on the next string in increasing order is started. This proceeds until in
some sub-computation M enters the accept state. If it enters a reject state,
that sub-computation is dropped. The process just outlined is often called
dovetailing, because of the fan shape that the computation takes.
Clearly, if some string w is accepted by M, it will start being processed
in some round, and eventually accepted, possibly much later. So the lan-
guage ASome is recognizable.
One way of showing a language L is decidable, is to show that there
are recognizers for L and L.
82
Theorem 1. If L is recognizable and L is recognizable then L is decidable.
Proof. Let M1 be a recognizer L and M2 a recognizer for L. For any x ∈ Σ∗ ,
x ∈ L or x ∈ L. A decider for L can thus be implemented that, on input
x, dovetails execution of M1 and M2 on x. Eventually, one of the machines
must halt, with an accept or reject verdict. If M1 halts first, the decider
returns the verdict of M1 ; if M2 halts first, the decider returns the opposite
of the verdict of M2 .
Example 48. The decidable languages are closed under union. Formally,
this is expressed as
Proof. Suppose L1 and L2 are decidable. Then there exists a decider M1 for
L1 and a decider M2 for L2 . Now we claim there is a decider M for L1 ∪ L2 .
Let M be the following machine:
That M is a decider is clear since both M1 and M2 halt on all inputs. That
M accepts the union of L1 and L2 is also clear, since M accepts x if and
only if one or both of M1 , M2 accept x.
83
When seeking to establish a closure property for recognizable languages,
we have to guard against the fact that the recognizers may not terminate
on objects outside of their language.
Example 49. The recognizable languages are closed under union. For-
mally, this is expressed as
84
3.5 Undecidability
Many problems are not decidable. This is easy to see by a cardinality ar-
gument: there are simply far more languages (uncountable) than there are
algorithms (countable). So some languages—almost all in fact—have no
decider, or recognizer. However, that is an abstract argument; what we
want to know is whether specific problems of interest to computer science
are decidable or recognizable.
In this section we will show the undecidability of the Halting Problem, a
result due to Alan Turing. The importance of this theorem is twofold: first,
it is the earliest and arguably most fundamental result about the limits of
computation (namely, some problems can not be algorithmically solved);
second, many other undecidability results stem from it. The proof uses a
cool technique, which we pause to introduce here.
3.5.1 Diagonalization
The diagonalization proof technique works as follows: assume that you
have a complete listing of objects; then construct a new object that should
be in the list but can’t be since it differs, in at least one place, with every
object in the list. A contradiction thereby ensues. This technique was in-
vented by Georg Cantor to show that R is not countable, i.e., that there
are far more real numbers than natural numbers. This shows that there is
more than one size of infinity.
The existence of a bijection between two sets is used to implement the
notion of the sets ‘being the same size’, or equinumerous. When the sets are
finite, we just count their elements and compare the resulting numbers.
However, equinumerosity is unusual when the sets are of infinite size. For
example, the set of even numbers is equinumerous with N even though it
is a proper subset of N!
85
Towards a contradiction, suppose that there is such a surjection, i.e.,
the real numbers between 0 and 1 can be arranged in a complete listing in-
dexed by natural numbers. This gives us a table, infinite in both directions.
Each row of the table represents one real number, and all real numbers are
in the table.
0 . 5 3 1 1 7 8 2 ···
0 . 4 3 0 0 1 2 9 ···
0 . 7 7 6 5 1 0 2 ···
0 . 0 1 0 0 0 0 0 ···
0 . 9 0 3 2 6 8 4 ···
0 . 0 0 0 1 1 1 0 ···
.. .. .. .. .. .. .. .. . .
. . . . . . . . .
The arrangement of the numbers in the table doesn’t matter; what is
important is that the listing is complete and that each row is indexed by a
natural number. Now we build an infinite sequence of digits D by travers-
ing the diagonal in the listing and changing each digit of the diagonal. For
example, we could build
D = 0.647172 . . .
86
Note. Pedants may enjoy pointing out that there is a difficulty with in-
finitely repeatedly digits since, e.g., 0.19999 . . . = 0.2000 . . .. If D ended
with an infinite repetition of 0s or 9s the argument wouldn’t work (be-
cause, e.g., 0.199999 . . . differs from 0.2000 . . . at each digit, but they are
equal numbers). We therefore exclude 0 and 9 from being used in the con-
struction of D.
Note. Diagonalization can also be used to show that there is no surjection
from a set to its power set; hence there is no bijection, hence the sets are not
equinumerous. The proof is almost identical to the one we’ve just seen.
ε 0 1 00 01 10 11 000 · · ·
Mε
M0
M1
M00
M01
M10
M11
M000
..
.
There is a tight connection between binary string w and machine Mw . If
w is a valid encoding of a machine, we use Mw to stand for that machine.
87
If w is not a valid encoding of a machine, it will denote a dummy machine
that simply halts in the accept state as soon as it starts to run. Thus each
Mw is a valid Turing machine, and every TM may be found at least once
in the list. This bears repeating: this list of TMs is complete, every possible
TM is in it.
Therefore an entry of the table indexed by (i, j) represents machine Mi
with input being the binary string corresponding to j: the table represents
all possible TM computations.
Now let’s get started on the argument. Towards a contradiction, we
suppose that there is a decider H for the halting problem. Thus, let H be
a TM having the property that, when given input hw, ui, for any w and u,
it correctly determines if Mw halts when given input u. Being a decider, H
must itself always finish its computation in a finite time.
Therefore, we could use H to fill in any cell in the table with T (signi-
fying that the program halted on the input) or F (the program goes into
an infinite loop when given the particular input). Notice that H can not
simply blindly execute Mw on u and return T when Mw on u terminates:
what if Mw on u didn’t terminate?
Now consider the following TM N which calls H as a sub-routine.
• N behaves differently from every machine in the list, for at least one
argument (it loops on x iff Mx halts on x). So N can’t possibly be on
the list.
88
Alternate proof A slightly different proof—one which explicitly makes
use of self-reference—proceeds as follows: we construct N as before, but
then ask the question: how does N behave when applied to itself, i.e., what
is the value of N(N)? By instantiating the definition of N, we have
This result lets us conclude that there can not be a procedure that will
tell if arbitrary programs halt on all possible inputs. In other words, the
halting problem is algorithmically unsolvable. (Note the use of the CT-Thesis
here: we have moved from a result about Turing machines to a claim about
all algorithms.)
To be clear: certain syntactically recognizable classes of programs, e.g.,
those in which the only looping construct is bounded for-loops, always
halt. Hence, although the general problem is undecidable, there are impor-
tant subsets that are decidable. However, it’s impossible to write a halting
detector that will work correctly for all programs.
The language AP asks whether the given machine accepts the given
input. AP is closely related to HP , but is not the same. The language
HP 42 is a specialization of HP: it essentially asks the question ‘Once the
89
input is known, does the Halting problem become decidable?’. The language
HP ∃ asks whether a machine halts on at least one input, while HP ∀ asks if
a machine halts on all its inputs. The language Const 42 asks whether the
given machine returns 42 as an answer in all possible computations. The
language Equiv asks whether two machines compute the same answers in
all cases, i.e., whether they are equivalent. Finally, the language Finite asks
whether the given machine accepts only a finite number of inputs.
All of these undecidable problems can be shown undecidable by a tech-
nique that amounts to employing the contrapositive, which embodies a ba-
sic proof strategy.
Definition 10 (Contrapositive). The contrapositive is a way of reasoning
that says: in order to prove P ⇒ Q, we can instead prove ¬Q ⇒ ¬P . Put
formally:
(P ⇒ Q) iff (¬Q ⇒ ¬P )
We are going to use the following instance of the contrapositive for
undecidability arguments:
Decidable(L) ⇒ Decidable(HP )
i.e., that if L was decidable, then we could decide the halting problem. But
since HP is not decidable, then neither is L.
This approach to undecidability proofs is called reduction. In particular,
the above approach is said to be a reduction from HP to L.7 We can also
reduce from other undecidable problems, if that is convenient.
Remark. It may seem intuitive to reason as follows:
If HP is a subset of L, then L is undecidable. Why? Because if
all occurrences of HP are found in L, deciding L has to be at least as
hard as solving HP .
7
You will sometimes hear people saying things like “we prove L undecidable by re-
ducing L to HP ”. This is backwards, and you will just have to make the mental transla-
tion.
90
This view is, however, deeply and horrifically wrong. The problem
with this argument is that HP ⊆ Σ∗ but Σ∗ is decidable.
We now go through a few examples of undecidability proofs. They all
share a distinctive pattern of reasoning.
91
Proof. Suppose HP 42 is decidable. We therefore have a decider D that,
given input hMi, accepts if M accepts 42 and rejects otherwise. Now we
construct a TM Q for deciding HP . Q performs the following steps:
92
Example 53 (HP ∀ is undecidable).
Proof. Suppose D decides HP ∀ . We can then decide HP 42 with the follow-
ing TM Q:
4. Run D on hN, Loopi and reverse its verdict, i.e., switch Accept with
Reject, and Reject with Accept.
93
We now prove a theorem that provides a sweeping generalization of
many undecidability proofs. But first we need the following concept:
Definition 11 (Index set). A set of TMs S is an index set if and only if it has
the following property:
This formula expresses, in a subtle way, that an index set contains all (and
only) those TMs having the same language. Thus index sets let us focus on
properties of a TM that are about the language of the TM, or the function
computed by the TM, and not the actual components making up the TM,
or how the TM executes.
Example 55. The following are index sets: H42 , HSome , HEvery , Const 42 , and
Finite. HP and AP can’t be index sets, since they are sets of hM, wi strings
rather than the required hMi strings. Similarly, Equiv is not an index set
since it is a set of hM1 , M2 i strings. The following languages are also not
index sets:
• {hMi | M has exactly 4 control states}. This is not an index set since
it is easy to provide TMs M1 and M2 such that L(M1 ) = L(M2 ) = ∅,
but M1 has 4 control states, while M2 has 3.
• {hMi | ∀x. M halts in at most 2 × len(x) steps, when run on x}. This is
not an index set since there exist TMs M1 , M2 and input x such that
L(M1 ) = L(M2 ) but M1 halts in exactly 2 × len(x) steps, while M2
takes a few more steps to halt.
• {hMi | M accepts more inputs than it has states}. This is not an index
set since, for example, there exists a TM M1 with 6 states that accepts
all binary strings of length 3 (so the number of strings in the language
of M1 is 8) and a TM M2 with 8 states that accepts the same language.
Theorem 4 (Rice). Let S be an index set such that there is at least one TM in S,
and not all TMs are in S. S is undecidable.
Proof. Towards a contradiction, suppose index set S is decided by TM D.
Consider Loops, a TM that loops on every input: L(Loops) = ∅. Either
Loops ∈ S or Loops ∈
/ S. Let’s do the case where Loops ∈ / S. Since S 6= ∅,
there is some TM K ∈ S. We can decide HP by the following machine Q:
94
1. The input to Q is hM, wi.
3. Build the machine description for the following machine N and write
it to tape:
Example 56. Since Finite is an index set, and (1) the language of at least
one TM is finite, and (2) the language of at least one TM is not finite, Rice’s
theorem allows us to immediately conclude that Finite is undecidable.
95
Example 57. The recognizable languages are not closed under comple-
ment.
Consider the complement of the halting problem (HP ). This is the set
of all hM, wi pairs where M does not halt on w. If this problem was rec-
ognizable, then we could get a decision procedure for the halting prob-
lem. How? Since HP is recognizable and (by assumption) HP is recog-
nizable, a decision procedure can be built that works as follows: on in-
put hM, wi, incrementally execute both recognizers for the two languages.
Since HP ∪ HP = Σ∗ , one of the recognizers will eventually accept hM, wi.
So in finite time, the halting (or not) of M on w will be detected, for any
M and w. But this can’t be because we have already shown that HP is un-
decidable. Thus HP can’t be recognizable and so must be a member of a
class of languages properly outside the recognizable languages.
Thus the set of all halting programs is recognizable, but the set of all
programs that do not halt can’t be recognizable for otherwise the set of
halting programs would be decidable, and we already know that such is
not the case.
There are many more non-recognizable languages. To prove that such
languages are indeed non-recognizable, one can use Theorem 1 or the no-
tion of reducibility. The latter is again embodied in an application of the
contrapositive:
(Recognizable(A) ⇒ Recognizable(B)) ⇒
(¬Recognizable(B) ⇒ ¬Recognizable(A))
Recognizable(A) ⇒ Recognizable(HP ) .
Example 58.
96
Chapter 4
Context-Free Grammars
Context Free Grammars (CFGs) first arose in the late 1950s as part of Noam
Chomsky’s work on the formal analysis of natural language. CFGs can
capture some of the syntax of natural languages, such as English, and also
of computer programming languages. Thus CFGs are of major importance
in Artifical Intelligence and the study of compilers.
Compilers use both automata and CFGs. Usually the lexical structure
of a programming language is given by a collection of regular expressions
which define the identifiers, keywords, literals, and comments of the lan-
guage. These regular expressions can be translated into an automaton,
usually called the lexer, which recognizes the basic lexical elements (lex-
emes) of programs. A parser for the programming language will take a
stream of lexemes coming from a lexer and build a parse tree (also known
as an abstract syntax tree or AST) by using a CFG. Thus parsing takes the
linear string of symbols given by a program text and produces a tree struc-
ture which is more suitable for later phases of compilation such as seman-
tic analysis, optimization, and code generation. This is illustrated in Fig-
ure 4.1 This is a naive picture; many compilers use more than one kind of
abstract syntax tree in their work. The main point is that tree structures
are far easier to work with than linear strings.
97
while terminals, or literals, are in lower case.
This can be pictured with a so-called parse tree, which summarizes the
ways in which the sentence may be produced.
98
SENTENCE
Q
Q
QQ
NP VP
HH
HH
H
CNOUN PP CVERB
@ @ @
@ @ @
ARTICLE NOUN PREP CNOUN VERB NP
B C
B C
B
the girl with B touches CNOUN
B @
ARTICLE NOUN @
ARTICLE NOUN
the boy
a flower
Reading the leaves of the parse tree from left to right yields the original
string. The parse tree represents the possible derivations of the sentence.
A −→ w
99
Note. V ∩ Σ = ∅. This helps us keep our sanity, because variables and ter-
minals can’t be confused. In general, our convention will be that variables
are upper-case while terminals are in lower case.
A CFG is a device for generating strings. The way a string is gener-
ated is by starting with the start variable S and performing replacements for
variables, according to the rules.
A sentential form is a string in (V ∪ Σ)∗ . A sentence is a string in Σ∗ . Thus
every sentence is a sentential form, but in general a sentential form might
not be a sentence, in particular when it has variables occurring in it.
Example 60. If Σ = {0, 1} and V = {U, W }, then 00101 is a sentence and
therefore a sentential form. On the other hand, W W and W 01U are sen-
tential forms that are not sentences.
To rephrase our earlier point: a CFG is a device for generating, ulti-
mately, sentences. However, at intermediate points, the generation pro-
cess will produce sentential forms.
Definition 13 (One step replacement). Let u, v, w ∈ (V ∪ Σ)∗ . Let A ∈ V .
We write
G
uAv ⇒ uwv
to stand for the replacement of variable A by w at the underlined location.
This replacement is only allowed if there is a rule A −→ w in R. When it is
G
clear which grammar is being referred to, the G in ⇒ will be omitted.
Thus we can replace any variable A in a sentential form by its ‘right
hand side’ w. Note that there may be more than one occurrence of A in
the sentential form; in that case, only one occurrence may be replaced in a
step. Also, there may be more than one variable possible to replace in the
sentential form. In that case, it is arbitrary which variable gets replaced.
Example 61. Suppose that we have the grammar (V, Σ, R, S) where V =
{S, U} and Σ = {a, b} and R is given by
S −→ UaUbS
U −→ a
U −→ b
Then we can write S ⇒ UaUbS. Now consider UaUbS. There are 3 loca-
tions of variables that could be replaced (two Us and one S). In one step
we can get to the following sentential forms:
100
• UaUbS ⇒ UaUbUaUbS (Replacing S)
is said to be a derivation of w.
Now we can define the set of strings derivable from a grammar, i.e.,
the language of the grammar: it is the set of sentences, i.e., strings lacking
variables, generated by G.
101
One question that is often asked is Why Context-Free?; in other words,
what aspect of CFGs is ‘free of context’ (whatever that means)? The an-
swer comes from examining the allowed structure of a rule. A rule in a
context-free grammar may only have the form V −→ w. When making a
replacement for V in a derivation, the symbols surrounding V in the sen-
tential form do not affect whether the replacement can take place or not.
Hence context-free. In contrast, there is a class of grammars called context-
sensitive grammars, in which the left hand side of a rule can be an arbitrary
sentential form; such a rule could look like abV c −→ abwc, and a replace-
ment for V would only be allowed in a sentential form where V occurred
in the ‘context’ abV c. Context-sensitive grammars, and phrase-structure
grammars are more powerful formalisms than CFGs, and we won’t be
discussing them in the course.
• S⇒ε
• S ⇒ 0S1 ⇒ 0ε1 ⇒ 01
We “see” that L(G) = {0n 1n | n ≥ 0}. A rigorous proof of this would re-
quire proving the statement
and the proof would proceed by induction on the length of the derivation.
Example 63. Give a CFG for the language L = {0n 12n | n ≥ 0}.
The answer to this can be obtained by a simple adaptation of the gram-
mar in the previous example:
S −→ ǫ
S −→ 0S11
102
Convention. We will usually be satisfied to give a CFG by giving its rules.
Usually, the start state will be named S, and the variables will be written
in upper case, while members of Σ will be written in lower case. Fur-
thermore, multiple rules with the same left-hand side will be collapsed
into a single rule, where the right-hand sides are separated by a |. Thus,
the previous grammar could be completely and unambiguously given as
S −→ ε | 0S11.
Example 64 (Palindromes). Give a grammar for generating palindromes
over {0, 1}∗ . Recall that the palindromes over alphabet Σ can be defined
as PAL = {w ∈ Σ∗ | w = w R }. Some examples are 101 and 0110 for binary
strings. For ASCII, there are some famous palindromes:1
• madamImadam, the first thing said in the Garden of Eden.
• amanaplanacanalpanama
• 1w1 ∈ PAL
A little thought doesn’t reveal any other ways of building elements of PAL,
so our final grammar is
S −→ ε | 0 | 1 | 0S0 | 1S1.
103
S −→ ε. Now we assume that we have a string w with balanced parenthe-
ses, and want to generate a new string in the language from w. There are
two ways of doing this:
• (w)
• ww
So the grammar can be given by
S −→ ε | (S) | SS.
Example 66. Give a grammar that generates
S −→ ε | 0S1 | 1S0
doesn’t work. Also, we couldn’t just add 0w0 and 1w1 in an effort to repair
this shortcoming, because then we could generate strings not in L, such as
00.
We want to think of S as denoting all strings with an equal number
of 0s and 1s. The previous attempts have the right idea—take a balanced
string w and make another balanced string from it—but only add the 0s
and 1s at the outer edges of the string. Instead, we want to add them at
internal locations as well. The following grammar supports this:
S −→ ε | S0S1 | S1S0
S −→ ε | 0S1 | 1S0 | SS
104
Here’s a derivation of the string 03 16 03 using grammar (4.1).
S ⇒ S1S0S
⇒ S1S1S0S0S
⇒ S1S1S1S0S0S0S
∗
⇒ S1ε1ε1ε0ε0ε0ε
⇒ S0S1S1ε1ε1ε0ε0ε0ε
⇒ S0S0S1S1S1ε1ε1ε0ε0ε0ε
⇒ S0S0S0S1S1S1S1ε1ε1ε0ε0ε0ε
∗
⇒ ε0ε0ε0ε1ε1ε1ε1ε1ε1ε0ε0ε0ε
= 000111111000
= 03 1 6 0 3 .
∗
Note that we used ⇒ to abbreviate multiple steps.
G = (V, Σ, R, S)
G1 = (V1 , Σ1 , R1 , S1 )
G2 = (V2 , Σ2 , R2 , S2 )
105
Assume V1 ∩ V2 = ∅. Let S0 , S3 and S4 be variables not occurring in
V ∪ V1 ∪ V2 . These assumptions are intended to avoid confusion when
making the constructions.
R5 = R ∪ {S0 −→ S0 S | ε} .
Remark. One might ask: what about closure under intersection and comple-
ment? It happens that the CFLs are not closed under intersection and we
can see this by the following counterexample.
Example 67. Let grammar G1 be given by the following rules:
A −→ P Q
P −→ aP b | ε
Q −→ cQ | ε
106
Then L(G1 ) = {ai bi cj | i, j ≥ 0}. Let grammar G2 be given by
B −→ RT
R −→ aR | ε
T −→ bT c | ε
Then L(G2 ) = {ai bj cj | i, j ≥ 0}. Thus L(G1 ) ∩ L(G2 ) = {ai bi ci | i, j > 0}.
But this is not a context-free language, as we shall see after discussing the
pumping lemma for CFLs.
Example 68. Construct a CFG for
L = {0m 1n | m 6= n} .
L1 = {0m 1n | m < n}
L2 = {0m 1n | m > n}
S1 −→ 1 | S1 1 | 0S1 1
S2 −→ 0 | 0S2 | 0S2 1 .
L = {x#y | xR is a substring of y}
= {x#uxR v | x, u, v ∈ {0, 1}∗ }
= {x#uxR | x, u ∈ {0, 1}∗ } · {v | v ∈ {0, 1}∗ }
| {z } | {z }
L1 L2
S2 −→ ε | 0S2 | 1S2
107
A grammar for L1 :
S1 −→ 0S1 0 | 1S1 1 | #S2
Thus, the final grammar is
S −→ S1 S2
S1 −→ 0S1 0 | 1S1 1 | #S2
S2 −→ ε | 0S2 | 1S2
S1 −→ AB
A −→ 0A1 | ε
B −→ 1B | 1
The second language can be rephrased as {0j+k 1j | k < j} and that can be
rephrased in terms of k (letting j = k + ℓ + 1, for some ℓ):
and from this we have the grammar for the second language
S2 −→ 0X1
X −→ 00X1 | Y
Y −→ 0Y 1 | ε
Putting it all together gives
S −→ S1 | S2
S1 −→ AB
A −→ 0A1 | ε
B −→ 1B | 1
S2 −→ 0X1
X −→ 00X1 | A
108
Example 71. Give a CFG for L = {ai bj ck | i = j + k}.
If we note that
L = {aj ak bj ck | j, k ≥ 0}
= {ak aj bj ck | j, k ≥ 0}
we quickly get the grammar
S −→ aSc | A
A −→ aAb | ε
Example 72. Give a CFG for L = {ai bj ck | i 6= j + k}.
The solution begins by splitting the language into two pieces:
L = {ai bj ck | i 6= j + k}
= {ai bj ck | i < j + k} ∪ {ai bj ck | j + k < i}
| {z } | {z }
L1 L2
In L1 , there are more bs and cs, in total, than as. We again start by attempt-
ing to scrub off equal numbers of as and cs. At the end of that phase, there
may be more as left, in which case the cs are gone, or, there may be more
cs left, in which case the as are gone.
S1 −→ aS1 c | A | B
A −→ aAb | C
B −→ bD | Dc
The rule for A scrubs off any remaining as, leaving a non-empty string of
bs. The rule for B deals with a (non-empty) string bi cj . Thus we add the
rules
C −→ b | bC
D −→ EF
E −→ ε | bE
F −→ ε | cF
To obtain a grammar for L2 is easier:
S2 −→ aS2 c | B2
B2 −→ aB2 b | C2
C2 −→ aC2 | a
And finally we complete the grammar with
S −→ S1 | S2
109
Example 73. Give a CFG for
L = {am bn cp dq | m + n = p + q} .
This example takes some thought. At its core, the problem is (essen-
tially) a perverse elaboration of the language {0n 1n | n ≥ 0} (which is gen-
erated by the rules S −→ ε | 0S1). Now, strings in L have the form
am bn cp d q
where m + n = p + q and the double line marks the midpoint in the string.
We will build the grammar in stages. We first construct a rule that will
‘cancel off’ a and d symbols from the outside-in :
S −→ aSd
In fact, min(m, q) symbols get cancelled. After this step, there are two cases
to consider:
1. m ≤ q, i.e., all the leading a symbols have been removed, leaving the
remaining string bn cp di , where i = q − m.
2. q ≤ m, i.e., all the trailing d symbols have been removed, leaving the
remaining string aj bn cp , where j = m − q.
bn cp d i
We now cancel off b and d symbols from the outside-in (if possible—it
could be that i = 0) using the following rule:
A −→ bAd
After this rule finishes, all trailing d symbols have been trimmed and the
situation looks like
bn−i cp
Now we can use the rule
C −→ bCc | ε
110
to trim all the matching b and c symbols that remain (there must be an
equal number of them). Thus, for this case, we have constructed the gram-
mar
S −→ aSd | A
A −→ bAd | C
C −→ bCc | ε
The second case, q ≤ m, is completely similar: the situation looks like
aj bn cp
We now cancel off a and c symbols from the outside-in (if possible—it
could be that i = 0) using the following rule:
B −→ aBc
After this rule finishes, all trailing c symbols have been trimmed and the
situation looks like
bn cp−j
Now we can re-use the rule
C −→ bCc | ε
to trim all the matching b and c symbols that remain. Thus, to handle the
case q ≤ m we have to add the rule B −→ aBc to the grammar, resulting
in the final grammar
S −→ aSd | A | B
A −→ bAd | C
B −→ aBc | C
C −→ bCc | ε
Now we examine a few problems about the language generated by a gram-
mar.
Example 74. What is the language generated by the grammar given by the
following rules?
S −→ ABA
A −→ a | bb
B −→ bB | ε
The answer is easy: (a + bb)b∗ (a + bb). The reason why it is easy is that
an A leads in one step to terminals (either a or bb); also, B expands to an
arbitrary number of bs.
111
Now for a similar grammar which is harder to understand:
Example 75. What is the language generated by the grammar given by the
following rules?
S −→ ABA
A −→ a | bb
B −→ bS | ε
We see that the grammar is nearly identical to the previous, except for
recursion on the start variable: a B can expand to bS, which means that
another trip through the grammar will be required. Let’s generate some
sentential forms to get a feel for the language (it will be useful to refrain
from substituting for A):
112
Example 76. Prove that every string produced by
113
4.2 Ambiguity
It is well known that natural languages such as English allow ambiguous
sentences: ones that can be understood in more than one way. At times
ambiguity arises from differences in the semantics of words, e.g., a word
may have more than one meaning. One favourite example is the word
livid, which can mean ‘ashen’ or ‘pallid’ but could also mean ‘black-and-
blue’. So when one is livid with rage, is their face white or purple?
Ambiguity of a different sort is found in the following sentences: com-
pare Fruit flies like a banana with Time flies like an arrow. The structure of
the parse trees for the two sentences are completely different. In natural
languages, ambiguity is a good thing, allowing much richness of expres-
sion, including puns. On the other hand, ambiguity is a terrible thing for
computer languages. If a grammar for a programming language allowed
some inputs to be parsed in two different ways, then different compilers
could compile a source file differently, which leads to much unhappiness.
In order to deal with ambiguity formally, we have to make a few def-
initions. To assert that a grammar is ambiguous, we really mean to say
that some string has more than one parse tree. But we want to avoid for-
malizing what parse trees are. Instead, we’d like to formalize the notion
in terms of derivations. However, we can’t simply say that a grammar is
ambiguous if there is some string having more than one derivation. That
doesn’t work, since there can be many ‘essentially similar’ derivations of
a string. (In fact, this is exactly what a parse tree captures.) The following
notion forces some amount of determinism on all derivations of a string.
But that can’t take care of it all. The choice of variable to be replaced
in a leftmost derivation might be fixed, but there could be multiple right
hand sides for that variable. This is what leads to different parse trees.
114
Example 77. Let G be
E −→ E+E
| E−E
| E∗E
| E/E
| −E
| C
| V
| (E)
C −→ 0|1
V −→ x|y|z
That G is ambiguous is easy to see: consider the expression x + y ∗ z. By
expanding the ‘+’ rule first, we have a derivation that starts E ⇒ E + E ⇒
· · · and the expression would be parsed as x + (y ∗ z). By expanding the
‘∗’ rule first, we have a derivation that starts E ⇒ E ∗ E ⇒ · · · and the
expression would be parsed as (x + y) ∗ z.
115
than ∗, but also − binds tighter than +. We can summarize this for arith-
metic operations as follows:
E −→ E +T |E −T |T
T −→ T ∗ U | T /U | U
U −→ −U | F
F −→ C | V | (E)
C −→ 0|1
V −→ x|y|z
E ⇒ E + T ⇒ (T − T ) + T ⇒ (U − T ) + T
⇒ (F − T ) + T ⇒ (V − T ) + T ⇒ (x − T ) + T
⇒ (x − (T ∗ U)) + T ⇒ (x − (U ∗ U)) + T ⇒ (x − (F ∗ U)) + T
∗
⇒ (x − (y ∗ U)) + T ⇒ (x − (y ∗ z)) + T
∗
⇒ (x − (y ∗ z)) + x
116
as the dangling else. A skeletal grammar including both forms is
S −→ if B then A | A
A −→ if B then A else S | C
B −→ b1 | b2 | b3
C −→ a1 | a2 | a3
Then the sentence
can be parsed as
or as
if b1 then (if b2 then a1 else if b3 then a2 ) else a3
How can this be repaired?
117
The previous algorithm propagates markings from right to left in rules.
To compute the reachable variables, we do the opposite: processing pro-
ceeds top down and from left to right. Thus, we begin by marking the start
variable. Then we look at the rhs of every production of the form S −→ rhs
and mark every unmarked variable in rhs. We continue in this way until
no unmarked variables become marked. The reachable variables are those
that are marked.
Definition 22 (Useful variables). A variable A in a context-free grammar
G = (V, Σ, R, S) is said to be useful if for some string x ∈ Σ∗ there is a
∗ ∗
derivation of x that takes the form S ⇒ αAβ ⇒ x. A variable that is not
useful is said to be useless. If a variable is not live or is not reachable then
it is clearly useless.
Example 80. Find a grammar having no useless variables which is equiv-
alent to the following grammar
S −→ ABC | BaB
A −→ aA | BaC | aaa
B −→ bBb | a
C −→ CA | AC
The reachable variables of this grammar are {S, A, B, C} and the live
variables are {A, B, S}. Since C is not live, L(C) = ∅, hence L(ABC) = ∅
and also L(BaC) = ∅, so we can delete the rules S −→ ABC and A −→
BaC to obtain the new, equivalent, grammar
S −→ BaB
A −→ aA | aaa
B −→ bBb | a
In this grammar, A is not reachable, so any rules with A on the lhs can be
dropped. This leaves
S −→ BaB
B −→ bBb | a
118
hand side of a rule were eliminated. Similarly, a rule such as P −→ Q is
an indirection that can seemingly be eliminated. A grammar in Chomsky
Normal Form2 is one in which these redundancies do not occur. However,
the simplification steps are somewhat technical, so we will have to take
some care in their application.
Definition 23 (Chomsky Normal Form). A grammar is in Chomsky Nor-
mal Form if every rule has one of the following forms:
• A −→ BC
• A −→ a
where A, B, C are variables, and a is a terminal. Furthermore, in all rules
A −→ BC, we require that neither B nor C are the start variable for the
grammar. Notice that the above restrictions do not allow a rule of the form
A −→ ε; however, this will disallow some grammars. Therefore, we allow
the rule S −→ ε, where S is the start variable.
The following algorithm translates grammar G = (V, Σ, R, S) to Chom-
sky Normal Form:
1. Create a new start variable S0 and add the rule S0 −→ S. Now the
start variable is not on the right hand side of any rule.
2. Eliminate all rules of the form A −→ ε. For each rule of the form
V −→ uAw, where u, w ∈ (V ∪ Σ)∗ , we add the rule V −→ uw. It is
important to notice that we must do this for every occurrence of A in
the right hand side of the rule. Thus the rule
V −→ uAwAv
119
3. Eliminate all rules which merely replace one variable by another, e.g.,
V −→ W . These are sometimes called unit rules. Thus, for each rule
W −→ u where u ∈ (V ∪ Σ)∗ , we add V −→ u.
S −→ ASA | aB
A −→ B | S
B −→ b | ε
1. Add new start variable. This is accomplished by adding the new rule
S0 −→ S.
120
is transformed to
S0 −→ S
S −→ ASA | aB | a
A −→ B|S|ε
B −→ b
Notice that, for example, we don’t drop A −→ B; instead we keep it
and add A −→ ε. So we’ve dropped one ε-rule and added another.
3. Eliminate A −→ ε. This yields the following grammar:
S0 −→ S
S −→ ASA | AS | SA | S | aB | a
A −→ B|S
B −→ b
We have now finished eliminating ε-rules and can move to eliminat-
ing unit rules.
4. Eliminate S −→ S. This illustrates a special case: when asked to
eliminate a rule V −→ V , the rule may simply be dropped without
any more thought. Thus we have the grammar
S0 −→ S
S −→ ASA | AS | SA | aB | a
A −→ B|S
B −→ b
121
7. Eliminate A −→ S. In this case, that means that wherever there is a
rule S −→ w, we will add A −→ w. Thus we have
S0 −→ ASA | AS | SA | aB | a
S −→ ASA | AS | SA | aB | a
A −→ ASA | AS | SA | aB | a | b
B −→ b
That finishes the elimination of unit rules. Now we map the gram-
mar to binary form.
S0 −→ AA1 | AS | SA | UB | a
S −→ AA1 | AS | SA | UB | a
A −→ AA1 | AS | SA | UB | a | b
A1 −→ SA
B −→ b
U −→ a
122
• a new terminal symbol appears.
123
number of these) checking each to see if w is derived. If it is, then accep-
tance, otherwise no derivation of w of length 2n−1 exists, so no derivation
of w exists at all, so rejection. Again, this is quite inefficient.
Fortunately, there are general algorithms for context-free parsing that
run relatively efficiently. We are going to look at one, known as the CKY al-
gorithm3 , which is directly based on grammars in Chomsky Normal Form.
If G is in CNF then it has only rules of the form
S −→ V1 V2 | . . .
(For the moment, we’ll ignore the fact that a rule S −→ ε may be allowed.
Also we will ignore rules of the form V −→ a.) Suppose that we want to
parse the string
w = w1 w2 . . . wn
∗
Now, S ⇒ w if
S ⇒ V1 V2 and
∗
V1 ⇒ w1 . . . wi and
∗
V2 ⇒ wi+1 . . . wn
for some splitting of w at index i. This recursive splitting process proceeds
until the problem size becomes 1, i.e., the problem becomes one of finding
a rule V −→ wi that generates a single terminal.
Now, of course, the problem is that there are n − 1 ways to split a string
of length n in two pieces having at least one symbol each. The algorithm
considers all of the splits, but in a clever way. The processing goes bottom-
up, dealing with shorter strings before longer ones. In this way, solutions
to smaller problems can be re-used when dealing with larger problems.
Thus this algorithm is an instance of the technique known as dynamic pro-
gramming.
The main notion in the algorithm is
N[i, i + j]
which denotes the set of variables in G that can derive the substring wi . . . wi+j−1 .
Thus N[i, i + 1] refers to the variables that can derive the single symbol wi .
If we can properly implement this abstraction, then all we have to do to de-
∗
cide if S ⇒ w, roughly speaking, is compute N[1, n + 1] and check whether
3
After the co-inventors Cocke, Kasami, and Younger.
124
S is in the resulting set. (Note: we will index strings starting at 1 in this
section.)
Thus we will systematically compute the following, moving from a
step-size of 1, to one of n, where n is the length of w:
Step size
• Eliminate S −→ ε
S0 −→ S | ε
S −→ (S) | () | S | SS
• Drop S −→ S
S0 −→ S | ε
S −→ (S) | () | SS
• Eliminate S0 −→ S
S0 −→ ε | (S) | () | SS
S −→ (S) | () | SS
125
• Put in binary rule format. We add two rules for deriving the opening
and closing parentheses:
L −→ (
R −→)
and then the final grammar is
S0 −→ ε | LA | LR | SS
S −→ LA | LR | SS
A −→ SR
L −→ (
R −→ )
Now, let’s try the algorithm on parsing the string (()(())) with this
grammar. The length n of this string is 8. We start by constructing an
array N with n + 1 = 9 rows and n columns. Then we write the string to
be parsed along the diagonal:
1 (
2 (
3 )
4 )
5 (
6 )
7 )
8 )
9
1 2 3 4 5 6 7 8
Now we consider, for each substring of length 1 in the string, the vari-
ables that could derive it. For example, the element at N[2, 3] will be L,
since the rule L −→ ( can be used to generate a ‘(’ symbol. In this way,
each N[i, i + 1], i.e., just below the diagonal is filled in:
126
1 (
2 L (
3 L )
4 R (
5 L (
6 L )
7 R )
8 R )
9 R
1 2 3 4 5 6 7 8
Now we consider, for each substring of length 2 in the string, the vari-
ables that could derive it. Now here’s where the cleverness of the algo-
rithm manifests itself. All the information for N[i, i + 2] can be found by
looking at N[i, i + 1] and N[i + 1, i + 2]. So we can re-use information al-
ready calculated and stored in N. For strings of length 2, it’s particularly
easy, since the relevant information is directly above and directly to the
right. For example, the element at N[1, 3] is calculated by asking “is there
a rule of the form V −→ LL?” There is none, so N[1, 3] = ∅. Similarly,
the entry at N[2, 4] = S0 , S because of the rules S0 −→ LR and S −→ LR.
Proceeding in this way, the next diagonal of the array is filled in as follows:
1 (
2 L (
3 ∅ L )
4 S0 , S R (
5 ∅ L (
6 ∅ L )
7 S0 , S R )
8 ∅ R )
9 ∅ R
1 2 3 4 5 6 7 8
Now substrings of length 3 are addressed. It’s important to note that
all ways of dividing a string of length 3 into non-empty substrings has to
be considered. Thus N[i, i+3] is computed from N[i, i+1] and N[i+1, i+3]
as well as N[i, i + 2] and N[i + 2, i + 3]. For example, let’s calculate N[1, 4]
127
• N[1, 2] = L and N[2, 4] = S, but there is no rule of the form V −→ LS,
so this split produces no variables
• N[1, 3] = ∅ and N[3, 4] = R, so this split produces no variables also
Hence N[1, 4] = ∅. By similar calculations, N[2, 5], N[3, 6], N[4, 7] are all ∅.
In N[5, 8] though, we can use the rule A −→ SR to derive N[5, 7] followed
by N[7, 8]. Thus the next diagonal is filled in:
1 (
2 L (
3 ∅ L )
4 ∅ S0 , S R (
5 ∅ ∅ L (
6 ∅ ∅ L )
7 ∅ S0 , S R )
8 A ∅ R )
9 ∅ ∅ R
1 2 3 4 5 6 7 8
Filling in the rest of the diagonals yields
1 (
2 L (
3 ∅ L )
4 ∅ S0 , S R (
5 ∅ ∅ ∅ L (
6 ∅ ∅ ∅ ∅ L )
7 ∅ ∅ ∅ ∅ S0 , S R )
8 ∅ S0 , S ∅ S0 , S A ∅ R )
9 S0 , S A A ∅ ∅ ∅ R
1 2 3 4 5 6 7 8
Since S0 ∈ N[1, 9], we have shown the existence of a parse tree for the
string (()(())).
An implementation of this algorithm can be coded in a concise triply-
nested loop of the form:
For each substring length
For each substring u of that length
128
For each split of u into non-empty pieces
....
As a result, the running time of the algorithm is O(n3 ) in the length of
the input string.
Other algorithms for context-free parsing are more popular than the
CKY algorithm. In particular, a top-down CFL parser due to Earley is
more efficient in many cases.
Decidable. How?
fullCFL. Does a CFG generate all strings over the alphabet?
Undecidable.
subCFL. Does one CFG generate a subset of the strings generated by an-
other?
subCFL = {hG1 , G2 i | L(G1 ) ⊆ L(G2 )}
Undecidable.
sameCFL. Do two CFGs generate the same language?
Undecidable.
ambigCFG. Is a CFG ambiguous, i.e., is there some string w ∈ L(G) with
more than one leftmost derivation using G? Undecidable.
129
4.6 Push Down Automata
Push Down Automata (PDAs) are a machine counterpart to context-free
grammars. PDAs consume, or process, strings, while CFGs generate strings.
A PDA can be roughly characterized as follows:
Empty Test the stack to see if it is empty. We won’t use this feature in our
work.
Only the top of the stack may be accessed in any one step; multiple pushes
and pops can be used to access other elements of the stack.
Use of the stack puts an explicit memory at our disposal. Moreover, a
stack can hold an unbounded amount of information. However, the con-
straint to access the stack in LIFO style means that use of memory is also
constrained.
Here’s the formal definition.
130
• Γ is the stack alphabet (finite set of symbols). Σ ⊆ Γ. As for Turing
machines, the need for Γ being an extension of Σ comes from the
fact that it is sometimes convenient to use symbols other than those
found in the input alphabet as special markers in the stack.
• δ : Q × (Σ ∪ {ε}) × (Γ ∪ {ε}) −→ 2Q×(Γ∪{ε}) is the transition function.
Although δ seems daunting, it merely incorporates the use of the
stack. When making a transition step, the machine uses the current
input symbol and the top of the stack in order to decide what state to
move to. However, that’s not all, since the machine must also update
the top of the stack at each step.
It is obviously complex to make a step of computation in a PDA. We
have to deal with non-determinism and ε-transitions, but the stack must
also be taken account of. Suppose q ∈ Q, a ∈ Σ, and u ∈ Γ. Then a
computation step
δ(q, a, u) = {(q1 , v1 ), . . . , (qn , vn )}
means that if the PDA is in state q, reading tape symbol a, and symbol u
is at the top of the stack, then there are n possible outcomes. In outcome
(qi , vi ), the machine has moved to state qi , and u at the top of the stack has
been replaced by vi .
Descriptions of how the top of stack changes in a computation step
seem peculiar at first glance. We summarize the possibilities in the follow-
ing table.
131
• When pushing symbol x, the configuration changes from (q, a·w, ε·t)
to (qi , w, x · t).
• When popping symbol x, the configuration changes from (q, a·w, x·t)
to (qi , w, t).
2. Computation steps
132
3. Final condition rm ∈ F and sm = ε.
q p
c, ε → ε
133
At this point the input string is exhausted and the computation stops. We
cannot accept the original string—although we are in an accept state—
because the stack is not empty. Thus we see that
L(M) = {wcw R | w ∈ {a, b}∗ } .
This example used c as a marker telling the machine when to change states.
It turns out that such an expedient is not needed because we have non-
determinism at our disposal.
Example 84. Find a PDA for L = {ww R | w ∈ {a, b}∗ }.
The PDA is just that of the previous example with the seemingly innocu-
ous alteration of the transition from p to q to be an ε-transition.
a, ε → a a, a → ε
b, ε → b b, b → ε
q p
ε, ε → ε
134
Example 85. Build a PDA to recognize
L = {ai bj ck | i + k = j}
The basic idea in finding a solution is to use states to enforce the order
of occurrences of letters, and to use the stack to enforce the requirement
that i + k = j.
b, ε → b
a, ε → a b, a → ε c, b → ε
q0 q1 q2
ε, ε → ε ε, ε → ε
a, b → ε
b, a → ε
a, ε → a
b, ε → b
135
Example 87. Give a PDA for L = {x ∈ {a, b}∗ | count(a, x) < 2 ∗ count(b, x)}.
This problem can be rephrased as: x is in L if, after doubling the number
of b’s in x, we have more b’s than a’s. We can build a machine to do this
explicitly: every time it sees a b in the input string, it will treat it as 2
consecutive b’s.
ε, ε → b
b, ε → b b, a → ε
a, ε → a ε, a → ε
a, b → ε ε, ε → b
ε, b → ε
ε, b → ε
a, ε → a b, a → ε b, ε → ε
b, a → ε
ε, ε → ε ε, ε → ε
Note that we could shrink this machine, by merging the last two states:
a, ε → a b, a → ε
b, a → ε
ε, ε → ε
b, ε → ε
136
This machine non-deterministically chooses to cancel one or two b symbols
for each a seen in the input. Note that we could also write an equivalent
machine that non-deterministically chooses to push one or two a symbols
to the stack: b, a → ε
a, ε → a
ε, ε → ε
a, ε → a
ε, ε → a
L = {ai bj | 2i = 3j}
ε, ε → ε
q0 q2
a, ε → a b, a → ε ε, a → ε
ε, ε → a
q1 q3 q4
ε, a → ε
L = {ai bj | 2i 6= 3j}
This is a more difficult problem. We will be able to re-use the basic idea
of the previous example, but must now take extra cases into account. The
success state q2 of the previous example will now change into a reject state.
But there is much more going on. We will build the solution incrementally.
The basic skeleton of our answer is
137
ε, ε → ε
q0 q2
a, ε → a b, a → ε ε, a → ε
ε, ε → a
q1 q3 q4
ε, a → ε
If we arrive in q2 where the input has been exhausted and the stack is
empty, we should reject, and that is what the above machine does. The
other cases in q2 are
• There is remaining input and the stack is not empty. This case is
already covered: go to q3 .
• There is remaining input and the stack is empty. We can assume that
the head of the remaining input is a b. (All the leading a symbols
have already been dealt with in the q0 , q1 pair.) We need to transition
to an accept state where we ensure that the rest of the input is all b
symbols. Thus we invent a new accept state q5 where we discard the
remaining b symbols in the input.
ε, ε → ε
q0 q2
b, ε → ε
a, ε → a b, a → ε
ε, ε → a ε, a → ε b, ε → ε
q1 q3 q4 q5
ε, a → ε
We further notice that this situation can happen in q3 and q4 , so we
add ε-transitions from them to q5 :
ε, ε → ε
q0 q2
b, ε → ε
a, ε → a b, a → ε
ε, ε → a ε, a → ε b, ε → ε
q1 q3 q4 q5
ε, a → ε ε, ε → ε
ε, ε → ε
138
• The input is exhausted, but the stack is not empty. Thus we have
excess a symbols on the stack and we need to jettison them before
accepting. This is handled in a new final state q6 :
ε, a → ε
ε, ε → ε ε, a → ε
q0 q2 q6
a, ε → a b, a → ε b, ε → ε
ε, ε → a ε, a → ε b, ε → ε
q1 q3 q4 q5
ε, a → ε ε, ε → ε
ε, ε → ε
This is the final PDA.
139
stack will get replaced by the rhs of some grammar rule. Of course, there
are several problems with implementing this concept. For one, the PDA
can only access the top of its stack: it can’t find a variable below the top.
For another, even if the PDA could find such a variable, it couldn’t fit the
rhs into a single stack slot. But these are not insurmountable. We simply
have to arrange things so that the PDA always has the leftmost variable
of the sentential form on top of the stack. If that can be set up, the PDA
can use the technique of using extra states to push multiple symbols ‘all at
once’.
The other consideration is that we are constructing a PDA after all,
so it needs to consume the input string and give a verdict. This fits in
nicely with our other requirements. In brief, the PDA will use ε-transitions
to push the rhs of rules into the stack, and will use ‘normal’ transitions
to consume input. In consuming input, we will be able to remove non-
variables from the top of the stack, always guaranteeing that a variable is
at the top of the stack.
Here are the details. Let G = (V, Σ, R, S). We will construct M =
∪ Σ}, δ, q0 , {q}) where
(Q, Σ, |V {z
Γ
• Q = {q0 , q} ∪ RuleStates;
• δ has rules for getting started, for consuming symbols from the input,
and for pushing the rhs of rules onto the stack.
140
ε, V → wn ε, ε → wn−1 ε, ε → w2 ε, ε → w1
q q
which pushes the rhs of the rule, using the n − 1 states. Note
that the symbols in the rhs of the rule are pushed on the stack
in right-to-left order.
S −→ aS | aSbS | ε
The corresponding PDA is
ε, S → ε
a, a → ε
b, b → ε
ε, ε → S
A B ε, ε → a F
ε, ε → a ε, S → S ε, ε → S
C ε, S → S D E
ε, ε → b
141
As a sequence of machine configurations, this looks like
142
(pushing b on the stack), ε, ε → ε, or a, ε → ε (ignoring stack) are
allowed. How can these be eliminated from δ without changing the
behaviour? First we need to make sure that the stack is never empty,
for if M ′ is going to look at the top element of the stack, the stack had
better never be empty. This can be ensured by starting the compu-
tation with a special token ($) in the stack and then maintaining an
invariant that the stack never thereafter becomes empty. It will also
be necessary to allow M ′ to push two stack symbols in one move:
since M ′ always looks at the top stack symbol, we need to push two
symbols in order to get the effect of a push operation on a stack. This
can be implemented by using extra states, but we will simply assume
that M ′ has this extra convenience.
Furthermore, we are going to add a new start state s and the tran-
sition ε, ε → $, which pushes $ on the stack when moving from the
new start state s to the original start state q0 . We also add a new final
state qf , with ε, $ → ε transitions from all members of F to qf . Thus
the machine M ′ has a single start state and a single end state, always
examines the top of its stack, and behaves the same as the machine
M.
• Construct G so that it simulates the working of M ′ . We first construct
the set of variables of G.
1. a, A → B
p q
143
2. a, A → BA
p p
3. a, A → ε
p p
Thus by making Vq0 $qf the start symbol of the grammar, we have
achieved our goal.
4.8 Parsing
To be added ...
144
Chapter 5
Automata
145
We said that automata are a model of computation. That means that
they are a simplified abstraction of ‘the real thing’. So what gets abstracted
away? One thing that disappears is any notion of hardware or software.
We merely deal with states and transitions between states.
We keep We drop
some notion of state notion of memory
stepping between states variables, commands, expressions
start state syntax
end states
The distinction between program and machine executing it disappears.
One could say that an automaton is the machine and the program. This
makes automata relatively easy to implement in either hardware or soft-
ware.
From the point of view of resource consumption, the essence of a finite
automaton is that it is a strictly finite model of computation. Everything
in it is of a fixed, finite size and cannot be extended in the course of the
computation.
146
An automaton processes a string on the tape by repeating the following
actions until the tape head has traversed the entire string:
• The tape head reads the current tape cell and sends the symbol s
found there to the control. Then the tape head moves to the next cell.
The tape head can only move forward.
• The control takes s and the current state and consults the state tran-
sition function to get the next state, which becomes the new current
state.
Once the entire string has been traversed, the final state is examined.
If it is an accept state, the input string is accepted; otherwise, the string is
rejected. All the above can be summarized in the following formal defini-
tion:
M = (Q, Σ, δ, q0 , F )
where
• Σ is a finite alphabet
step(M, q, a) = δ(q, a)
147
A sequence of steps ∆ is defined as
∆(M, q, ǫ) = q
∆(M, q, a · x) = ∆(M, step(M, q, a), x)
Finally, an execution of M on string x is a sequence of computation steps
beginning in the start state q0 of M:
execute(M, x) = ∆(M, q0 , x)
5.1.1 Examples
Now we shall review a collection of examples of DFAs.
• Q = {q0 , q1 , q2 , q3 }
• Σ = {0, 1}
• F = {q1 , q2 }
0 1
q0 q1 q3
q1 q2 q3
q2 q2 q2
q3 q3 q3
148
0
q0 0 q1 0 q2
1
1 1
q3
0 1
Notice that the start state is designated by an arrow with no source.
Final states are marked by double circles. The strings accepted by M are:
149
• (both) pad 1 and pad 2
We will need 2 sensors, one for each pad, and some external mecha-
nism to convert these two inputs into one of the possibilities. So our al-
phabet will be {b, f, r, n}, standing for {both, front, rear , neither}. Now the
task is to define the transition function. This is most easily expressed as a
diagram:
n, r, b f f, r, b
closed open
n
where δ is defined as
δ(open, x) = if x = n then closed else open
δ(closed , x) = if x = f then open else closed
In the course, there are two main questions asked about automata:
150
Example 94. Give a DFA for recognizing the set of all strings over {0, 1},
i.e., {0, 1}∗ . This is also known as the set of all binary strings. There is a
very simple automaton for this:
0
q0
1
Example 95. Give a DFA for recognizing the set of all binary strings be-
ginning with 01. Here’s a first attempt (which doesn’t quite work):
0, 1
q0 0 q1 1 q2
The problem is that this diagram does not describe a DFA: δ is not total.
Here is a fixed version:
0, 1
q0 0 q1 1 q2
1
0
q3
0, 1
0, 1
0 0 0
q0 1 q1 1 q2 1 q3
Example 97. Let Σ = {0, 1} and L = {w | len(w) is at most 5}. Show that L
is regular.
151
0, 1 0, 1 0, 1 0, 1 0, 1
q0 q1 q2 q3 q4 q5
0, 1
q6
0, 1
152
Now we give a name to the set of all languages that can be recognized by
a DFA.
The regular languages give a uniform way to relate the languages rec-
ognized by DFAs and NFAs, long with the languages generated by regular
expressions.
Example 98. The set of all binary strings having a substring 00 is regular.
To show this, we need to construct a DFA that recognizes all and only
those strings having 00 as a substring. Here’s a natural first try:
q0 0 q1 0 q2
0, 1 0, 1
However, this is not a DFA (it is an NFA, which we will discuss in the
next lecture). A second try can be constructed by trying to implement the
following idea: we try to find 00 in the input string by ‘shifting along’ until
a 00 is seen, whereupon we go to a success state. We start with a preliminary
DFA, that expresses the part of the machine that detects successful input:
q0 0 q1 0 q2
0, 1
And now we consider, for each state, the moves needed to make the
transition function total, i.e., we need to consider all the missing cases.
153
• If we are in q0 and we get a 1, then we should try again, i.e., stay in q0 .
So q0 is the machine state where it is looking for a 0. So the machine
should look like
1
q0 0 q1 0 q2
0, 1
1
0 0
q0 q1 q2
1
0, 1
Example 99. Give a DFA that recognizes the set of all binary strings having
a substring 00101. A straightforward—but incorrect—first attempt is the
following:
1 0, 1
q0 0 q1 0 q2 1 q3 0 q4 1 q5
1 0 1 0
1
q0 0 q1 0 q2 1 q3
1
154
If the machine is in q2 , it has seen a 00. If we then get another 0, we
could be seeing 000101. In other words, if the next 3 symbols after 2 or
more consecutive 0s are 101, we should accept. Therefore, once, we see 00,
we should stay in q2 as long as we see more 0s. Thus we can refine our
diagram to
1 0
q0 0 q1 0 q2 1 q3
1
1 0
q0 0 q1 0 q2 1 q3 0 q4
1
1
Now q4 . We’ve seen . . . 0010. If we now see a 1, then we’ve found our
substring, and can accept. Otherwise, we’ve seen . . . 00100, i.e., have seen
a 00, therefore should go to q2 . This gives the final solution (somewhat
rearranged):
q3
1 1 0 0, 1
q0 q4 1 q5
0 1
q1 0
1 0
q2
155
5.2 Nondeterministic finite-state automata
A nondeterministic finite-state automaton (NFA) N = (Q, Σ, δ, q0 , F ) is de-
fined in the same way as a DFA except that the following liberalizations
are allowed:
• ε-transitions
ε-Transitions
In an ε-transition, the tape head doesn’t do anything—it doesn’t read and
it doesn’t move. However, the state of the machine can be changed. For-
mally, the transition function δ is given the empty string. Thus
δ(q, ε) = {q1 , . . . , qk }
means that the next state could be one of q1 , . . . , qk without consuming
the next input symbol. When an NFA executes, it makes transitions as
a DFA does. However, after making a transition, it can make as many
ε-transitions as are possible.
Formally, all that has changed in the definition of an automaton is δ:
DFA δ : Q × Σ → Q
NFA δ : Q × (Σ ∪ {ε}) → 2Q
156
Note. Some authors write Σε instead of Σ ∪ {ε}.
Don’t let any of this formalism confuse you: it’s just a way of saying
that δ delivers a set of next states, each of which is a member of Q.
Example 100. Let δ, the transition function, be given by the following table
0 1 ε
q0 ∅ {q0 , q1 } {q1 }
q1 {q2 } {q1 , q2 } ∅
q2 {q2} ∅ {q1 }
Also, let F = {q2 }. Note that we must take account of the possibility of ε
transitions in every state. Also note that each step can lead to one of a set
of next states. The state transition diagram for this automaton is
1 1 0
0
q0 1 q1 q2
ε
ε 1
Note. In a transition diagram for an NFA, we draw arrows for all tran-
sitions except those landing in the empty set (can one land in an empty
set?).
Note. δ is still a total function, i.e., we have to specify its behaviour in ε,
for each state.
Question : Besides δ, what changes when moving from DFA to NFA?
Answer : The notion that there is a single computation path for a string,
and therefore, the definitions of acceptance and rejection of strings. Con-
sequently, the definition of L(N), where N is an NFA.
Example 101. Giving the input 01 to our example NFA allows 3 computa-
tion paths:
ε 0 ε 1
• q0 −→ q1 −→ q2 −→ q1 −→ q1
ε 0 ε 1
• q0 −→ q1 −→ q2 −→ q1 −→ q2
ε 0 ε 1 ε
• q0 −→ q1 −→ q2 −→ q1 −→ q2 −→ q1
157
Notice that, in the last path, we can see that even after the input string
has been consumed, the machine can still make ε-transitions. Also note
that two paths, the first and third, do not end in a final state. The second
path is the only one that ends in a final state
In general, the computation paths for input x form a computation tree:
the root is the start state and the paths branch out to (possibly) different
states. For our example, with the input 01, we have the tree
q1
1
q0 ε q1 0 q2 ε q1 q2
1 ε
q1
Note that marking q2 as a final state is just a marker to show that a path
(the second) ends at that point; of course, the third path continues from
that state.
An NFA accepts an input x if at least one path in the computation tree for
x leads to a final state. In our example, 01 is accepted because q2 is a final
state.
Definition 31 (Acceptance by an NFA). Let N = (Q, Σ, δ, q0 , F ) be an NFA.
N accepts w if we can write w as w1 · w2 · . . . · wn , where each wi is a member
of Σ ∪ {ε} and a sequence of states q0 , . . . , qk exists, with each qi ∈ Q such
that the following conditions hold
• q0 is the start state of N
• qk is a final state of N (qk ∈ F )
• qi+1 ∈ δ(qi , wi+1 )
As for DFAs, the language recognized by an NFA N is the set of strings
accepted by N.
Definition 32 (Language of an NFA). The language of an NFA N is written
L(N) and defined
L(N) = {x | N accepts x}
158
Example 102. A diagram for an NFA that accepts all binary string having
a substring 010 is the following:
0, 1 0, 1
q0 0 q1 1 q2 0 q3
This machine accepts the string 1001010 because there exists at least one
accepting path (in fact, there are 2). The computation tree looks like q
0
0
q0 1 q0
0 0
q0 1 q0 q1
0 0
q0 q1 1 q2 0 q3
0 0
q0 1 q0 q1 1 q2 0 q3 1 q3 0 q3
0
q1 0
∅
Example 103. Design an NFA that accepts the set of binary strings begin-
ning with 010 or ending with 110. The solution to this uses a decompo-
sition strategy: do the two cases separately then join the automata with
ε-links. An automaton for the first case is the following
0, 1
q0 0 q1 1 q2 0 q3
159
0, 1
q0 1 q1 1 q2 0 q3
0, 1
q1 0 q2 1 q3 0 q4
ε
q0
ε
q5 1 q6 1 q7 0 q8
0, 1
0, 1
1 0, 1 0, 1 0, 1 0, 1
q0 q1 q2 q3 q4 q5
160
Example 105. Find an NFA that accepts the set of binary strings with at
least 2 occurrences of 01, and which end in 11.
The solution uses ε-moves to connect 3 NFAs together:
0, 1
q0 0 q1 1 q2
0, 1 ε
q3 q4 1 q5
0
0, 1 ε ε
q6 q7 1 q8
1
Note. There is a special case to take care of the input ending in 011; whence
the ε-transition from q5 to q7 .
5.3 Constructions
OK, now we have been introduced to DFAs and NFAs and seen how they
accept/reject strings. Now we are going to examine various constructions
that operate on automata, yielding other automata. It’s a way of building
automata from components.
161
in solving this requirement is to make the states of the product automaton
be pairs of states from the component automaton.
Definition 33 (Product construction). Let M1 = (Q1 , Σ, δ1 , q1 , F1 ) and M2 =
(Q2 , Σ, δ2 , q2 , F2 ) be DFAs. Notice that they share the same alphabet. The
product of M1 and M2 —sometimes written as M1 × M2 —is (Q, Σ, δ, q0 , F ),
where
• Q = Q1 × Q2 . Recall that this is {(p, q) | p ∈ Q1 ∧ q ∈ Q2} or infor-
mally as all possible pairings of states in Q1 with states in Q2 . The size of
Q is the product of the sizes of Q1 and Q2 .
• q0 = (q1 , q2 ), where q1 is the start state for M1 and q2 is the start state
for M2 .
Let’s build 2 automata separately and then use the product construc-
tion to join them. The first DFA, call it M1 , is
162
1 0, 1
q0 0 q1 0 q2
1
The second, call it M2 , is
1 0
q0 0 q1 1 q2
0
1
The next thing to do is to construct the state space of the product ma-
chine, and use that to figure out δ. The following table gives the details:
Q1 × Q2 0 1
(q0 q0 ) (q1 , q1 ) (q0 , q0 )
(q0 q1 ) (q1 , q1 ) (q0 , q2 )
(q0 q2 ) (q1 , q1 ) (q0 , q0 )
(q1 q0 ) (q2 , q1 ) (q0 , q0 )
(q1 q1 ) (q2 , q1 ) (q0 , q2 )
(q1 q2 ) (q2 , q1 ) (q0 , q0 )
(q2 q0 ) (q2 , q1 ) (q2 , q0 )
(q2 q1 ) (q2 , q1 ) (q2 , q2 )
(q2 q2 ) (q2 , q1 ) (q2 , q0 )
This is of course easy to write out, once you get used to it (a bit mind-
less though). Now there are several of the combined states that aren’t
reachable from the start state and can be pruned. The following is a dia-
gram of the result.
1 0
q0 , q0 0 q1 , q1 0 q2 , q1
0 0 1
1 0
1 1
q0 , q2 q2 , q0 q2 , q2
1
163
The final states of the automaton are {(q0 , q2 ), (q2 , q1 ), (q2 , q0 ), (q2 , q2 )}
(underlined states are the final states in the component automata).
164
Proof. Let M = (Q, Σ, δ, q0 , F ) be a DFA recognizing A. So M accepts all
strings in A and rejects all others. Thus a DFA recognizing A is obtained
by switching the final and non-final states of M, i.e., the desired machine
is M ′ = (Q, Σ, δ, q0 , Q − F ). Note that M ′ recognizes Σ∗ − L(M).
• Q = Q1 ∪ Q2
• q0 = q01
• F = F2
2
An NFA, in fact.
165
• δ(q, a) is defined by cases, as to whether it is operating ‘in’ M1 , tran-
sitioning between M1 and M2 , or operating ‘in’ M2 :
166
Example 107. Let M be given by the following DFA:
a a, b
b a, b
q0 q1 q2
a a, b
ε b a, b
qs q0 q1 q2
ε
δ
NFA Σ ∪ {ε} × Q → 2Q
DFA Σ × 2Q → 2Q
In other words, the NFA N is always in a single state and can have multiple
successor states for symbol a via δ. In contrast, the DFA M is always in a
3
They shared a Turing award for this; however, both researchers are famous for much
other work as well.
167
set (possibly empty) of states and moves into a set of successor states via
δ ′ , which is defined in terms of δ. This is formalized as follows:
δ ′ ({q1 , . . . , qk }, a) = δ(q1 , a) ∪ . . . ∪ δ(qk , a).
Example 108. Let’s consider the NFA N given by the diagram
0, 1
q0 1 q1 0 q2
N evidently accepts the language {x10 | x ∈ {0, 1}∗ }. The subset con-
struction for N proceeds by constructing a transition function over all sub-
sets of the states of N. Thus we need to consider
∅, {q0 }, {q1 }, {q2 }, {q0 , q1 }, {q0 , q2 }, {q1 , q2 }, {q0 , q1 , q2 }
as possible states of the DFA M to be constructed. The following table
describes the transition function for M.
Q 0 1
∅ ∅ ∅ (unreachable)
{q0 } {q0 } {q0 , q1 } (reachable)
{q1 } {q2 } ∅ (unreachable)
{q2 } ∅ ∅ (unreachable)
{q0 , q1 } {q0 , q2 } {q0 , q1 } (reachable)
{q0 , q2 } {q0 } {q0 , q1 } (reachable)
{q1 , q2 } {q2 } ∅ (unreachable)
{q0 , q1 , q2 } {q0 , q2 } {q0 , q1 } (unreachable)
And here’s the diagram. Unreachable states have been deleted. State A =
{q0 } and B = {q0 , q1 } and C = {q0 , q2 }.
0 1 1
1
A B C
0
0
Note that the final states of the DFA will be those that contain at least
one final state of the NFA.
168
So by making the states of the DFA be sets of states of the NFA, we
seem to get what we want: the DFA will accept just in case the NFA would
accept. This apparently gives us the best of both characterizations: the ex-
pressive power of NFAs, coupled with the straightforward executability
of DFAs. However, there is a flaw: some NFAs map to DFAs with expo-
nentially more states. A class of examples with this property are those
expressed as
Construct a DFA accepting the set of all binary strings in which
the nth symbol from the right is 0.
Also, we have not given a complete treatment: we still have to account
for ε-transitions, via ε-closure.
ε-Closure
Don’t be confused by the terminology: ε-closure has noting to do with
closure of regular languages under ∩, ∪, etc.
The idea of ε-closure is the following: when moving from a set of states
Si to a a set of states Si+1 , we have to take account of all ε-moves that could
be made after the transition. Why do we have to do that? Because the DFA
is over the alphabet Σ, instead of Σ ∪ {ε}, so we have to squeeze out all the
ε-moves. Thus we define, for a set of states Q,
E(Q) = {q | q can be reached from a state in Q by 0 or more ε−moves}
169
• Q′ = 2Q
• q0′ = E{q0 }
• δ ′ ({q1 , . . . , qk }, a) = E(δ(q1 , a) ∪ . . . ∪ δ(qk , a))
= E(δ(q1 , a)) ∪ . . . ∪ E(δ(qk , a))
• F ′ = {S ∈ 2Q | ∃q ∈ S ∧ q ∈ F }
The essence of the argument for correctness of the subset construction
amounts to noticing that the generated DFA mimicks the transition be-
haviour of the NFA and accepts and rejects strings exactly as the NFA
does.
Theorem 15 (Correctness of subset construction). If DFA M is derived by
applying the subset construction to NFA N, then L(M) = L(N).
Example 109. Convert the following NFA to an equivalent DFA.
0, 1
q0 ε q1 0 q5
1 1 1
ε
q2 0 q3 q4
0
170
But that would lead to madness. Instead we should build the table in
an on-the-fly manner, wherein we only write down the transitions for the
reachable states, the ones we could actually get to by following transitions
from the start state. First, we need to decide on the start state: it is not
{q0 }! We have to take the ε-closure of {q0 }:
E{q0 } = {q0 , q1 }
In the following, it also helps to name the reached state sets, for concision.
states 0 1
A = {q0 , q1 } {q0 , q1 , q5 } {q0 , q1 , q2 }
B = {q0 , q1 , q5 } B D
C = {q0 , q1 , q2 } E C
D = {q0 , q1 , q2 , q4 } E C
E = {q0 , q1 , q3 , q4 , q5 } E D
So for example, if we are in state C, the set of states we could be in after
a 0 on the input are:
E(δ(q0 , 0)) ∪ E(δ(q1 , 0)) ∪ E(δ(q2 , 0)) = {q0 , q1 } ∪ {q5 } ∪ {q3 , q4 }
= {q0 , q1 , q3 , q4 , q5 }
= E
Similarly, if we are in state C and see a 1, the set of states we could be
in are:
E(δ(q0 , 1)) ∪ E(δ(q1 , 1)) ∪ E(δ(q2 , 1)) = {q0 , q1 } ∪ {q2 } ∪ ∅
= {q0 , q1 , q2 }
= C
A diagram of this DFA is
0
0 1
A B D
1 0
1 1
0
C E
1 0
171
Summary
For every DFA, there’s a corresponding (trivial) NFA. For every NFA,
there’s an equivalent DFA, via the subset construction. So the 2 models,
apparently quite different, have the same power (in terms of the languages
they accept). But notice the cost of ‘compiling away’ the non-determinism:
the number of states in a DFA derived from the subset construction can be
exponentially larger than in the orignal. Implementability has its price!
• a ∈ R, if a ∈ Σ
• ε∈R
• ∅∈R
• r1 + r2 ∈ R, if r1 ∈ R ∧ r2 ∈ R
• r1 · r2 ∈ R, if r1 ∈ R ∧ r2 ∈ R
• r ∗ ∈ R, if r ∈ R
• Nothing else is in R
172
Definition 36 (Semantics of regular expressions). The meaning of a regular
expression r, written L(r) is defined as follows:
Note the overloading. The occurrence of · and ∗ on the right hand side
of the equations are operations on languages, while on the left hand side,
they are nodes in a tree structure.
• r + = rr ∗.
Σ = {+, −} ∪ D ∪ {.}
4
We will underline the ‘plus sign’ + to distinguish it from the + used to build the
regular expression.
173
where D = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Then
(+ + − + ε) · (D + + D + .D ∗ + D ∗ .D + )
is a concise description of a simple class of floating point constants for a
programming language. Examples of such constants are: +3, −3.2, −.235.
Example 111. Give a regular expression for the binary representation of
the numbers which are powers of 4:
{40 , 41 , 42 , . . .} = {1, 4, 16, 64, 256, . . .}
Merely transcribing to binary gives us the important clue we need:
{1, 100, 10000, 1000000, . . .}
The regular expression generating this language is 1(00)∗ .
Example 112. Give a regular expression for the set of binary strings which
have at least one occurrence of 001. One answer is
(0 + 1)∗ 001(0 + 1)∗ or Σ∗ 001Σ∗
Example 113. Give a regular expression for the set of binary strings which
have no occurrence of 001. This example is much harder, since the problem
is phrased negatively. In fact, this is an instance where it is easier to build
an automaton for recognizing the given set:
• Build an NFA for recognizing any string where 001 occurs. This is
easy.
• Convert to a DFA. We know how to do this (subset construction).
• Complement the resulting automaton.5
However, we are required to come up with a regular expression. How
to start? First, note that a string w in the set can have no occurrence of 00,
unless w is a member of the set denoted by 000∗. The set of binary strings
having no occurrence of 00 and ending in 1 is
(01 + 1)∗
And now we can append any number of 0s to this and get the specified
set:
(01 + 1)∗ 0∗
5
Note that directly complementing the NFA won’t work in general.
174
5.4.1 Equalities for regular expressions
The following equalities are useful when manipulating regular expres-
sions. They should mostly be familiar and can be proved simply by reduc-
ing to the meaning in languages and using the techniques and theorems
we have already seen.
r1 + (r2 + r3 ) = (r1 + r2 ) + r3
r1 + r2 = r2 + r1
r+r =r
r+∅=r
εr = r = rε
∅r = ∅ = r∅
∅∗ = ε
r1 (r2 r3 ) = (r1 r2 )r3
r1 (r2 + r3 ) = r1 r2 + r1 r3
(r1 + r2 )r3 = r1 r3 + r2 r3
ε + rr ∗ = r ∗
(ε + r)∗ = r ∗
rr ∗ = r ∗ r
r∗r∗ = r∗
r∗∗ = r∗
(r1 r2 )∗ r1 = r1 (r2 r1 )∗
(r1 ∗ r2 )∗ r1 ∗ = (r1 + r2 )∗
175
and
(00)∗ = {ε, 02, 04 , 06 , . . .} = {0n | n is even}
Thus (00)∗ 0 + (00)∗ = 0∗ .
0(00)∗ + 1(00)∗ + 0 + 1
(0 + ε)0∗ 1 = (0 + ε)(0∗1)
= 0 + 1 + 0∗ 1
= 0∗ 1.
∗ ∗
Example 118. Show that (02 + 03 ) = (02 0∗ ) . Examining the left hand side,
we have ∗
(02 + 03 ) = {ε, 02, 03 , 04 , . . .}
= 0∗ − {0}.
On the right hand side, we have
∗
(02 0∗ ) = (000∗ )∗
= {ε} ∪ {00, 000, 04, 05 , . . .}
= {ε} ∪ {0k+2 | 0 ≤ k}
= 0∗ − {0}.
Example 119. Prove the identity (0 + 1)∗ = (1∗ (0 + ε)1∗ )∗ using the al-
gebraic identities. We will work on the rhs, underlining subexpressions
176
about to be changed.
1 ∗ 1 ∗ )∗
= (1∗ 01∗ + |{z} a∗ a∗ = a∗
01}∗ )∗
1∗ + |1∗{z
= (|{z} (a + b)∗ = (a∗ b)∗ a∗
a b
∗
= (1| ∗∗ ∗ ∗
1∗∗
{z1} 01 ) |{z} a∗∗ = a∗ = a∗ a∗
01∗ )∗ |{z}
1∗ |{z}
= (|{z} 1∗ (ab)∗ a = a(ba)∗
a b a
1 ∗ 1 ∗ )∗
= 1∗ (0 |{z} a∗ a∗ = a∗
1∗ (|{z}
= |{z} 1 ∗ )∗
0 |{z} a(ba)∗ = (ab)∗ a
a b a
0 )∗ |{z}
1∗ |{z}
= (|{z} 1∗ (a∗ b)∗ a∗ = (a + b)∗
a b a
= (0 + 1)∗
177
• DFAs into equivalent regular expressions
r
(init) A B
r1
r1 + r2
(plus) A B =⇒ A B
r2
r1 · r2 r1 r2
(concat) A B =⇒ A B
r∗ ε ε
(star) A B =⇒ A B
r
The init rule is used to start the process off: it sets the regular expression
as a label between the start and accept states. The idea behind the rule
applications is to iteratively replace each regular expression by a fragment
of automaton that implements it.
Application of these rules, the star rule especially, can result in many
useless ε-transitions. There is a complicated rule for eliminating these,
which can be applied only after all the other rules can no longer be applied.
• If the edge labelled with ε is the only edge leaving qi then qi can be
replaced by qj . If qi is the start node, then qj becomes the new start
state.
178
• If the edge labelled ε is the only edge entering qj then qj can be re-
placed by qi . If qj is a final state, then qi becomes a final state.
Example 120. Find an equivalent NFA for (11 + 0)∗ (00 + 1)∗ .
ε ε ε ε
star (twice)
11 + 0 00 + 1
0 1
plus (twice) ε ε ε ε
11 00
0 1
ε ε ε ε
concat (twice)
1 1 0 0
That ends the elaboration of the regular expression into the correspond-
ing NFA. All that remains is to eliminate redundant ε-transitions. The ε
transition from the start state can be dispensed with, since it is a unique
out-edge from a non-final node; similarly, the ε transition into the final
state can be eliminated because it is a unique in-edge to a non-initial node.
This yields
179
0 1
ε ε
1 1 0 0
We are not yet done. One of the two middle ε-transitions can be eliminated—
in fact the middle node has a unique in-edge and a unique out-edge—so
the middle state can be dropped.
0 1
1 1 0 0
180
B
b
c
A
a
C
any string that will eventually be accepted will be of one of the forms
A = cA + bB + aC
the right hand size of which looks very much like a regular expression, ex-
cept for the occurrences of the variables A, B, and C. Indeed, the equation
solving process eliminates these variables so that the final expression is a
bona fide regular expression. The goal, of course, is to solve for the variable
representing the start state.
Accept states are somewhat special since the machine, if run from them,
would accept the empty string. This has to be reflected in the equation.
Thus
B
b
c
A
a
C
A = cA + bB + aC + ε
181
Using Arden’s Lemma to solve a system of equations
An important theorem about languages, proved earlier in these notes, is
the following:
Theorem 16 (Arden’s Lemma). Assume that A and B are two languages with
ε∈
/ A. Also assume that X is a language having the property X = (A · X) ∪ B.
Then X = A∗ · B.
What this theorem allows is the finding of closed form solutions to equa-
tions where the variable (X in the theorem) appears on both sides. We can
apply this theorem to the equations read off from DFAs quite easily: the
side condition that ε ∈
/ A always holds, since DFAs have no ε-transitions.
Thus, from our example, the equation characterizing the strings accepted
from state A
A = cA + bB + aC + ε
is equivalent, by application of Arden’s Lemma to
A = c∗ (bB + aC + ε)
Once the closed form Q = rhs for a state Q is found, rhs can be sub-
stituted for Q throughout the remainder of the equations. This is repeated
until finally the start state has a regular expression representing its lan-
guage.
Example 121. Give an equivalent regular expression for the following DFA:
b
a
a
A B
b a
b C
We now make an equational presentation of the DFA:
A = aB + bC
B = bB + aA + ε
C = aB + bA + ε
182
We eventually need to solve for A, but can start with B or C. Let’s start
with B. By application of Arden’s lemma, we get
B = b∗ (aA + ε)
And now we do some regular expression algebra to prepare for the final
application of the lemma:
Notice how this quite elegantly summarizes all the ways to loop back
to A when starting from A, followed by all the non-looping paths from A
to an accept state.
End of example
183
To summarize, we have seen methods for translating between DFAs,
NFAs, and regular expressions:
Notice that, in order to say that these translations work, i.e., are correct,
’ we need to use the concept of formal language.
5.5 Minimization
Now we turn to examining how to reduce the size of a DFA such that it
still recognizes the same language. This is useful because some transfor-
mations and tools will generate DFAs with a large amount of redundancy.
0, 1 0, 1
q0 0 q1 1 q2 0 q3
1 0 0 0 1
p0 0 p1 1 p2 0 p3 p4 1 p5
1
1 0
184
which has 6 reachable states, out of a possible 24 = 16. But notice
that p3 , p4 , and p5 are all accept states, and it’s impossible to ‘escape’ from
them. So you could collapse them to one big success state. Thus the DFA
is equivalent to the following DFA with 4 states:
1 0 0, 1
p0 0 p1 1 p2 0 p3
a q1 a, b a, b
q0 q3
b q2 a, b
185
is clearly equivalent to the following 3 state machine:
a, b
a, b a, b
q0 q12 q2
q1 0 q3
0 0, 1
1 0, 1
q0 q5
1 1
q2 q4 0, 1
0
q1 0, 1
0 0, 1
0, 1
q0 q34 q5
1 q2 0, 1
0, 1
0, 1 0, 1 0, 1
q0 q12 q34 q5
186
q5 0 q4
0 0
q0 q3
0 0
q1 q2
0
recognizes the language
{0n | ∃k. n = 3k + 1}
q0 0 q2
0 0
q1
187
Definition 38 (DFA state equivalence).
p ≈ q iff ∀x ∈ Σ∗ . ∆(p, x) ∈ F iff ∆(q, x) ∈ F
where F is the set of final states of the automaton.
Question: What is ∆?
Answer ∆ is the extension of δ from symbols (single step) to strings
(multiple steps). Its formal definition is as follows:
∆(q, ε) = q
∆(q, a · x) = ∆(δ(q, a), x)
Thus ∆(q, x) gives the state after the machine has made a sequence of tran-
sitions while processing x. In other words, it’s the state at the end of the
computation path for x, where we treat q as the start state.
Remark. ≈ is an equivalence relation, i.e., it is reflexive, symmetric, and
transitive:
• p≈p
• p≈q⇒q≈p
• p≈q∧q ≈r ⇒p≈r
An equivalence relation partitions the underlying set (for us, the set of
states Q of an automaton) into disjoint equivalence classes. This is denoted
by Q/ ≈. Each element of Q is in one and only one partition of Q/ ≈.
Example 126. Suppose we have a set of states Q = {q0 , q1 , q2 , q3 , q4 , q5 } and
we define qi ≈ qj iff i mod 2 = j mod 2, i.e., qi and qj are equivalent if i and
j are both even or both odd. Then Q/ ≈ = {{q0 , q2 , q4 }, {q1 , q3 , q5 }}.
The equivalence class of q ∈ Q is written [q], and defined
[q] = {p | p ≈ q} .
We have the equality
p≈q iff ([p] = [q])
| {z } | {z }
equivalence of states equality of sets of states
188
Definition 39 (Quotient automaton). Let M = (Q, Σ, δ, q0 , F ) be a DFA.
The quotient automaton is M/ ≈ = (Q′ , Σ, δ ′ , q0′ , F ′) where
• Σ is unchanged
• q0′ = [q0 ], i.e., the start state in the new machine is the equivalence
class of the start state in the original.
• if there exists an unmarked pair (p, q) in the table such that one
of the states in the pair (δ(p, a), δ(q, a)) is marked, for some a ∈
Σ, then mark (p, q).
189
4. Done. Read off the equivalence classes: if (p, q) is not marked, then
p ≈ q.
Remark. We may have to revisit the same (p, q) pair several times, since
combining two states can suddenly allow hitherto equivalent states to be
markable.
0
1
0 1 0
A B C D
0 0 1 1
1
1 1
E F G H
0
1
0
0
We start by setting up our table. We will be able to restrict our attention
to the lower left triangle, since equivalence is symmetric. Also, each box
on the diagonal will be marked with ≈, since every state is equivalent to
itself. We also notice that state D is not reachable, so we will ignore it.
A B C D E F G H
A ≈ − − − − − − −
B ≈ − − − − − −
C ≈ − − − − −
D − − − − − − − −
E − ≈ − − −
F − ≈ − −
G − ≈ −
H − ≈
Now we split the states into final and non-final. Thus, a box indexed by
p, q will be labelled with an X if p is a final state and q is not, or vice versa.
190
Thus we obtain
A B C D E F G H
A ≈ − − − − − − −
B ≈ − − − − − −
C X0 X0 ≈ − − − − −
D − − − − − − − −
E X0 − ≈ − − −
F X0 − ≈ − −
G X0 − ≈ −
H X0 − ≈
State C is inequivalent to all other states. Thus the row and column la-
belled by C get filled in with X0 . (We will subscript each X with the step
at which it is inserted into the table.) However, note that C, C is not filled
in, since C ≈ C. Now we have the following pairs of states to consider:
{AB, AE, AF, AG, AH, BE, BF, BG, BH, EF, EG, EH, F G, F H, GH}
Now we introduce some notation which compactly captures how the ma-
chine transitions from a pair of states to another pair of states. The notation
0 1
p1 p2 ←− q1 q2 −→ r1 r2
0 0 1 1
means q1 −→ p1 and q2 −→ p2 and q1 −→ r1 and q2 −→ r2 . If one of p1 , p2 ,
r1 , or r2 are already marked in the table, then there is a way to distinguish
q1 and q2 : they transition to inequivalent states. Therefore q1 6≈ q2 and the
box labelled by q1 q2 will become marked. For example, if we take the state
pair AB, we have
0 1
BG ←− AB −→ F C
and since F C is marked, AB becomes marked as well.
A B C D E F G H
A ≈ − − − − − − −
B X1 ≈ − − − − − −
C X0 X0 ≈ − − − − −
D − − − − − − − −
E X0 − ≈ − − −
F X0 − ≈ − −
G X0 − ≈ −
H X0 − ≈
191
In a similar fashion, we examine the remaining unassigned pairs:
0 1
• BH ←− AE −→ F F . Unable to mark.
0 1
• BC ←− AF −→ F G. Mark, since BC is marked.
0 1
• BG ←− AG −→ F E. Unable to mark.
0 1
• BG ←− AH −→ F C. Mark, since F C is marked.
0 1
• GH ←− BE −→ CF . Mark, since CF is marked.
0 1
• GC ←− BF −→ CG. Mark, since CG is marked.
0 1
• GG ←− BG −→ CE. Mark, since CE is marked.
0 1
• GG ←− BH −→ CC. Unable to mark.
0 1
• HC ←− EF −→ F G. Mark, since CH is marked.
0 1
• HG ←− EG −→ F E. Unable to mark.
0 1
• HG ←− EH −→ F C. Mark, since CF is marked.
0 1
• CG ←− F G −→ GE. Mark, since CG is marked.
0 1
• CG ←− F H −→ GC. Mark, since CG is marked.
0 1
• GG ←− GH −→ EC. Mark, since EC is marked.
A B C D E F G H
A ≈ − − − − − − −
B X1 ≈ − − − − − −
C X0 X0 ≈ − − − − −
D − − − − − − − −
E X1 X0 − ≈ − − −
F X1 X1 X0 − X1 ≈ − −
G X1 X0 − X1 ≈ −
H X1 X0 − X1 X1 X1 ≈
192
Next round. The following pairs need to be considered:
The previously calculated transitions can be re-used; all that will have
changed is whether the ‘transitioned-to’ states have been subsequently
marked with an X1 :
A B C D E F G H
A ≈ − − − − − − −
B X1 ≈ − − − − − −
C X0 X0 ≈ − − − − −
D − − − − − − − −
E X1 X0 − ≈ − − −
F X1 X1 X0 − X1 ≈ − −
G X2 X1 X0 − X2 X1 ≈ −
H X1 X0 − X1 X1 X1 ≈
Next round. The following pairs remain: {AE, BH}. However, neither
makes a transition to a marked pair, so the round adds no new markings
to the table. We are therefore done. The quotiented state set is
In other words, we have been able to merge states A and E, and B and H.
The final automaton is given by the following diagram.
193
1
0
AE 0 BH 0
G
1 0 1
1
0
F C
194
7. Given DFAs M1 and M2 , L(M1 ) ⊆ L(M2 )?
10. Given DFA M, is M the DFA having the fewest states that recognizes
L(M)?
It turns out that all these problems do have algorithms that correctly
answer the question. Some of the algorithms differ in how efficient they
are; however, we will not delve very deeply into that issue, since this class
is mainly oriented towards qualitative aspects of computation, i.e., can the
problems be solved at all? (For some decision problems, as we shall see
later in the course, the answer is, surprisingly, no.)
are easily solved: for the first, merely run the string x through the DFA
and check whether the machine is in an accept state at the end of the run.
For the second, first translate the NFA to an equivalent DFA by the subset
construction and then run the DFA on the string. For the third, one must
translate the regular expression to an NFA and then translate the NFA to
a DFA before running the DFA on x.
However, we would like to avoid the step mapping from NFAs to
DFAs, since the subset construction can create a DFA with exponentially
more states than the NFA. Happily, it turns out that an algorithm that
maintains the set of possible current states in an on-the-fly manner works
relatively efficiently. The algorithm will be illustrated by example.
Example 128. Does the following NFA accept the string aaaba?
195
a a, b
b, ε a
q0 q1 q2
b b
a a, b
q3 q4
ε
The initial set of states that the machine could be in is {q0 , q1 }. We then
have the following table, showing how the set of possible current states
changes with each new transition:
input symbol possible current states
{q0 , q1 }
a ↓
{q0 , q1 , q2 }
a ↓
{q0 , q1 , q2 }
a ↓
{q0 , q1 , q2 }
b ↓
{q1 , q2 , q3 , q4 }
a ↓
{q2 , q3 , q4 }
After the string has been processed, we examine the set of possible states
{q2 , q3 , q4 } and find q4 , so the answer returned is true.
In an implementation, the set of possible current states would be kept
in a data structure, and each transition would cause states to be added or
deleted from the set. Once the string was fully processed, all that needs to
be done is to take the intersection between the accept states of the machine
and the set of possible current states. If it was non-empty, then answer true;
otherwise, answer false.
5.6.2 L(M) = ∅?
There are a couple of possible approaches to checking language emptiness.
The first idea is to minimize M to an equivalent minimum state machine
196
M ′ and check whether M ′ is equal to the following DFA, which is a mini-
mum (having only 1) state DFA that recognizes ∅, i.e., accepts no strings:
Σ
This is a good idea; however, recall that the first step in minimizing a
DFA is to first remove all unreachable states. A reachable state is one that
some string will put the machine into. In other words, the reachable states
are just those you can get to from the start state by making a finite number
of transitions.
• q0 is reachable.
reachable R =
let new = {q ′ | ∃q a. q ∈ R ∧ q ′ ∈
/ R ∧ a ∈ Σ ∧ δ(q, a) = q ′ }
in
if new = ∅ then R else reachable(new ∪ R)
197
5.6.3 L(M) = Σ∗?
To decide whether a machine M accepts all strings over its alphabet, we
can use one of the following two algorithms:
1. Check if a minimized version of M is equal to the following DFA:
Σ
198
5.6.6 L(M1 ) = L(M2 )?
• One algorithm directly uses the fact S1 = S2 iff S1 ⊆ S2 ∧ S2 ⊆ S1 .
199
Chapter 6
So far, we have not yet tied together the 3 different components of the
course. What are the relationships between Regular, Context-Free, Decid-
able, Recognizable, and not-even-Recognizable languages?
It turns out that there is a (proper) inclusion hierarchy, known as the
Chomsky hierarchy:
200
6.1 The Pumping Lemma for Regular Languages
The pumping lemma provides one way out of this problem. It exposes a
property, pumpability, that all regular sets have.
Theorem 19 (Pumping lemma for regular languages). Suppose that M =
(Q, Σ, δ, q0 , F ) is a DFA recognizing L. Let p be the number of states in Q, and
s ∈ L be a string w0 · . . . · wn−1 of length n ≥ p. Then there exists x, y, and z
such that s = xyz and
(a) xy n z ∈ L, for all n ∈ N
(b) y 6= ǫ (i.e., len(y) > 0)
(c) len(xy) ≤ p
Proof. Suppose M is a DFA with p states which recognizes L. Also suppose
there’s an s = w0 · . . . · wn−1 ∈ L where n ≥ p. Then the computation path
w0 1 w wn−1
q0 −→ q1 −→ · · · −→ qn
for s traverses at least n + 1 states. Now n + 1 > p, so, by the Pigeon
Hole Principle1 , there’s a state, call it q, which occurs at least twice in the
computation path. Let qj and qk be the first and second occurrences of q in
the computation path. So we have
w
0 1 w wj−1 wj wk−1
k w wn−1
q0 −→ q1 −→ · · · −→ qj −→ · · · −→ qk −→ · · · −→ qn
Now we partition the path into 3 as follows
x y z
z }| { z }| { z }| {
w0 w1 wj−1 wj wk−1 wk wn−1
q0 −→ q1 −→ · · · −→ qj | −→ · · · −→ | qk −→ · · · −→ qn
We have thus used our assumptions to construct a partition of s into x, y, z.
Note that this works for any string in L with length not less than p. Now
we simply have to show that the remaining conditions hold:
(a) The sub-path from qj to qk moves from q to q, and thus constitutes a
loop. We may go around the loop 0, 1, or more times to generate
ever-larger strings, each of which is accepted by M and is thus in L.
1
The Pigeon Hole Principle is informally stated as: given n + 1 pigeons and n boxes, any
assignment of pigeons to boxes must result in at least one box having at least 2 pigeons.
201
(b) This is clear, since qj and qk are separated by at least one label (note
that j < k).
The criteria (a) allows one to pump sufficiently long strings arbitrar-
ily often, and thus gives us insight as to the nature of regular languages.
However, it is the application of the pumping lemma to proofs of non-
regularity of languages that is of interest.
{0n 1n | n ≥ 0} .
202
Proof. Suppose the contrary, i.e., that L is regular. Then there’s a DFA M
that recognizes L. Let p be the number of states in M.
Crucial Creative Step: Let s = 0p 1p .
Now, s ∈ L and len(s) ≥ p. Thus, the hypotheses of the pumping
lemma hold, and we are given a partition of s into x, y, and z such that
s = xyz and
(a) xy n z ∈ L, for all n ∈ N
(c) len(xy) ≤ p
all hold. Consider the string xz. By (a) xz = xy 0 z ∈ L. By (c) xy is
composed only of zeros, and hence x is all zeros. By b, x has fewer zeros
than xy. So xz has fewer than p zeros, but has p ones. Thus there is no way
to express xz as 0k 1k , for any k. So xz ∈
/ L. Contradiction.
Here’s a picture of the situation:
Notice that x, y, and z are abstract; we really don’t know anything about
them other than what we can infer by application of constraints a − c. We
have x = 0u and y = 0v (v 6= 0) and z = 0w 1p . We know that u + v + w = p,
but we also know that u + w < p, so we know xz = 0u+w 1p 6= L.
There’s always huge confusion with the pumping lemma. Here’s a
slightly alternative view—the pumping lemma protocol—on how to use it to
prove a language is not regular. Suppose there’s an office O to support
pumping lemma proofs.
1. To start the protocol, you inform O that L is regular.
3. You then think about L and invent a witness s. You send s off to O,
along with some evidence (proofs) that s ∈ L and len(s) ≥ p. Often
this is very easy to see.
203
4. O checks your proofs. Then it divides s into 3 pieces x, y, and z, but
it doesn’t send them to you. Instead O gives you permission to use (a),
(b), and (c).
5. You don’t know what x, y, and z are, but you can use (a), (b) and
(c), plus your knowledge of s to deduce facts. After some ingenious
steps, you find a contradiction, and send the proof of it off to O.
6. O checks the proof and, if it is OK, sends you a final message con-
firming that L is not regular after all.
Example 130. The following language L is not regular:
{w | w has an equal number of 0s and 1s} .
Proof. Towards a contradiction, suppose L is regular. Then there’s a DFA
M that recognizes L. Let p be the number of states in M. Let s = 0p 1p .
Now, s ∈ L and len(s) ≥ p, so we know that s = xyz, for some x, y, and z.
We also know (a), (b), and (c) from the statement of the pumping lemma.
By (c) xy is composed only of 0s. By (b) xz = 0k 1p and k < p; thus xz ∈
/ L.
However, by (a), xz = xy 0 z ∈ L. Contradiction.
So why did we choose 0p 1p for s? Why not (01)p , for example? The
answer comes from recognizing that, when s is split into x, y, and z, we
have no control over how the split is made. Thus y can be any non-empty
string of length ≤ p. So if s = 0101 . . . 0101, then y could be 01. In that case,
repeated pumping will only ever lead to strings still in L and we will not
be able to obtain our desired contradiction.
Upshot. s has to be chosen such that pumping it (adding in copies of y)
will lead to a string not in L. Note that we can pump down, by adding in 0
copies of y, as we have done in the last two proofs.
204
Example 132. The following language L is not regular:
{ww | w ∈ {0, 1}∗ } .
Proof. Towards a contradiction, suppose L is regular. Then there’s a DFA
M that recognizes L. Let p > 0 be the number of states in M. Let s =
0p 10p 1. Now, s ∈ L and len(s) ≥ p, so we know that s = xyz, for some x,
y, and z. We also know (a), (b), and (c) from the statement of the pumping
lemma. By (c) xy is composed only of 0s. By (b) xz = 0k 10p 1 where k < p
so xz ∈ / L. However, by (a), xz = xy 0 z ∈ L. Contradiction.
Here’s an example where pumping up is used.
Example 133. The following language L is not regular:
2
{1n | n ≥ 0}.
Proof. This language is the set of all strings of 1s with length a square num-
ber. Towards a contradiction, suppose L is regular. Then there’s a DFA M
2
that recognizes L. Let p > 0 be the number of states in M. Let s = 1p .
This is the only natural choice; now let’s see if it works! Now, s ∈ L and
len(s) ≥ p, so we have s = xyz, for some x, y, and z. We also know (a),
(b), and (c) from the statement of the pumping lemma. Now we know that
2
1p = xyz. Let i = len(x), j = len(y) and k = len(z). Then i+j+k = p2 . Also,
len(xyyz) = i + 2j + k = p2 + j. However, (b) and (c) imply that 0 < j ≤ p.
2 2 2
Now the next element of L larger than 1p must be 1(p+1) = 1p +2p+1 , but
p2 < p2 + j < p2 + 2p + 1, so xyyz ∈ / L. Contradiction.
And another.
Example 134. Show that L = {0i 1j 0k | k > i + j} is not regular.
Proof. Towards a contradiction, suppose L is regular. Then there’s a DFA
M that recognizes L. Let p > 0 be the number of states in M. Let s =
0p 1p 02p+1 . Now, s ∈ L and len(s) ≥ p, so we know that s = xyz, for some x,
y, and z. We also know (a), (b), and (c) from the statement of the pumping
lemma. By (c) we know
0a |{z}
s = |{z} 02p+1}
0b 0| c 1p{z
x y z
205
6.1.2 Is L(M) finite?
Recall the problem of deciding whether a regular language is finite. The
ideas in the pumping lemma provide another way to provide an algorithm
solving this problem. The idea is, given DFA M, to try M on a finite set
of strings and then render a verdict. Recall that the pumping lemma says,
loosely, that every ‘sufficiently long string in L can be pumped’: if we
could find a sufficiently long string w that M accepts, then L(M) would be
infinite.
All we have to do is figure out what ‘sufficiently long’ should mean.
Two facts are important:
In the worst case for a machine with p states, it will take a string of
length p − 1 to get to an accept state, plus another p symbols in order
to see if that state gets revisited. Thus our upper bound h = 2p.
The decision algorithm generates the (finite) set of strings having length
at least p and at most 2p − 1 and tests to see if M accepts any of them. If it
does, then L(M) is infinite; otherwise, it is finite.
206
6.2 The Pumping Lemma for Context-Free Lan-
guages
As for the regular languages, the context-free languages admit a pump-
ing lemma which illustrates an interesting way in which every context-free
language has a precise notion of repetition in its elements. For regular lan-
guages, the important idea in the proof was an application of the Pigeon
Hole Principle in order to show that once an automaton M made n + 1
transitions (where n was the number of states of M) it would have to visit
some state twice. If it could visit twice, it could visit any number of times.
Thus we could pump any sufficiently long string in order to get longer and
longer strings, all in the language.
The same sort of argument, suitably adapted, can be applied to context-
free languages. If a sufficiently long string is generated by a grammar G,
then some rule in G has to be applied at least twice, by appeal to the PHP.
Therefore the rule can be repeatedly applied in order to pump the string.
• len(vy) > 0
• len(vxy) ≤ p
• ∀i ≥ 0. uv ixy i z ∈ L
207
S
• len(vy) > 0. This holds since T was chosen to be the smallest parse
tree satisfying the other constraints: if both v and y were ε, then the
resulting tree would be smaller than T.
Recall that the main application of the pumping lemma for regular lan-
guages was to show that various languages were not regular, by contradic-
tion. The same is true for the context-free languages. However, the details
of the proofs are more complex, as we shall see. We will go through one
proof in full detail and then see how—sometimes—much of the complex-
ity can be avoided.
208
there exists a pumping length p > 0. Consider s = ap bp cp . (This is the
first creative bit.) Evidently, s ∈ L, and len(s) ≥ p. Therefore, there exists
u, v, x, y, z such that s = uvxyz and the following hold
1. len(vy) > 0
2. len(vxy) ≤ p
3. ∀i ≥ 0. uv ixy i z ∈ L
Now we consider where vxy can occur in s. Pumping lemma proofs for
context-free languages are all about case analysis. Here we have a number
of cases (some of which have sub-cases): vxy can occur
• completely within the leading ap symbols.
• completely within the middle bp symbols.
• completely within the trailing cp symbols.
• partly in the ap and partly in the bp .
• partly in the bp and partly in the cp .
What cannot happen is for vxy to start with some a symbols, span all p
b symbols, and finish with some c symbols: clause (2) above prohibits this.
Now, if vxy occurs completely within the leading ap symbols, then
pumping up once yields the string s′ = uv 2xy 2 z = aq bp cp , where p < q,
by (1). Thus s′ ∈ / L, contradicting (3).
Similarly, if vxy occurs completely within the middle bp symbols, pump-
ing up once yields the string s′ = ap bq cp , where p < q. Contradiction. Now,
of course, it can easily be seen that a very similar proof handles the case
where vxy occurs completely within the trailing cp symbols. We are now
left with the hybrid cases, where vxy spans two kinds of symbol. These
need further examination.
Suppose vxy occurs partly in the ap and partly in the bp . Thus, at some
point, vxy changes from a symbols to b symbols. The change-over can
happen in v, x, or in y:
• in v. Then we’ve deduced that the split of s looks like
ai |{z}
s = |{z} aj bk |{z}
bℓ |{z}
bm |{z}
bn cp
u v x y z
209
If we now pump up, we obtain
s′ = ai |{z}
aj bk |{z}
aj bk bℓ |{z}
bm |{z}
bm bn cp
v v y y
s′ = ai bℓ bn cp
ai |{z}
s = |{z} aj |{z}
ak bℓ |{z}
bm |{z}
bn cp
u v x y z
s′ = ai a2j ak bℓ b2m bn cp
i + 2j + k > p ∨ ℓ + 2m + n > p
• in y. This case is very similar to the case where the change-over hap-
pens in v. We have
ai |{z}
s = |{z} aj |{z}
ak a ℓ m n p
b |{z}
|{z} b c
u v x y z
210
• completely within the leading ap symbols or completely within the
middle bp symbols, or completely within the trailing cp symbols. These
are all complete, and were easy.
• partly in the ap and partly in the bp . This has just been completed. A
subsidiary case analysis on where the change-over from a to b hap-
pens was needed: in v, in x, or in y .
• partly in the bp and partly in the cp . Not done, but requires case
analysis on where the change-over from b to c happens: in v, in x, or
in y. With minor changes, the arguments we gave for the previous
case will establish this case, so we won’t go through them.
Now that we have seen a fully detailed case analysis of the problem, it
is worth considering whether there is a shorter proof. All that case analysis
was pretty tedious! A different approach, which is sometimes a bit simpler
for some (not all) pumping lemma proofs, is to use zones. Let’s try the
example again.
Example 136 (Repeated). The language L = {an bn cn | n ≥ 0} is not context-
free.
Proof. Let the same boilerplate and witness be given. Thus we have the
same facts at our disposal, but will make a different case analysis in the
proof. Notice that vxy can occur either in
• zone A
p p p
s=a b c
|{z}
A
′
In this case, if we pump up (to get s ), we will add a non-zero number
of a and/or b symbols to zone A. Thus count(s′ , a) + count(s′ , b) >
2 ∗ count(s′ , c), which implies that s′ ∈
/ L. Contradiction.
• or zone B:
s = ap |{z}
bp cp
B
′
If we pump up (to get s ), we will add a non-zero number of b and/or
c symbols to zone B. Thus 2 ∗ count(s′ , a) < count(s′ , b) + count(s′ , c),
which implies that s′ ∈
/ L. Contradiction.
211
Remark. The argument for zone A uses the following obvious lemma, which
we will spell out for completeness.
/ {an bn cn | n ≥ 0}
count(w, a) + count(w, b) > 2 ∗ count(w, c) ⇒ w ∈
212
• zone B:
s = 0p |1p{z0}p 1p
B
• zone A:
p p p
s=a b c
|{z}
A
Here we have to pump up: pumping down could preserve the in-
equality. Thus s′ = ai bj cp , where i > p ∨ j > p. In either case, s′ ∈
/ L.
Contradiction.
• zone B:
s = ap |{z}
bp cp
B
213
Proof. Assume L is a context-free language. Then there exists a pumping
2
length p > 0. Consider s = ap . Thus s ∈ L, and len(s) ≥ p. There-
fore, there exists u, v, x, y, z such that s = uvxyz and the following hold (1)
len(vy) > 0, (2) len(vxy) ≤ p, and (3) ∀i ≥ 0. uv i xy i z ∈ L.
By (1) and (2), we know 0 < len(vy) ≤ p. Thus if we pump up once, we
obtain a string s′ = an , where p2 < n ≤ p2 + p. Now consider L. The next2
element of L after s must be of length (p + 1)2 , i.e., of length p2 + 2p + 1.
Since
p 2 < n < p2 + p + 1
we conclude s′ ∈
/ L. Contradiction.
2
L is a set, of course, and so has no notion of ‘next’; however, for every element x of
L, there’s an element y ∈ L such that len(y) > len(x) and y is the shortest element of L
longer than x. Thus y would be the next element of L after x.
214
Chapter 7
Further Topics
215
What happens if we avoid machines and stipulate a direct mapping
from regular expressions to regular expressions? Is it possible? It turns
out that the answer is ”yes”. In the first few pages of Derivatives of Regular
Expressions,1 Brzozowski defines an augmented set of regular expressions,
and then introduces the idea of the derivative2 of a regular expression with
respect to a symbol of the alphabet. He goes on to give a recursive function
to compute the derivative and shows how to use it in regular expression
matching.
An extended regular expression adds ∩ and complementation opera-
tions to the set of regular expression operations. This allows any boolean
operation on languages to be expressed.
• a ∈ R, if a ∈ Σ
• ε∈R
• ∅∈R
• r ∈ R, if r ∈ R (new)
• r1 ∩ r2 ∈ R, if r1 ∈ R ∧ r2 ∈ R (new)
• r1 + r2 ∈ R, if r1 ∈ R ∧ r2 ∈ R
• r1 · r2 ∈ R, if r1 ∈ R ∧ r2 ∈ R
• r ∗ ∈ R, if r ∈ R
• Nothing else is in R
1
Journal of the ACM, October 1964, pages 481 to 494.
2
This is not the familiar notion from calculus, although it was so named because the
algebraic equations are similar.
216
Definition 42 (Semantics of extended regular expressions). The meaning
of an extended regular expression r, written L(r) is defined as follows:
ε + 0 + 1 + (01)∗ + (10)∗
but it requires a few moments thought to make sure that this is a correct
regular expression for the language. However, the following extended
regular expression
Σ∗ (00 + 11)Σ∗
for the language is immediately understandable.
Example 141. The following extended regular expression generates the
language of all binary strings with at least two consecutive zeros and not
ending in 01.
(Σ∗ 00Σ∗ ) ∩ Σ∗ 01
One might think that this can be expressed just as simply with ordinary
regular expressions: something like
Σ∗ 00Σ∗ (10 + 11 + 00 + 0)
Σ∗ 00Σ∗ (10 + 11 + 00 + 0 + ε)
217
Example 142. The following extended regular expression generates the
language of all strings with at least three consecutive ones and not ending
in 01 or consisting of all ones.
Definition 43 (Nullable).
nullable(a) = false if a ∈ Σ
nullable(ε) = true
nullable(∅) = false
nullable(r) = ¬(nullable(r))
nullable(r1 ∩ r2 ) = nullable(r1 ) ∧ nullable(r2 )
nullable(r1 + r2 ) = nullable(r1 ) ∨ nullable(r2 )
nullable(r1 · r2 ) = nullable(r1 ) ∧ nullable(r2 )
nullable(r ∗ ) = true
Derivative(u, r) = {w | u · w ∈ L(r)}
Theorem 21.
w ∈ L(r) iff ε ∈ Derivative(w, r)
218
Definition 45 (Derivative of a symbol). The derivative D(a, r) of a regular
expression r with respect to a symbol a ∈ Σ is defined by
D(a, ε) = ∅
D(a, ∅) = ∅
D(a, a) = ε
D(a, b) = ∅ if a 6= b
D(a, r) = D(a, r)
D(a, r1 + r2 ) = D(a, r1 ) + D(a, r2 )
D(a, r1 ∩ r2 ) = D(a, r1 ) ∩ D(a, r2 )
D(a, r1 · r2 ) = (D(a, r1 ) · r2 ) + D(a, r2 ) if nullable(r1 )
D(a, r1 · r2 ) = D(a, r1 ) · r2 if ¬nullable(r1 )
D(a, r ∗ ) = D(a, r) · r ∗
Consider r ′ = D(a, r). Intuitively, L(r ′) is the set of strings in L(r) from
which a leading a has been dropped. Formally,
Theorem 22.
w ∈ L(D(a, r)) iff a · w ∈ L(r)
Der(ε, r) = r
Der(a · w, r) = Der(w, D(a, r))
Theorem 23.
L(Der(w, r)) = Derivative(w, r)
Recall that the standard way to check if w ∈ L(r) requires the transla-
tion of r to a state machine, followed by running the state machine on w. In
contrast, the use of derivatives allows one to merely evaluate nullable(Der(w, r)),
i.e., to stay in the realm of regular expressions. However, this can be ineffi-
cient, since taking the derivative can substantially increase the size of the
regular expression.
219
Generating automata from extended regular expressions
Instead, Brzozowski’s primary purpose in introducing derivatives was to
use them as a way of directly producing minimal DFAs from extended reg-
ular expressions.The process works as follows. Suppose Σ = {a1 , . . . , an }
and r is a regular expression. We think of r as representing the start state of
the desired DFA. Since the transition function δ of the DFA is total, the suc-
cessor states may be obtained by taking the derivatives D(a1 , r), . . . D(an , r).
This is repeated until no new states can be produced. Final states are just
those that are nullable. The resulting state machine accepts the language
generated by r. This is an amazingly elegant procedure, especially in com-
parison to the translation to automata. However, it depends on being able
to decide when two regular expressions have the same language (so that
seemingly different states can be equated, which is necessary for the pro-
cess to terminate).
Since this regular expression is not equal to that associated with any
other state, we allocate a new state q1 = (0 + 1)∗ 1 + ε. Note that q1 is a
final state because its associated regular expression is nullable. We now
220
compute the successors to q1 :
∗ ∗
D(0, (0 + 1) 1 + ε) = D(0, (0 + 1) 1) + D(0, ε)
= (0 + 1)∗ 1 + ∅
So δ(q1 , 0) = q0 . Also
∗ ∗
D(1, (0 + 1) 1 + ε) = D(1, (0 + 1) 1) + D(1, ε)
∗
= ((0 + 1) 1 + ε) + ∅
0 1
1
q0 q1
0
221
7.1.2 How to Learn a DFA
7.1.3 From DFAs to regular expressions (Again)
[ The following subsection takes a traditional approach to the translation of DFAs
to regexps. In the body of the notes, I have instead used the friendlier (to the
instructor and the student) approach based on representing the automaton by
systems of equations and then iteratively using Arden’s lemma to solve for the
starting state. ]
The basic idea in translating an automaton M into an equivalent regu-
lar expression is to translate M into a regular expression through a series
of steps. Each step will preserve L(M). At each step we will drop a state
from the automaton, and in order to still recognize L(M), we will have
to ‘patch up’ the labels on the edges between the remaining states. The
technical device for accomplishing this is the so-called GNFA, which is
an NFA with arbitrary regular expressions labelling transitions. (You can
think of the intermediate automata in the just-seen regular expression-to-
NFA translation as being GNFA.)
We will look at a very simple example of the translation to aid our
intuition when thinking about the general case.
Example 144. Let the example automaton be given by the following dia-
gram
a, b
a
q0 b q1
The first step is to add a new start state and a single new final state,
connected to the initial automaton by ε-transitions. Also, multiple edges
from a source to a target are agglomerated into one, by joining the labels
via a + operation.
a+b
a
s ε q 0
b q ε1 f
Now we iteratively delete nodes. It doesn’t matter in which order we
delete them—the language will remain the same—although pragmatically,
222
the right choice of node to delete can make the work much simpler.3 Let’s
delete q1 . Now we have to patch the hole left. In order to still accept the
same set of strings, we have to account for the b label, the a + b label on
the self-loop of q1 , and the ε label leading from q1 to f . Thus the following
automaton:
a
ε b(a + b)∗ ε
s q0 f
Similarly, deleting q0 yields the final automaton:
a∗ b(a + b)∗
s f
Constructing a GNFA
To make an initial GNFA (call it GNFA0 ) from an NFA N = (Q, Σ, δ, q0 , F )
requires the following steps:
2. Make a new final state with ε-transitions from all the states in F . The
states in F are no longer considered to be final states in GNFA0 .
r2
qi r1 qj r3 qk
r4
3
The advice of the experts is to delete the node which ‘disconnects’ the automaton as
much as possible.
223
To achieve this may require adding in lots of weird new edges. In par-
ticular, a GNFA must have the following special form:
• The new start state must have arrows going to every other state (but
no arrows coming in to it).
• The new final state must have arrows coming into it from every other
state (but no arrows going out of it).
• For all other states (namely all those in Q) there must be a single ar-
row to every other state, plus a self loop. In order to agglomerate
multiple edges from the same source to a target, we make a ‘sum’ of
all the labels.
Note that if a transition didn’t exist between two states in N, one would
have to be created. For this purpose, such an edge would be labelled with
∅, which fulfills the syntactic requirement without actually enabling any
new behaviour by the machines (since transitions labelled with ∅ can never
be followed). Thus, our simple example
a, b
a
q0 b q1
a a+b
∅
s ε q0 q1 ε f
b
∅ ∅
224
r2
qi r1 qj r3 qk
r4
we replace it by
qi r1 r2 ∗ r3 + r4 qk
Example 145. Give an equivalent regular expression for the following DFA:
b
a
q0 a q1
b a
b q2
b
a
s ε q0 a q1 ε
b a
ε f
b q2
The ‘set-up’ of the initial GNFA means that, for any state qj , except s
and f , the following pattern holds:
225
r2
qi r1 qj r3 qk
r4
s ε q0 a q1
However, we have to consider all such patterns, i.e., all pairs of states
that q0 lies between. There are surprisingly many (eight more, in all):
∅
1. s ε q0 b q2
2. s ε q0 ∅
f
226
∅
3. q1 a q0 b q2
4. q1 a q0 a q1
5. q1 a q0 ∅
f
6. q2 b q0 a q1
7. q2 b q0 b q2
227
∅
8. q2 b q0 ∅
f
Now we apply our rule to get the following new transitions, which
replace any old ones:
1. ε∅∗ a + ∅ q1 = s a q1
s
2. ε∅∗ b + ∅ q2 = s b q2
s
3. ε∅∗ ∅ + ∅ ∅
s f = s f
∗
4. q1 a∅ b + ∅ q2 = q1 ab q2
∗
5. q1 a∅ a + b q1 = q1 aa + b q1
∗
6. a∅ ∅ + ε ε
q1 f = q1 f
∗
7. q2 b∅ a + a q1 = q2 ba + a q1
8. q2 b∅∗ b + ∅ q2 = q2 bb q2
9. b∅∗ ∅ + ε ε
q2 f = q2 f
b + aa
q1
a ε
Thus GNFA1 is s ba + a ab f
ε
b
q2
bb
Now let’s toss out q1 . We therefore have to consider the following cases:
228
b + aa
• = a(b + aa)∗ ab + b
a ab s q2
s q1 q2
b + aa
• = a(b + aa)∗ ε + ∅
a ε s f
s q1 f
b + aa
∗
• = q2 (ba + a)(b + aa) ab + bb q2
q2 ba + a q1 ab q2
bb
b + aa
∗
• = q2 (ba + a)(b + aa) ε + ε f
q2 ba + a q1 ε f
ε
Thus GNFA2 is :
229
a(aa + b)∗
s f
a(aa + b)∗ ab + b
(ba + a)(aa + b)∗ + ε
q2
r ∗
r 2 r r
z }|1 {z }| {z }|3 { z }|4 {
∗
(a(aa + b) ab + b) ((ba + a)(aa + b) ab + bb) ((ba + a)(aa + b) + ε) + a(aa + b)∗
∗ ∗ ∗
s f
7.1.4 Summary
We have now seen a detailed example of translating a DFA to a regular
expression, the denotation of which is just the language accepted by the
DFA. The translations used to convert back and forth can be used to prove
the following important theorem.
230