Computer Science Press - Introduction To Logic and Automata
Computer Science Press - Introduction To Logic and Automata
= x
2n+1
: n = 0, 1, 2, 3, . . .
Example: Alphabet = 0, 1, 2, . . . , 9.
Language
L
3
= any string of alphabet letters that does not start with the letter 0
= 1, 2, 3, . . . , 9, 10, 11, . . .
Denition: For any set S, we use the notation w S to denote that w is
an element of the set S. Also, we use the notation y , S to denote that y
is not an element of the set S.
Example: If L
1
= x
n
: n = 1, 2, 3 . . ., then x L
1
and xxx L
1
, but
, L
1
.
Denition: The set , which is called the empty set, is the set consisting of
no elements.
CHAPTER 2. LANGUAGES 2-6
Fact: Note that , since has no elements.
Example: Let = a, b, and we can dene a language L consisting of all
strings that begin with a followed by zero or more bs; i.e.,
L = a, ab, abb, abbb, . . .
= ab
n
: n = 0, 1, 2, . . ..
2.3 Set Relations and Operations
Denition: If A and B are sets, then A B (A is a subset of B) if w A
implies that w B; i.e., each element of A is also an element of B.
Examples:
Suppose A = ab, ba and B = ab, ba, aaa. Then A B, but B , A.
Suppose A = x, xx, xxx, . . . and B = , x, xx, xxx, . . .. Then A
B, but B , A.
Suppose A = ba, ab and B = aa, bb. Then A , B and B , A.
Denition: Let A and B be 2 sets. A = B if A B and B A.
Examples:
Suppose A = ab, ba and B = ab, ba. Then A B and B A, so
A = B.
Suppose A = ab, ba and B = ab, ba, aaa. Then A B, but B , A,
so A ,= B.
Suppose A = x, xx, xxx, . . . and B = x
n
: n 1. Then A B and
B A, so A = B.
Denition: Given two sets of strings S and T, we dene
S +T = w : w S or w T
CHAPTER 2. LANGUAGES 2-7
to be the union of S and T; i.e., S + T consists of all words either in S or in
T (or in both).
Examples:
Suppose S = ab, bb and T = aa, bb, a. Then S +T = ab, bb, aa, a.
Denition: Given two sets S and T of strings, we dene
S T = w : w S and w T,
which is the intersection of S and T; i.e., S T consists of strings that are in
both S and T.
Denition: Sets S and T are disjoint if S T = .
Examples:
Suppose S = ab, bb and T = aa, bb, a. Then S T = bb.
Suppose S = ab, bb and T = ab, bb. Then S T = ab, bb.
Suppose S = ab, bb and T = aa, ba, a. Then S T = , so S and T
are disjoint.
Denition: For any 2 sets S and T of strings, we dene S T = w : w
S, w , T.
Examples:
Suppose S = a, b, bb, bbb and T = a, bb, bab. Then S T = b, bbb.
Suppose S = ab, ba and T = ab, ba. Then S T = .
Denition: For any set S, we dene [S[, which is called the cardinality of S,
to be the number of elements in S.
Examples:
Suppose S = ab, bb and T = a
n
: n 1. Then [S[ = 2 and [T[ = .
CHAPTER 2. LANGUAGES 2-8
If S = , then [S[ = 0.
Denition: If S is any set, we say that S is nite if [S[ < . If S is not
nite, then we say that S is innite.
Examples:
Suppose S = ab, bb. Then S is nite.
Suppose T = a
n
: n 1. Then T is innite.
Fact: If S and T are 2 disjoint sets (i.e., S T = ), then [S +T[ = [S[ +[T[.
Fact: If S and T are any 2 sets such that [S T[ < , then
[S +T[ = [S[ +[T[ [S T[.
In particular, if S T = , then [S +T[ = [S[ +[T[.
Examples:
Suppose S = ab, bb and T = aa, bb, a. Then
S +T = ab, bb, aa, a
S T = bb
[S[ = 2
[T[ = 3
[S T[ = 1
[S +T[ = 4.
Suppose S = ab, bb and T = aa, ba, a. Then
S +T = ab, bb, aa, ba, a
S T =
[S[ = 2
[T[ = 3
[S T[ = 0
[S +T[ = 5.
CHAPTER 2. LANGUAGES 2-9
Denition: The Cartesian product (or direct or cross product) of two sets A
and B is the set A B = (x, y) : x A, y B of ordered pairs.
Examples:
If A = ab, ba, bbb and B = bb, ba, then
A B = (ab, bb), (ab, ba), (ba, bb), (ba, ba), (bbb, bb), (bbb, ba).
Note that (ab, ba) A B.
Also, note that
B A = (bb, ab), (bb, ba), (bb, bbb), (ba, ab), (ba, ba), (ba, bbb).
Note that (bb, ba) B A, but (bb, ba) , A B, so B A ,= A B.
We can also dene the Cartesian product of more than 2 sets.
Denition: The Cartesian product (or direct or cross product) of n sets
A
1
, A
2
, . . . , A
n
is the set
A
1
A
2
A
n
= (x
1
, x
2
, . . . , x
n
) : x
i
A
i
for i = 1, 2, . . . , n
of ordered n-tuples.
Examples:
Suppose
A
1
= ab, ba, bbb,
A
2
= a, bb,
A
3
= ab, b.
Then
A
1
A
2
A
3
= (ab, a, ab), (ab, a, b), (ab, bb, ab), (ab, bb, b), (ba, a, ab), (ba, a, b),
(ba, bb, ab), (ba, bb, b), (bbb, a, ab), (bbb, a, b), (bbb, bb, ab), (bbb, bb, b).
Note that (ab, a, ab) A
1
A
2
A
3
.
CHAPTER 2. LANGUAGES 2-10
Denition: If S and T are sets of strings, we dene the product set (or
concatenation) ST to be
ST = w = w
1
w
2
: w
1
S, w
2
T
Examples:
If S = a, aa, aaa and T = b, bb, then
ST = ab, abb, aab, aabb, aaab, aaabb
If S = a, ab, aba and T = , b, ba, then
ST = a, ab, aba, abb, abba, abab, ababa
If S = , a, aa and T = , bb, bbbb, bbbbbb, . . ., then
ST = , a, aa, bb, abb, aabb, bbbb, abbbb, . . .
Denition: For any set S, dene 2
S
, which is called the power set, to be the
set of all possible subsets of S; i.e., 2
S
= A : A S.
Example: If S = a, bb, ab, then
2
S
= , a, bb, ab, a, bb, a, ab, bb, ab, a, bb, ab.
Fact: If [S[ < , then [2
S
[ = 2
|S|
; i.e., there are 2
|S|
dierent subsets of S.
2.4 Functions and Operations
Denition: For any string s, the length of s is the number of letters in s.
We will sometimes denote the length of a string s by length(s) or by [s[.
Examples:
length(cat) = 3. Also, [cat[ = 3. If we dene a string s such that s = cat,
then [s[ = 3.
CHAPTER 2. LANGUAGES 2-11
[[ = 0.
Denition: A function (or operator, operation, map, or mapping) f maps
each element in a domain D into a single element in a range R. We denote
this by f : D R. Also, we say that the mapping f is dened on the domain
D and that f is an R-valued mapping. In particular, if the range R 1, i.e.,
if the range is a subset of the real numbers, then we say that f is a real-valued
mapping.
Examples:
Let 1 denote the real numbers, and let 1
+
denote the non-negative real
numbers. We can dene a function f : 1 1
+
as f(x) = x
2
.
If we dene f such that f(3) = 4 and f(3) = 8, then f is not a function
since it maps 3 to more than one value.
Let D be any collection of strings, and let R be the non-negative integers.
Then we can dene f : D R to be such that for any string s D,
f(s) = [s[,
which is the length of s.
We can dene a function f : 1 1 1 to be f(x, y) = x +y.
Let L
1
and L
2
be two sets of strings. Then we can dene the concatena-
tion operator as the function f : L
1
L
2
L
1
L
2
such that
f(w
1
, w
2
) = w
1
w
2
Language L
1
= x
n
: n 1 from before.
Can concatenate a = xxx and b = x to get ab = xxxx.
Note that a, b L
1
and that ab L
1
.
Language L
2
= x
2n+1
: n 0 from before.
Can concatenate a = xxx and b = x to get ab = xxxx.
Note that a, b L
2
but that ab , L
2
.
Denition: For a mapping f dened on a domain D, we dene
f(D) = f(x) : x D;
CHAPTER 2. LANGUAGES 2-12
i.e., f(D) is the set of all possible values that the mapping f can take on when
applied to values in D.
Example:
If f(x) = x
2
and D = 1, then f(D) = 1
+
, the set of non-negative real
numbers.
Denition: Suppose f is a mapping dened on a domain D. We say that D
is closed under mapping f if f(D) D; i.e., if x D implies that f(x) D.
In other words, D is closed under f if applying f to any element in D results
in an element in D.
Denition: Suppose f is a mapping dened on a domain D D. We say
that D is closed under mapping f if f(D, D) D; i.e., if (x, y) D D
implies that f(x, y) D.
Examples:
L
1
= x
n
: n = 1, 2, 3, . . . is closed under concatenation.
L
2
= x
2n+1
: n = 0, 1, 2, . . . is not closed under concatenation since x
concatenated with x yields xx , L
2
.
Denition: For any string w, the reverse of w, written as reverse(w) or w
R
,
is the same string of letters written in reverse order. Thus, if w = w
1
w
2
w
n
,
where each w
i
is a letter, then reverse(w) = w
n
w
n1
w
1
.
Examples:
For xxxx L
1
= x
n
: n = 1, 2, 3, . . ., reverse(xxxx) = xxxx L
1
. We
can show that L
1
is closed under reversal.
Recall L
3
is the set of strings over the alphabet = 0, 1, 2, . . . , 9 such
that the rst letter is not 0. For 48 L
3
, reverse(48) = 84 L
3
.
Example: For 90210 L
3
, reverse(90210) = 01209 , L
3
. Thus, L
3
is
not closed under reversal.
CHAPTER 2. LANGUAGES 2-13
Denition: Over the alphabet = a, b, the language PALINDROME
is dened as
PALINDROME = and all strings x such that reverse(x) = x
= , a, b, aa, bb, aaa, aba, . . .
Note that for the language PALINDROME, the words abba, a PALINDROME,
but their concatenation abbaa is not in PALINDROME.
Denition: Suppose f : D 1 and g : D 1 are real-valued mappings
such that f(x) g(x) for all x D. Then f is a bounded above by g, or g
is an upper bound for f. In addition, if there exists some x D such that
f(x) = g(x), then we say that g is a tight upper bound for f.
Examples:
If f(x) = sin(x) and g(x) = 2 for all x 1, then g is an upper bound of
f, but g is not a tight upper bound of f.
If f(x) = sin(x) and g(x) = 1 for all x 1, then g is a tight upper
bound of f.
Suppose f(x) = x and g(x) = x
2
. Then g is an upper bound for f for
all x 1, and g is tight since g(x) = f(x) for x = 1.
Suppose f(x) = x
2
and g(x) = 2
x
. Then g is an upper bound for f for
all x 4. Also, g is a tight upper bound over x 4 since g(x) = f(x)
for x = 4.
2.5 Closures
Denition: Given an alphabet , let
= w = w
1
w
2
w
n
: n 0, w
i
for i = 1, 2, . . . , n,
where we dene w
1
w
2
w
n
= when n = 0.
CHAPTER 2. LANGUAGES 2-14
Example: Alphabet = x. Then, the closure of is
= , x, xx, xxx, . . .
Example: Alphabet = 0, 1, 2, . . . , 9. Then, the closure of is
= S
0
+S
1
+S
2
+S
3
+ .
In set notation,
S
= w = w
1
w
2
w
3
w
n
: n 0 and w
i
S for all i = 1, 2, 3, . . . , n,
where we interpret w
1
w
2
w
3
w
n
for n = 0 to be the null string . Thus,
S
0
= for any set S. In particular, if S = , we still have S
0
= .
Example: If S = ba, a, then
S
, we can write w = w
1
w
2
w
n
, for some n 0, where each
w
i
S, i = 1, 2, . . . , n.
Since S = ba, a, there are ve possibilities for how the 2-letter sub-
string xy could have arisen:
1. xy is the concatenation of two 1-letter words from S; i.e., for some
i = 1, 2, . . . , n1, we have that xy = w
i
w
i+1
, where w
i
and w
i+1
are
words from S having only one letter each. Since the only 1-letter
word from S is a, we must have that w
i
= w
i+1
= a. In this case,
xy = aa, which is not bb.
2. xy is a 2-letter word from S; i.e., for some i = 1, 2, . . . , n, we have
that xy = w
i
, where w
i
is a 2-letter word from S. Since the only
2-letter word from S is ba, we must have that w
i
= ba. In this case,
xy = ba, which is not bb.
3. xy is the concatenation of a 1-letter word from S and the rst letter
of a 2-letter word from S; i.e., for some i = 1, 2, . . . , n 1, we have
that xy = w
i
w
i+1,1
, where
w
i
is a 1-letter word from S.
w
i+1
is a 2-letter word of S with w
i+1
= w
i+1,1
w
i+1,2
and w
i+1,1
and w
i+1,2
are the two letters of w
i+1
.
Since the only 1-letter word from S is a, we must have that w
i
= a.
Since the only 2-letter word from S is ba, we must have that w
i+1
=
ba, whose rst letter is b. In this case, xy = ab, which is not bb.
4. xy is the concatenation of the second letter of a 2-letter word from
S and a 1-letter word from S; i.e., for some i = 1, 2, . . . , n 1, we
have that xy = w
i,2
w
i+1
, where
w
i
is a 2-letter word of S with w
i
= w
i,1
w
i,2
and w
i,1
and w
i,2
are the two letters of w
i
.
w
i+1
is a 1-letter word from S.
Since the only 2-letter word from S is ba, we must have that w
i
= ba,
whose second letter is a. Since the only 1-letter word from S is a,
we must have that w
i+1
= a. In this case, xy = aa, which is not bb.
CHAPTER 2. LANGUAGES 2-16
5. xy is the concatenation of the second letter of a 2-letter word from
S and the rst letter of a 2-letter word from S; i.e., for some i =
1, 2, . . . , n 1, we have that xy = w
i,2
w
i+1,1
, where
w
i
is a 2-letter word of S with w
i
= w
i,1
w
i,2
and w
i,1
and w
i,2
are the two letters of w
i
.
w
i+1
is a 2-letter word of S with w
i+1
= w
i+1,1
w
i+1,2
and w
i+1,1
and w
i+1,2
are the two letters of w
i+1
.
Since the only 2-letter word from S is ba, we must have that w
i
=
w
i+1
= ba, whose rst letter is b and whose second letter is a. In
this case, xy = ab, which is not bb.
This exhausts all of the possibilities for how a 2-letter substring xy can
arise in this example. Since all of them result in xy ,= bb, we have
completed the proof.
Example: If S = xx, xxx, then
S
= .
Example: If S = , then S
= .
Remarks:
CHAPTER 2. LANGUAGES 2-17
Two words are considered the same if all their letters are the same and
in the same order, so there is only one possible word of no letters, .
There is an important dierence between the word that has no letters
and the language that has no words, which we denote by .
It is not true that is a word in the language since doesnt have any
words at all.
If a language L does not contain the word and we wish to add it to L,
we use the union of sets operation denoted by + to form L +.
Note that L ,= L + if , L.
Note that L = L +.
Denition: If S is some set of words, then S
+
= S
1
+ S
2
+ S
3
+ , which
is the set of all nite strings formed by concatenating some positive number
of strings from S.
Example: If = x, then
+
= x, xx, xxx, . . ..
Denition: If A and B are sets, then A B (A is a subset of B) if w A
implies that w B; i.e., each element of A is also an element of B.
Suppose that we have two sets A and B, and we want to prove that A = B.
One way of proving this is to show that
1. A B, and
2. B A.
Example: Suppose A = x, xx and B = x, xx, xxx. Note that A B,
but B , A, and so A ,= B.
Theorem 1 For any set S of strings, we have that S
= S
.
Proof. The way we will prove this is by showing two things:
1. S
is also in S
.
Note that since w
0
S
, w
0
is made up of factors, say w
1
, w
2
, . . . , w
k
,
k 0, from S
; i.e., w
0
= w
1
w
2
w
k
, with k 0 and w
i
S
for
i = 1, 2, . . . , k.
Also, each factor w
i
, i = 1, 2, . . . , k, is from S
, and so it is made up of
a nonnegative number of factors from S; i.e., w
i
= w
i,1
w
i,2
w
i,n
i
, with
n
i
0 and w
i,j
S for j = 1, 2, . . . , n
i
.
Therefore, we can write
w
0
= w
1
w
2
w
k
= w
1,1
w
1,2
w
1,n
1
w
2,1
w
2,2
w
2,n
2
w
k,1
w
k,2
w
k,n
k
,
where each w
i,j
S, i = 1, 2, . . . , k, j = 1, 2, . . . , n
i
. So the original word
w
0
S
.
Since w
0
was arbitrary, we have just shown that every word in S
is also
a word in S
; i.e., S
.
To show part 2, note that in general, for any set A, we know that A A
.
Hence, letting A = S
, we see that S
.
Chapter 3
Recursive Denitions
3.1 Denition
A recursive denition is characteristically a three-step process:
1. First, we specify some basic objects in the set. The number of basic
objects specied must be nite.
2. Second, we give a nite number of rules for constructing more objects in
the set from the ones we already know.
3. Third, we declare that no objects except those constructed in this way
are allowed in the set.
3.2 Examples
Example: Consider the set P-EVEN, which is the set of positive even num-
bers.
We can dene the set P-EVEN in several dierent ways:
We can dene P-EVEN to be the set of all positive integers that are
evenly divisible by 2.
P-EVEN is the set of all 2n, where n = 1, 2, . . ..
3-1
CHAPTER 3. RECURSIVE DEFINITIONS 3-2
P-EVEN is dened by these three rules:
Rule 1 2 is in P-EVEN.
Rule 2 If x is in P-EVEN, then so is x + 2.
Rule 3 The only elements in the set P-EVEN are those that can be
produced from the two rules above.
Note that the rst two denitions of P-EVEN are much easier to apply than
the last.
In particular, to show that 12 is in P-EVEN using the last denition, we would
have to do the following:
1. 2 is in P-EVEN by Rule 1.
2. 2 + 2 = 4 is in P-EVEN by Rule 2.
3. 4 + 2 = 6 is in P-EVEN by Rule 2.
4. 6 + 2 = 8 is in P-EVEN by Rule 2.
5. 8 + 2 = 10 is in P-EVEN by Rule 2.
6. 10 + 2 = 12 is in P-EVEN by Rule 2.
We can make another denition for P-EVEN as follows:
Rule 1 2 is in P-EVEN.
Rule 2 If x and y are both in P-EVEN, then x +y is in P-EVEN.
Rule 3 No number is in P-EVEN unless it can be produced by rules 1 and 2.
Can use the new denition of P-EVEN to show that 12 is in P-EVEN:
1. 2 is in P-EVEN by Rule 1.
2. 2 + 2 = 4 is in P-EVEN by Rule 2.
3. 4 + 4 = 8 is in P-EVEN by Rule 2.
4. 4 + 8 = 12 is in P-EVEN by Rule 2.
CHAPTER 3. RECURSIVE DEFINITIONS 3-3
Example: Let PALINDROME be the set of all strings over the alphabet =
a, b that are the same spelled forward as backwards; i.e., PALINDROME
= w : w = reverse(w) = , a, b, aa, bb, aaa, aba, bab, bbb, aaaa, abba, . . ..
A recursive denition for PALINDROME is as follows:
Rule 1 , a, and b are in PALINDROME.
Rule 2 If w PALINDROME, then so are awa and bwb.
Rule 3 No other string is in PALINDROME unless it can be produced by
rules 1 and 2.
Example: Let us now dene a set AE of certain valid arithmetic expressions.
The set AE will not include all possible arithmetic expressions.
The alphabet of AE is
= 0 1 2 3 4 5 6 7 8 9 + / ( )
We recursively dene AE using the following rules:
Rule 1 Any number (positive, negative, or zero) is in AE.
Rule 2 If x is in AE, then so are (x) and (x).
Rule 3 If x and y are in AE, then so are
(i) x +y (if the rst symbol in y is not )
(ii) x y (if the rst symbol in y is not )
(iii) x y
(iv) x/y
(v) x y (our notation for exponentiation)
Rule 4 AE consists of only those things can be created by the above three
rules.
For example,
(5 (8 + 2))
CHAPTER 3. RECURSIVE DEFINITIONS 3-4
and
5 (8 + 1)/3
are in AE since they can be generated using the above denition.
However,
((6 + 7)/9
and
4(/9 4)
are not since they cannot be generated using the above denition.
Now we can use our recursive denition of AE to show that
8 6 ((4/2) + (3 1) 7)/4
is in AE.
1. Each of the numbers are in AE by Rule 1.
2. 8 6 is in AE by Rule 3(iii).
3. 4/2 is in AE by Rule 3(iv).
4. (4/2) is in AE by Rule 2.
5. 3 1 is in AE by Rule 3(ii).
6. (3 1) is in AE by Rule 2.
7. (3 1) 7 is in AE by Rule 3(iii).
8. (4/2) + (3 1) 7 is in AE by Rule 3(i).
9. ((4/2) + (3 1) 7) is in AE by Rule 2.
10. ((4/2) + (3 1) 7)/4 is in AE by Rule 3(iv).
11. 8 6 + ((4/2) + (3 1) 7)/4 is in AE by Rule 3(i).
Chapter 4
Regular Expressions
4.1 Some Denitions
Denition: If S and T are sets of strings of letters (whether they are nite
or innite sets), we dene the product set of strings of letters to be
ST = w = w
1
w
2
: w
1
S, w
2
T
Example: If S = a, aa, aaa and T = b, bb, then
ST = ab, abb, aab, aabb, aaab, aaabb
Example: If S = a, ab, aba and T = , b, ba, then
ST = a, ab, aba, abb, abba, abab, ababa
Example: If S = , a, aa and T = , bb, bbbb, bbbbbb, . . ., then
ST = , a, aa, bb, abb, aabb, bbbb, abbbb, . . .
Denition: Let s and t be strings. Then s is a substring of t if there exist
strings u and v such that t = usv.
4-1
CHAPTER 4. REGULAR EXPRESSIONS 4-2
Example: Suppose s = aba and t = aababb.
Then s is a substring of t since we can dene u = a and v = bb, and then
t = usv.
Example: Suppose s = abb and t = aaabb.
Then s is a substring of t since we can dene u = aa and v = , and then
t = usv.
Example: Suppose s = bb and t = aababa.
Then s is not a substring of t.
Denition: Over the alphabet = a, b, a string contains a double letter
if it has either aa or bb as a substring.
Example: Over the alphabet = a, b,
1. The string abaabab contains a double letter.
2. The string bb contains a double letter.
3. The string aba does not contain a double letter.
4. The string abbba contains two double letters.
4.2 Dening Languages Using Regular Expres-
sions
Previously, we dened the languages:
L
1
= x
n
for n = 1, 2, 3, . . .
L
2
= x, xxx, xxxxx, . . .
But these are not very precise ways of dening languages.
So we now want to be very precise about how we dene languages, and
we will do this using regular expressions.
CHAPTER 4. REGULAR EXPRESSIONS 4-3
Languages that are associated with these regular expressions are called
regular languages and are also said to be dened by a nite representa-
tion.
Regular expressions are written in bold face letters and are a way of
specifying the language.
Recall that we previously saw that for sets S, T, we dened the operations
S +T = w : w S or w T
ST = w = w
1
w
2
: w
1
S, w
2
T
S
= S
0
+S
1
+S
2
+
S
+
= S
1
+S
2
+
We will precisely dene what a regular expression is later. But for now,
lets work with the following sketchy description of a regular expression.
Loosely speaking, a regular expression is a way of specifying a language
in which the only operations allowed are
union (+),
concatenation (or product),
Kleene- closure,
superscript-+.
The allowable symbols are parentheses, , and , as well as each letter
in written in boldface. No other symbols are allowed in a regular ex-
pression. Also, a regular expression must only consist of a nite number
of symbols.
To introduce regular expressions, think of
x = x;
i.e., x represents the language (i.e., set) consisting of exactly one string,
x. Also, think of
a = a,
b = b,
so a is the language consisting of exactly one string a, and b is the
language consisting of exactly one string b.
CHAPTER 4. REGULAR EXPRESSIONS 4-4
Using this interpretation, we can interpret ab to mean
ab = ab = ab
since the concatenation (or product) of the two languages a and b
is the language ab.
We can also interpret a +b to mean
a +b = a +b = a, b
We can also interpret a
to mean
a
= a
= , a, aa, aaa, . . .
We can also interpret a
+
to mean
a
+
= a
+
= a, aa, aaa, . . .
Also, we have
(ab +a)
b = (ab +a)
b = ab, a
b
Example: Previously, we saw language
L
4
= , x, xx, xxx, . . .
= x
= language(x
)
Example: Language
L
1
= x, xx, xxx, xxxx, . . .
= language(xx
)
= language(x
x)
= language(x
+
)
= language(x
xx
)
= language(x
x
+
)
Note that there are several dierent regular expressions associated with L
1
.
CHAPTER 4. REGULAR EXPRESSIONS 4-5
Example: alphabet = a, b
language L of all words of the form one a followed by some number (possibly
zero) of bs.
L = language(ab
)
Example: alphabet = a, b
language L of all words of the form some positive number of as followed by
exactly one b.
L = language(aa
b)
Example: alphabet = a, b
language
L = language(ab
a),
which is the set of all strings of as and bs that have at least two letters, that
begin and end with one a, and that have nothing but bs inside (if anything at
all).
L = aa, aba, abba, abbba, . . .
Example: alphabet = a, b
The language L consisting of all possible words over the alphabet has the
following regular expression:
(a +b)
and ( +a +b)
.
Example: alphabet = x
language L with an even number (possibly zero) of xs
L = , xx, xxxx, xxxxxx, . . .
= language((xx)
)
Example: alphabet = x
language L with a positive even number of xs
L = xx, xxxx, xxxxxx, . . .
= language(xx(xx)
)
= language((xx)
+
)
CHAPTER 4. REGULAR EXPRESSIONS 4-6
Example: alphabet = x
language L with an odd number of xs
L = x, xxx, xxxxx, . . .
= language(x(xx)
)
= language((xx)
x)
Is L = language(x
xx
) ?
No, since it includes the word (xx)x(x).
Example: alphabet = a, b
language L of all three-letter words starting with b
L = baa, bab, bba, bbb
= language(b(a +b)(a +b))
= language(baa +bab +bba +bbb)
Example: alphabet = a, b
language L of all words starting with a and ending with b
L = ab, aab, abb, aaab, aabb, abab, abbb, . . .
= language(a(a +b)
b)
Example: alphabet = a, b
language L of all words starting and ending with b
L = b, bb, bab, bbb, baab, babb, bbab, bbbb, . . .
= language(b +b(a +b)
b)
Example: alphabet = a, b
language L of all words with exactly two bs
L = language(a
ba
ba
)
Example: alphabet = a, b
language L of all words with at least two bs
L = language((a +b)
b(a +b)
b(a +b)
)
CHAPTER 4. REGULAR EXPRESSIONS 4-7
Note that bbaaba L since
bbaaba = ()b()b(aaba) = (b)b(aa)b(a)
Example: alphabet = a, b
language L of all words with at least two bs
L = language(a
ba
b(a +b)
)
Note that bbaaba L since bbaaba = b b aaba
Example: alphabet = a, b
language L of all words with at least one a and at least one b
L = language((a +b)
a(a +b)
b(a +b)
+ (a +b)
b(a +b)
a(a +b)
)
= language((a +b)
a(a +b)
b(a +b)
+bb
aa
)
where
the rst regular expression comes from separately considering the two
cases:
1. requiring an a before a b,
2. requiring a b before an a.
the second expression comes from the observation that the rst term
in the rst expression only omits words that are of the form some bs
followed by some as.
Example: alphabet = a, b
language L consists of and all strings that are either all as or b followed by
a nonnegative number of as
L = language(a
+ba
)
= language(( +b)a
)
Theorem 5 If L is a nite language, then L can be dened by a regular
expression.
CHAPTER 4. REGULAR EXPRESSIONS 4-8
Proof. To make a regular expression that denes the language L, turn all
the words in L into boldface type and put pluses between them.
Example: language
L = aba, abba, bbaab
Then a regular expression to dene L is
aba +abba +bbaab
4.3 The Language EVEN-EVEN
Example: Consider the regular expression
E = [aa +bb + (ab +ba)(aa +bb)
(ab +ba)]
.
We now prove that the regular expression E generates the language EVEN-
EVEN, which consists exactly of all strings that have an even number of as
and an even number of bs; i.e.,
EVEN-EVEN = , aa, bb, aabb, abab, abba, baab, baba, bbaa, aaaabb, . . ..
Proof.
Let L
1
be the language generated by the regular expression E.
Let L
2
be the language EVEN-EVEN.
So we need to prove that L
1
= L
2
, which we will do by showing that
L
1
L
2
and L
2
L
1
.
First note that any word generated by E is made up of syllables of
three types:
type
1
= aa
type
2
= bb
type
3
= (ab +ba)(aa +bb)
(ab +ba)
E = [type
1
+ type
2
+ type
3
]
(ab + ba),
and do the following:
If (w
i
= a and w
i+1
= b), then choose ab in the rst part
of the type
3
syllable.
If (w
i
= b and w
i+1
= a), then choose ba in the rst part
of the type
3
syllable.
Do the following while either (w
i+2
= a and w
i+3
= a) or
(w
i+2
= b and w
i+3
= b):
Let i = i + 2.
If w
i
= a and w
i+1
= a, then iterate the inner star of
the type
3
syllable, and use aa.
If w
i
= b and w
i+1
= b, then iterate the inner star of
the type
3
syllable, and use bb.
Let i = i + 2.
If (w
i
= a and w
i+1
= b), then choose ab in the last part
of the type
3
syllable.
If (w
i
= b and w
i+1
= a), then choose ba in the last part
of the type
3
syllable.
Remarks:
We must eventually read in either ab or ba, which bal-
ances out the previous unbalanced pair. This com-
pletes a syllable of type
3
.
If we never read in the second unbalanced pair, then
either the number of as is odd or the number of bs is
odd, which is a contradiction.
(d) Let i = i + 2.
CHAPTER 4. REGULAR EXPRESSIONS 4-11
This algorithm shows how to use the regular expression E to gen-
erate any string in EVEN-EVEN; i.e., if w EVEN-EVEN, then
we can use the above algorithm to generate w using E.
Thus, L
2
L
1
.
4.4 More Examples and Denitions
Example: b
(abb
1
and r
+
1
Rule 3 Nothing else is a regular expression.
Denition: For a regular expression r, let L(r) denote the language generated
by (or associated with) r; i.e., L(r) is the set of strings that can be generated
by r.
Denition: The following rules dene the language associated with (or gen-
erated by) any regular expression:
Rule 1 (i) If , then L( ) = ; i.e., the language associated with the
regular expression that is just a single letter is that one-letter word
alone.
CHAPTER 4. REGULAR EXPRESSIONS 4-12
(ii) L() = ; i.e., the language associated with is , a one-word
language.
(iii) L( ) = ; i.e., the language associated with is , the language
with no words.
Rule 2 If r
1
is a regular expression associated with the language L
1
and r
2
is
a regular expression associated with the language L
2
, then
(i) The regular expression (r
1
)(r
2
) is associated with the language L
1
concatenated with L
2
:
language(r
1
r
2
) = L
1
L
2
.
We dene L
1
= L
1
= .
(ii) The regular expression r
1
+r
2
is associated with the language formed
by the union of the sets L
1
and L
2
:
language(r
1
+r
2
) = L
1
+L
2
(iii) The language associated with the regular expression (r
1
)
is L
1
, the
Kleene closure of the set L
1
as a set of words:
language(r
1
) = L
1
(iv) The language associated with the regular expression (r
1
)
+
is L
+
1
:
language(r
+
1
) = L
+
1
Chapter 5
Finite Automata
5.1 Introduction
Modern computers are often viewed as having three main components:
1. the central processing unit (CPU)
2. memory
3. input-output devices (IO)
The CPU is the thinker
1. Responsible for such things as individual arithmetic computations
and logical decisions based on particular data items.
2. However, the amount of data the unit can handle at any one time
is xed forever by its design.
3. To deal with more than this predetermined, limited amount of in-
formation, it must ship data back and forth, over time, to and from
the memory and IO devices.
Memory
1. The memory may in practice be of several dierent kinds, such as
magnetic core, semiconductor, disks, and tapes.
2. The common feature is that the information capacity of the mem-
ory is vastly greater than what can be accommodated, at any one
instant of time, in the CPU.
5-1
CHAPTER 5. FINITE AUTOMATA 5-2
3. Therefore, this memory is sometimes called auxilliary, to distinguish
it from the limited storage that is part of the CPU.
4. At least in theory, the memory can be expanded without limit, by
adding more core boxes, more tape drives, etc.
IO devices are the means by which information is communicated back
and forth to the outside world; e.g.,
1. terminals
2. printers
3. tapes
We now will study a severely restricted model of an actual computer called a
nite automaton (FA).
Like a real computer, it has a central processor with xed nite capacity,
depending on its original design.
Unlike a real computer, it has no auxiliary memory at all.
It receives its input as a string of characters.
It delivers no output at all, except an indication of whether the input is
considered acceptable.
It is a language-recognition device.
Why should we study such a simple model of computer with no memory?
Actually, nite automata do have memory, but the amount they have is
xed and cannot be expanded.
Finite automata are applicable to the design of several common types of
computer algorithms and programs.
For example, the lexical analysis phase of a compiler is often based
on the simulation of a nite automaton.
The problem of nding an occurrence of one string within another
for example, a particular word within a large text le can
also be solved eciently by methods originating from the theory of
nite automata.
CHAPTER 5. FINITE AUTOMATA 5-3
To introduce nite automata, consider the following scenario:
Play a board game in which two players move pieces around dierent
squares.
Throw dice to determine where to move.
Players have no choices to make when making their move. The move is
completely determined by the dice.
A player wins if after 10 throws of the dice, his piece ends up on a certain
square.
Note that no skill or choice is involved in the game.
Each possible position of pieces on the board is called a state.
Every time the dice are thrown, the state changes according to what
came up on the dice.
We call the winning square a nal state (also known as a halting state,
terminal state, or accepting state).
There may be more than one nal state.
Lets look at another simple example
Suppose you have a simple computer (machine), as described above.
Your goal is to write a program to compute 3 + 4.
The program is a sequence of instructions that are fed into the computer
one at a time.
Each instruction is executed as soon as it is read, and then the next
instruction is read.
If the program is correct, then the computer outputs the number 7 and
terminates execution.
We can think of taking a snapshot of the internals (i.e., contents of
memory, etc.) of the computer after every instruction is executed.
CHAPTER 5. FINITE AUTOMATA 5-4
Each possible conguration of 0s and 1s in the cells of memory repre-
sents a dierent state of the system.
We say the machine ends in a nal state (also called a halting, terminal,
or accepting state) if when the program nishes executing, it outputs
the number 7.
Two machines are in the same state if their output pages look the same
and their memories look the same cell by cell.
The computer is deterministic, i.e., on reading one particular input in-
struction, the machine converts itself from one given state to some par-
ticular other state (which is possibly the same), where the resultant state
is completely determined by the prior state and the input instruction.
No choice is involved.
The success of the program (i.e., it outputs 7) is completely determined
by the sequence of inputs (i.e., the lines of code).
We can think of the set of all computer instructions as the letters of an
alphabet.
We can then dene a language to be the set of all words over this alphabet
that lead to success.
This is the language with words that are all programs that print a 7.
5.2 Finite Automata
Denition: A nite automaton (FA), also known as a nite acceptor, is a
collection M = (K, , , s, F) where :
1. K is a nite set of states.
Exactly one state s K is designated as the initial state (or start
state).
Some set F K is the set of nal states, where we allow F = or
F = K or F could be any other subset of K.
CHAPTER 5. FINITE AUTOMATA 5-5
2. An alphabet of possible input letters, from which are formed strings,
that are to be read one letter at a time.
3. : K K is the transition function.
In other words, for each state and for each letter of the input al-
phabet, the function tells which (one) state to go to next; i.e., if
x K and , then (x, ) is the state that you go to when you
are in state x and read in .
For each state x and each letter , there is exactly one arc
leaving x labeled with .
Thus, there is no choice in how to process a string, and so the
machine is deterministic.
An FA works as follows:
It is presented with an input string of letters.
It starts in the start state.
It reads the string one letter at a time, starting from the left.
The letters read in determine a sequence of states visited.
Processing ends after the last input letter has been read.
If after reading the entire input string the machine ends up in a nal
state, then the input string is accepted. Otherwise, the input string is
rejected.
Example: Consider an FA with three states (x, y, and z) with input alphabet
= a, b.
Dene the following transition table for the FA:
a b
start x y z
y x z
nal z z z
Input the string aaaa to the FA:
CHAPTER 5. FINITE AUTOMATA 5-6
Start in state x and read in rst a, which takes us to state y.
From state y, read in second a, which takes us to state x.
From state x, read in third a, which takes us to state y.
From state y, read in fourth a, which takes us to state x.
No more letters in input string so stop.
Note that on input aaaa,
We ended up in state x, which is not a nal state.
we say that aaaa is not accepted or rejected by this FA.
Now consider the input string abab:
Start in state x and read in rst a, which takes us to state y.
From state y, read in second letter, which is b, which takes us to state z.
From state z, read in third letter, which is a, which takes us to state z.
From state z, read in fourth letter, which is b, which takes us to state z.
No more letters in input string so stop.
On the input string abab:
We ended up in state z, which is a nal state.
we say that abab is accepted by this FA.
Denition: The set of all strings accepted is the language associated with
or accepted by the FA.
Note that
the above FA accepts all strings that have the letter b in them and no
other strings.
CHAPTER 5. FINITE AUTOMATA 5-7
the language accepted by this FA is the one dened by the regular ex-
pression
(a +b)
b(a +b)
= (a +b)
+
All strings over the alphabet = a, b except .
a, b
a, b
- +
CHAPTER 5. FINITE AUTOMATA 5-9
Example: regular expression
(a +b)
-
+ a, b
This FA accepts all strings over the alphabet = a, b including .
CHAPTER 5. FINITE AUTOMATA 5-10
There are FAs that accept the language having no words:
FA has no nal states
-
a
a, b
a
b
b
Final state cannot be reached from start state because graph discon-
nected.
- + b
a
a, b a, b
b
a
Final state cannot be reached from start state because no path
- + b
a, b a, b a
a, b
CHAPTER 5. FINITE AUTOMATA 5-11
Example: Build FA to accept all words in the language
a(a +b)
+
a, b
+
a, b
or
-
a
b
a, b
+
-
a
b
a, b
a, b
Note that
more than one possible FA for any given language
can have more than one nal state
CHAPTER 5. FINITE AUTOMATA 5-12
Example:
a
b
b a
a
b
a, b
1 - 4 +
2
3
Note that
ababa is not accepted.
baaba is accepted.
FA accepts strings that have a double letter
Regular expression of language
(a +b)
ba
ba
b)
+
Language with words having at least one letter and the number of bs divisible
by 4.
1 - 2
4
5
3 +
a
b
a
b
a
a a b
b
b
Example: Only accepts the word .
+
-
a, b a, b
CHAPTER 5. FINITE AUTOMATA 5-16
Example: Regular expression:
(a +b)
b
Words that end with b
does not include .
- + a
b
b
a
Example: Regular expression:
+ (a +b)
b
Either or words that end in b; i.e., words that do not end in a.
+
-
b a
a
b
CHAPTER 5. FINITE AUTOMATA 5-17
Example: Regular expression:
(a +b)
aa + (a +b)
bb
Words that end in a double letter.
-
+
+
a
b
a b b
a
b b
a
a
CHAPTER 5. FINITE AUTOMATA 5-18
Example: EVEN-EVEN
b
b
a a
b
b
1
+
-
2
3 4
a a
Note that
Every b moves us either left or right.
Every a moves us either up or down.
Chapter 6
Transition Graphs
6.1 Introduction
Each FA has the following properties (among others):
For each state x and each letter , there is exactly one arc leaving
x labeled with .
Can only read one letter at a time when traversing an arc.
Exactly one start state.
Now we want a dierent kind of machine that relaxes the above requirements:
For each state x and each letter , we do not require that there is
exactly one arc leaving x labeled with .
Able to read any number of letters at a time when traversing an arc.
Specically, each arc is now labeled with a string s
, so the string s
might be or it might be a single letter .
If an arc is labeled with , we traverse the arc without reading any letters
from the input string.
If an arc is labeled with a non-empty string s
.
If an arc is labeled with , we traverse the arc without reading any
letters from the input string.
If an arc is labeled with an non-empty string s
, we can traverse
the arc if and only if the next unread letter(s) from the original input
string are the string s.
We allow for the possibility that for any state x K and any string
s
aba
- +
aba
a, b
Example: this TG accepts the language of all words that begin and end with
the same letter and have at least two letters.
-
+
+
a
b
a, b
a, b
b
a
CHAPTER 6. TRANSITION GRAPHS 6-8
Example: this TG accepts the language of all words in which the as occur
in clumps of three and that end in four or more bs.
- +
aaa
b
aaa b
b b b
Example: this is the TG for EVEN-EVEN
+
-
ab, ba
aa, bb
ab, ba
aa, bb
Example: Is the word baaabab accepted by this machine?
Chapter 7
Kleenes Theorem
7.1 Kleenes Theorem
The following theorem is the most important and fundamental result in the
theory of FAs:
Theorem 6 Any language that can be dened by either
regular expression, or
nite automata, or
transition graph
can be dened by all three methods.
Proof. The proof has three parts:
Part 1: (FA TG) Every language that can be dened by an FA can also
be dened by a transition graph.
Part 2: (TG RegExp) Every language that can be dened by a transition
graph can also be dened by a regular expression.
Part 3: (RegExp FA) Every language that can be dened by a regular
expression can also be dened by an FA.
7-1
CHAPTER 7. KLEENES THEOREM 7-2
7.2 Proof of Part 1: FA TG
We previously saw that every FA is also a transition graph.
Hence, any language that has been dened by a FA can also be dened
by a transition graph.
7.3 Proof of Part 2: TG RegExp
We will give a constructive algorithm for proving part 2.
Thus, we will describe an algorithm to take any transition graph T and
form a regular expression corresponding to it.
The algorithm will work for any transition graph T.
The algorithm will nish in nite time.
An overview of the algorithm is as follows:
Start with any transition graph T.
First, transform it into an equivalent transition graph having only one
start state and one nal state.
In each following step, eliminate either some states or some arcs by
transforming the TG into another equivalent one.
We do this by replacing the strings labelling arcs with regular expres-
sions.
We can traverse an arc labelled with a regular expression using any string
that can be generated by the regular expression.
End up with a TG having only two states, start and nal, and one arc
going from start to nal.
The nal TG will have a regular expression on its one arc
Note that in each step we eliminate some states or arcs.
Since the original TG has a nite number of states and arcs, the algo-
rithm will terminate in a nite number of iterations.
CHAPTER 7. KLEENES THEOREM 7-3
Algorithm:
1. If T has more than one start state, add a new state and add arcs labeled
going to each of the original start states.
=>
2
3
4
b
a
a
b
...
1
-
/\
/\
1 - 3
4
b
a
a
b
...
2 -
2. If T has more than one nal state, add a new state and add arcs labeled
going from each of the original nal states to the new state. Need to
make sure the nal state is dierent than the start state.
...
+
+
a
b
a
a, b
a, b
b =>
...
a
b
a
a, b
a, b
b +
/\
/\
CHAPTER 7. KLEENES THEOREM 7-4
3. Now we give an iterative procedure for eliminating states and arcs
(a) If T has some state with n > 1 loops circling back to itself, where the
loops are labeled with regular expressions r
1
, r
2
, . . . , r
n
, then replace
the n loops with a single loop labeled with the regular expression
r
1
+r
2
+ +r
n
.
r
1
r
2
r
3
=> r + r + r
1 2 3
(b) If two states are connected by n > 1 direct arcs in the same di-
rection, where the arcs are labelled with the regular expressions
r
1
, r
2
, . . . , r
n
, then replace the n arcs with a single arc labeled with
the regular expression r
1
+r
2
+ +r
n
.
r
r
r + r
1
2
1
=>
2
CHAPTER 7. KLEENES THEOREM 7-5
(c) Bypass operation:
i. If there are three states x, y, z such that
there is an arc from x to y labelled with the regular expres-
sion r
1
and
an arc from y to z labelled with the regular expression r
2
,
then replace the two arcs and the state y with a single arc from
x to z labelled with the regular expression r
1
r
2
.
x y z x z
r
r
r r
1
2 1 2
=>
r
3
x y z x z
r
r
r r
1
2 1
=>
3 2
*r
CHAPTER 7. KLEENES THEOREM 7-6
ii. If there are
n+2 states x, y, z
1
, z
2
, . . . , z
n
such that there is an arc from
x to y labelled with the regular expression r
0
, and
an arc from y to z
i
, i = 1, 2, . . . , n, labelled with the regular
expression r
i
, and
an arc from y back to itself labelled with regular expression
r
n+1
,
then replace the n + 1 original arcs and the state y with n
arcs from x to z
i
, i = 1, 2, . . . , n, each labelled with the regular
expression r
0
r
n+1
r
i
.
z
z
z
1
2
n
.
.
.
z
z
z
1
2
n
.
.
.
x x y
r
r
r
r
0
1
n+1
2
r
=>
0 n+1 1
0 n+1 2
0 n+1 n
r r* r
r r* r
r r* r
n
iii. If any other arcs led directly to y, divert them directly to the
z
i
s.
CHAPTER 7. KLEENES THEOREM 7-7
iv. Need to make sure that all paths possible in the original TG
are still possible after the bypass operation.
Example
w z
w z
1 3
r (r r )* r
4 2
1 3
r r (r r )* r
5 2 2
1 3
r (r r )* r
4 2 1 3
r r (r r )* r
5 2 2
w
x
y
z
r
r r
r
r
1
2 3
4
5
=>
=>
+
CHAPTER 7. KLEENES THEOREM 7-8
Example:
Suppose we want to get rid of state y.
Need to account for all paths that go through state y.
There are arcs coming from x, w, and z going into y.
There are arcs from y to x and z.
Thus, we need to account for each possible path from a
state having an arc into y (i.e., x, w, z) to each state
having an arc from y (i.e., x, z)
Thus, we need to account for the paths from
x to y to x, which has regular expression r
1
r
2
r
5
x to y to z, which has regular expression r
1
r
2
r
3
w to y to x, which has regular expression r
7
r
2
r
5
w to y to z, which has regular expression r
7
r
2
r
3
z to y to x, which has regular expression r
6
r
2
r
5
z to y to z, which has regular expression r
6
r
2
r
3
Thus, after eliminating state y, we get the following:
v. Never delete the unique start or nal state.
CHAPTER 7. KLEENES THEOREM 7-9
Example:
1 - 2 3
4 5 +
1 -
5 +
1 -
5 +
a
=>
abba
abb
bb
a+b
=>
a*(abba+abb+bb)(a+b)*
/\
/\ bb
abb
a, b
ba ab
a
CHAPTER 7. KLEENES THEOREM 7-10
Example:
=>
1 - 2 -
4 5 +
b b
b
a
a
b a
a
3 +
a, b
3
4
b b
b
a
a
b a
-
5
/\ /\
a
a+b
/\
/\
+
4
b
-
=>
a
a
+
a(a+b)*
/\+b
bb*
bb*a(a+b)*
1 2
4
b
-
5
/\
a
a+b
/\
=>
bb*a
a
a
b
+
bb*
2 2
CHAPTER 7. KLEENES THEOREM 7-11
4
b
-
=>
a
+
a(a+b)*
/\+b
bb*(/\+a(a+b)*)
a(ba)*a(a+b)* + ab(ab)*bb*(/\+a(a+b)*)
=> -
+
=> a(ba)*a(a+b)* + ab(ab)*bb*(/\+a(a+b)*) (/\+b)((ab)*bb*(/\+a(a+b)*) + a(ba)*a(a+b)*) +
a(ba)*a(a+b)* + ab(ab)*bb*(/\+a(a+b)*)
(/\+b)((ab)*bb*(/\+a(a+b)*) + a(ba)*a(a+b)*)
4
b
-
=>
a
a
+
a(a+b)*
/\+b
bb*(/\+a(a+b)*) 2 2
CHAPTER 7. KLEENES THEOREM 7-12
7.4 Proof of Part 3: RegExp FA
To show: every language that can be dened by a regular expression can also
be dened by a FA.
We will do this by using a recursive denition and a constructive algorithm.
Recall
every regular expression can be built up from the letters of the alphabet
and and .
Also, given some existing regular expressions, we can build new regular
expressions by applying the following operations:
1. union (+)
2. concatenation
3. closure (Kleene star)
We will not include r
+
in our discussion here, but this will not be a
problem since r
+
= rr
.
CHAPTER 7. KLEENES THEOREM 7-13
Recall that we had the following recursive denition for regular expressions:
Rule 1: If x , then x is a regular expression. is a regular expression.
is a regular expression.
Rule 2: If r
1
and r
2
are regular expressions, then r
1
+r
2
is a regular expres-
sion.
Rule 3: If r
1
and r
2
are regular expressions, then r
1
r
2
is a regular expression.
Rule 4: If r
1
is a regular expression, then r
1
is a regular expression.
Based on the above recursive denition for regular expressions, we have the
following recursive denition for FAs associated with regular expressions:
Rule 1:
There is an FA that accepts the language L dened by the regular
expression x; i.e., L = x, where x , so language L consists of
only a single word and that word is the single letter x.
There is an FA that accepts the language dened by regular expres-
sion ; i.e., the language .
There is an FA dened by the regular expression ; i.e., the language
with no words, which is .
Rule 2: If there is an FA called FA
1
that accepts the language dened by
the regular expression r
1
and there is an FA called FA
2
that accepts the
language dened by the regular expression r
2
, then there is an FA called
FA
3
that accepts the language dened by the regular expression r
1
+r
2
.
Rule 3: If there is an FA called FA
1
that accepts the language dened by
the regular expression r
1
and there is an FA called FA
2
that accepts the
language dened by the regular expression r
2
, then there is an FA called
FA
3
that accepts the language dened by the regular expression r
1
r
2
,
which is the concatenation.
Rule 4: If there is an FA called FA
1
that accepts the language dened by
the regular expression r
1
, then there is an FA called FA
2
that accepts
the language dened by the regular expression r
1
.
CHAPTER 7. KLEENES THEOREM 7-14
Lets now show that each of the rules hold by construction:
Rule 1: There is an FA that accepts the language L dened by the regular
expression x; i.e., L = x, where x . There is an FA that ac-
cepts language dened by the regular expression . There is an FA that
accepts the language dened by the regular expression .
If x , then the following FA accepts the language x:
- +
- {x}
x
An FA that accepts the language is
+
_
An FA that accepts the language is
-
CHAPTER 7. KLEENES THEOREM 7-15
Rule 2: If there is an FA called FA
1
that accepts the language dened by
the regular expression r
1
and there is an FA called FA
2
that accepts the
language dened by the regular expression r
2
, then there is an FA called
FA
3
that accepts the language dened by the regular expression r
1
+r
2
.
Suppose regular expressions r
1
and r
2
are dened with respect to a
common alphabet .
Let L
1
be the language generated by regular expression r
1
.
L
1
has nite automaton FA
1
.
Let L
2
be the language generated by regular expression r
2
.
L
2
has nite automaton FA
2
.
Regular expression r
1
+r
2
generates the language L
1
+L
2
.
Recall L
1
+L
2
= w
: w L
1
or w L
2
.
Thus, w L
1
+ L
2
if and only if w is accepted by either FA
1
or
FA
2
(or both).
We need FA
3
to accept a string if the string is accepted by FA
1
or
FA
2
or both.
We do this by constructing a new machine FA
3
that simultaneously
keeps track of where the input would be if it were running on FA
1
and where the input would be if it were running on FA
2
.
Suppose FA
1
has states x
1
, x
2
, . . . , x
m
, and FA
2
has states y
1
, y
2
, . . . , y
n
.
Assume that x
1
is the start state of FA
1
and that y
1
is the start
state of FA
2
.
We will create FA
3
with states of the form (x
i
, y
j
).
The number of states in FA
3
is at most mn, where m is the number
of states in FA
1
and n is the number of states in FA
2
.
Each state in FA
3
corresponds to a state in FA
1
and a state in
FA
2
.
FA
3
accepts string w if and only if either FA
1
or FA
2
accepts w.
So nal states of FA
3
are those states (x, y) such that x is a nal
state of FA
1
or y is a nal state of FA
2
.
CHAPTER 7. KLEENES THEOREM 7-16
We use the following algorithm to construct FA
3
from FA
1
and FA
2
.
Suppose that is the alphabet for both FA
1
and FA
2
.
Given FA
1
= (K
1
, ,
1
, s
1
, F
1
) with
Set of states K
1
= x
1
, x
2
, . . . , x
m
s
1
= x
1
is the initial state
F
1
K
1
is the set of nal states of FA
1
.
1
: K
1
K
1
is the transition function for FA
1
.
Given FA
2
= (K
2
, ,
2
, s
2
, F
2
) with
Set of states K
2
= y
1
, y
2
, . . . , y
n
s
2
= y
1
is the initial state
F
2
K
2
is the set of nal states of FA
2
.
2
: K
2
K
2
is the transition function for FA
2
.
We then dene FA
3
= (K
3
, ,
3
, s
3
, F
3
) with
Set of states K
3
= K
1
K
2
= (x, y) : x K
1
, y K
2
The alphabet of FA
3
is .
FA
3
has transition function
3
: K
3
K
3
with
3
((x, y), ) = (
1
(x, ),
2
(y, )).
The initial state s
3
= (s
1
, s
2
).
The set of nal states
F
3
= (x, y) K
1
K
2
: x F
1
or y F
2
.
Since K
3
= K
1
K
2
, the number of states in the new machine FA
3
is [K
3
[ = [K
1
[ [K
2
[.
But we can leave out a state (x, y) K
1
K
2
from K
3
if (x, y)
is not reachable from FA
3
s initial state (s
1
, s
2
).
This would result in fewer states in K
3
, but still we have [K
1
[
[K
2
[ as an upper bound for [K
3
[; i.e., [K
3
[ [K
1
[ [K
2
[.
CHAPTER 7. KLEENES THEOREM 7-17
Example: L
1
= words with b as second letter
with regular expression r
1
= (a +b)b(a +b)
L
2
= words with odd number of as
with regular expression r
2
= b
a(b +ab
a)
x1- x2 x3+
x4
a, b b
a
a, b
a, b
b
a
a
b
y1- y2+
FA2 for L2: FA1 for L1:
x1,y1-
x2,y2+
x4,y1
x3,y2+
x4,y2+
a
b
b
a
b
a a
b
a
a
b
b
a
b
FA3 for L1+L2:
x2,y1
x3,y1+
CHAPTER 7. KLEENES THEOREM 7-18
Rule 3: If there is an FA called FA
1
that accepts the language dened by
the regular expression r
1
and there is an FA called FA
2
that accepts the
language dened by the regular expression r
2
, then there is an FA called
FA
3
that accepts the language dened by the regular expression r
1
r
2
.
For this part,
we need FA
3
to accept a string if the string can be factored into
two substrings, where the rst factor is accepted by FA
1
and the
second factor is accepted by FA
2
.
One problem is we dont know when we reach the end of the rst
factor and the beginning of the second factor.
Example: L
1
= words that end with aa
with regular expression r
1
= (a +b)
aa
L
2
= words with odd length
with regular expression r
2
= (a +b)((a +b)(a +b))
3
(x, y
1
, . . . , y
n
, )
=
1
(x, ),
2
(y
1
, ), . . . ,
n
(y
2
, ) if
1
(x, ) , F
1
,
1
(x, ),
2
(y
1
, ), . . . ,
n
(y
2
, ), s
2
if
1
(x, ) F
1
,
where x, y
1
, . . . , y
n
K
3
, n 0, x K
1
, y
i
K
2
for
i = 1, . . . , n, and .
Final states
F
3
= x, y
1
, . . . , y
n
: n 1, y
i
F
2
for some i = 1, . . . , n.
The number of states in FA
3
is
[K
3
[ = [K
1
[ [2
K
2
[ = [K
1
[ 2
|K
2
|
.
Actually, we can leave out fromK
3
any states x, y
1
, . . . , y
n
aa
L
2
= words with odd length
with regular expression r
2
= (a +b)((a +b)(a +b))
1
.
Basic idea of how to build machine FA
2
:
Each state of FA
2
corresponds to one or more states of FA
1
.
FA
2
initially acts like FA
1
.
when FA
2
hits a
state of FA
1
, then FA
2
simultaneously keeps
track of how the rest of the string would be processed on FA
1
from
where it left o and how the rest of the string would be processed
on FA
1
starting in the start state.
Whenever FA
2
hits a
state of FA
1
, we have to start a new
process starting in the start state of FA
1
(if no version of FA
1
is
currently in its start state.)
The nal states of FA
2
are those states which have a correspondence
to some nal state of FA
1
.
We need to be careful about making sure that FA
2
accepts .
To have FA
2
accept , we make the start state of FA
2
also a nal
state.
But we need to be careful when there are arcs going into the start
state of FA
1
.
CHAPTER 7. KLEENES THEOREM 7-23
Formally, we build the machine FA
2
for L
1
as follows:
Let L
1
be language generated by regular expression r
1
and having
nite automaton FA
1
= (K
1
, ,
1
, s
1
, F
1
).
For now, assume that FA
1
does not have any arcs entering the
initial state s
1
.
Know that language L
1
is generated by regular expression r
1
.
Dene FA
2
= (K
2
, ,
2
, s
2
, F
2
) for L
1
with
States K
2
= 2
K
1
.
Initial state s
2
= s
1
.
Transition function
2
: K
2
K
2
with
2
(x
1
, . . . , x
n
, )
=
1
(x
1
, ), . . . ,
1
(x
n
, ) if
1
(x
k
, ) , F
1
for all k = 1, . . . , n,
1
(x
1
, ), . . . ,
1
(x
n
, ), s
1
if
1
(x
k
, ) F
1
for some k = 1, . . . , n,
where x
1
, . . . , x
n
K
2
, n 1, x
i
K
1
for all i = 1, . . . , n,
and .
Final states
F
2
= s
1
+x
1
, . . . , x
n
: n 1, x
i
F
1
for some i = 1, . . . , n.
The number of states in FA
2
is
[K
2
[ = [2
K
1
[ = 2
|K
1
|
.
Actually, we can leave out from K
2
any state x
1
, . . . , x
n
that
is not reachable from the initial state s
2
.
In this case, 2
|K
1
|
still provides an upper bound for [K
2
[; i.e.,
[K
3
[ 2
|K
1
|
.
CHAPTER 7. KLEENES THEOREM 7-24
Example: Consider language L having regular expression
r = (a +bb
ab
a)((b +ab
a)b
a)
b
x1+
-
x4
x3
x2,x1+
x4,x2,x1+
x3,x4
x2,x1,x3+ x1,x2,x3,x4+
b a
b
b
a
b
a
a
b
a
a
a
a
b
b
FA for L*:
x1-
x2+
x3
x4
a
b
a
a
b
b
a
b
FA for L:
CHAPTER 7. KLEENES THEOREM 7-25
Example: Consider language L having regular expression
(a +b)
b
Need to be careful since we can return to the start state.
x1- x2+
b
b
FA for L:
a
a
If we blindly applied previous method for constructing FA for L
, we
get the following:
x1+
-
x2,x1+
a
b a
b
Problem:
Note that start state is nal state.
But this FA accepts a , L
1
are also regular languages.
Proof. (by regular expressions)
If L
1
and L
2
are regular languages, then there are regular expressions r
1
and r
2
that dene these languages.
r
1
+r
2
is a regular expression that denes the language L
1
+L
2
, and so
L
1
+L
2
is a regular language.
r
1
r
2
is a regular expression that denes the language L
1
L
2
, and so L
1
L
2
is a regular language.
r
1
is a regular expression that denes the language L
1
, and so L
1
is a
regular language.
9-1
CHAPTER 9. REGULAR LANGUAGES 9-2
Proof. (by machines)
If L
1
and L
2
are regular languages, then there are transition graphs TG
1
and TG
2
that accept them by Kleenes Theorem.
We may assume that TG
1
has a unique start state and unique nal state,
and the same for TG
2
.
We construct the TG for L
1
+L
2
as follows:
-
- -
TG1 TG2
.
.
.
.
.
.
.
.
new start state
We construct the TG for L
1
L
2
as follows:
new
state
start
- -
TG1 TG2
. . .
. . .
new
final
state
1 + 2 +
^
CHAPTER 9. REGULAR LANGUAGES 9-3
We construct the TG for L
1
as follows:
original
final
state
original
start
state
- - + +
^
^
TG1
^
^
. . .
new start state new final state
CHAPTER 9. REGULAR LANGUAGES 9-4
Remarks:
The technique given in the tapes of lectures 11 and 12 is wrong.
To see why, consider the following FA for the language
L = words having an odd number of bs
+
b
b
-
a a
Note that L
b(a +b)
given
in the taped lecture, then we get the following:
+
b
b
a a
-
^
^
However, the above TG accepts the string a , L
.
On the other hand, if we use the method presented above to con-
struct a TG for L
a
r
2
= (a +b)
aa(a +b)
FA1:
FA2:
b
a
b
a
b
a
b
a
a, b
-
-
+
+
CHAPTER 9. REGULAR LANGUAGES 9-6
r + r = (a+b)*a + (a+b)*aa(a+b)*
1 2
1+2
b
+ b
a
b
a
a, b
+ a b
a
^
^
-
TG for r + r
1 2
CHAPTER 9. REGULAR LANGUAGES 9-7
b
- b
a
b
a
+
^
a b a, b
a
TG for r r
1 2
b
a
b
a
a, b
^
^
^
^
+
TG for r*
2
-
CHAPTER 9. REGULAR LANGUAGES 9-8
9.2 Complementation of Regular Languages
Denition: If L is a language over the alphabet , we dene L
to be its
complement, which is the language of all strings of letters from that are not
words in L, i.e., L
= w
: w , L.
Example: alphabet = a, b
L = language of all words in
= L
Theorem 11 If L is a regular language, then L
from FA as follows:
FA
FA
.
Kleenes Theorem implies that L
is a regular language.
CHAPTER 9. REGULAR LANGUAGES 9-9
Example: = a, b
L = all words with length at least 2 and second letter b
L
1
+L
2
)
L + L
1 2
L L
1 2
L
2
L
1
(L + L )
1 2
Since L
1
and L
2
are regular languages, Theorem 11 implies that L
1
and
L
2
are regular languages.
Theorem 10 then implies that L
1
+L
2
is a regular language.
Theorem 11 then implies that (L
1
+L
2
)
is a regular language.
CHAPTER 9. REGULAR LANGUAGES 9-11
Example: alphabet = a, b
L
1
= all words with length 2 and second letter b
L
2
= all words containing the substring ab.
CHAPTER 9. REGULAR LANGUAGES 9-12
x3
a, b
a
b
a, b
a, b
y3 b
a
a
a, b
b
x2
x4
a, b
a
b
a, b
a, b
y2 b
a
a
a, b
b
x1- x3+
y1- y3+
x1+
-
x2+
x4+
y1+
-
y2+
FA1 :
FA2 :
FA2 :
FA1 :
r1 = (a+b)b(a+b)* r2 = (a+b)*ab(a+b)*
CHAPTER 9. REGULAR LANGUAGES 9-13
x3, y3
a
b
b
b
b
a, b a
a, b
a
b
b
a
x2, y2
x2, y1 x3, y1
x4, y2 x4, y3
a
b
b
b
b
a, b a
a, b
a
b
b
a
x3, y2
x3, y2+
x1, y1+
-
x2, y1+ x3, y1+
x2, y2+
x4, y2+ x4, y3+
x1, y1-
x3, y3+
FA for (L1+L2):
FA for (L1+L2) :
a
a
a
a
CHAPTER 9. REGULAR LANGUAGES 9-14
As an exercise, we will now derive a regular expression for L
1
L
2
using our
FA for (L
1
+L
2
)
, dene n
a
(w) to be the number of as in w, and
n
b
(w) to be the number of bs in w.
Dene the language L = w
: n
a
(w) n
b
(w); i.e., L consists of
strings w for which the number of as in w is at least as large as the
number of bs in w.
For example, abbaa L since the string has 3 as and 2 bs, and 3 2.
We can prove that L is a nonregular language using the pumping lemma.
What string w L should we use to get a contradiction?
Example: Consider the language EQUAL =, ab, ba, aabb, abab, abba, baba, bbaa, . . .,
which consists of all words having an equal number of as and bs. We now
prove that EQUAL is a non-regular language.
CHAPTER 10. NONREGULAR LANGUAGES 10-11
We will prove this by contradiction, so suppose that EQUAL is a regular
language.
Note that a
n
b
n
: n 0 = a
EQUAL
Recall that the intersection of two regular languages is a regular lan-
guage.
Note that a
a
R = (ba)
Pref(Q in R) = (ba)
b
CHAPTER 10. NONREGULAR LANGUAGES 10-12
Theorem 16 If R is a regular language and Q is any language whatsoever,
then the language
P = Pref(Q in R)
is regular.
Proof.
Since R is a regular language, it has some nite automaton FA
1
that
accepts it.
FA
1
has one start state and several (possibly none or one) nal states.
For each state s in FA
1
, do the following:
Using s as the start state, process all words in the language Q on
FA
1
.
When starting s, if some word in Q ends in the nal state of FA
1
,
then paint state s blue.
So for each state s in FA
1
that is painted blue, there exists some word
in Q that can be processed on FA
1
starting from s and end up in a nal
state.
Now construct another machine FA
2
:
FA
2
has the same states and arcs as FA
1
.
The start state of FA
2
is the same as that of FA
1
.
The nal states of FA
2
are the ones that were previously painted
blue (regardless if they were nal states in FA
1
).
We will now show that FA
2
accepts exactly the prex language
P = Pref(Q in R).
To prove this, we have to show two things:
Every word in P is accepted by FA
2
.
Every word accepted by FA
2
is in P.
First, we show that every word accepted by FA
2
is in P.
CHAPTER 10. NONREGULAR LANGUAGES 10-13
Consider any word w accepted by FA
2
.
Starting in the start state of FA
2
, process the word w on FA
2
, and
we end up in a nal state of FA
2
.
Final states of FA
2
were painted blue.
Now we can start from here and process some word from Q and end
up in a nal state of FA
1
.
Thus, the word w P.
Now we prove that every word in P is accepted by FA
2
.
Consider any word p P.
By denition, there exists some word q Q and a word w R such
that pq = w.
This implies that if pq is processed on FA
1
, then we end up in a
nal state of FA
1
.
When processing the string pq on FA
1
, consider the state s we are in
just after nishing processing p and at the beginning of processing
q.
State s must be a blue state since we can start here and process q
and end in a nal state.
Hence, by processing p, we must start in the start state and end in
state s.
Thus, p is accepted by FA
2
.
Chapter 11
Decidability for Regular
Languages
11.1 Introduction
We have three basic questions to answer:
1. How can we tell if two regular expressions dene the same language?
2. How can we tell if two FAs are equivalent?
3. How can we tell if the language dened by an FA has nitely many or
innitely many words in it?
Note that questions 1 and 2 are essentially the same by Kleenes Theorem.
11.2 Decidable Problems
Denition: A problem is eectively solvable if there is an algorithm that
provides the answer in a nite number of steps, no matter what the particular
inputs are (but may depend on the size of the problem).
The maximum number of steps the algorithm will take must be predictable
before we begin executing the procedure.
11-1
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-2
Example: Problem: nd roots of quadratic equation ax
2
+bx +c = 0.
Solution: use quadratic equation
x =
b
b
2
4ac
2a
No matter what the coecients a, b, and c are, we can compute the solution
using the following operations:
four multiplications
two subtractions
one square root
one division
Another solution: keep guessing until we nd a root.
This approach is not guaranteed to nd root in a xed number of steps.
Example: Find the maximum of n numbers. An eective solution for this is
to scan through the list once while updating the maximum observed thus far.
This takes O(n) steps.
Denition: An eective solution to a problem that has a yes or no answer is
called a decision procedure. A problem that has a decision procedure is called
decidable.
11.2.1 Is L
1
= L
2
?
Determine if two languages L
1
and L
2
are the same:
Method 1: Check if the language
L
3
(L
1
L
2
) + (L
1
L
2
)
has any words (even ).
If L
1
= L
2
, then L
3
= .
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-3
If L
1
,= L
2
, then L
3
,= .
Example: Suppose L
1
= a, aa and L
2
= a, aa, aaa. Then L
1
L
2
=
, but L
1
L
2
= aaa. Thus, L
1
,= L
2
.
So now we have reduced the problem of determining if L
1
= L
2
to
determining if L
3
= .
11.2.2 Is L = ?
So we need a method for determining if a regular language is empty.
Since the language is regular, it has a regular expression and a FA.
Given a regular expression, check if there is any part that is not concate-
nated with .
Specically, use the following algorithm to determine if L = given a
regular expression r for L:
Method 1 (for deciding if a language L = given regular expression r
for L):
Write r as
r = r
1
+r
2
+ +r
n
,
where for each i = 1, 2, . . . , n, r
i
= r
i,1
r
i,2
r
i,j
i
for some j
i
1;
i.e., r is written as a sum of other regular expressions r
i
, i =
1, 2, . . . , n, where each r
i
is a concatenation of regular expressions.
It is always possible to write any regular expression r in this form.
If there exists some i = 1, 2, . . . , n such that r
i,j
,= for all 1 j
j
i
, then L ,= . In other words, if one of the summands has none of
its factors being , then the language L is not empty.
If for each i = 1, 2, . . . , n, at least one of r
i,1
, r
i,2
, . . . , r
i,j
i
is , then
L = . In other words, if each of the summands has at least one
factor being , then the language L is empty.
Example: The regular expression
(b +a)
+b
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-4
has the last b not concatenated with so the language is not empty.
Example: The regular expression
(b +a)
+b
has all parts concatenated with so the language is empty.
Remarks: The algorithm in the book for determining if L = given a
regular expression for L is incorrect.
Method 2 (for deciding if a language L = ): Given an FA, we check
if there are any paths from to some + state by using the blue paint
algorithm:
1. Paint the start state blue.
2. From every blue state, follow each edge that leads out of it and paint
the connecting state blue, then delete this edge from the machine.
3. Repeat Step 2 until no new state is painted blue, then stop.
4. When the procedure has stopped, if any of the nal states are
painted blue, then the machine accepts some words, and if not,
the machine accepts no words.
Remarks on Method 2:
The above algorithm will iterate Step 2 at most N times, where N
is the number of states in the machine.
Thus, it is a decision procedure.
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-5
Example:
- +
a
a
b
b
a
a
b
a
b
+
a
a
b
b
a
a
b
a
b
+
b
a
a
b
a
b
+
a
b
a
b
=>
=>
=>
=>
blue
blue
blue
blue blue
b
blue
blue
blue
blue
blue
blue
b
b
b
a
a
a
a
a
b
b
b
b
blue
+
a
b
a
-
-
-
-
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-6
Theorem 17 Let F be an FA with N states. Then if F accepts any
strings at all, it accepts some string with N 1 or fewer letters.
Proof.
Consider any string w that is accepted by F.
Let s = w and DONE = NO.
Do while (DONE == NO)
Trace path of s through F.
If no circuits in path, then set DONE = YES.
If there are circuits in the path, then
Eliminate rst circuit in the path.
Let s be the string resulting from the new path.
Resulting path:
Starts in initial state.
Ends in a nal state.
Has no circuits, so visits at most N states.
This corresponds to a string of at most N 1 letters.
String is accepted by FA.
Method 3 (for deciding if a language L = ): Test all words with N 1
or fewer letters by running them on the FA.
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-7
Example: Consider the languages L
1
and L
2
with FAs:
a
b
b
a
a
a, b
a
b
b
a
b
b
a
a
a, b
a
b
b
x1 +
-
x2 +
x3 +
x4 x5 +
y1+
-
y2+ y3
b a
a, b b a
FA for both
L1 L2 and
L1 L2
FA1: FA2:
x5, y2
x1, y1-
x2, y1
x3, y2
x4, y3
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-8
Theorem 18 There are eective procedures to decide whether:
1. A given FA accepts any words.
2. Two FAs are equivalent; i.e., the two FAs accept the same language.
3. Two regular expressions are equivalent; i.e., the two regular expressions
generate the same language.
Remarks:
We can establish part 3 of Theorem 18 by rst converting the regular
expressions into FAs.
We previously saw an eective procedure for doing this in the proof of
Kleenes Theorem.
Then we just developed an eective procedure to decide whether two
FAs are equivalent.
11.2.3 Is L innite?
Determining if a language L is innite
If we have a regular expression for L, then all we need to do is check if
the is applied to some part of the regular expression that is not nor
.
Note that
= and
= .
Note that a
is innite.
Theorem 19 Let F be an FA with N states. Then
1. If F accepts an input string w such that
N length(w) < 2N
then F accepts an innite language.
2. If F accepts innitely many words, then F accepts some word w such
that
N length(w) < 2N
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-9
Proof.
1. Assume that F accepts an input string w such that
N length(w) < 2N
Since length(w) N, the second version of the pumping lemma
(Theorem 14) implies that there exist substrings x, y, and z such
that y ,= and xy
n
z, n = 0, 1, 2, . . ., are all accepted by F.
Thus, the FA accepts innitely many words.
2. Assume that F accepts innitely many words.
This implies that there exists some word u accepted by F that has
a circuit (possibly more than one). Why?
Each circuit can consist of at most N states since F has only N
states.
Iteratively eliminate the rst circuit in the path until only one cir-
cuit left (as in the proof of Theorem 17).
Let v correspond to the word from this one-circuit path, and note
that v is accepted by F.
We can write v as the concatenation of three strings x, y, and z,
i.e.,
v = xyz,
such that
x consists of the letters read before the circuit.
y consists of the letters read along the circuit.
z consists of the letters read after the circuit
We can show that
0 < length(y) N
as follows:
Since we have eliminated all but the rst circuit, the circuit
starts and ends in the same state and all of the other states are
unique.
Thus, the circuit can visit at most N + 1 states (with at most
one state repeated).
This corresponds to reading at most N letters.
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-10
Also, since a circuit corresponds to at least one transition and
each transition in an FA uses up exactly one letter, we see that
length(y) > 0.
We can show that
length(x) + length(z) < N
as follows:
Since we constructed the string v by eliminating all but the
rst circuit, the paths followed by processing x and z have no
circuits.
Thus, all of the states visited along the paths followed by pro-
cessing x and z are unique.
Hence, the paths followed by processing x and z visit at most
N states.
This means that length(x) + length(z) N 1 < N.
Thus,
length(v) = length(x) + length(y) + length(z) N 1 +N < 2N.
If v has at least N letters, then we are done.
If v has less than N letters, then we can pump up the cycle some
number of times to obtain a word that has the desired characteris-
tics since 0 < length(y) N.
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-11
Example:
1-
2
3
4
5
6+ a b a b b
a
b
a
b
b
a
a
Consider the word w = abaaaababbabb
length(w) = 13 > 2N = 12.
w is accepted by the FA.
Processing w on FA takes the path
1 2 5 3
circuit 1
2 4 5
circuit 2
4 5
circuit 3
4 6 5
circuit 4
4 6
Bypassing all but the rst circuit yields the path
1 2 5 3 2 4 6
which corresponds to the word abaaab, which has length 6.
Thus, Theorem 19 implies that the FA accepts an innite language.
Consider the word w = bbaabb
length(w) = 6 = N
w is accepted by the FA.
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-12
Processing w on FA takes the path
1 3
circuit 1
3 2 4 6
circuit 2
6
Bypassing all but the rst circuit yields the path
1 3
circuit 1
3 2 4 6
which corresponds to the word bbaab, which has length 5.
However, we can go around the circuit one more time, yielding the path
1 3
circuit 1
3
circuit 1
3 2 4 6
which corresponds to the word bbbaab, which has length 6.
CHAPTER 11. DECIDABILITY FOR REGULAR LANGUAGES 11-13
Theorem 20 There is an eective procedure to decide whether a given FA
accepts a nite or an innite language.
Proof.
Suppose that the FA has N states.
Suppose that the alphabet consists of m letters.
Then by Theorem 19, we only need to check all strings w with
N length(w) < 2N
to determine if FA accepts an innite language.
If any of these are accepted, then the FA accepts an innite language.
Otherwise, it accepts a nite language.
The number of strings w satisfying
N length(w) < 2N
is
m
N
+m
N+1
+m
N+2
+ +m
2N1
which is nite.
Thus, checking all of these strings is an eective procedure.
Chapter 12
Context-Free Grammars
12.1 Introduction
English grammar has rules for constructing sentences; e.g.,
1. A sentence can be a subject followed by a predicate .
2. A subject can be a noun-phrase .
3. A noun-phrase can be an adjective followed by a noun-phrase .
4. A noun-phrase can be an article followed by a noun-phrase .
5. A noun-phrase can be a noun .
6. A predicate can be a verb followed by a noun-phrase .
7. A noun can be:
person sh stapler book
8. A verb can be:
buries touches grabs eats
9. An adjective can be:
big small
10. An article can be:
the a an
12-1
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-2
These rules can be used to construct the following sentence:
The small person eats the big sh
sentence subject predicate Rule 1
noun-phrase predicate Rule 2
noun-phrase verb noun-phrase Rule 6
article noun-phrase verb noun-phrase Rule 4
article adjective noun-phrase verb noun-phrase Rule 3
article adjective noun verb noun-phrase Rule 5
article adjective noun verb article noun-phrase Rule 4
article adjective noun verb article adjective noun-phrase Rule 3
article adjective noun verb article adjective noun Rule 5
the adjective noun verb article adjective noun Rule 10
the small noun verb article adjective noun Rule 9
the small person verb article adjective noun Rule 7
the small person eats article adjective noun Rule 8
the small person eats the adjective noun Rule 10
the small person eats the big noun Rule 9
the small person eats the big sh Rule 7
Denition: The things that cannot be replaced by anything are called ter-
minals.
Denition: The things that must be replaced by other things are called
nonterminals.
In the above example,
small and eats are terminals.
noun-phrase and verb are nonterminals.
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-3
Example: restricted class of arithmetic expressions on integers.
start AE
AE AE + AE
AE AE AE
AE AE AE
AE AE / AE
AE AE AE
AE (AE )
AE AE
AE ANY-NUMBER
nonterminals: start , AE
terminals: ANY-NUMBER , +, , , /, , (, )
Can generate the arithmetic expression
ANY-NUMBER +(ANY-NUMBER ANY-NUMBER )/ANY-NUMBER
as follows:
start AE
AE + AE
AE + AE / AE
AE + (AE ) / AE
AE + (AE AE ) / AE
ANY-NUMBER + (AE AE ) / AE
ANY-NUMBER + (ANY-NUMBER AE ) / AE
ANY-NUMBER + (ANY-NUMBER ANY-NUMBER ) / AE
ANY-NUMBER + (ANY-NUMBER ANY-NUMBER ) / ANY-NUMBER
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-4
Could also make ANY-NUMBER a nonterminal:
Rule 1 ANY-NUMBER FIRST-DIGIT
Rule 2 FIRST-DIGIT FIRST-DIGIT OTHER-DIGIT
Rule 3 FIRST-DIGIT 1 2 3 4 5 6 7 8 9
Rule 4 OTHER-DIGIT 0 1 2 3 4 5 6 7 8 9
In this case,
nonterminals: ANY-NUMBER , FIRST-DIGIT , OTHER-DIGIT
terminals: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Can produce the number 90210 as follows:
Rule 1 ANY-NUMBER FIRST-DIGIT
Rule 2 FIRST-DIGIT OTHER-DIGIT
Rule 2 FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT
Rule 2 FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT
Rule 2 FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT
Rule 3 9 OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT
Rule 4 9 0 OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT
Rule 4 9 0 2 OTHER-DIGIT OTHER-DIGIT
Rule 4 9 0 2 1 OTHER-DIGIT
Rule 4 9 0 2 1 0
Note that we had rules of the form:
one nonterminal string of nonterminals
or
one nonterminal choice of terminals
Denition: The sequence of applications of the rules that produces the
nished string of terminals from the starting symbol is called a derivation or
production.
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-5
12.2 Context-Free Grammars
Example: terminals: = a
nonterminal: = S
productions:
S aS
S
Can generate a
4
as follows:
S aS
aaS
aaaS
aaaaS
aaaa = aaaa
Example: terminal: a
nonterminal: S
productions:
S SS
S a
S
Can write this in more compact notation:
S SS [ a [
which is called the Backus Normal Form or Backus-Naur Form (BNF).
CFL is a
Can generate a
2
as follows:
S SS
SSS
SSa
SSSa
SaSa
aSa
aa = aa
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-6
In previous example, unique way to generate any word.
Here, each word in CFL has innitely many derivations.
Denition: Acontext-free grammar (CFG) is a collection G = (, , R, S),
with
1. A (nite) alphabet of letters called terminals from which we make
strings that will be the words of the language.
2. A nite set of symbols called nonterminals, one of which is the
symbol S (i.e., S ), standing for start here.
3. A nite set R of productions, with R (+)
. If a production
(N, |) R with N and | ( + )
.
Claim: L
1
= L
2
.
Proof:
We rst show that L
2
L
1
.
Consider a
n
L
2
for n 1. We can generate a
n
by using
rst production n times, and then second production.
Can generate L
2
by using second production only.
Hence L
2
L
1
.
We now show that L
1
L
2
.
Since a is the only terminal, CFG can only produce strings
having only as.
Thus, L
1
L
2
.
Note that
Two types of arrows:
used in statement of productions
used in derivation of word
in the above derivation of a
4
, there were many unnished stages
that consisted of both terminals and nonterminals. These are called
working strings.
is neither a nonterminal (since it cannot be replaced with some-
thing else) nor a terminal (since it disappears from the string).
12.3 Examples
Example: terminals: a, b
nonterminals: S
productions:
S aS
S bS
S a
S b
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-8
More compact notation:
S aS [ bS [ a [ b
Can produce the word abbab as follows:
S aS
abS
abbS
abbaS
abbab
Let L
1
be the CFL, and let L
2
be the language generated by the
regular expression (a +b)
+
.
Claim: L
1
= L
2
.
Proof:
First we show that L
2
L
1
.
Consider any string w L
2
.
Read letters of w from left to right.
For each letter read in, if it is not the last, then
use the production S aS if the letter is a or
use the production S bS if the letter is b
For the last letter of the word,
use the production S a if the letter is a or
use the production S b if the letter is b
In each stage of the derivation, the working string has the form
(string of terminals)S
Hence, we have shown how to generate w using the CFG, which
means that w L
1
.
Hence, L
2
L
1
.
Now we show that L
1
L
2
.
To show this, we need to show that if w L
1
, then w L
2
.
This is equivalent to showing that if w , L
2
, then w , L
1
.
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-9
Note that the only string w , L
2
is w = .
But note that cannot be generated by the CFG, so , L
1
.
Hence, we have proven that L
1
L
2
.
Example: terminals: a, b
nonterminals: S, X, Y
productions:
S X [ Y
X
Y aY [ bY [ a [ b
Note that if we use rst production (S X), then the only word we
can generate is .
The second production (S Y ) leads to a collection of productions
identical to the previous example.
Thus, the second production produces (a +b)
+
.
CFL is (a +b)
Example: terminals: a, b
nonterminals: S
productions:
S aS [ bS [ a [ b [
CFL is (a +b)
For this CFG, the sequence of productions to generate any word is not
unique.
e.g., can generate bab using
S bS
baS
babS
bab = bab
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-10
or
S bS
baS
bab
Example: terminals: a, b
nonterminals: S, X
productions:
S XaaX
X aX [ bX [
The last set of productions generates any word from
.
CFL is (a +b)
aa(a +b)
aa(a +b)
(ab +ba)]
(ab +ba)
Consider any word generated from the regular expression for EVEN-
EVEN. Lets examine the way it was generated using the regular
expression, and show how to generate the same word using our
CFG.
Start our derivation using the CFG from S.
Every time we iterate the outer star in the regular expression, we
choose one of the three syllables.
1. If we choose a syllable of type 1, then rst use the production
S BS and then the production B aa. Thus, we end up
with a working string of aaS for this iteration of the outer star.
2. If we choose a syllable of type 2, then rst use the production
S BS and then the production B bb. Thus, we end up
with a working string of bbS for this iteration of the outer star.
3. If we choose a syllable of type 3, then
(a) First use the production S SS.
(b) Then change the rst S using the production S USU,
resulting in USUS.
(c) If the rst (ab+ba) in the syllable (ab+ba)(aa+bb)
(ab+
ba) is used to generate ab, then replace the rst U in USUS
using the production U ab, resulting in abSUS. If the
rst (ab + ba) in (ab + ba)(aa + bb)
(ab+ba). Thus,
CHAPTER 12. CONTEXT-FREE GRAMMARS 12-13
we now have xSyS as a working string for this iteration of
the outer star of the regular expression, where x is either
ab or ba, and y is either ab or ba.
(d) Now suppose the (aa + bb)
is iterated n times, n 0. If
n = 0, then change the rst S in xSyS using the production
S , resulting in xyS = xyS. If n 1, then change
the rst S in xSyS using the production S BS and do
this n times, resulting in xBBB BSyS, where there are
n Bs in the clump of Bs. Then change the rst S using
the production S , resulting in xBBB ByS =
xBBB ByS, where there are n Bs in the clump of Bs.
Finally, if on the kth iteration, k n, of the in (aa +
bb)
is written as
N |
Denition: For a given CFG G = (, , R, S), W is a semiword if W
;
i.e., W is a string of terminals (maybe none) cancatenated with exactly one
nonterminal (on the right).
Example: aabaN is a semiword if N is a nonterminal and a and b are
terminals.
CHAPTER 13. GRAMMATICAL FORMAT 13-5
Denition: G = (, , R, S) is a regular grammar if (N, |) R implies
| (
) +
, so L
0
is all words in L except .
Theorem 23 If L is a CFL generated by a CFG G
1
that includes -productions,
then there is another CFG G
2
with no -productions that generates L
0
.
Basic Idea.
We give constructive algorithm to convert CFG G
1
with -productions
into equivalent CFG G
2
with no -productions:
1. Delete all -productions.
CHAPTER 13. GRAMMATICAL FORMAT 13-13
2. For each production
X something
with at least one nullable nonterminal on the right-hand side, do the
following for each possible nonempty subset of nullable nonterminals
on the RHS:
(a) create a new production
X new something
where the new RHS is the same as the old RHS except with
the entire current subset of nullable nonterminals removed.
(b) do not create the production
X
Example: CFG G
1
S a [ Xb [ aY a
X Y [
Y X [ a
has nullable nonterminals X, Y .
We create new productions:
Original Production New Production
S Xb S b
S aY a S aa
X Y Nothing
Y X Nothing
New CFG G
2
:
S a [ Xb [ aY a [ b [ aa
X Y
Y X [ a
CHAPTER 13. GRAMMATICAL FORMAT 13-14
Example: CFG G
1
S X [ XY [ Z
X Z [
Y Wa [ a
Z WX [ aZ [ Zb
W XY Z [ bXa [
has nullable nonterminals S, X, Z, W.
We create new productions:
Original Production New Production
S X Nothing
S XY S Y
S Z Nothing
X Z Nothing
Y Wa Y a
Z WX Z W and Z X
Z aZ Z a
Z Zb Z b
W XY Z W Y Z, W XY , and W Y
W bXa W ba
New CFG G
2
:
S X [ XY [ Z [ Y
X Z
Y Wa [ a
Z WX [ aZ [ Zb [ W [ X [ a [ b
W XY Z [ bXa [ Y Z [ XY [ Y [ ba
We need to show two things:
1. all non- words generated using original CFG G
1
can be generated
using new CFG G
2
.
2. all words generated using new CFG G
2
can be generated using
original CFG G
1
.
CHAPTER 13. GRAMMATICAL FORMAT 13-15
First we show that all non- words generated using original CFG G
1
can
be generated using new CFG G
2
.
Suppose our CFG G
1
included the productions A bBb and B
.
Suppose we had the following derivation of a word:
S . . .
baAaAa
babBbaAa from A bBb
. . .
babBbaabAa
bbabbaabAa from B
. . .
There would have been no dierence if we had applied the produc-
tion A bb rather than A bBb in the third line.
More generally, we can see that any non- word generated using
original CFG G
1
can be generated using new CFG G
2
.
Now show that all words generated using new CFG G
2
can be generated
using original CFG G
1
.
Note that each new production is just a combination of old produc-
tions (e.g., X aY a and Y ).
Can show that any derivation using G
2
has a corresponding deriva-
tion using G
1
that possibly uses a -production.
Hence, all words generated using new CFG G
2
can be generated
using original CFG G
1
.
13.2.2 Unit Productions
Denition: A production (N, |) R is a unit production if | ; i.e., the
production is of the form
one nonterminal one nonterminal
CHAPTER 13. GRAMMATICAL FORMAT 13-16
Theorem 24 If a language L is generated by a CFG G
1
that has no -
productions, then there is also a CFG G
2
for L with no -productions and
no unit productions.
Basic Idea.
Use the following rules to create new CFG:
For each pair of nonterminals A and B such that there is a production
A B
or a chain of productions (unit derivation)
A
B,
introduce the following new productions:
if the non-unit productions from B are
B s
1
[ s
2
[ . . . [ s
n
where the s
i
( + )
.
.
.
See what happens when we process it:
CHAPTER 14. PUSHDOWN AUTOMATA 14-6
STATE STACK TAPE
START aabb
READ
1
,aabb
PUSH a a ,aabb
READ
1
a ,a,abb
PUSH a aa ,a,abb
READ
1
aa ,a,a,bb
POP
1
a ,a,a,bb
READ
2
a ,a,a,b,b
POP
1
,a,a,b,b
READ
2
,a,a,b,b,
POP
2
, ,a,a,b,b,
ACCEPT , ,a,a,b,b,
The language accepted by the PDA is
a
n
b
n
: n = 0, 1, 2, . . .
which is a nonregular language.
Proof. see pages 295299 of text.
So, why can PDAs accept certain nonregular languages?
STACK is memory with unlimited capacity.
FAs only had xed amount of memory built in.
CHAPTER 14. PUSHDOWN AUTOMATA 14-7
14.3 Determinism and Nondeterminism
Denition: A PDA is deterministic if each input string can only be processed
by the machine in one way.
Denition: A PDA is nondeterministic if there is some string that can be
processed by the machine in more than one way.
A nondeterministic PDA
may have more than one edge with the same label leading out of a certain
READ state or POP state.
may have more than one arc leaving the START state.
Both deterministic and nondeterministic PDAs
may have no edge with a certain label leading out of a certain READ
state or POP state.
if we are in a READ or POP state and encounter a letter for which there
is no out-edge from this state, the PDA crashes.
Remarks:
For FAs, nondeterminism does not increase power of machines.
For PDAs, nondeterminism does increase power of machines.
CHAPTER 14. PUSHDOWN AUTOMATA 14-8
14.4 Examples
Example: Language PALINDROMEX, which consists of all words of the
form
sXreverse(s)
where s is any string generated by (a +b)
.
PALINDROMEX = X, aXa, bXb, aaXaa, abXba, baXab, bbXbb, . . .
Each word in PALINDROMEX has odd length and X in middle.
When processing word on PDA, rst read letters from TAPE and PUSH
letters onto STACK until read in X.
Then POP letters o STACK, and check if they are the same as rest of
input string on TAPE.
PDA:
Input alphabet = a, b, X
Stack alphabet = a, b
START
READ
2
POP
1
POP
2
POP
3
ACCEPT
PUSH a READ
1
a
PUSH b
b
X
a
b
a
b
CHAPTER 14. PUSHDOWN AUTOMATA 14-9
Example: Language ODDPALINDROME, which consists of all words over
= a, b having odd length that are the same forwards and backwards.
ODDPALINDROME = a, b, aaa, aba, bab, bbb, aaaaa, . . .
Remarks:
For PALINDROMEX, easy to detect when at middle of word when read-
ing TAPE since marked by X.
For ODDPALINDROME, impossible to detect when at middle of word
when reading TAPE.
Need to use nondeterminism.
START
READ
2
POP
1
POP
2
POP
3
ACCEPT
PUSH a READ
1
a
PUSH b
b
a
b
a
b
a, b
CHAPTER 14. PUSHDOWN AUTOMATA 14-10
Example: Language EVENPALINDROME, which consists of all words over
= a, b having even length that are the same forwards as backwords.
EVENPALINDROME = s reverse(s) : s can be generated by (a +b)
.
.
.
See what happens when we process it:
STATE STACK TAPE
START baab
READ
1
,baab
PUSH b b ,baab
READ
1
b ,b,aab
PUSH a ab ,b,aab
READ
1
ab ,b,a,ab
POP
1
b ,b,a,ab
READ
2
b ,b,a,a,b
POP
2
,b,a,a,b
READ
2
,b,a,a,b,
POP
3
,b,a,a,b,
ACCEPT ,b,a,a,b,
Alternatively, we could have processed it as follows:
STATE STACK TAPE
START baab
READ
1
,baab
PUSH b b ,baab
READ
1
b ,b,aab
POP
1
,b ,b,aab
CRASH ,b,aab
This time the PDA crashes.
But since there is at least one way of processing the string baaaab which leads
to an ACCEPT state, the string is accepted by the PDA.
CHAPTER 14. PUSHDOWN AUTOMATA 14-12
14.5 Formal Denition of PDA and More Ex-
amples
Denition: A pushdown automaton (PDA) is a collection of eight things:
1. An alphabet of input letters.
2. An input TAPE (innite in one direction), which initially contains the
input string to be processed followed by an innite number of blanks
3. An alphabet of STACK characters.
4. A pushdown STACK (innite in one direction), which initially contains
all blanks .
5. One START state that has only out-edges, no in-edges. Can have more
than one arc leaving the START state. There are no labels on arcs
leaving the START state.
6. Halt states of two kinds:
(a) zero or more ACCEPT states
(b) zero or more REJECT states
Each of which have in-edges but no out-edges.
7. Finitely many nonbranching PUSH states that introduce characters from
onto the top of the STACK.
8. Finitely many branching states of two kinds:
(a) READ states, which read the next unused letter from TAPE and
may have out-edges labeled with letters from or a blank .
(There is no restriction on duplication of labels and no requirement
that there be a label for each letter of , or .)
(b) POP states, which read the top character of STACK and may have
out-edges labeled with letters of and the blank character , with
no restrictions.
Remarks:
CHAPTER 14. PUSHDOWN AUTOMATA 14-13
The denition for PDA allows for nondeterminism.
If we want to consider a PDA that does not have nondeterminism, then
we will call it a deterministic PDA.
CHAPTER 14. PUSHDOWN AUTOMATA 14-14
Example: CFG:
S S +S [ S S [ 3
terminals: +, , 3
nonterminals: S
(Nondeterministic) PDA:
START READ
1
READ
2
READ
4
ACCEPT
READ
3
PUSH S
1
POP
PUSH S PUSH S
PUSH S PUSH S
PUSH + PUSH *
2
3
4
5
6
7
S
* +
+
*
S S
3
Process 3 3 + 3 on PDA, where we now erase input TAPE as we read in
letters:
CHAPTER 14. PUSHDOWN AUTOMATA 14-15
STATE STACK TAPE
START 3 3 + 3
PUSH
1
S S 3 3 + 3
POP 3 3 + 3
PUSH
5
S 3 3 + 3
PUSH
6
S 3 3 + 3
PUSH
7
S S 3 3 + 3
POP S 3 3 + 3
READ
1
S 3 + 3
POP S 3 + 3
READ
3
S 3 + 3
POP 3 + 3
PUSH
2
S 3 + 3
PUSH
3
+S 3 + 3
PUSH
4
S +S 3 + 3
POP +S 3 + 3
READ
1
+S +3
POP S +3
READ
2
S 3
POP 3
READ
1
POP
READ
4
ACCEPT
14.6 Some Properties of PDA
Theorem 28 For every regular language L, there is some PDA that accepts
it.
CHAPTER 14. PUSHDOWN AUTOMATA 14-16
Note that PDA can reach ACCEPT state and still have non-blank letters on
TAPE and/or STACK.
Example:
READ
1
REJECT
ACCEPT
START PUSH S
PUSH X
b
a
Theorem 29 Given any PDA, there is another PDA that accepts exactly the
same language with the additional property that whenever a path leads to AC-
CEPT, the STACK and the TAPE contain only blanks.
Proof. Can convert above PDA into equivalent one below:
CHAPTER 14. PUSHDOWN AUTOMATA 14-17
READ
1
ACCEPT
REJECT
START PUSH S
PUSH X
b
POP READ
2
a
Chapter 15
CFG = PDA
15.1 Introduction
We will now see that the following are equivalent:
1. the set of all languages accepted by PDAs
2. the set of all languages generated by CFGs.
15.2 CFG PDA
Theorem 30 Given a language L generated by a particular CFG, there is a
PDA that accepts exactly L.
Proof. By construction
By Theorem 26, we can assume that the CFG is in CNF.
15-1
CHAPTER 15. CFG = PDA 15-2
Example: CFG in CNF:
S AS
S BC
B AA
A a
C b
Propose following (nondeterministic) PDA for above CFG:
START READ
1
ACCEPT
READ
POP
PUSH A PUSH B
PUSH A
S
S
PUSH S READ
3
2
PUSH S PUSH C
PUSH A
B
C A
b a
STACK alphabet: = S, A, B, C
Input TAPE alphabet: = a, b
CHAPTER 15. CFG = PDA 15-3
Consider following leftmost derivation of word aaaab:
S AS
aS
aAS
aaS
aaBC
aaAAC
aaaAC
aaaaC
aaaab
Now process string aaaab on PDA:
CHAPTER 15. CFG = PDA 15-4
Leftmost derivation STATE TAPE STACK
START aaaab
S PUSH S aaaab S
POP (S) aaaab
PUSH S aaaab S
AS PUSH A aaaab AS
POP (A) aaaab S
aS READ
2
,aaaab S
POP (S) ,aaaab
PUSH S ,aaaab S
aAS PUSH A ,aaaab AS
POP (A) ,aaaab S
aaS READ
2
,a,aaab S
POP (S) ,a,aaab
PUSH C ,a,aaab C
aaBC PUSH B ,a,aaab BC
POP (B) ,a,aaab C
PUSH A ,a,aaab AC
aaAAC PUSH A ,a,aaab AAC
POP (A) ,a,aaab AC
aaaAC READ
2
,a,a,aab AC
POP (A) ,a,a,aab C
aaaaC READ
2
,a,a,a,ab C
POP (C) ,a,a,a,ab
aaaab READ
1
,a,a,a,a,b
POP () ,a,a,a,a,b
READ
3
,a,a,a,a,b
ACCEPT ,a,a,a,a,b
Note that just before entering the POP state, the current working string
in the LMD is the same as the cancelled letters on the TAPE concate-
nated with current contents of the STACK.
Before the rst time we enter POP,
working string = S
letters cancelled = none
string of nonterminals in STACK = S
Just before entering POP for the last time,
working string = whole word
CHAPTER 15. CFG = PDA 15-5
letters cancelled = all
string of nonterminals in STACK =
CHAPTER 15. CFG = PDA 15-6
Consider the following CFG in CNF:
X
1
X
2
X
3
X
1
X
1
X
3
X
4
X
2
X
5
.
.
.
X
2
a
X
3
a
X
4
b
.
.
.
where start symbol S = X
1
.
Terminals: a, b
Nonterminals: X
1
, X
2
, . . . , X
n
Construction of PDA will correspond to leftmost derivation of words.
PDA will have only one POP and will be nondeterministic.
Begin constructing PDA by starting with
START
POP PUSH X
1
CHAPTER 15. CFG = PDA 15-7
For each production of the form
X
i
X
j
X
k
we include this circuit from the POP back to itself:
POP
PUSH X
PUSH X
k
j
X
i
CHAPTER 15. CFG = PDA 15-8
For all productions of the form
X
i
b
we add the following circuit to the above POP:
i
X
POP
READ
b
Finally, add the following to the above POP:
POP READ ACCEPT
CHAPTER 15. CFG = PDA 15-9
Recall that languages that include the word cannot be put into CNF.
To take care of this, we need to add loop to the above POP when
is in the language:
POP
S
This last loop will kill nonterminal S without replacing it with
anything.
CHAPTER 15. CFG = PDA 15-10
Example: Let L
0
be the language of the following CFG in CNF:
S AB
S SB
A CA
A a
B b
C b
We now want a PDA for the language L = L
0
+.
Propose following (nondeterministic) PDA for above CFG:
START READ
1
READ
2
READ
4
ACCEPT
READ
3
POP
A C B
S
PUSH B
PUSH A
PUSH A
PUSH C
A S
S
a b b
PUSH S
PUSH B
PUSH S
STACK alphabet: = S, A, B, C
CHAPTER 15. CFG = PDA 15-11
Input TAPE alphabet: = a, b
Consider following leftmost derivation of word babb:
S SB
ABB
CABB
bABB
baBB
babB
babb
Now process string babb on PDA:
Leftmost derivation STATE TAPE STACK
START babb
PUSH S babb S
POP (S) babb
PUSH B babb B
S SB PUSH S babb SB
POP (S) babb B
PUSH B babb BB
ABB PUSH A babb ABB
POP (A) babb BB
PUSH A babb ABB
CABB PUSH C babb CABB
POP (C) babb ABB
bABB READ
3
,babb ABB
POP (A) ,babb BB
baBB READ
1
,b,abb BB
POP (B) ,b,abb B
babB READ
2
,b,a,bb B
POP (B) ,b,a,bb
babb READ
2
,b,a,b,b
POP () ,b,a,b,b
READ
4
,b,a,b,b
ACCEPT ,b,a,b,b
CHAPTER 15. CFG = PDA 15-12
15.3 PDA CFG
Theorem 31 Given a language L that is accepted by a certain PDA, there
exists a CFG that generates exactly L.
Proof. Strategy of proof:
1. Start with any PDA
2. Put the PDA into a standardized form, known as conversion form.
3. The purpose of putting a PDA in conversion form is that since the PDA
now has a standardized form, we can easily convert the pictorial rep-
resentation of the PDA into a table. This table will be known as a
summary table. Number the rows in the summary table.
The summary table and the pictorial representation of the PDA will
contain exactly the same amount of information. In other words, if
you are only given a summary table, you could draw the PDA from
it.
The correspondence between the pictorial representation of the PDA
and the summary table is similar to the correspondence between a
drawing of a nite automaton and a tabular representation of the
FA.
4. Processing and accepting a string on the PDA will correspond to a par-
ticular sequence of rows from the summary table. But not every possible
sequence of rows from the summary table will correspond to a processing
of a string on the PDA. So we will come up with a way of determining
if a particular sequence of rows from the summary table corresponds to
a valid processing of a string on the PDA.
5. Then we will construct a CFG that will generate all valid sequences of
rows from the summary table. We call the collection of all valid sequences
of rows the row-language.
6. Convert this CFG for row-language into CFG that generates all words
of as and bs in original language of PDA.
We now begin by showing how to transform a given PDA into conversion form:
CHAPTER 15. CFG = PDA 15-13
rst introduce new state HERE in PDA.
HERE state does not read TAPE nor push or pop the STACK.
HERE is just used as a marker.
Denition: A PDA is in conversion form if it meets all of the following
conditions:
1. there is only one ACCEPT state.
2. there are no REJECT states.
3. Every READ or HERE is followed immediately by a POP.
4. POPs must be separated by READs or HEREs.
5. All branching occurs at READ or HERE states, none at POP states,
and every edge has only one label.
6. The STACK is initially loaded with the symbol $ on top. If the
symbol is ever popped in processing, it must be replaced immedi-
ately. The STACK is never popped beneath this symbol. Right
before entering ACCEPT, this symbol is popped and left out.
7. The PDA must begin with the sequence:
START POP PUSH $
HERE
or
READ
$
8. The entire input string must be read before the machine can accept
a word.
CHAPTER 15. CFG = PDA 15-14
Note that we can convert any PDA into an equivalent PDA in conversion
form as follows:
1. There is only one ACCEPT state:
If there is more than one ACCEPT state, then delete all but one
and have all the edges that formerly went into the others feed into
the remaining one:
ACCEPT ACCEPT
becomes
ACCEPT
CHAPTER 15. CFG = PDA 15-15
2. There are no REJECT states:
If there were previously any REJECT states in the original PDA,
just delete them from the new PDA. This will just lead to a crash,
which is equivalent to going to a REJECT state.
READ REJECT
b
a
becomes
READ
a
CHAPTER 15. CFG = PDA 15-16
3. Every READ or HERE is followed immediately by a POP:
a
READ READ
1 2
b
becomes
READ
1
READ
2
b
a
POP PUSH b
PUSH $
PUSH a
a
b
$
becomes (by property 5)
READ
1
READ
2
POP
POP
POP
PUSH b
PUSH a
PUSH $
a
b
b
b
b
$
a
CHAPTER 15. CFG = PDA 15-17
4. POPs must be separated by READs or HEREs:
POP POP
1 2
b
becomes
POP POP
2 1
b
HERE
CHAPTER 15. CFG = PDA 15-18
5. All branching occurs at READ or HERE states, none at POP states,
and every edge has only one label.
READ
2
b
POP READ
1
READ
3
a
b
becomes
READ
2
READ
3
POP
POP
b
READ
1
a
b
b
CHAPTER 15. CFG = PDA 15-19
6. The STACK is initially loaded with the symbol $ on top. If the
symbol is ever popped in processing, it must be replaced immedi-
ately. The STACK is never popped beneath this symbol. Right
before entering ACCEPT, this symbol is popped and left out.
$
7. The PDA must begin with the sequence:
START POP PUSH $
HERE
or
READ
$
Simple.
8. The entire input string must be read before the machine can accept
a word:
Use algorithm of Theorem 29.
CHAPTER 15. CFG = PDA 15-20
Example: PDA for language a
2n
b
n
: n = 1, 2, 3, . . .:
READ
1
START
POP
1
POP
2
READ
2
ACCEPT POP
3
PUSH a
a
b
b a a
PDA in conversion form:
PUSH a
PUSH a PUSH a
READ
1
POP
1
POP
2
START
READ
2
ACCEPT
HERE
POP
4
POP POP
5 6
POP
3
$
b
b
$
a
a $
PUSH $
a
a a
PUSH $
CHAPTER 15. CFG = PDA 15-21
Example: PDA for language ab:
PUSH a
a
START
1
READ
POP
READ
2
POP
ACCEPT
a
b
CHAPTER 15. CFG = PDA 15-22
PDA in conversion form:
$
a a
$
a
a
$
POP
START
PUSH $
POP POP POP
POP
1
READ
READ
2
ACCEPT
PUSH $
PUSH a PUSH a
PUSH a
b
From To READ POP PUSH Row
where where what what what number
START READ
1
$ $ 1
READ
1
READ
1
a $ a$ 2
READ
1
READ
1
a a aa 3
READ
1
READ
2
b a 4
READ
2
ACCEPT $ 5
CHAPTER 15. CFG = PDA 15-23
Purpose of conversion form is to decompose machine into path segments,
each of the form:
From To Reading Popping Pushing
START READ One or no Exactly Any string
or READ or HERE input letters one STACK onto the
or HERE or ACCEPT character STACK
The states START, READ, HERE, and ACCEPT are called joints.
We can break up any PDA in conversion form into a collection of joint-
to-joint segments.
Each joint-to-joint segment has the following form:
1. It starts with a joint.
2. The rst joint is immediately followed by exactly one POP.
3. The one POP is immediately followed by zero or more PUSHes.
4. The PUSHes are immediately followed by another JOINT.
Summary table describes the entire PDA as list of all joint-to-joint seg-
ments:
From To READ POP PUSH Row
where where what what what number
START READ
1
$ $ 1
READ
1
READ
1
a $ a$ 2
READ
1
READ
1
a a aa 3
READ
1
READ
2
b a 4
READ
2
ACCEPT $ 5
Consider processing string ab on PDA:
CHAPTER 15. CFG = PDA 15-24
STATE Corresponding
Row Number
START 1
POP
PUSH $
READ
1
2
POP
PUSH $
PUSH a
READ
1
4
POP
READ
2
5
POP
ACCEPT
Every path through PDA corresponds to a sequence of rows of the sum-
mary table
Not every sequence of rows of the summary table corresponds to a path
through PDA.
Need to make sure joint consistent; i.e., last STATE of one row is
same as rst STATE of next row in sequence.
Need to make sure STACK consistent; i.e., when a row pops a
character, it should be at the top of the STACK.
Dene row-language of PDA represented by a summary table:
Alphabet letters:
= Row
1
, Row
2
, . . . , Row
5
i.e., terminals
All valid words are sequences of alphabet letters that correspond
to paths from START to ACCEPT that are joint consistent and
STACK consistent.
All valid words begin with Row
1
and end with Row
5
.
The string
Row
1
Row
4
Row
3
Row
3
is not a valid word
CHAPTER 15. CFG = PDA 15-25
Does not end with Row
5
not joint consistent since Row
4
ends in state READ
2
, and Row
3
begins in state READ
1
not STACK consistent since Row
1
ends with $ on the top of the
STACK, and Row
4
tries to pop a from the top of the STACK
We will develop a CFG for row-language and then transform it into
another CFG for the original language accepted by the PDA.
Recall the strategy of our proof:
1. Start with any PDA
2. Redraw PDA in conversion form.
3. Build summary table and number the rows.
4. Dene row-language to be set of all sequences of rows that corre-
spond to paths through PDA. Make sure STACK consistent.
5. Determine a CFG that generates all words in row-language.
6. Convert this CFG for row-language into CFG that generates all
words of as and bs in original language of PDA.
We are now up to Step 5.
So for Step 5, we want to determine a CFG for the row-language.
Dene nonterminal S to be used to start any derivation in row-language
grammar.
Nonterminals in the row-language grammar:
Net(X, Y, Z)
where
X and Y are specic joints (START, READ, HERE, ACCEPT)
Z is any character from stack alphabet .
Interpretation: There is some path going from joint X to joint Y
(possibly going through other joints) that has the net eect on the
STACK of removing the symbol Z from top of STACK.
STACK is never popped below the initial Z on the top, but may be
built up along the path, and eventually ends with the Z popped.
CHAPTER 15. CFG = PDA 15-26
Example:
READ
1
POP PUSH b PUSH a POP POP
READ
2
a
Z a
b
has net eect of popping Z, and is a Net(READ
1
, READ
2
, Z).
Example:
READ
1
READ
2
POP POP PUSH a
Z a a
does not have net eect of popping Z since STACK went below the initial
Z. Hence, this is not a Net(READ
1
, READ
2
, Z).
CHAPTER 15. CFG = PDA 15-27
Productions in the CFG for row-language will typically have
a nonterminal Net(, , ) on the LHS
and on the RHS, there will be a terminal Row
i
followed by zero or
more nonterminals Net(, , ).
The LHS and RHS of each production will have the same net eect on
the STACK.
Recall that the summary table for our example is
From To READ POP PUSH Row
where where what what what number
START READ
1
$ $ 1
READ
1
READ
1
a $ a$ 2
READ
1
READ
1
a a aa 3
READ
1
READ
2
b a 4
READ
2
ACCEPT $ 5
Example: Production:
Net(READ
1
, READ
2
, a) Row
4
Example: Production:
Net(READ
1
, ACCEPT, $)
Row
2
Net(READ
1
, READ
2
, a) Net(READ
2
, ACCEPT, $)
CHAPTER 15. CFG = PDA 15-28
READ
1
READ
2
READ
1
PUSH a
POP POP
ACCEPT
POP
$
PUSH $
$ a
a
Net(READ1, READ2, a)
Row 2 Net(READ2, ACCEPT, $)
CHAPTER 15. CFG = PDA 15-29
In last example, note that
Row
2
POPs the $ o the stack, then PUSHes $ and then a, and
ends in state READ
1
.
Then, Net(READ
1
, READ
2
, a) starts in state READ
1
, has the net
eect of POPping the a o the top of the STACK, and ends in state
READ
2
.
Then, Net(READ
2
, ACCEPT, $) starts in state READ
2
, has the
net eect of POPping the $ o the top of the STACK, and ends in
state ACCEPT.
The above three steps can be summarized by Net(READ
1
, ACCEPT, $).
More generally, use following rules to create productions:
Rule 1: Create production
S Net(START, ACCEPT, $)
Rule 2: For every row of summary table that has no PUSH entry, such
as
FROM TO READ POP PUSH ROW
X Y anything Z i
we include the production:
Net(X, Y, Z) Row
i
CHAPTER 15. CFG = PDA 15-30
Rule 3: For every row that pushes n 1 characters onto the STACK,
such as
FROM TO READ POP PUSH ROW
X Y anything Z m
1
m
2
m
n
j
for all sets of n READ, HERE, or ACCEPT states S
1
, S
2
, . . . , S
n
,
we create the productions:
Net(X, S
n
, Z) Row
j
Net(Y, S
1
, m
1
) Net(S
1
, S
2
, m
2
) Net(S
n1
, S
n
, m
n
)
.
.
.
POP
Y
m
1
POP
m
S
1
2
POP POP
X
Z
PUSH m
n
PUSH m
n-1
PUSH m
n-2
PUSH m
1
m
S
2
3
. . .
S
n
Row j
Net(Y, S1, m1)
Net(S1, S2, m2)
CHAPTER 15. CFG = PDA 15-31
Some productions generated may never be used in a derivation of a word.
This is analogous to the following:
Example: CFG:
S X [ Y
X aX
Y ab
Production S X doesnt lead to a word.
Applying Rule 1 gives
PROD 1 S Net(START, ACCEPT, $)
Applying Rule 2 to Rows 4 and 5 gives
PROD 2 Net(READ
1
, READ
2
, a) Row
4
PROD 3 Net(READ
2
, ACCEPT, $) Row
5
Applying Rule 3 to Row 1 gives
Net(START, S
1
, $) Row
1
Net(READ
1
, S
1
, $)
where S
1
can take on values READ
1
, READ
2
, ACCEPT.
PROD 4 Net(START, READ
1
, $) Row
1
Net(READ
1
, READ
1
, $)
PROD 5 Net(START, READ
2
, $) Row
1
Net(READ
1
, READ
2
, $)
PROD 6 Net(START, ACCEPT, $) Row
1
Net(READ
1
, ACCEPT, $)
CHAPTER 15. CFG = PDA 15-32
Applying Rule 3 to Row 2 gives
Net(READ
1
, S
2
, $) Row
2
Net(READ
1
, S
1
, a) Net(S
1
, S
2
, $)
where S
2
can be any joint except START
and S
1
can be any joint except START or ACCEPT.
PROD 7 Net(READ
1
, READ
1
, $)
Row
2
Net(READ
1
, READ
1
, a) Net(READ
1
, READ
1
, $)
PROD 8 Net(READ
1
, READ
1
, $)
Row
2
Net(READ
1
, READ
2
, a) Net(READ
2
, READ
1
, $)
PROD 9 Net(READ
1
, READ
2
, $)
Row
2
Net(READ
1
, READ
1
, a) Net(READ
1
, READ
2
, $)
PROD 10 Net(READ
1
, READ
2
, $)
Row
2
Net(READ
1
, READ
2
, a) Net(READ
2
, READ
2
, $)
PROD 11 Net(READ
1
, ACCEPT, $)
Row
2
Net(READ
1
, READ
1
, a) Net(READ
1
, ACCEPT, $)
PROD 12 Net(READ
1
, ACCEPT, $)
Row
2
Net(READ
1
, READ
2
, a) Net(READ
2
, ACCEPT, $)
CHAPTER 15. CFG = PDA 15-33
Applying Rule 3 to Row 3 gives
Net(READ
1
, S
2
, a) Row
3
Net(READ
1
, S
1
, a) Net(S
1
, S
2
, a)
where S
2
can be any joint except START
and S
1
can be any joint except START or ACCEPT.
PROD 13 Net(READ
1
, READ
1
, a)
Row
3
Net(READ
1
, READ
1
, a) Net(READ
1
, READ
1
, a)
PROD 14 Net(READ
1
, READ
1
, a)
Row
3
Net(READ
1
, READ
2
, a) Net(READ
2
, READ
1
, a)
PROD 15 Net(READ
1
, READ
2
, a)
Row
3
Net(READ
1
, READ
1
, a) Net(READ
1
, READ
2
, a)
PROD 16 Net(READ
1
, READ
2
, a)
Row
3
Net(READ
1
, READ
2
, a) Net(READ
2
, READ
2
, a)
PROD 17 Net(READ
1
, ACCEPT, a)
Row
3
Net(READ
1
, READ
1
, a) Net(READ
1
, ACCEPT, a)
PROD 18 Net(READ
1
, ACCEPT, a)
Row
3
Net(READ
1
, READ
2
, a) Net(READ
2
, ACCEPT, a)
Our CFG for the row-language has
5 terminals:
Row
1
, Row
2
, . . . , Row
5
16 nonterminals:
S, 9 of the form Net(, , $), 6 of the form Net(, , a).
18 productions:
PROD 1, . . ., PROD 18
CHAPTER 15. CFG = PDA 15-34
Can derive word in row-language using left-most derivation:
S Net(START, ACCEPT, $) PROD 1
Row
1
Net(READ
1
, ACCEPT, $) PROD 6
Row
1
Row
2
Net(READ
1
, READ
2
, a) Net(READ
2
, ACCEPT, $) PROD 12
Row
1
Row
2
Row
4
Net(READ
2
, ACCEPT, $) PROD 2
Row
1
Row
2
Row
4
Row
5
PROD 3
Not all productions in CFG will be used in derivations of actual words.
Our CFG doesnt generate words having as and bs. It generates words
using terminals Row
1
, Row
2
, . . . , Row
5
.
Need to transform this CFG into another CFG that has terminals a and
b.
CHAPTER 15. CFG = PDA 15-35
To convert previous CFG for row-language into CFG for original lan-
guage of as and bs,
Change the terminals Row
i
into nonterminals
Add new terminals a, b.
Also use
Create more productions as below:
Rule 4: For every row
FROM TO READ POP PUSH ROW
A B C D EFGH i
create the production
Row
i
C
Applying Rule 4 gives
PROD 19 Row
1
PROD 20 Row
2
a
PROD 21 Row
3
a
PROD 22 Row
4
b
PROD 23 Row
5
We can continue with the previous derivation in the row-language gram-
mar to get a word in the original language:
S Net(START, ACCEPT, $) PROD 1
Row
1
Row
2
Row
4
Row
5
PROD 3
Row
2
Row
4
Row
5
PROD 19
a Row
4
Row
5
PROD 20
a b Row
5
PROD 22
a b PROD 23
giving us the word ab.
The word ab can be accepted by the PDA in conversion form by following
the path:
Row
1
Row
2
Row
4
Row
5
Chapter 17
Context-Free Languages
17.1 Closure Under Unions
We will now prove some properties of CFLs.
Theorem 36 If L
1
and L
2
are CFLs, then their union L
1
+L
2
is a CFL.
Proof. By grammars.
L
1
CFL implies that L
1
has a CFG, CFG
1
, that generates it.
Assume that the nonterminals in CFG
1
are S, A, B, C, . . ..
Change the nonterminals in CFG
1
to S
1
, A
1
, B
1
, C
1
, . . ..
Do not change the terminals in the CFG
1
.
L
2
CFL implies that L
2
has a CFG, CFG
2
, that generates it.
Assume that the nonterminals in CFG
2
are S, A, B, C, . . ..
Change the nonterminals in CFG
2
to S
2
, A
2
, B
2
, C
2
, . . ..
Do not change the terminals in the CFG
2
.
Now CFG
1
and CFG
2
have nonintersecting sets of nonterminals.
We create a CFG for L
1
+L
2
as follows:
17-1
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-2
Include all of the nonterminals S
1
, A
1
, B
1
, C
1
, . . . and S
2
, A
2
, B
2
, C
2
, . . ..
Include all of the productions from CFG
1
and CFG
2
.
Create a new nonterminal S and a production
S S
1
[ S
2
To see that this new CFG generates L
1
+L
2
,
note that any word in language L
i
, i = 1, 2, can be generated by
rst using the production S S
i
also, since there is no overlap in the use of nonterminals in CFG
1
and CFG
2
, once we start a derivation with the production S S
1
,
we can only use the productions originally in CFG
1
and cannot use
any of the productions from CFG
2
, and so we can only produce
words in L
1
.
Similar situation occurs when we start a derivation with the pro-
duction S S
2
.
Example:
CFG
1
for L
1
S SS [ AaAb [ BBB [
A SaS [ bBb [ abba
B SSS [ baab
CFG
2
for L
2
S aS [ aAba [ BbB [
A aSa [ abab
B BabaB [ bb
To construct CFG for L
1
+L
2
transform CFG
1
S
1
S
1
S
1
[ A
1
aA
1
b [ B
1
B
1
B
1
[
A
1
S
1
aS
1
[ bB
1
b [ abba
B
1
S
1
S
1
S
1
[ baab
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-3
transform CFG
2
S
2
aS
2
[ aA
2
ba [ Bb
2
B
2
[
A
2
aS
2
a [ abab
B
2
B
2
abaB
2
[ bb
construct CFG for L
1
+L
2
:
S S
1
[ S
2
S
1
S
1
S
1
[ A
1
aA
1
b [ B
1
B
1
B
1
[
A
1
S
1
aS
1
[ bB
1
b [ abba
B
1
S
1
S
1
S
1
[ baab
S
2
aS
2
[ aA
2
ba [ Bb
2
B
2
[
A
2
aS
2
a [ abab
B
2
B
2
abaB
2
[ bb
Proof. (of Theorem 36 by machines)
Since L
1
is CFL, Theorem 30 implies that there exists some PDA, PDA
1
,
that accepts L
1
.
Since L
2
is CFL, Theorem 30 implies that there exists some PDA, PDA
2
,
that accepts L
2
.
Construct new PDA
3
to accept L
1
+L
2
by combining PDA
1
and PDA
2
into one machine by coalescing START states of PDA
1
and PDA
2
into
a single START state.
Note that once we leave the START state of PDA
3
, we can never come
back to the START state.
Also, there is no way to cross over from PDA
1
to PDA
2
.
Hence, any word accepted by PDA
3
must also be accepted by either
PDA
1
or PDA
2
.
Also, it is obvious that any word accepted by either PDA
1
or PDA
2
will
be accepted by PDA
3
.
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-4
Example:
PDA
1
for L
1
:
READ READ ACCEPT
START
b a a
1
PDA
2
for L
2
:
ACCEPT READ
START
PUSH a
a
a
b
POP
2
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-5
PDA
3
for L
1
+L
2
:
READ READ ACCEPT READ
START PUSH a
a
a
b a a
b
POP
17.2 Closure Under Concatenations
Theorem 37 If L
1
and L
2
are CFLs, then L
1
L
2
is a CFL.
Proof. By grammars.
L
1
CFL implies that L
1
has a CFG, CFG
1
, that generates it.
Assume that the nonterminals in CFG
1
are S, A, B, C, . . ..
Change the nonterminals in CFG
1
to S
1
, A
1
, B
1
, C
1
, . . ..
Do not change the terminals in the CFG
1
.
L
2
CFL implies that L
2
has a CFG, CFG
2
, that generates it.
Assume that the nonterminals in CFG
2
are S, A, B, C, . . ..
Change the nonterminals in CFG
2
to S
2
, A
2
, B
2
, C
2
, . . ..
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-6
Do not change the terminals in the CFG
2
.
Now CFG
1
and CFG
2
have nonintersecting sets of nonterminals.
We create a CFG for L
1
L
2
as follows:
Include all of the nonterminals S
1
, A
1
, B
1
, C
1
, . . . and S
2
, A
2
, B
2
, C
2
, . . ..
Include all of the productions from CFG
1
and CFG
2
.
Create a new nonterminal S and a production
S S
1
S
2
To see that this new CFG generates L
1
L
2
,
Obviously, we can generated any word in L
1
L
2
using our new CFG.
also, since there is no overlap in the use of nonterminals in CFG
1
and
CFG
2
, once we start a derivation with the production S S
1
S
2
, the
S
1
part will generate a word from L
1
and the S
2
part will generate
a word from L
2
.
hence, any word generated by the new CFG will be in L
1
L
2
.
Example:
CFG
1
for L
1
S SS [ AaAb [ BBB [
A SaS [ bBb [ abba
B SSS [ baab
CFG
2
for L
2
S aS [ aAba [ BbB [
A aSa [ abab
B BabaB [ bb
To construct CFG for L
1
L
2
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-7
transform CFG
1
S
1
S
1
S
1
[ A
1
aA
1
b [ B
1
B
1
B
1
[
A
1
S
1
aS
1
[ bB
1
b [ abba
B
1
S
1
S
1
S
1
[ baab
transform CFG
2
S
2
aS
2
[ aA
2
ba [ Bb
2
B
2
[
A
2
aS
2
a [ abab
B
2
B
2
abaB
2
[ bb
construct CFG for L
1
L
2
:
S S
1
S
2
S
1
S
1
S
1
[ A
1
aA
1
b [ B
1
B
1
B
1
[
A
1
S
1
aS
1
[ bB
1
b [ abba
B
1
S
1
S
1
S
1
[ baab
S
2
aS
2
[ aA
2
ba [ Bb
2
B
2
[
A
2
aS
2
a [ abab
B
2
B
2
abaB
2
[ bb
Remarks:
Dicult to prove Theorem 37 by machines.
Cannot just combine PDA
1
and PDA
2
by removing the ACCEPT state
of PDA
1
and replacing it with the START state of PDA
2
.
Problem is we can reach the ACCEPT state of PDA
1
while there are
still unread characters on the input TAPE and there are still characters
on the STACK.
Thus, when we go to PDA
2
, we may process the last part of the word in
L
1
and the entire word in L
2
and incorrectly accept or reject the entire
word.
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-8
17.3 Closure Under Kleene Star
Theorem 38 If L is a CFL, then L
is a CFL.
Proof.
Since L is a CFL, by denition there is some CFG that generates L.
Suppose CFG for L has nonterminals S, A, B, C, . . ..
Change the nonterminal S to S
1
.
We create a new CFG for L
as follows:
Include all the nonterminals S
1
, A, B, C, . . . from the CFG for L.
Include all of the productions from the CFG for L.
Add the new nonterminal S and the new production
S S
1
S [
We can repeat last production
S S
1
S S
1
S
1
S S
1
S
1
S
1
S S
1
S
1
S
1
S
1
S S
1
S
1
S
1
S
1
S
1
S
1
S
1
S
1
Note that any word in L
, note that
each of the S
1
above generates a word in L.
Also, there is no interaction between the dierent S
1
s.
Example: CFG for L:
S AaAb [ BBB [
A SaS [ bBb [ abba
B SSS [ baab
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-9
Convert CFG for L:
S
1
AaAb [ BBB [
A S
1
aS
1
[ bBb [ abba
B S
1
S
1
S
1
[ baab
New CFG for L
:
S S
1
S [
S
1
AaAb [ BBB [
A S
1
aS
1
[ bBb [ abba
B S
1
S
1
S
1
[ baab
17.4 Intersections
We now will give an example showing that the intersection of two CFLs
may not be a CFL.
To show this, we will need to assume that the language L
3
= a
n
b
n
a
n
:
n = 0, 1, 2, . . . is a non-context-free language. This is shown in the
textbook in Chapter 16. L
3
is the set of words with some number of as,
followed by an equal number of bs, and ending with the same number
of as.
Example:
Let L
1
be generated by the following CFG:
S XY
X aXb [
Y aY [
Thus, L
1
= a
n
b
n
a
m
: n, m 0, which is the set of words that have
a clump of as, followed by a clump of bs, and ending with another
clump of as, where the number of as at the beginning is the same as
the number of bs in the middle. The number of as at the end of the
word is arbitrary, and does not have to equal the number of as and bs
that come before it.
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-10
Let L
2
be generated by the following CFG:
S WZ
W aW [
Z bZa [
Thus, L
2
= a
i
b
k
a
k
: i, k 0, which is the set of words that have a
clump of as, followed by a clump of bs, and ending with another clump
of as, where the number of bs in the middle is the same as the number
of as at the end. The number of as at the beginning of the word is
arbitrary, and does not have to equal the number of bs and as that
come after it.
Note that L
1
L
2
= L
3
, where L
3
= a
n
b
n
a
n
: n = 0, 1, 2, . . ., which is
a non-context-free language.
However, sometimes the intersection of two CFLs is a CFL.
For example, suppose that L
1
and L
2
are regular languages. Then The-
orem 21 implies that L
1
and L
2
are CFLs. Also, Theorem 12 implies
that L
1
L
2
is a regular language, and so L
1
L
2
is also a CFL by
Theorem 21. Thus, here is an example of 2 CFLs whose intersection is
a CFL.
Thus, in general, we cannot say if the intersection of two CFLs is a CFL.
17.5 Complementation
If L is a CFL, then L
are CFLs.
We now show that the complement of a CFL may not be a CFL by
contradiction:
CHAPTER 17. CONTEXT-FREE LANGUAGES 17-11
Suppose that it is always true that if L is a CFL, then L
is a CFL.
Suppose that L
1
and L
2
are CFLs.
Then by our assumption, we must have that L
1
and L
2
are CFLs.
Theorem 36 implies that L
1
+L
2
is a CFL.
Then by our assumption, we must have that (L
1
+L
2
)
is a CFL.
But we know that (L
1
+L
2
)
= L
1
L
2
by DeMorgans Law.
However, we previously showed that the intersection of two CFLs
is not always a CFL, which contradicts the previous two steps.
So our assumption that CFLs are always closed under complemen-
tation must not be true.
Thus, in general, we cannot say if the complement of a CFL is a CFL.
Chapter 18
Decidability for CFLs
18.1 Membership The CYK Algorithm
We want to determine if a given string x can be generated from a particular
CFG G.
Theorem 45 Let L be a language generated by a CFG G with alphabet .
Given a string s
true if i = k and G
1
has production X s
ii
true if i < k and G
1
has production X Y Z such that
j with i j < k and T[i, j, Y ] and T[j + 1, k, Z]
false otherwise
CHAPTER 18. DECIDABILITY FOR CFLS 18-3
Can solve recursion using dynamic programming.
Store the values of T in an array that is initialized to false
everywhere.
Need to go through the array in such an order that T[i, j, Y ]
and T[j +1, k, Z] are evaluated before T[i, k, X] for i j < k.
Can do this by going through the array for increasing values of
k and, subject to that, decreasing the values of i.
CYK Algorithm: to determine if s L, where L is generated by
CFG G
1
in Chomsky normal form.
/* initialization */
n = length(s);
for every nonterminal X, do begin
for i = 1 to n do
for k = i to n do
T[i, k, X] = false;
for i = 1 to n do
if G
1
has production X s
ii
, then
T[i, i, X] = true;
end;
for k = 2 to n do
for i = k 1 down to 1 do
for all productions in G
1
of the form X Y Z do
for j = i to k 1 do
if T[i, j, Y ] and T[j + 1, k, Z] then
T[i, k, X] = true;
s L i T[1, n, S] = true;
Chapter 19
Turing Machines
19.1 Introduction
Turing machines will be our ultimate model for computers, so they need
output capabilities.
But computers without output statements can tell us something.
Consider the following program
1. READ X
2. IF X=1 THEN END
3. IF X=2 THEN DIVIDE X BY 0
4. IF X>2 THEN GOTO STATEMENT 4
If we assume that the input is always a positive integer, then
if program terminates naturally, then we know X was 1.
if program terminates with error message saying there is an overow
(i.e., crashes), then we know X was 2.
if the program does not terminate, then we know X was greater
than 2.
Denition: A Turing machine (TM) T = (, , , , K, s, H, ), where
1. An alphabet of input letters, and assume that the blank , .
19-1
CHAPTER 19. TURING MACHINES 19-2
2. A Tape divided into a sequence of numbered cells, each containing one
character or a blank.
The input word is presented to the machine on the tape with one
letter per cell beginning in the leftmost cell, called cell i.
The rest of the Tape is initially lled with blanks .
The Tape is innitely long in one direction.
cell ii cell iii cell v cell i
Tape Head
. . .
cell iv
3. A Tape Head that can in one step read the contents of a cell on the
Tape, replace it with some other character, and reposition itself to the
next cell to the right or to the left of the one it has just read.
At the start of the processing, the Tape Head always begins by
reading the input in cell i.
The Tape Head can never move left from cell i. If it is given orders
to do so, the machine crashes.
The location of the Tape Head is indicated as in the above picture.
4. An alphabet of characters that can be printed on the Tape by the
Tape Head .
Assume that , , and we may have that .
The Tape Head may erase a cell, which corresponds to writing
in the cell.
5. A nite set K of states including
Exactly one START state s K from which we begin execution
(and which we may reenter during execution).
CHAPTER 19. TURING MACHINES 19-3
H K is a set of HALT states, which cause execution to terminate
when we enter any of them. There are zero or more HALT states.
The other states have no function, only names such as q
1
, q
2
, q
3
, . . .
or 1, 2, 3, . . ..
6. A program , which is a nite set of rules that, on the basis of the state
we are in and the letter the Tape Head has just read, tells us
(a) how to change states,
(b) what to print on the Tape,
(c) where to move the Tape Head.
The program
K K ( + +) ( +) L, R,
with the restriction that
if (q
1
, q
2
, , c, d) and (q
1
, q
2
,
, c
, d
) with q
1
= q
1
and =
,
then q
2
= q
2
, c = c
and d = d
;
i.e., for any state q
1
and any character + + , there is
only one arc leaving state q
1
corresponding to reading character
from the Tape.
This restriction means that TMs are deterministic.
We depict the program as a collection of directed edges connecting the
states. Each edge is labeled with the triplet of information:
(character, character, direction) (+ +) ( +) L, R
where
The rst character (either or from or ) is the character the
Tape Head reads from the cell to which it is pointing.
From any state, there can be at most one arc leaving that state
corresponding to or any given letter of + ;
i.e., there cannot be two arcs leaving a state both with the same
rst letter (i.e., a Turing machine is deterministic).
The second character (either or from ) is what the Tape Head
prints in the cell before it leaves.
CHAPTER 19. TURING MACHINES 19-4
The third component, the direction, tells the Tape Head whether
to move one cell to the right, R, or one cell to the left, L.
Remarks:
The above denition does not require that every state has an edge leaving
it corresponding to each letter of + .
If we are in a state and read a letter for which there is no arc leaving
that state corresponding to that letter, then the machine crashes. In this
case, the machine terminates execution unsuccessfully.
To terminate execution successfully, machine must be led to a HALT
state. In this case, we say that the word on the input tape is accepted
by the TM.
If Tape Head is currently in cell i and the program tells the Tape Head
to move left, then the machine crashes.
Our denition of TMs requires them to be deterministic. There are also
non-deterministic TMs. When we say just TM, then we mean our
above denition, which means it is deterministic.
Denition: A string w
.
CHAPTER 19. TURING MACHINES 19-5
Example: Consider the following TM with input alphabet = a, b and
tape alphabet = a, b:
START 1 2 3 HALT 4
(a,a,R)
(b,b,R)
(b,b,R)
(b,b,R)
(a,a,R)
( , ,R)
and input tape containing input aba
i
a b a
ii iv v vi iii
We start in state START 1 with the Tape Head reading cell i, and we
denote this by
1
aba
The number on top denotes the state we are currently in. The things
below represent the current contents of the tape, with the letter about
to be read underlined.
After reading in a in state 1, the TM then takes the top arc from state 1
to state 2, and so it prints a into the contents of cell i and the Tape Head
moves to the right to cell ii. We record this action by writing
1
aba
2
aba
CHAPTER 19. TURING MACHINES 19-6
The tape now looks like
i
a b a
ii iv v vi iii
Now we are in state 2, and the Tape Head is pointing to cell ii. Since
cell ii contains b, we will take the arc from state 2 to state 3, print b in
cell ii, and move the Tape Head to the right to cell iii. We record this
action by writing
1
aba
2
aba
3
aba
The tape now looks like
i
a b a
ii iv v vi iii
Now we are in state 3, and the Tape Head is pointing to cell iii. Since
cell iii contains a, we will take the arc labeled (a, a, R) from state 3 back
to state 3, print a in cell iii, and move the Tape Head to the right to
cell iv, which contains a blank . We record this action by writing
1
aba
2
aba
3
aba
3
aba
The tape now looks like
CHAPTER 19. TURING MACHINES 19-7
i
a b a
ii iv v vi iii
Now we are in state 3, and the Tape Head is pointing to cell iv. Since
cell iv contains , we will take the arc labeled (, , R) from state 3 to
state HALT 4, print in cell iv, and move the Tape Head to the right
to cell v, which contains a blank . We record this action by writing
1
aba
2
aba
3
aba
3
aba
HALT
Since we reached a HALT state, the string on the input tape is accepted.
Note that if an input string has a as its second letter, then the TM
crashes, and so the string is not accepted.
This TM accepts the language of all strings over the alphabet = a, b
whose second letter is b.
CHAPTER 19. TURING MACHINES 19-8
Example: Consider the following TM with input alphabet = a, b and
tape alphabet = a, b:
START 1 2 HALT 3
(a,a,R)
(b,b,R)
(a,a,R)
(b,b,R)
( , , R)
Consider processing the word baab on the TM
Note that the rst cell on the TAPE contains b, and so upon reading
this, the TM writes b in cell i, moves the tape head to the right to
cell ii, and then the TM loops back to state 1,
The second cell on the TAPE contains a, and so upon reading this,
the TM moves to state 2, writes a in cell ii, and moves the tape
head to the right to cell iii.
The third cell on the TAPE contains a, and so upon reading this,
the TM writes a in cell iii, moves the tape head to the right to
cell iv, and moves to state 3, which is a HALT state
The TM now halts, and so the string is accepted. Note that the
input tape still has a letter b that has not been read.
Consider processing on the TM the word bba.
Note that each of the rst two bs results in the TM looping back
to state 1 and moving the tape head to the right one cell.
The third letter a makes the TM go to state 2 and moves the tape
head to the right one cell.
CHAPTER 19. TURING MACHINES 19-9
The fourth cell of the TAPE has a blank, and so the TM then
crashes. Thus, bba is not accepted.
Consider processing on the TM the word bab.
Note that the rst letter b results in the TM looping back to state 1
and moving the tape head to the right one cell.
The tape head then reads the a in the second cell, which causes the
TM to move to state 2 and moves the tape head to the right one
cell.
The tape head then reads the b in the third cell, which causes the
TM to move back to state 1 and moves the tape head to the right
one cell.
The fourth cell of the TAPE has a blank, and so the TM returns
to state 1, and the tape head moves one cell to the right.
All of the other cells on the TAPE are blank, and so the TM will
keep looping back to state 1 forever.
Since the TM never reaches a HALT state, the string bab is not
accepted.
In general, we can divide the set of all possible strings into three sets:
1. Strings that contain the substring aa, which are accepted by the
TM since the TM will reach a HALT state.
2. Strings that do not contain substring aa and that end in a. For
these strings, the TM crashes, and so they are not accepted.
3. Strings that do not contain substring aa and that do not end in
a. For these strings, the TM loops forever, and so they are not
accepted.
Note: The videotaped lecture contains an error about this point.
Let S
1
be the set of strings that do not contain the substring
aa and that do not end in a.
Let S
2
be the set of strings that do not contain the substring
aa and that end in b.
In the videotaped lecture, I said that S
2
is the set of strings
for which the TM loops forever, but actually, S
1
is the set of
strings for which the TM loops forever.
CHAPTER 19. TURING MACHINES 19-10
Note that S
1
,= S
2
since S
1
but , S
2
.
This TM accepts the language having regular expression (a+b)
aa(a+
b)
.
Denition: Every Turing machine T over the alphabet divides the set of
input strings into three classes:
1. ACCEPT(T) is the set of all strings w
aa(a +b)
.
REJECT(T) = strings in
; i.e.,
accept(T) = L,
reject(T) + loop(T) = L
.
In other words, the class of languages that are accepted by a TM is exactly
those languages that are recursively enumerable.
Denition: A language L over an alphabet is called recursive if there is a
TM that accepts every word in L and rejects every word in L
; i.e.,
accept(T) = L,
reject(T) = L
,
loop(T) =
23.2 Church-Turing Thesis
There is an eective procedure to solve a decision problem if and only if there
is a Turing machine that halts for all input strings and solves the problem.
23-1
CHAPTER 23. TM LANGUAGES 23-2
23.3 Encoding of Turing Machines
Can take any pictorial representation of a TM and represent it as two tables
of information.
Example: For the following TM
START 1 2 HALT 3
(a,a,R)
(b,b,R)
(a,a,R)
(b,b,R)
( , , R)
we can represent it as the following tables:
State Start? Halt?
1 1 0
2 0 0
3 0 1
From To Read Write Move
1 1 R
1 1 b b R
1 2 a a R
2 1 b b R
2 3 a a R
Remarks:
We can do this encoding for any TM. We call this an encoded Turing
machine.
CHAPTER 23. TM LANGUAGES 23-3
The encoding can be written as just a string of characters.
For example, we can write the above encoding as
110200301%11R11bbR12aaR21bbR23aaR
where we use the % to denote where the rst table ends and the second
one begins.
The textbook converts the above string into a string of as and bs, which
we wont do.
Thus, we can represent any TM as a string of characters, which we can
think of as a program.
We can use the encoded TM as an input string to another TM, just as a
C++ program is an input string to a C++ compiler, which itself is just
a program.
In particular, a copy of a program may be passed to itself as input.
For our above example, the string
110200301%11R11bbR12aaR21bbR23aaR
is rejected by the TM since it crashes on the rst letter.
23.4 Non-Recursively Enumerable Language
Theorem 64 Not all languages are recursively enumerable.
Proof.
Let L
N
be the set of strings w that are encoded TMs for which w is not
accepted by its own TM.
For example, the string 110200301%11R11bbR12aaR21bbR23aaR is
in L
N
since it was not accepted by its own TM.
We will prove by contradiction that L
N
is not recursively enumerable.
Suppose that L
N
is recursively enumerable.
CHAPTER 23. TM LANGUAGES 23-4
Then there exists a TM T
N
for L
N
.
Let P be the encoded TM of TM T
N
.
There are 2 possibilities: either TM T
N
accepts P or TM T
N
doesnt
accept P.
If T
N
accepts P,
then P , L
N
since L
N
consists of strings w that are encoded TMs
such that w is not accepted by its own TM.
But this is a contradiction since the TM T
N
is only supposed to
accept those strings in L
N
.
If T
N
doesnt accept P,
then P L
N
.
But this is a contradiction since T
N
should accept P since P L
N
.
Therefore, L
N
is not recursively enumerable.
23.5 Universal Turing Machine
Denition: A universal Turing machine (UTM) is a TM that can be fed as
input a string composed of 2 parts:
1. The rst is any encoded TM P, followed by a marker, say $.
2. The second part is a string w called the data.
The UTM reads the input, and then simulates P with input w.
Theorem 65 UTMs exist.
Remarks about UTMs
CHAPTER 23. TM LANGUAGES 23-5
The reason that UTMs are important is that they allow one to write
programs; i.e., UTMs are programmable, just like real computers.
We dont have to build a new Turing machine for each problem.
For a proof of Theorem 65, see pp. 554557 of Cohen.
23.6 Halting Problem
Theorem 69 There is no TM that can accept any encoded TM P and any
input string w for P and always decide correctly whether P halts on w; i.e.,
the halting problem cannot be decided by a TM.
Basic Idea:
Dene halting function H(P, w), where
P is encoding of program (i.e., encoded Turing machine)
w is intended input for P.
Let H(P, w) = yes if P halts on input w.
Let H(P, w) = no if P does not halt on input w.
Assume that a program computing H(P, w) exists.
Construct a program Q(P) with input P:
1. x = H(P, P)
2. While x = yes, goto step 2.
Now run program Q with input P = Q.
Suppose Q(Q) halts. Then H(Q, Q) = yes, but Q is stuck in innite
loop and so it doesnt halt.
Suppose that Q(Q) doesnt halt. Then H(Q, Q) = no, while in fact
Q(Q) halts.
Therefore H(P, w) cannot exist.
CHAPTER 23. TM LANGUAGES 23-6
Proof. (of Theorem 69)
Suppose there is a TM, call it H, to solve the halting problem; i.e., H
works as follows:
Recall that all TMs take a TAPE loaded with an input string.
Our TM H takes as its input an encoded TM P and an input string
w to be used with P.
So we have to specify how P and w can be specied as an input
string to H.
We do this by taking P and rst concatenating it with a special
character, say #, and then concatenating this with the input string
w. We use the # to mark the end of the encoded TM and the
beginning of the input string.
Thus, we now have a single long string P#w.
If we feed the string P#w into H, then
if P halts on w, then H prints yes somewhere on the TAPE.
if P does not halt on w, then H prints no somewhere on the
TAPE.
See p. 449 of the textbook to see how to print characters on
the TAPE.
Now suppose that we create another encoded TM Q that takes an en-
coded TM P as input and uses H as a subroutine as follows:
Since P is the input, the TAPE initially contains P.
First modify the TAPE so that it now contains P#P. (See p. 449
of the textbook to see how this can be done.)
Then run H using input P#P.
if TM H prints yes on input P#P, then loop forever;
if TM H prints no on input P#P, then halt.
Now run Q with input P = Q.
Suppose Q halts on input Q.
CHAPTER 23. TM LANGUAGES 23-7
This means that H prints no on input Q#Q.
But this means that the encoded TM Q does not halt on input Q,
which is a contradiction.
Suppose Q does not halt on input Q.
This means that H prints yes on input Q#Q.
But this means that the encoded TM Q halts on input Q, which
again is a contradiction.
Therefore, H cannot exist.
23.7 Does TM Accept ?
Theorem 70 There is no TM that can decide for every encoded TM T whether
or not T accepts the word ; i.e., the blank-tape problem for TMs is undecid-
able.
Proof.
We will prove this by contradiction.
Suppose that there is a TM, call it B, that can decide for every encoded
TM T whether or not T accepts the word ; i.e., whether T halts when
it starts with a blank tape.
Function B(T), where T is encoding of program (i.e., encoded Tur-
ing machine)
B(T) = yes if T halts on input .
B(T) = no if T does not halt on input .
Dene a new program M(P, w), with input P and w, where P is any
encoded Turing machine and w is any input string:
CHAPTER 23. TM LANGUAGES 23-8
First construct new program P
w
that starts with blank input tape
and works as follows:
First P
w
writes w on the input tape.
Then P
w
positions the tape head back to the beginning of the
tape.
Finally P
w
simulates program P with w on the input tape.
Call B(P
w
), and return M(P, w) = B(P
w
).
Since we started P
w
with a blank tape, we can apply program B to P
w
to see if it halts.
Clearly, P
w
will halt on a blank tape if and only if P halts on w.
If P
w
halts on blank tape (i.e., if B(P
w
) = yes), then P halts on w.
If P
w
does not halt on blank tape (i.e., if B(P
w
) = no), then P does
not halt on w.
Note that M(P, w) solves the halting problem.
But the halting problem is undecidable, and so B cannot exist.
23.8 Does TM Accept Any Words?
Theorem 71 There is no TM that can decide for every encoded TM T whether
or not T accepts any words at all; i.e., the emptiness problem for TMs is un-
decidable.
Proof.
We will prove this by contradiction.
Suppose that there is a TM, call it N, that can decide for every encoded
TM T whether or not N accepts any words at all; i.e., whether the
language L of T is L ,= .
CHAPTER 23. TM LANGUAGES 23-9
Function N(T), where T is encoding of program (i.e., encoded Tur-
ing machine)
N(T) = yes if T accepts language L ,= .
N(T) = no if T accepts language L = .
Dene a new program E(P), with input P, which is any encoded Turing
machine:
First construct new program P
)
Suppose E(P) = yes. Then
N(P
) = yes.
This implies P
) = no.
This implies P
is .
But since P
intersection S
1
S
2
= w : w S
1
and w S
2
product S
1
S
2
= w = w
1
w
2
: w
1
S
1
, w
2
S
2
subtraction S
1
S
2
= w : w S
1
, w , S
2
= w = w
1
w
2
w
n
: n 0, w
i
S i =
1, 2, . . . , n
complement S
= w
: w , S
S
= S
2. Regular expressions
3. FA = (K, , , s, F), where
K is nite set of states
is the alphabet
24-1
CHAPTER 24. REVIEW 24-2
: K K is the transition function
s is the initial state
F is the set of nal states.
4. TG = (K, , , S, F),
where
K is nite set of states
is the alphabet
K K is the transition relation
S is the set of initial states
F is the set of nal states.
5. Kleenes Theorem
Any language that can be dened by a
regular expression
FA
TG
can be dened by all three methods.
Given FA for L
1
and L
2
, can construct FA for
L
1
+L
2
L
1
L
2
L
1
Algorithm for generating regular expression from FA.
Nondeterminism
6. FA with output
Moore machine
Mealy machine
These are equivalent.
7. Regular languages
If L
1
and L
2
are RL, then so are
L
1
+L
2
CHAPTER 24. REVIEW 24-3
L
1
L
2
L
1
L
1
L
1
L
2
8. Nonregular languages
Pumping lemma.
9. Decidability
Can tell if two FAs, FA
1
and FA
2
, generate the same language by
checking if either of the following accepts any words:
FA
1
FA
2
FA
1
FA
2
There are eective procedures to decide if
an FA accepts a nite or innite language
a regular expression generates an innite language
an FA has language
a regular expression generates language
10. CFG
CFG G = (, , R, S), where
is the nite set of terminals, i.e., the alphabet
is nite set of nonterminals
R (+)
.
If a CFG is a regular grammar, then the CFL is a regular language.
(Can do this by converting regular grammar into TG.)
Chomsky Normal Form: A CFG G is in CNF if every production
N U in G has U + .
12. PDA
Every regular language L is accepted by some PDA.
13. CFG = PDA
14. CFLs
If L
1
and L
2
are CFLs, then so are
L
1
+L
2
L
1
L
2
L
1
However, CFLs are not closed under intersection or complements; i.e.,
there are examples of CFLs L
1
and L
2
such that L
1
L
2
is not context-
free, and there are examples of CFLs L such that L
is not context-free.
15. Decidability for CFLs
Membership is decidable for CFLs; i.e., for any CFG G and string
w, can decide if G generates w (using CYK algorithm).
16. Turing Machines
Following problems are undecidable:
Halting problem
Whether arbitrary TM halts on a blank tape
Whether arbitrary TM accepts any words
Whether arbitrary TM accepts nite or innite language.