Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
49 views

Machines and Their Languages (G51MAL) Lecture Notes Spring 2003

This document outlines the contents and topics to be covered in a course on machines and their languages. It will introduce mathematical models of computation like finite automata, pushdown automata, and Turing machines. It will cover how to specify formal languages using regular expressions, context-free grammars, and context-sensitive grammars. The relationship between these machines and languages will also be examined. The course will have sections on finite automata, regular expressions, context-free grammars, pushdown automata, parsing, Turing machines, and the Chomsky hierarchy.

Uploaded by

jiaren
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Machines and Their Languages (G51MAL) Lecture Notes Spring 2003

This document outlines the contents and topics to be covered in a course on machines and their languages. It will introduce mathematical models of computation like finite automata, pushdown automata, and Turing machines. It will cover how to specify formal languages using regular expressions, context-free grammars, and context-sensitive grammars. The relationship between these machines and languages will also be examined. The course will have sections on finite automata, regular expressions, context-free grammars, pushdown automata, parsing, Turing machines, and the Chomsky hierarchy.

Uploaded by

jiaren
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Machines and their languages (G51MAL)

Lecture notes
Spring 2003
Thorsten Altenkirch
April 23, 2004

Contents
1 Introduction
1.1 Examples on syntax . . . . . .
1.2 What is this course about? . .
1.3 Applications . . . . . . . . . . .
1.4 History . . . . . . . . . . . . .
1.4.1 The Chomsky Hierarchy
1.4.2 Turing machines . . . .
1.5 Languages . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

3
3
4
4
5
5
6
6

2 Finite Automata
2.1 Deterministic finite automata . . . . . . .
2.1.1 What is a DFA? . . . . . . . . . .
2.1.2 The language of a DFA . . . . . .
2.2 Nondeterministic Finite Automata . . . .
2.2.1 What is an NFA? . . . . . . . . . .
2.2.2 The language accepted by an NFA
2.2.3 The subset construction . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

8
8
8
9
10
10
10
12

3 Regular expressions
3.1 What are regular expressions? . . . . . .
3.2 The meaning of regular expressions . . .
3.3 Translating regular expressions to NFAs
3.4 Summing up . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

15
15
16
18
25

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

6 Pushdown Automata
6.1 What is a Pushdown Automaton? . . . . . . . .
6.2 How does a PDA work? . . . . . . . . . . . . . .
6.3 The language of a PDA . . . . . . . . . . . . . .
6.4 Deterministic PDAs . . . . . . . . . . . . . . . .
6.5 Context free grammars and push-down-automata

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

35
35
36
37
38
39

7 How to implement a recursive descent


7.1 What is a LL(1) grammar ? . . . . . .
7.2 How to calculate First and Follow . .
7.3 Constructing an LL(1) grammar . . .
7.4 How to implement the parser . . . . .
7.5 Beyond LL(1) - use LR(1) generators .

parser
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

41
41
41
42
43
46

8 Turing machines and the rest


8.1 What is a Turing machine? . . . .
8.2 Grammars and context-sensitivity
8.3 The halting problem . . . . . . . .
8.4 Back to Chomsky . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

47
47
50
51
52

4 Showing that a language is not regular


27
4.1 The pumping lemma . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Applying the pumping lemma . . . . . . . . . . . . . . . . . . . . 28
5 Context free grammars
5.1 What is a context-free grammar?
5.2 The language of a grammar . . .
5.3 More examples . . . . . . . . . .
5.4 Parse trees . . . . . . . . . . . .
5.5 Ambiguity . . . . . . . . . . . . .
1

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

30
30
30
31
33
34

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Introduction

Most references refer to the course text [HMU01]. Please, note that the 2nd
edition is quite different to the first one which appeared in 1979 and is a classical
reference on the subject.
I have also been using [Sch99] for those notes, but note that this book is written
in german.
The online version of this text contains some hyperlinks to webpages which
contain additional information.

I hope that you are able to spot all the errors in the first program. It may be
actually suprising but the 2nd (strange looking) program is actually correct.
How do we know whether a program is syntactically correct? We would hope
that this doesnt depend on the compiler we are using.

1.2

What is this course about?

1. Mathematical models of computation, such as:

1.1

Examples on syntax

Finite automata,

In PR1 and PR2 you are learning the language JAVA. Which of the following
programs are syntactically correct, i.e. will be accepted by the JAVA compiler
without error messages?

Pushdown automata,
Turing machines
2. How to specify formal languages?

Hello-World.java

Regular expressions
Context free grammars

public class Hello-World {


public static void main(String argc[3]) {
System:out.println(Hello World);
}
}

Context sensitive grammars


3. The relation between 1. and 2.

1.3

Applications

Regular expressions
Regular expressions are a convenient ways to express patterns (i.e. for
search). There are a number of tools which use regular expressions:

A.java

grep pattern matching program (UNIX)

class A {
class B {
void C () {
{} ; {{}}
}
}

sed stream editor (UNIX)


lex A generator for lexical analyzers (UNIX)
Grammars for programming languages.
The appendix of [GJSB00] contains a context free grammar specifying the
syntax of JAVA.

YACC is a tool which can be used to generate a C program (a parser)


from a grammar. Parser generators now also exist for other languages,
like Java CUP for Java.
Specifying protocols
See section 2.1 (pp 38 - 45) contains a simple example of how a protocol
for electronic cash can be specified and analyzed using finite automata.

1.4
1.4.1

History
The Chomsky Hierarchy
1.4.2

Turing machines

All languages
Type 0 or recursively enumerable languages
Decidable languages
Turing machines
Type 1 or context sensitive
languages
Type 2 or context free
languages
pushdown automata
Type 3 or
regular languages
finite automata

Noam Chomsky introduced the Chomsky hierarchy which classifies grammars


and languages. This hierarchy can be amended by different types of machines
(or automata) which can recognize the appropriate class of languages.
Chomsky is also well known for his unusual views on society and politics.

Alan Turing (1912-1954) introduced an abstract model of computation, which


we call Turing machines, to give a precise definition which problems can be
solved by a computer. All the machines we are introducing can be viewed as
restricted versions of Turing machines.
I recommend Andrew Hodges biography Alan Turing: the Enigma.

1.5

Languages

In this course we will use the terms language and word different than in everyday
language:
A language is a set of words.
A word is a sequence of symbols.
This leaves us with the question: what is a symbol? The answer is: anything,
but it has to come from an alphabet which is a finite set. A common (and
important) instance is = {0, 1}.
More mathematically we say: Given an alphabet we define the set as set
of words (or sequences) over : the empty word  and given a symbol
x and a word w we can form a new word xw . These are all the
ways elements on can be constructed (this is called an inductive definition).
E.g. in the example = {0, 1}, typical elements of are 0010, 00000000,.
Note, that we only write  if it apperas on its own, instead of 00 we just write
0. It is also important to realzie that although there are infinite many words,
each word has a finite length.
An important operation on is concatenation. Confusingly, this is denoted
by an invisible operator: given w, v we can construct a new word wv
simply by concatenating the two words. We can define this operation by

primitive recursion:
v
(xw)v

=
=

v
x(wv)

A language L is a set of words, hence L or equivalently L P(). Here


are some informal examples of languages:
The set {0010, 00000000, } is a language over = {0, 1}. This is an
example of a finite language.
The set of words with odd length over = {1}.
The set of words which contain the same number of 0s and 1s is a language
over = {0, 1}.
The set of words which contain the same number of 0s and 1s modulo 2
(i.e. both are even or odd) is a language over = {0, 1}.
The set of palindroms using the english alphabet, e.g. words which read
the same forwards and backwards like abba. This is a language over
{a, b, . . . , z}.
The set of correct Java programs. This is a language over the set of
UNICODE characters (which correspond to numbers between 0 and 216
1).
The set of programs, which if executed on a Windows machine, will print
the text Hello World! in a window. This is a language over = {0, 1}.

Finite Automata

Finite automata correspond to a computer with a fixed finite amount of memory.


We will introduce deterministic finite automata (DFA) first and then move
to nondeterministic finite automata (NFA). An automaton will accept certain
words (sequences of symbols of a given alphabet ) and reject others. The set
of accepted words is called the language of the automaton. We will show that
the class of languages which are accepted by DFAs and NFAs is the same.

2.1
2.1.1

Deterministic finite automata


What is a DFA?

A deterministic finite automaton (DFA) A = (Q, , , q0 , F ) is given by:


1. A finite set of states Q,
2. A finite set of input symbols ,
3. A transition function Q Q,
4. An initial state q0 Q,
5. A set of accepting states F Q.
As an example consider the following automaton
D = ({q0 , q1 , q2 }, {0, 1}, D , q0 , {q2 })
where
D = {((q0 , 0), q1 ), ((q0 , 1), q0 ), ((q1 , 0), q1 ), ((q1 , 1), q2 ), ((q2 , 0), q2 ), ((q2 , 1), q2 )}
The DFA may be more conveniently represented by a transition table:
0
q1
q1
q2

q0
q1
q2

1
q0
q2
q2

The table represents the function , i.e. to find the value of (q, x) we have to
look at the row labelled q and the column labelled x. The initial state is marked
by an and all final states are marked by .
Yet another, optically more inspiring, alternative are transition diagrams:
1
q0

0
0

q1

q2

0,1

There is an arrow into the initial state and all final states are marked by double
rings. If (q, x) = q 0 then there is an arrow from state q to q 0 which is labelled
x.
We write for the set of words (i.e. sequences) over the alphabet . This
includes the empty word which is written . I.e.
{0, 1} = {, 0, 1, 00, 01, 10, 11, 000, . . . }
8

2.1.2

The language of a DFA

To each DFA A we associate a language L(A) . To see whether a word


w L(A) we put a marker in the initial state and when reading a symbol
forward the marker along the edge marked with this symbol. When we are in
an accepting state at the end of the word then w L(A), otherwise w
/ L(A).
In the example above we have that 0
/ L(D),101 L(D) and 110
/ L(D).
Indeed, we have
L(D) = {w | w contains the substring 01}
To be more precise we give a formal definition of L(A). First we define the
w) = q 0 if
extended transition function Q Q. Intuitively, (q,
starting from state q we end up in state q 0 when reading the word w. Formally,
is defined by primitive recursion:
)
(q,
xw)
(q,

((q,
x), w)

=
=
=
=
=
=
=

D (D (q0 , 1), 01)


D (q0 , 01)
D (D (q0 , 0), 1)
D (q1 , 1)
D (D (q1 , 1), )
D (q2 , )
q2

by (2)
because D (q0 , 1) = q0
by (2)
because D (q0 , 0) = q1
by (2)
because D (q1 , 1) = q2
by (1)

2.2.1

Nondeterministic Finite Automata


What is an NFA?

Nondeterministic finite automata (NFA) have transition functions which may


assign several or no states to a given state and an input symbol. They accept
a word if there is any possible transition from the one of initial states to one
of the final states. It is important to note that although NFAs have a nondetermistic transition function, it can always be determined whether or not a
word belongs to its language (w L(A)). Indeed, we shall see that every NFA
can be translated into an DFA which accepts the same language.
Here is an example of an NFA C which accepts all words over = {0, 1} s.t.
the symbol before the last is 1.

(1)

0,1

(2)

Here xw stands for a non empty word whose first symbol is x and the rest is w.
E.g. if we are told that xw = 010 then this entails that x = 0 and w = 10. w
may be empty, i.e. xw = 0 entails x = 0 and w = .
As an example we calculate D (q0 , 101) = q1 :
D (q0 , 101)

2.2

q0

q1

0,1

q2

A nondeterministic finite automaton (NFA) A = (Q, , , q0 , F ) is given by:


1. A finite set of states Q,
2. A finite set of input symbols ,
3. A transition function Q P(Q),
4. A set of initial state S Q,
5. A set of accepting states F Q.
The differences to DFAs are to have start states instead of a single one and the
type of the transition function. As an example we have that
C = ({q0 , q1 , q2 }, {0, 1}, C , {q0 }, {q2})

Using we may now define formally:


0 , w) F }
L(A) = {w | (q

where C so given by
C
q0
q1
q2

Hence we have that 101 L(D) because D (q0 , 101) = q2 and q2 FD .

0
1
{q0 } {q0 , q1 }
{q2 }
{q2 }
{}
{}

Note that we diverge he slightly from the definition in the book, which uses a
single initial state instead of a set of initial states. Doing so means that we can
avoid introducing -NFAs (see [HMU01], section 2.5).
2.2.2

The language accepted by an NFA

To see whether a word w is accepted by an NFA (w L(A)) we may have


to use several markers. Initially we put one marker on the initial state. Then
each time when we read a symbol we look at all the markers: we remove the
old marker and put markers at all the states which are reachable via an arrow
marked with the current input symbol (this may include the state which was
10

marked in the previously). Thus we may have to use several marker but it may
also happen that all markers disappear (if no appropriate arrows exist). In this
case the word is not accepted. If at the end of the word any of the final states
has a marker on it then the word is accepted.
E.g. consider the word 100 (which is not accepted by C). Initially we have

w) is the set
We define P(Q) P(Q) with the intention that (S,
of states which is marked after having read w starting with the initial markers
given by S.
)
(S,
xw)
(S,

0,1
q0

q1

0,1

q2

After reading 1 we have to use two markers because there are two arrows from
q0 which are labelled 1:

0,1
q0

q1

0,1

q2

Now after reading 0 the automaton has still got two markers, one of them in an
accepting state:

0,1
q0

q1

0,1

q2

=
=

S
[
{(q, x) | q S}, w)
(

(3)
(4)

As an example we calculate C (q0 , 100) which is {q0 } as we already know from


playing with markers.
S
C ({q0 }, 100) = C ( {C (q, 1) | q {q0 }}, 00)
by (4)

= C (C (q0 , 1), 00)

= C ({q0 , q1 }, 00)
S
= C ( {C (q, 0) | q {q0 , q1 }}, 0) by (4)
= C (C (q0 , 0) C (q1 , 0), 0)
= C ({q0 } {q2 }, 0)
= C ({q0 , q2 }, 0)
S
= C ( {C (q, 0) | q {q0 , q2 }}, ) by (4)
= C (C (q0 , 0) C (q2 , 0), 0)
= C ({q0 } {}, )
= {q0 }
by (3)
Using the extended transition function we define the language of an NFA as
w) F 6= {}}
L(A) = {w | (S,

However, after reading the 2nd 0 the second marker disappears because there
is no edge leaving q2 and we have:

This shows that 100


/ L(C) because
0 }, 100) = {q0 } {q2 } = {}
({q

0,1
q0

q1

0,1

q2

which is not accepting because no marker is in the accepting state.


To specify the extended transition function for
S NFAs we use an generalisation
of the union operation on sets. We define
to be the union of a (finite) set
of sets:
[
{A1 , A2 , . . . An } = A1 A2 An
In the special cases of the empty sets of sets and a one element set of sets we
define:
[
[
{} = {}
{A} = A
As an example
[
{{1}, {2, 3}, {1, 3}} = {1} {2, 3} {1, 3} = {1, 2, 3}

S
Actually, we may define by comprehension, which also extends the operation
to infinite sets of sets (although we dont need this here)
[
B = {x | A B.x A}
11

2.2.3

The subset construction

DFAs can be viewed as a special case of NFAs, i.e. those for which the the there
is precisely one start state S = {q0 } and the transition function returns always
one-element sets (i.e. (q, x) = {q 0 } for all q Q and x ).
Below we show that for every NFA we can construct a DFA which accepts the
same language. This shows that NFAs arent more powerful as DFAs. However,
in some cases NFAs need a lot fewer states than the corresponding DFA and
they are easier to construct.
Given an NFA A = (Q, , , S, F ) we construct the DFA
D(A) = (P(Q), , D(A) , S, FD(A) )
where
D(A) (S, x) =

{(q, x) | q S}

FD(A) = {S QN | S F 6= {}}
The basic idea of this construction (the subset construction) is to define a DFA
whose states are sets of states of the NFA. A final state of the DFA is a set
12

which contains at least a final state of the NFA. The transitions just follow the
active set of markers, i.e. a state S P(QN ) corresponds to having markers on
all q S and when we follow the arrow labelled x we get the set of states which
are marked after reading x.
As an example let us consider the NFA C above. We construct a DFA D(C)
D(C) = (P({q0 , q1 , q2 }, {0, 1}, D(C) , {q0 }, FD(C) )
with D(C) given by
D(C)
{}
{q0 }
{q1 }
{q2 }
{q0 , q1 }
{q0 , q2 }
{q1 , q2 }
{q0 , q1 , q2 }

0
1
{}
{}
{q0 }
{q0 , q1 }
{q2 }
{q2 }
{}
{}
{q0 , q2 } {q0 , q1 , q2 }
{q0 }
{q0 , q1 }
{q2 }
{q2 }
{q0 , q2 } {q0 , q1 , q2 }

Lemma 2.1

D(A) (S, w) = A (S, w)

The result of both functions are sets of states of the NFA A: for the left hand
side because the extended transition function on NFAs returns sets of states and
for the right hand side because the states of D(A) are sets of states of A.
Proof: We show this by induction over the length of the word w, lets write |w|
for the length of a word.
|w| = 0 Then w =  and we have
D(A) (S, )

=
=

S
A (S, )

by (1)
by (3)

|w| = n + 1 Then w = xv with |v| = n.


D(A) (S, xv)

and FD(C) is the set of all the states marked with above,i.e.

=
=
=
=

D(A) (D(A) (S, x), v)


A (D(A) (S, x), v)
S
A ( {A (q, x) | q S}, v)
A (S, xv)

by (2)
ind.hyp.
by (4)

FD(C) = {{q2 }, {q0 , q2 }, {q1 , q2 }, {q0 , q1 , q2 }}




Looking at the transition diagram:

We can now use the lemma to show


Theorem 2.2
0

L(A) = L(D(A))

0
{q0}
1

{q0,q1}

Proof:

{q0,q2}

0
1

{q1}

{q0,q1,q2}

0,1
{q1,q2}

{q2}

{}

0,1

0,1

0,1

we note that some of the states ({}, {q1 }, {q2 }, {q1 , q2 }) cannot be reached from
the initial state, which means that they can be omitted without changing the
language. Hence we obtain the following automaton:

0
1


Corollary 2.3 NFAs and DFAs recognize the same class of languages.
Proof: We have noticed that DFAs are just a special case of NFAs. On the
other hand the subset construction introduced above shows that for every NFA
we can find a DFA which recognizes the same language.


0
{q0}

w L(A)
Definition of L(A) for NFAs

A (S, w) FA 6= {}

Lemma 2.1

D (A)(S, w) FA 6= {}

Definition of FD (A)
D (A)(S, w) FD(A)

Definition of L(A) for DFAs


w LD(A)

{q0,q1}

{q0,q2}

0
1

0
{q0,q1,q2}

We still have to convince ourselves that the DFA D(A) accepts the same language as the NFA A, i.e. we have to show that L(A) = L(D(A)).
As a lemma we show that the extended transition functions coincide:
13

14

Regular expressions

Given an alphabet a language is a set of words L . So far we were able to


describe languages either by using set theory (i.e. enumeration or comprehension) or by an automaton. In this section we shall introduce regular expressions
as an elegant and concise way to describe languages. We shall see that the
languages definable by regular expressions are precisely the same as those accepted by deterministic or nondeterministic finite automata. These languages
are called regular languages or (according to the Chomsky hierarchy) Type 3
languages.
As already mentioned in the introduction regular expressions are used to define
patterns in programs such as grep. grep gets as an argument a regular expression and then filters out all those lines from a file which match the regular
expression, where matching means that the line contains a substring which is
in the language assigned to the regular expression. It is interesting to note that
even in the case when we search for a specific word (this is a special case of a
regular expresion) programs like grep are more efficient than a naive implementation of word search.
To find out more about grep have a look at the UNIX manual page and play
around with grep. Note that the syntax grep uses is slightly different from
the one we use here. grep also use some convenient shorthands which are not
relevant for a theoretical analysis of regular expressions because they do not
extend the class of languages.

3.1

h(a + e)llo
a b
( + b)(ab) ( + a)
As in arithmetic they are some conventions how to read regular expressions:
binds stronger then sequence and +. E.g. we read ab as a(b ). We
have to use parentheses to enforce the other reading (ab) .
Sequencing binds stronger than +. E.g. we read ab + cd as (ab) + (bc).
To enforce another reading we have to use parentheses as in a(b + c)d.

3.2

The meaning of regular expressions

We now know what regular expressions are but what do they mean?
For this purpose, we shall first define an operation on languages called the Kleene
star. Given a language L we define
L = {w0 w1 . . . wn1 | n N i < n.wi L}
Intuitively, L contains all the words which can be formed by concatenating an
arbitrary number of words in L. This includes the empty word since the number
may be 0.
As an example consider L = {a, ab} {a, b} :

What are regular expressions?

We assume as given an alphabet (e.g. = {a, b, c, . . . , z}) and define the


syntax of regular expressions (over )
1. is a regular expression.

L = {, a, ab, aab, aba, aaab, aaba, . . . }


You should notice that we use the same symbol as in but there is a subtle
difference: is a set of symbols but L is a set of words.
Alternatively (and more abstractly) one may describe L as the least language
(wrt ) which contains L and the empty word and is closed under concatenation:

2.  is a regular expression.

w L v L = wv L

3. For each x , x is a regular expression. E.g. in the example all small


letters are regular expression. We use boldface to emphasize the difference
between the symbol a and the regular expression a.
4. If E and F are regular expressions then E + F is a regular expression.

We now define the semantics of regular expressions: To each regular expression


E over we assign a language L(E) . We do this by induction over the
definition of the syntax :
1. L() =

5. If E and F are regular expressions then EF (i.e. just one after the other)
is a regular expression.

2. L() = {}

6. If E is a regular expression then E is a regular expression.

3. L(x) = {x}
where x .

7. If E is a regular expression then (E) is a regular expression.

4. L(E + F ) = L(E) L(F )

These are all regular expressions.


Here are some examples for regular expressions:


5. L(EF ) = {wv | w L(E) v L(F )}


6. L(E ) = L(E)
7. L((E)) = L(E)

hallo
16

hallo + hello
15

Subtle points: in 1. the symbol may be used as a regular expression (as in


L()) or the empty set ( = {}). Similarily,  in 2. may be a regular expression
or a word, in 6. may be used to construct regular expressions or it is an
operation on languages. Which alternative we mean becomes only clear from
the context, there is no generally agreed mathematical notation 1 to make this
difference explicit.
Let us now calculate what the examples of regular expressions from the previous
section mean, i.e. what are the langauges they define:

a b
Let us introduce the following notation:
wi = |ww{z
. . . w}
i times

Now using 6 we know that

L(a ) = {w0 w1 . . . wn1 | n N i < n.wi L(a)}


= {w0 w1 . . . wn1 | n N i < n.wi {a}}
= {an | n N}

L() = {}
By 2.
hallo

and hence using 5 we conclude

Lets just look at L(ha). We know from 3:

L(a b ) = {uv | u L(a ) v L(b )}


= {uv | u {an | n N} v {am | m N}}
= {an bm | m, n N}

L(h) = {h}
L(a) = {a}

I.e. L(a b ) is the set of all words which start with a (possibly empty)
sequence of as followed by a (possibly empty) sequence of bs.

Hence by 5:
L(ha) = {wv | w {h} v {a}}
= {ha}

( + b)(ab) ( + a)
Lets analyze the parts:

Continuing the same reasoning we obtain:


L( + b) = {, b}

L(hallo) = {hallo}

L((ab) ) = {abi | i N}
L( + b) = {, b}

hallo + hello
From the previous point we know that:

Hence, we have

L(hallo) = {hallo}
L(hello) = {hello}

L(( + b)(ab) ( + a)) = {u(ab)i v | u {, b} i N v {, b}


In english: L(( + b)(ab) ( + a)) is the set of (possibly empty) sequences
of interchanging as and bs.

Hence by using 4 we get:


L(hallo + hello) = {hallo} {hello}}
= {hallo, hello}
h(a + e)llo
Using 3 and 4 we know
L(a + e) = {a, e}

3.3

Translating regular expressions to NFAs

Theorem 3.1 For each regular expression E we can construct ab NFA N (E)
s.t. L(N (E)) = L(E), i.e. the automaton accepts the language described by the
regular expression.
Proof:
We do this again by induction on the syntax of regular expressions:
1. N ():

Hence using 5 we obtain:


L(h(a + e)llo) = {uvw | u L(h) v L(a + e) w L(llo)}
= {uvw | u {h} v {a, e} w {(llo}}
= {hallo, hello}
1 This is different in programming, e.g. in JAVA we use ". . . " to signal that we mean things
literally.

17

18

N(0)
which will reject everything (it has got no final states) and hence

N(x)
This automaton only accepts the word x, hence:
L(N (x)) = {x}
= L(x)

L(N ()) =
= L()

4. N (E + F ):
We merge the diagrams for N (E) and N (F ) into one:

2. N ():

N(E)

N(F)

N(E+F)

N( )

I.e. given
N (E) = (QE , , E , SE , FE )
N (F ) = (QF , , F , SF , FF )

This automaton accepts the empty word but rejects everything else, hence:

Now we use the disjoint union operation on sets (see the MCS lecture

L(N ()) = {}


= L()
3. N (x):
20

19

notes [Alt01], section 4.1)


QE+F = QE + QF
E+F ((0, q), x) = {(0, q 0 ) | q 0 E (q, x)}
E+F ((1, q)), x = {(1, q 0 ) | q 0 F (q, x)}
SE+F = SE + SF
FE+F = FE + FF
N (E + F ) = (QE+F , , E+F , SE+F , FE+F )
The disjoint union just signals that we are not going to identify states,
even if they accidently happen to have the same name.
Just thinking of the game with markers you should be able to convince
yourself that
L(N (E + F )) = L(N (E)) L(N (F ))

In this diagram I only depicted one initial and one final state of each of
the automata although they may be several of them.
Here is how we construct N (EF ) from N (E) and N (F ):
N (E) = (QE , , E , SE , FE )
N (F ) = (QF , , F , SF , FF )
The states of N (EF ) are the disjoint union of the states of N (E)
and N (F ):
QEF = QE + QF
The transition function of N (EF ) contains all the transitions of N (E)
and N (F ) (as for N (E + F )) and for each state q of N (E) which has
a transition to a final state of N (E) we add a transition with the
same label to all the initial states of N (F ).

Moreover to show that


EF ((0, q), x) = {(0, q 0 ) | q 0 E (q, x)}
{(1, q 00 ) | q 0 .q 0 E (q, x) q 00 SE }
EF ((1, q)) = {(1, q 0 ) | q 0 F (q))}

L(N (E + F )) = L(E + F )
we are allowed to assume that
L(N (E)) = L(E)
L(N (F )) = L(F )

The initial states of N (EF ) are the initial states of N (E), and the
initial states of N (F ) if there is an initial state of N (E) which is also
a final state.

thats what is meant by induction over the syntax of regular expressions.

SEF = {(0, q) | q SE } {(1, q) | q SF SE FE 6= }

Now putting everything together:


The final states of N (EF ) are the final states of N (F ).

L(N (E + F )) = L(N (E)) L(N (F ))


= L(E) L(F )
= L(E + F )

FEF = {(1, q) | q FF }
We now set
N (EF ) = (QEF , , EF , SEF , ZEF )

5. N (EF ):
We want to put the two automata N (E) and N (F ) in series. We do this
by connecting the final states of N (E) with the initial states of N (F ) in
a way explained below.

N(E)

N(F)

I hope that you are able to convince yourself that


L(N (EF )) = {uv | u L(N (E)) v L(N (F ))
and hence we can reason
L(N (EF )) = {uv | u L(N (E)) v L(N (F ))
= L(EF )

= {uv | u L(E) v L(F )

6. N (E ):
We construct N (E ) from N (E) by merging initial and final states of
N (E) in a way similar to the previous construction and we add a new
state which is initial and final.

22

N(EF)

21

since we can run through the automaton an arbitrary number of times.


The new state allows us also to accept the empty sequence. Hence:
L(N (E )) = {w0 w1 . . . wn1 | n N i < n.wi L(N (E))}
= L(N (E))
= L(E)
= L(E )
7. N ((E)) = N (E)

N(E)

I.e. using brackets does not change anything.



As an example we construct N (a b ). First we construct N (a):

N(E*)
Given

a
N (E) = (QE , , E , SE , FE )

we construct N (E ).
We add one extra state :

N(a)

QE = QE + {}
NE inherits all transitions form NE and for each state which has
an arrow to the final state labelled x we also add an arrow to all the
initial states labelled x.

Now we have to apply the -construction and we obtain:

E ((0, q), x) ={(0, q 0 ) | q 0 E (q, x)}


{(0, q 0 ) | E (q, x) FE 6= q 0 SE }
The initial states of N (E ) are the initial states of N (E) and :

SE = {(0, q) | q SE } {(1, )}
The final states of NE are the final states of NE and :

FE = {(0, q) | q FE } {(1, )}
We define
N (E ) = (QE , , E , SE , FE )

N(a*)

We claim that
L(N (E )) = {w0 w1 . . . wn1 | n N i < n.wi L(N (E))}

N (b ) is just the same and we get


24

23

1. L is given by a regular expression.

b
a

2. L is the language accepted by an NFA.


3. L is the language acceped by a DFA.

N(a*)

N(b*)

and now we have to serialize the two automata and we get:

Proof: We have that 1. = 2 by theorem 3.1. We know that 2. = 3. by2.2


and 3. = 1. by 3.2.

As indicated in the introduction: the languages which are characterized by
any of the three equivalent conditions are called regular languages or type-3languages.

a
a

b
a
N(a*b*)

Now, you may observe that this automaton, though correct, is unnecessary
complicated, since we could have just used

However, we shall not be concerned with minimality at the moment.

3.4

Summing up . . .

From the previous section we know that a language given by regular expression is
also recognized by a NFA. What about the other way: Can a language recognized
by a finite automaton (DFA or NFA) also be described by a regular expression?
The answer is yes:
Theorem 3.2 (Theorem 3.4, page 91) Given a DFA A there is a regular
expression R(A) which recognizes the same language L(A) = L(R(A)).
We omit the proof (which can be found in the [HMU01] on pp.91-93). However,
we conclude:
Corollary 3.3 Given a language L the following is equivalent:
25

26

Showing that a language is not regular

Regular languages are languages which can be recognized by a computer with


finite (i.e. fixed) memory. Such a computer corresponds to a DFA. However,
there are many languages which cannot be recognized using only finite memory,
a simple example is the language
L = {0n 1n | n N}
i.e. the language of words which start with a number of 0s followed by the same
number of 1s. Note that this is different to L(0 1 ) which is the language of
words of sequences of 0s followed by a sequence of 1s but the umber has not to
be identical (and which we know to be regular because it is given by a regular
expression).
Why can L not be recognized by a computer with fixed finite memory? Assume
we have 32 Megabytes of memory, that is we have 32102410248 = 268435456
bits. Such a computer corresponds to an enormous DFA with 2268435456 states
(imagine you have to draw the transition diagram). However, the computer can
only count until 2268435456 if we feed it any more 0s in the beginning it will get
confused! Hence, you need an unbounded amount of memory to recognize n.
We shall now show a general theorem called the pumping lemma which allows
us to prove that a certain language is not regular.

4.1

4.2

Applying the pumping lemma

Theorem 4.2 The language L = {0n 1n | n N} is not regular.


Proof: Assume L would be regular. We will show that this leads to contradiction using the pumping lemma.
Now by the pumping lemma there is an n such that we can split each word which
is longer than n such that the properties given by the pumping lemma hold.
Consider 0n 1n L, this is certainly longer than n. We have that xyz = 0n 1n
and we know that |xy| n, hence y can only contain 0s, and since y 6=  it must
contain at least one 0. Now according to the pumping lemma xy 0 z L but this
cannot be the case because it contains at least one 0 less but the same number
of 1s as 0n 1n .
Hence, our assumption that L is regular must have been wrong.

It is easy to see that the language
{1n | n is even}
is regular (just construct the appropriate DFA or use a regular expression).
However what about
{1n | n is a square}
where by saying n is a square we mean that is there is an k N s.t. n = k 2 . We
may try as we like there is no way to find out whether we have a got a square
number of 1s by only using finite memory. And indeed:

The pumping lemma

Theorem 4.1 Given a regular language L, then there is a number n N such


that all words w L which are longer than n (|w| n) can be split into three
words w = xyz s.t.
1. y 6= 
2. |xy| n

Theorem 4.3 The language L = {1n | n is a square} is not regular.


Proof: We apply the same strategy as above. Assume L is regular then there is
a number n such we can split all longer words according to the pumping lemma.
2
Lets take w = 1n this is certainly long enough. By the pumping lemma we
know that we can split w = xyz s.t. the conditions of the pumping lemma hold.
In particular we know that

3. for all k N we have xy k z L.

1 |y| |xy| n

Proof: For a regular language L there exists a DFA A s.t. L = L(A). Let us
assume that A has got n states. Now if A accepts a word w with |w| n it
must have visited a state q twice:

We choose q s.t. it is the first cycle, hence |xy| n. We also know that y is non
empty (otherwise there is no cycle).
Now, consider what happens if we feed a word of the form xy i z to the automaton,
i.e. s instead of y it contains an arbitrary number of repetitions of y, including
the case i = 0, i.e. y is just left out. The automaton has to accept all such
words and hence xy i z L

27

xyyz L
that is |xyyz| is a square. However we know that

Using the 3rd condition we know that

n2 = |w|
= |xyz|
< |xyyz|

since 1 |y| = |xyz| + |y|

n2 + n

since |y| n

< n2 + 2n + 1
= (n + 1)2
To summarize we have
n2 < |xyyz| < (n + 1)2
28

That is |xyyz| lies between two subsequent squares. But then it cannot be a
square itself, and hence we have a contradiction to xyyz L.
We conclude L is not regular.

Given a word w we write wR for the word read backwards. I.e. abcR =
bca. Formally this can be defined as
R = 
(xw)R = wR x
We use this to define the language of even length palindromes
Lpali = {wwR | w
I.e. for = {a, b} we have abba Lpali . Using the intuition that finite automata
can only use finite memory it should be clear that this language is not regular,
because one has to remember the first half of the word to check whether the
2nd half is the same word read backwards. Indeed, we can show:
Theorem 4.4 Given = {a, b} we have that Lpali is not regular.
Proof: We use the pumping lemma: We assume that Lpali is regular. Now
given a pumping number n we construct w = an bban Lpali , this word is
certainly longer than n. From the pumping lemma we know that there is a
splitting of the word w = xyz s.t. |xy| n and hence y may only contain 0s
and since y 6=  at least one. We conclude that xz Lpali where xz = am bban
where m < n. However, this word cannot be a palindrome since only the first
half contains any a s.
Hence our assumption Lpali is regular must be wrong.

The proof works for any alphabet with at least 2 different symbols. However, if
contains only one symbol as in = {1} then Lpali is the language of an even
number of 1s and this is regular Lpali = (11) .

Context free grammars

We will now introduce context free grammars (CFGs) as a formalism to define


languages. CFGs are more general than regular expressions, i.e. there are
more languages definable by CFGs (called type-2-languages). We will define the
corresponding notion of automata, the push down automata (PDA) later.

5.1

What is a context-free grammar?

A context-free grammar G = (V, , S, P ) is given by


A finite set V of variables or nonterminal symbols.
A finite set of symbols or terminal symbols. We assume that the sets
V and are disjoint.
A start symbol S V .
A finite set P V (V T ) of productions. A production (A, ), where
A V and (V T ) is a sequence of terminals and variables, is written
as A .
As an example we define a grammar for the language of arithmetical expressions
over a (using only + and ), i.e. elements of this language are a + (a a) or
(a + a) (a + a). However words like a + +a or )(a are not in the language.
We define G = ({E, T, F }, {(, ), a, +, }, E, P ) where P is given by:
P = {E T
E E+T
T F
T T F
F a
F (E)}
To save space we may combine all the rules with the same left hand side, i.e.
we write
P = {E T | E + T
T F |T F
F a | (E)

5.2

The language of a grammar

How do we check whether a word w is in the language of the grammar?


We start with the start symbol E and use one production V to replace the
nonterminal symbol by the until we have no nonterminal symbols left - this

30

29

is called a derivation. I.e. in the example G:


E G
G
G
G
G
G
G
G
G
G
G

E+T
T +T
F +T
a+T
a+F
a + (E)
a + (T )
a + (T F )
a + (F F )
a + (a F )
a + (a a)

Note that G here stands for the relation derives in one step and has nothing
to do with implication. In the example we have always replaced the leftmost
non-terminal symbol (hence it is called a leftmost derivation) but this is not
necessary.
Given any grammar G = (V, , S, P ) we define the relation derives in one step
G (V T ) (V T )
V G : V P
The relation derives is defined as2
G (V T ) (V T )
0 G n : G 1 G . . . n1 G n
this includes the case G because n can be 0.
We now say that the language of a grammar L(G) is given by all words
(over ) which can be derived in any number of steps, i.e.
L(G) = {w | S G w}
A language which can be given by a context-free grammar is called a context-free
language (CFL).

5.3

turns out to be context-free;


G = ({S}, {a, b}, S, {S  | aSa | bSb})
We also note that
Theorem 5.1 All regular languages are context-free.
We dont give a proof here - the idea is that regular expressions can be translated
into (special) context-free grammars, i.e. a b can be translated into
G = ({A, B}, {a, b}, {A aA | B, B bB | )
(Extensions of) context free grammars are used in computer linguistics to describe real languages. As an example consider
= {the, dog, cat, which, bites, barks, catches}
we use the grammar G = ({S, N, N P, V I, V T, V P }, , S, P ) where
S
N
NP
VI
VT
VP

NP V P
cat | dog
the N | N P which V P
barks | bites
bites | catches
V I | V T NP

which allows us to derive interesting sentences like


the dog which catches the cat which bites barks
An important example for context-free languages is the syntax of programming
languages. We have already mentioned appendix of [GJSB00] which uses a
formalism slightly different from the one introduced here. Another example
is the toy language used in the compilers course ([Bac02], see Mini-Triangle
Concrete and Abstract Syntax
However, note that not all syntactic aspects of programming languages are captured by the context free grammar, i.e. the fact that a variable has to be
declared before used and the type correctness of expressions are not captured.

More examples

Some of the languages which we have shown not to be regular are actually
context-free.
The language {0n 1n | n N} is given by the following grammar
G = ({S}, {0, 1}, S, {S  | 0S1})
Also the language of palindromes
{wwR | w {a, b} }
2
G

is the transitive-reflexive closure of G .

31

32

5.4

Parse trees

With each derivation we also associate a derivation tree, which shows the structure of the derivation. As an example consider the tree associated with the
derivation of a + (a a) given before:

E
T

5.5

Ambiguity

We say that a grammar G is ambiguous if there is a word w L(G) for which


there is more than one parse tree. This is usually a bad thing because it entails
that there is more than one way to interpret a word (i.e. it leads to semantical
ambiguity).
As an example consider the following alternative grammar for arithmetical expressions: We define G0 = ({E}, {(, ), a, +, }, E, P 0 ) where P 0 is given by:
P 0 = {E E + E | E E | a | (E)}

This grammar is shorter and requires only one variable instead of 4. Moreover
it generates the same language, i.e. we have L(G) = L(G0 ). But it is ambigous:
Consider a + a a we have the following parse trees:

)
E

E E

T
a a
E

T
F

F
a

Each parse tree correspond to a different way to read the expression, i.e. the
first one corresponds to (a + a) a and the second one to a + (a a). Depending
on which one is chosen an expression like 2 + 2 3 may evaluate to 12 or to 8.
Informally, we agree that binds more than + and hence the 2nd reading is the
intended one.
This is actually achieved by the first grammar which only allows the 2nd reading:
E

a
The top of the tree (called its root) is labelled with start symbol, the other nodes
are labelled with nonterminal symbols and the leaves are labelled by terminal
symbols. The word which is derived can be read off from the leaves of the tree.
The important property of a parse tree is that the ancestors of an internal node
correspond to a production in the grammar.

T
F

F
a

34

33

F
a

Pushdown Automata

We will now consider a new notion of automata Pushdown Automata (PDA).


PDAs are finite automata with a stack, i.e. a data structure which can be used
to store an arbitrary number of symbols (hence PDAs have an infinite set of
states) but which can be only accessed in a last-in-first-out (LIFO) fashion.
The languages which can be recognized by PDA are precisely the context free
languages.

6.1

What is a Pushdown Automaton?

A Pushdown Automaton P = (Q, , , , q0 , Z0 , F ) is given by the following


data

To save space we may abbreviate this by writing:


(q0 , x, z)
(q0 , , z)
(q1 , x, x)
(q1 , , #)
(q, x, z)

=
=
=
=
=

{(q0 , xz)}
{(q1 , z)}
{(q1 , )}
{(q2 , )}
{}
everywhere else

where q Q, x , z . We obtain the previous table by expanding all the


possibilities for q, x, z.
We draw the transition diagram of P by labelling each transition with a triple
x, Z, with x , Z , :

A finite set Q of states,

x,x,

x,z,xz

A finite set of symbols (the alphabet),


A finite set of stack symbols,
q0

A transition function
Q ( {}) Pfin (Q )
Here Pfin (A) are the finite subsets of a set, i.e. this can be defined as
Pfin (A) = {X | X A Xis finite.}

q1

q2

,#,

How does a PDA work?

At any time the state of the computation of a PDA P = (Q, , , , q0 , Z0 , F ) is


given by:
the state q Q the PDA is in,

An initial state q0 Q,

the input string w which still has to be processed,

An initial stack symbol Z0 ,

the contents of the stack .

A set of final states F Q.


As an example we consider a PDA P which recognizes the language of even
length palindromes over = {0, 1} L = {wwR | w {0, 1} }. Intuitively,
this PDA pushes the input symbols on the stack until it guesses that it is in
the middle and then it compares the input with what is on the stack, popping
of symbols from the stack as it goes. If it reaches the end of the input precisely
at the time when the stack is empty, it accepts.
P0 = ({q0 , q1 , q2 }, {0, 1}, {0, 1, #}, , q0 , #, {q2 })
where is given by the following equations:
(q0 , 0, #)
(q0 , 1, #)
(q0 , 0, 0)
(q0 , 1, 0)
(q0 , 0, 1)
(q0 , 1, 1)
(q0 , , #)
(q0 , , 0)
(q0 , , 1)
(q1 , 0, 0)
(q1 , 1, 1)
(q1 , , #)
(q, w, z)

6.2

, z,z

=
=
=
=
=
=
=
=
=
=
=
=
=

{(q0 , 0#)}
{(q0 , 1#)}
{(q0 , 00)}
{(q0 , 10)}
{(q0 , 01)}
{(q0 , 11)}
{(q1 , #)}
{(q1 , 0)}
{(q1 , 1)}
{(q1 , )}
{(q1 , )}
{(q2 , )}
{}
everywhere else
35

Such a triple (q, w, ) Q is called an Instantanous Description (ID).


We define a relation `P IDID between IDs which describes how the PDA can
change from one ID to the next one. Since PDAs in general are nondeterministic
this is a relation (not a function), i.e. there may be more than one possibility.
There are two possibilities for `P :
1. (q, xw, z) `P (q 0 , w, ) if (q 0 , ) (q, x, z)
2. (q, w, z) `P (q 0 , w, ) if (q 0 , ) (q, , z)
In the first case the PDA reads an input symbol and consults the transition
function to calculate a possible new state q 0 and a sequence of stack symbols
which replace the currend symbol on the top z.
In the second case the PDA ignores the input and silently moves into a new
state and modifies the stack as above. The input is unchanged.
Consider the word 0110 what are possible sequences of IDs for P0 starting
with (q0 , 0110, #) ?
(q0 , 0110, #)

`P 0
` P0
` P0
`P0
` P0
`P0

(q0 , 110, 0#)


(q0 , 10, 10#)
(q1 , 10, 10#)
(q1 , 0, 0#)
(q1 , , #)
(q2 , , )
36

1.
1.
2.
1.
1.
2.

with
with
with
with
with
with

(q0 , 0#) (q0 , 0, #)


(q0 , 10) (q0 , 1, 0)
(q1 , 1) (q0 , , 1)
(q1 , ) (q1 , 1, 1)
(q1 , ) (q1 , 0, 0)
(q2 , ) (q1 , , #)

We write (q, w, ) `P (q 0 , w0 , ) if the PDA can move from (q, w, ) to (q 0 , w0 , 0 )


in a (possibly empty) sequence of moves. Above we have shown that
(q0 , 0110, #) `P0 (q2 , , ).
However, this is not the only possible sequence of IDs for this input. E.g. the
PDA may just guess the middle wrong:
(q0 , 0110, #)

`P0 (q0 , 110, 0#)


`P0 (q1 , 110, 0#)

1. with (q0 , 0#) (q0 , 0, #)


2. with (q1 , 0) (q0 , , 0)

We have shown (q0 , 0110, #) `P0 (q1 , 110, 0#). Here the PDA gets stuck there
is no state after (q1 , 110, 0#).
If we start with a word which is not in the language L (like 0011) then the
automaton will always get stuck before reaching a final state.

6.3

The language of a PDA

There are two ways to define the language of a PDA P = (Q, , , , q0 , Z0 , F )


(L(P ) ) because there are two notions of acceptance:

6.4

Deterministic PDAs

We have introduced PDAs as nondeterministic machines which may have several


alternatives how to continue. We now define Deterministic Pushdown Automata
(DPDA) as those which never have a choice.
To be precise we say that a PDA P = (Q, , , , q0 , Z0 , F ) is deterministic (is
a DPDA) iff
|(q, x, z)| + |(q, , z)| 1
Remember, that |X| stands for the number of elements in a finite set X.
That is: a DPDA may get stuck but it has never any choice.
In our example the automaton P0 is not deterministic, e.g. we have (q0 , 0, #) =
{(q0 , 0#)} and (q0 , , #) = {(q1 , #)} and hence |(q0 , 0, #)| + |(q0 , , #)| = 2.
Unlike the situation for finite automata, there is in general no way to translate a
nondeterministic PDA into a deterministic one. Indeed, there is no DPDA which
recognizes the language L ! Nondeterministic PDAs are more powerful
than deterministic PDAs.
However, we can define a similar language L0 over = {0, 1, $} which can be
recognized by a deterministic PDA:
L0 = {w$wR | w {0, 1} }

Acceptance by final state


L(P ) = {w | (q0 , w, Z0 ) `P (q, , ) q F }
That is the PDA accepts the word w if there is any sequence of IDs starting
from (q0 , w, Z0 ) and leading to (q, , ), where q F is one of the final
states. Here it doesnt play a role what the contents of the stack are at
the end.
In our example the PDA P0 would accept 0110 because (q0 , 0110, #) `P0
(q2 , , ) and q2 F . Hence we conclude 0110 L(P0 ).
On the other hand since there is no successful sequence of IDs starting
with (q0 , 0011, #) we know that 0011
/ L(P0 ).
Acceptance by empty stack

That is L contains palindroms with a marker $ in the middle, e.g. 01$10 L0 .


We define a DPDA P1 for L0 :
0

P1 = ({q0 , q1 , q2 }, {0, 1, $}, {0, 1, #}, 0 , q0 , #, {q2 })


where is given by:
(q0 , x, z)
(q0 , $, z)
(q1 , x, x)
(q1 , , #)
(q, x, z)

=
=
=
=
=

{(q0 , xz)
x {0, 1}}
{(q1 , z)}
{(q1 , )}
{(q2 , )}
{}
everywhere else

Here is its transition graph:

x,x,

x,z,xz

L(P ) = {w | (q0 , w, Z0 ) `P (q, , )}


That is the PDA accepts the word w if there is any sequence of IDs starting
from (q0 , w, Z0 ) and leading to (q, , ), in this case the final state plays no
role.
If we specify a PDA for acceptance by empty stack we will leave out the
set of final states F and just use P = (Q, , , , q0 , Z0 ).
Our example automaton P0 also works if we leave out F and use acceptance by empty stack.

q0

$,z,z

q1

q2

We can check that this automaton is deterministic. In particular the 3rd and
4th line cannot overlap because # is not an input symbol.
Different to PDAs in general the two acceptance methods are not equivalent for
DPDAs acceptance by final state makes it possible to define a bigger class of
langauges. Hence, we shall always use acceptance by final state for DPDAs.

We can always turn a PDA which use one acceptance method into one which uses
the other. Hence, both acceptance criteria specify the same class of languages.
38

37

,#,

6.5

Context free grammars and push-down-automata

Theorem 6.1 For a language L the following is equivalent:


1. L is given by a CFG G, L = L(G).
2. L is the language of a PDA P , L = L(P ).
To summarize: Context Free Languages (CFLs) can be described by a Context
Free Grammar (CFG) and can be processed by a pushdown automaton.
We will he only show how to construct a PDA from a grammar - the other
direction is shown in [HMU01] (6.3.2, pp. 241).
Given a CFG G = (V, , S, P ), we define a PDA
P (G) = ({q0 }, , V , , q0 , S)
where is defined as follows:
(q0 , , A) = {(q0 , ) | A P }
for all A V .
(q0 , a, a) = {(q0 , )}
for all a .
We havent given a set of final states because we use acceptance by empty stack.
Yes, we use only one state!
Take as an example G = ({E, T, F }, {(, ), a, +, }, E, P ) where
P = {E T | E + T
T F |T F
F a | (E)

How does the P (G) accept a + (a*a)?


(q0 , a + (a*a), E) ` (q0 , a + (a*a), E+T )
` (q0 , a + (a*a), T +T )
` (q0 , a + (a*a), F +T )
` (q0 , a + (a*a), a+T )
` (q0 , + (a*a), +T )
` (q0 , (a*a), T )
` (q0 , (a*a), F )
` (q0 , (a*a), (E))
` (q0 , a*a), E))
` (q0 , a*a), T ))
` (q0 , a*a), T F ))
` (q0 , a*a), F F ))
` (q0 , a*a), a F ))
` (q0 , *a), *F ))
` (q0 , a), F ))
` (q0 , a), a))
` (q0 , ), ))
` (q0 , , )
Hence a + (a*a) L(P (G)).
This example hopefully already illustrates the general idea:
w L(G)

S ... w

(q0 , w, S) ` ` (q0 , , )

we define

w L(P (G))

P (G) = ({q0 }, {(, ), a, +, }, {E, T, F, (, ), a, +, }, , q0 , E)


with

The automaton we have constructed is very non-deterministic: Whenever we


have a choice between different rules the automaton may silently choose one of
the alternative.

(q0 , , E) = {(q0 , T ), (q0 , E + T )}


(q0 , , T ) = {(q0 , F ), (q0 , T F )}
(q0 , , F ) = {(q0 , a), (q0 , (E))}
(q0 , (, () = {(q0 , )}
(q0 , ), )) = {(q0 , )}
(q0 , a, a) = {(q0 , )}
(q0 , +, +) = {(q0 , )}
(q0 , *, *) = {(q0 , )}
40

39

How to implement a recursive descent parser

A parser is a program which processes input defined by a context-free grammar.


The translation given in the previous section is not very useful in the design
of such a program because of the non-determinism. Here I show how for a
certain class of grammars this non-determinism can be eliminated and using
the example of arithmetical expressions I will show how a JAVA-program can
be constructed which parses and evaluates expressions.

We calculate First and Follow in a similar fashion:


First(a) = {a} if a .
If A B1 B2 . . . Bn and there is an i n s.t. 1 k < i.Bk  then
we add First(Bi ) to First(A).
And for Follow:

7.1

What is a LL(1) grammar ?

The basic idea of a recursive descent parser is to use the current input symbol
to decide which alternative to choose. Grammars which have the property that
it is possible to do this are called LL(1) grammars.
First we introduce an end marker $, for a given G = (V, , S, P ) we define the
augmented grammar G$ = (V 0 , 0 , S 0 , P 0 ) where
/ V ,
V 0 = V {S 0 } where S 0 is chosen s.t. S 0
0 = {$} where $ is chosen s.t. $
/ V ,
P 0 = P {S 0 S$}
The idea is that
L(G$ ) = {w$ | w L(G)}
Now for each nonterminal symbol A V 0 0 we define
First(A) = {a | a A a}
Follow(A) = {a | a S 0 Aa}
i.e. First(A) is the set of terminal symbols with which a word derived from A
may start and Follow(A) is the set of symbols which may occur directly after
A. We use the augmented grammar to have a marker for the end of the word.
For each production A P we define the set Lookahead(A ) which
are the set of symbols which indicate that we are in this alternative.
[
Lookahead(A B1 B2 . . . Bn ) = {First(Bi ) | 1 k < i.Bk }

Follow(A) if B1 B2 . . . Bk 

otherwise
We now say a grammar G is LL(1), iff for each pair A , A P with
6= it is the case that Lookahead(A ) Lookahead(A ) =

7.2

How to calculate First and Follow

We have to determine whether A . If there are no -production we know


that the answer is always negative, otherwise
If A  P we know that A .
If A B1 B2 . . . Bn where all Bi are nonterminal symbols and for all
1 i n: Bi  then we also know A .
41

$ Follow(S) where S is the original start symbol.


If there is a production A B then everything in First() is in
Follow(B).
If there is a production A B with  then everything in
Follow(A) is also in Follow(B).

7.3

Constructing an LL(1) grammar

Lets have a look at the grammar G for arithmetical expressions again. G =


({E, T, F }, {(, ), a, +, }, E, P ) where
P = {E T | E + T
T F |T F
F a | (E)
We dont need the Follow-sets in the moment because the empty word doesnt
occur in the grammar. For the nonterminal symbols we have
First(F ) = {a, (}
First(T ) = {a, (}
First(E) = {a, (}
and now it is easy to see that most of the Lookahead-sets agree, e.g.
Lookahead(E T ) = {a, (}
Lookahead(E E + T ) = {a, (}
Lookahead(T F ) = {a, (}
Lookahead(T T F ) = {a, (}
Lookahead(F a) = {a}
Lookahead(F (E)) = {(}
Hence the grammar G is not LL(1).
However, luckily there is an alternative grammar G0 which defines the same
language: G0 = ({E, E 0 , T, T 0 , F }, {(, ), a, +, }, E, P 0 ) where
P 0 = {E T E 0
E 0 +T E 0 | 
T FT0
T 0 *F T 0 | 
F a | (E)
42

Since we have -productions we do need the Follow-sets.


First(E) = First(T ) = First(F ) = {a, (}
First(E 0 ) = {+}
First(T 0 ) = {*}
Follow(E) = Follow(E 0 ) = {), $}
Follow(T ) = Follow(T 0 ) = {+, ), $}
Follow(F ) = {+, *, ), $}
Now we calculate the Lookahead-sets:
Lookahead(E T E 0 ) = {a, (}
Lookahead(E 0 +T E 0 ) = {+}
Lookahead(E 0 ) = Follow(E 0 ) = {), $}
Lookahead(T +F T 0 ) = {a, (}
Lookahead(T 0 *F T 0 ) = {*}
Lookahead(T 0 ) = Follow(T 0 ) = {+, ), $}
Lookahead(F a) = {a}
Lookahead(F (E)) = {(}

Hence the grammar G0 is LL(1).

7.4

try {
curr=st.nextToken().intern();
} catch( NoSuchElementException e) {
curr=null;
}
}
We also implement a convenience method error(String) to report an error
and terminate the program.
Now we can translate all productions into methods using the Lookahead sets to
determine which alternative to choose. E.g. we translate
E 0 +T E 0 | 
into (using E1 for E 0 to follow JAVA rules):
static void parseE1() {
if (curr=="+") {
next();
parseT();
parseE1();
} else if(curr==")" || curr=="$" ) {
} else {
error("Unexpected :"+curr);
}
The basic idea is to

How to implement the parser

We can now implement a parser - one way would be to construct a deterministic


PDA. However, using JAVA we can implement the parser using recursion - here
the internal JAVA stack plays the role of the stack of the PDA.
First of all we have to separate the input into tokens which are the terminal symbols of our grammar. To keep things simple I assume that tokens are separated
by blanks, i.e. one has to type
( a + a ) * a
for (a+a)*a. This has the advantage that we can use java.util.StringTokenizer.
In a real implementation tokenizing is usually done by using finite automata.
I dont want to get lost in java details - in the main program I read a line and
produce a tokenizer:
String line=in.readLine();
st = new StringTokenizer(line+" $");
The tokenizer st and the current token are static variables. I implement the
convenience method next which assigns the next token to curr.
static StringTokenizer st;
static String curr;

Translate each occurrence of a non terminal symbol into a test that this
symbol has been read and a call of next().
Translate each nonterminal symbol into a call of the method with the same
name.
If you have to decide between different productions use the lookahead sets
to determine which one to use.
If you find that there is no way to continue call error().
We initiate the parsing process by calling next() to read the first symbol and
then call parseE(). If after processing parseE() we are at the end marker,
then the parsing has been successful.
next();
parseE();
if(curr=="$") {
System.out.println("OK ");
} else {
error("End expected");
}
The complete parser can be found at
http://www.cs.nott.ac.uk/~txa/g51mal/ParseE0.java.
Actually, we can be a bit more realistic and turn the parser into a simple evaluator by
44

static void next() {


43

Replace a by any integer. I.e. we use


Integer.valueOf(curr).intValue();

to translate the current token into a number. JAVA will raise an exception
if this fails.
Calculate the value of the expression read. I.e. we have to change the
method interfaces:
static
static
static
static
static

int
int
int
int
int

parseE()
parseE1(int x)
parseT()
parseT1(int x)
parseF()

7.5

Beyond LL(1) - use LR(1) generators

The restriction to LL(1) has a number of disadvantages: In many case a natural


(and unambiguous) grammar like G has to be changed. There are some cases
where this is actually impossible, i.e. although the language is deterministic
there is no LL(1) grammar for this.
Luckily, there is a more powerful approach, called LR(1). LL(1) proceeds from
top to bottom, when we are looking at the parse tree, hence this is called topdown parsing. In contrast LR(1) proceeds from bottom to top, i.e. it tries to
construct the parse tree from the bottom upwards.
The disadvantage with LR(1) and the related approach LALR(1) (which is
slightly less powerful but much more efficient) is that it is very hard to construct
LR-parsers from hand. Hence there are automated tools which get the grammar
as an input and which produce a parser as the output. One of the first of those
parser generators was YACC for C. Nowadays one can find parser generators
for many languages such as JAVA CUP for Java [Hud99] and Happy for Haskell
[Mar01].

The idea behind parseE1 and parseT1 is to pass the result calculated
so far and leave it to the method to incorporate the missing part of the
expression. I.e. in the case of parseE1
static int parseE1(int x) {
if (curr=="+") {
next();
int y = parseT();
return parseE1(x+y);
} else if(curr==")" || curr=="$" ) {
return x;
} else {
error("Unexpected :"+curr);
return x;
}
}

Here is the complete program with evaluation


http://www.cs.nott.ac.uk/~txa/g51mal/ParseE.java.
We can run the program and observe that it handles precedence of operators
and brackets properly:
[txa@jacob misc]$ java ParseE
3 + 4 * 5
OK 23
[txa@jacob misc]$ java ParseE
( 3 + 4 ) * 5
OK 35

46

45

Turing machines and the rest

A Turing machine (TM) is a generalization of a PDA which uses a tape instead


of a stack. Turing machines are an abstract version of a computer - they have
been used to define formally what is computable. There are a number of alternative approaches to formalize the concept of computability (e.g. called the
-calculus, or -recursive functions, . . . ) but they can all shown to be equivalent. That this is the case for any reasonable notion of computation is called
the Church-Turing Thesis.
On the other side there is a generalization of context free grammars called phrase
structure grammars or just grammars. Here we allow several symbols on the
left hand side of a production, e.g. we may define the context in which a rule
is applicable. Languages definable by grammars correspond precisely to the
ones which may be accepted by a Turing machine and those are called Type-0languages or the recursively enumerable languages (or semidecidable languages)
Turing machines behave different from the previous machine classes we have
seen: they may run forever, without stopping. To say that a language is accepted
by a Turing machine means that the TM will stop in an accepting state for each
word which is in the language. However, if the word is not in the language the
Turing machine may stop in a non-accepting state or loop forever. In this case
we can never be sure whether the given word is in the language - i.e. the Turing
machine doesnt decide the word problem.
We say a language is recursive (or decidable), if there is a TM which will always
stop. There are type-0-languages which are not recursive the most famous
one is the halting problem. This is the language of encodings of Turing machines
which will always stop.
There is no type of grammars which captures all recursive languages (and for
theoretical reasons there cannot be one). However there is a subset of recursive languages which are called context-sensitive languages which are given by
context-sensitive grammars, these are those grammars where the left hand side
of a production is always shorter than the right hand side. Context sensitive
languages on the other hand correspond to linear bounded TMs, these are those
TMs which use only a tape whose length can be given by a linear function over
the length of the input.

8.1

What is a Turing machine?

A Turing machine M = (Q, , , , q0 , B, F ) is given by the following data


A finite set Q of states,
A finite set of symbols (the alphabet),
A finite set of tape symbols s.t. . This is the case because we use
the tape also for the input.
A transition function

otherwise if (q, x) = (q 0 , y, d) the machines gets into state q 0 , writes y on


the tape (replacing x) and moves left if d = L or right, if d = R
An initial state q0 Q,
The blank symbol B but B
/ . In the beginning only a finite section
of the tape containing the input is not blank.
A set of final states F Q.
In [HMU01] the transition function is defined without the stop option as
Q Q{L, R}. However they allow to be undefined which correspond
to our function returning stop.
This defines deterministic Turing machines, for non-deterministic TMs we change
the transition function to
Q P(Q {L, R})
Here stop corresponds to returning an empty set. As for finite automata
(and unlike for PDAs) there is no difference in the strength of deterministic or
non-deterministic TMs.
As for PDAs we define instantaneous descriptions ID for Turing machines. We
have ID = Q where (l , q, r ) ID means that the TM is in state Q
and left from the head the non-blank part of the tape is l and starting with
the head itself and all the non-blank symbols to the right is r .
We define the next state relation `M similar as for PDAs:
1. (l , q, xr ) `M (l y, q 0 , r ) if (q, x) = (q 0 , y, R)
2. (l z, q, xr ) `M (l , q 0 , zyr ) if (q, x) = (q 0 , y, L)
3. (l , q, ) `M (l y, q 0 , r ) if (q, B) = (q 0 , y, R)
4. (, q, xr ) `M (l , q 0 , Byr ) if (q, x) = (q 0 , y, L)
The cases 3. and 4. are only needed to deal with the situation if we have reached
the end of the (non-blank part of) the tape.
We say that a TM M accepts a word if it goes into an accepting state, i.e. the
language of a TM is defined as
L(M ) = {w | (, q0 , w) `M (l , q 0 , r ) q 0 F }
I.e. the TM stops automatically if it goes into an accepting state. However, it
may also stop in a non-accepting state if returns stop - in this case the word
is rejected.
A TM M decides a language if it accepts it and it never loops (in the negative
case).
To illustrate this we define a TM M which accepts the language L = {an bn cn |
n N} this is a language which cannot be recognized by a PDA or be defined
by a CFG.
We define M = (Q, , , , q0 , B, F ) by
Q = {q0 , q1 , q2 , q3 , q4 , q5 , q6 }

Q {stop} Q {L, R}
The transition function defines how the function behaves if is in state q
and the symbol on the tape is x. If (q, x) = stop then the machine stops
47

= {a, b, c}
48

= {X, Y, Z, }
is given by
(q0 , )
(q0 , a)
(q1 , a)
(q1 , Y )
(q1 , b)
(q2 , b)
(q2 , Z)
(q2 , c)
(q3 , )
(q3 , c)
(q4 , Z)
(q4 , b)
(q4 , Y )
(q4 , a)
(q4 , X)
(q5 , Z)
(q5 , Y )
(q5 , X)
(q, x)

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

( , q6 , R)
(X, q1 , R)
(a, q1 , R)
(Y, q1 , R)
(Y, q2 , R)
(b, q2 , R)
(Z, q2 , R)
(Z, q3 , R)
( , q5 , L)
(c, q4 , L)
(Z, q4 , L)
(b, q4 , L)
(Y, q4 , L)
(a, q4 , L)
(X, q0 , R)
(Z, q5 , L)
(Y, q4 , L)
(X, q6 , R)
stop
everywhere else

q0

a,X,R

q1

b,Y,R

a,a,R
Y,Y,R
X,X,R

c,Z,R

q2

q3

b,b,R
Z,Z,R
q4

, ,L
q5

c,c,L

Z,Z,L
Y,Y,L

X,X,R
Z,Z,L
b,b,L
Y,Y,L
a,a,L

q6

E.g. consider the sequence of IDs on aabbcc:


(, q0 , aabbcc) ` (X, q1 , abbcc)
` (Xa, q1 , bbcc)
` (XaY, q2 , bcc)
` (XaYb, q2 , cc)
` (XaYbZ, q3 , c)
` (XaYb, q4 , Zc)
` (XaY, q4 , bZc)
` (Xa, q4 , YbZc)
` (X, q4 , aYbZc)
` (, q4 , XaYbZc)
` (X, q0 , aYbZc)
` (XX, q1 , YbZc)
` (XXY, q1 , bZc)
` (XXYY, q2 , Zc)
` (XXYYZ, q2 , c)
` (XXYYZZ, q2 , )
` (XXYYZ, q5 , Z)
` (XXYY, q5 , ZZ)
` (XXY, q5 , YZZ)
` (XX, q5 , YYZZ)
` (X, q5 , XYYZZ)
` (, q6 , XXYYZZ)

q0 = q 0
B=
F = {q6 }
The machine replaces an a by X (q0 ) and then looks for the first b replaces it
by Y (q1 ) and looks for the first c and replaces it by a Z (q2 ). If there are more
cs left it moves left to the next a (q4 ) and repeats the cycle. Otherwise it checks
whether there are no as and bs left (q5 ) and if so goes in an accepting state (q6 ).
Graphically the machine can be represented by the following transition diagram,
where the edges are labelled by (read-symbol,write-symbol,move-direction):

We see that M accepts aabbcc. Since M never loops it does actually decide L.

8.2

Grammars and context-sensitivity

Grammars G = (V, , S, P ) are defined as context-free grammars before with


the only difference that there may be several symbols on the left-hand side of a
production, i.e. P (V T )+ (V T ) . Here (V T )+ means that at least
50

49

one symbol has to present. The relation derives G (and G ) is defined as


before
G (V T ) (V T )
G 0 : 0 P
and as before the language of G is defined as
L(G) = {w | S G w}
We say that a grammar is context-sensitive (or type 1) if the left hand side of a
production is at least as long as the right hand side. That is for each P
we have || ||
Here is an example of a context sensitive grammar: G = (V, , S, P ) with
L(G) = {{an bn cn | n N n 1}. where
V = {S, B, C}
= {a, b, c}

P = {S aSBC
S aBC
aB ab
CB BC
bB bb
bC bc
cC cc}

Lets fix a simple alphabet = {0, 1}. As computer scientist we are well aware
that everything can be coded up in bits and hence we accept that there is an
encoding of TMs in binary. I.e. given a TM M we write dM e {0, 1} for its
binary encoding. We assume that the encoding contains its length s.t. we know
when subsequent input on the tape starts.
Now we define the following language
Lhalt = {dM ew | M holds on input w.}
It is easy (although the details are quite daunting) to define a TM which accepts
this language: we just simulate M and accept if M stops.
However, Turing showed that there is no TM which decides this language. To
see this let us assume that there is a TM H which decides L. Now using H we
construct a new TM F which is a bit obnoxious: F on input x runs H on xx.
If H says yes then F goes into a loop otherwise (H says no) F stops.
The question is what happens if I run F on dF e? Let us assume it terminates,
then H applied to dF edF e returns yes and hence we must conclude that F on
dF e loops??? On the other hand if F with input dF e loops then H applied to
dF edF e will stop and reject and hence we have to conclude that F on dF e will
stop?????
This is a contradiction and hence we must conclude that our assumption that
there is a TM H which decides Lhalt is false. We say Lhalt is undecidable.
We haven shown that a Turing machine cannot decide whether a program (for
a Turing machine) halts. Maybe we could find a more powerful programming
language which overcomes this problem? It turns out that all computational
formalisms (i.e. programming languages) which can actually be implemented
are equal in power and can be simulated by each other this observation
is called the Church-Turing thesis because it was first formulated by Alonzo
Church and Alan Turing in the 30ies.

8.4

We present without proof:


Theorem 8.1 For a language L the following is equivalent:

Back to Chomsky

At the end of the course we should have another look at the Chomsky hierarchy,
which classifies languages based on sublasses of grammars or equivalently by
different types of automata which recognize them

1. L is accepted by a Turing machine M , i.e. L = L(M )


2. L is given by a grammar G, i.e. L = L(G)
Theorem 8.2 For a language L the following is equivalent:
1. L is accepted by a Turing machine M , i.e. L = L(M ) such that the length
of the tape is bounded by a linear function in the length of the input, i.e.
|l | + |r | f (x) where f (x) = ax + b with a, b N.
2. L is given by a context sensitive grammar G, i.e. L = L(G)

8.3

The halting problem

Turing showed that there are languages which are accepted by a TM (i.e. type 0
languages) but which are undecidable. The technical details of this construction
are quite involved but the basic idea is quite simple and is closely related to
Russells paradox, which we have seen in MCS.
51

52

All languages
Type 0 or recursively enumerable languages
Decidable languages
Turing machines
Type 1 or context sensitive
languages
Type 2 or context free
languages
pushdown automata
Type 3 or
regular languages
finite automata

References
[Alt01]

Thorsten Altenkirch. Mathematics for computer scientists (g51mcs)


lecture notes. www, 2001. 21

[Bac02]

Roland Backhouse. Compilers (g52cmp) lecture notes. www, 2002.


32

[GJSB00] James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java
Language Specification. Sun Microsystems, Inc., 2nd edition edition,
2000. 4, 32
[HMU01] John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison
Wesley, 2nd edition edition, 2001. 3, 10, 25, 39, 48
[Hud99]

Scott E. Hudson. Cup parser generator for java. www, 1999. 46

[Mar01]

Simon Marlow. Happy - the parser generator for haskell. www, 2001.
46

[Sch99]

Uwe Schoning. Theoretische Informatik kurzgefat. Spektrum Akademischer Verlag, 3. Auflage edition, 1999. 3

We have worked our way from the bottom to the top of the hierarchy: starting with finite automata, i.e. computation with fixed amount of memory via
pushdown automata (finite automata with a stack) to Turing machines (finite
automata with a tape). Correspondigly we have introduced different grammatical formalisms: regular expressions, context-free grammars and grammars.
Note that at each level there are languages which are on the next level but not
on the previous: {an bn | n N} is level 2 but not level 3; {an bn cn } is level 1
but not level 2 and the Halting problem is level 0 but not level 1.
We could have gone the other way: starting with Turing machines and grammars
and then introduce restrictions on them. I.e. Turing machines which only use
their tapes as a stack, and Turing machines which never use the tape apart
for reading the input. Again correspondingly we can define restrictions on the
grammar sise: first introduce context-free grammars and then grammars where
all productions are of the form A aB or A a, with A, B non-terminal
symbols and a, b are terminals. These grammars correspond precisely to regular
expressions (I leave this as an exercise).
I believe that Chomsky introduced his herarchy as a classification of grammars
and that the relation to automata was only observed a bit later. This is maybe
the reason why he introduced the Type-1 level, which is not so interesting from
an automata point of view (unless you are into computational complexity, i.e.
resource use - here linear use of memory). It is also the reason why on the other
hand the decidable languages do not constitute a level: there is no corresponding
grammatical formalism (we can even prove this).

54

53

You might also like