Week-2 Lecture 2 Lexical Analysis
Week-2 Lecture 2 Lexical Analysis
Week-2 Lecture 2 Lexical Analysis
Lecture: 3
Lexical Analysis (Part 2)
Finite State Automata (FSAs)
AKA “Finite State Machines”, “Finite Automata”, “FA”
A recognizer for a language is a program that takes as input a
string x and answers “yes” if x is a sentence of the language and
“no” otherwise.
The regular expression is compiled into a recognizer by
constructing a generalized transition diagram called a finite
automaton.
One start state
Many final states
Each state is labeled with a state name
Directed edges, labeled with symbols
Two types
Deterministic (DFA)
Non-deterministic (NFA)
Nondeterministic Finite Automata
A nondeterministic finite automaton (NFA) is a
mathematical model that consists of
1. A set of states S
2. A set of input symbols
3. A transition function that maps state/symbol pairs
to a set of states
4. A special state s0 called the start state
5. A set of states F (subset of S) of final states
INPUT: string
OUTPUT: yes or no
Example – NFA : (a|b)*abb
S = { 0, 1, 2, 3 } a
start
s0 = 0 0 a 1 b 2 b 3
F={3}
b
= { a, b }
input
(null) moves possible
s a b
t 0 { 0, 1 } { 0 } i j
a 1
t -- {2} Switch state but do not
e 2 -- {3} use any input symbol
Transition Table
Deterministic Finite Automata
a b
start
0 a 1 b 2 b 3
a
b a
What Language is Accepted?
a
start
0 a 1 b 2 b 3
b
DFA vs NFA
Both DFA and NFA are the recognizers of regular
sets.
But – time-space trade space exists
DFAs are faster recognizers
Can be much bigger too..
Converting Regular Expressions to NFAs
Thompson’s Construction
Empty string is a regular expression denoting { }
start i f
a is a regular expression denoting {a} for any a in
start i a f
Converting Regular Expressions to NFAs
If P and Q are regular expressions with NFAsNp, Nq:
P | Q (union)
Np
start
i f
Nq
PQ (concatenation)
start
i Np Nq f
Converting Regular Expressions to NFAs
start Nq
i f
NFA Construction
RE: (a|b)*abb
NFA Construction
RE: (a|b)*abb
H.W: Construct NFA for the following RE
(ab*c) | (a(b|c*))
NFA Construction
id letter ( letter | digit )*
End of slide