Lexical Analysis
Lexical Analysis
Lexical Analysis
L i
• Positive Closure
i 0
– L+ =
L i
– Regular Expressions
i 1
4
Regular Expression
• Notation for representing Tokens
• Ex: Identifiers in Pascal
letter A | B | ... | Z | a | b | ... | z
digit 0 | 1 | ... | 9
id letter (letter | digit ) *
5
error error
Symbol Table
7
Attributes of Tokens
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>
token
tokenval
(token attribute) Parser
8
s0 =
si = si-1s for i > 0
note that s = s = s
11
letter AB…Zab…z
digit 01…9
id letter ( letterdigit )*
• Regular definitions are not recursive:
r+ = rr*
r? = r
[a-z] = abc…z
• Examples:
digit [0-9]
num digit+ (. digit+)? ( E (+-)? digit+ )?
16
lex.yy.c C a.out
compiler
input sequence
stream a.out of tokens
21
Optional
regular
NFA DFA
expressions
Nondeterministic Finite
Automata
• An NFA is a 5-tuple (S, , , s0, F) where
Transition Graph
• An NFA can be diagrammatically
represented by a labeled directed graph
called a transition graph
a
S = {0,1,2,3}
start a b b = {a,b}
0 1 2 3
s0 = 0
b F = {3}
24
Transition Table
• The mapping of an NFA can be
represented in a transition table
Input Input
State
(0,a) = {0,1} a b
(0,b) = {0} 0 {0, 1} {0}
(1,b) = {2} 1 {2}
(2,b) = {3}
2 {3}
25
Subset construction
DFA
27
a start a
i f
N(r1)
r1r2
start
i f
N(r2)
start
r1r2 i N(r1) N(r2) f
r* start
i N(r) f
28
a { action1 }
start a b b
abb { action2 } 3 4 5 6
a b
a*b+ { action3 }
start
7 b 8
a
1 2
start
0 3
a
4
b
5
b
6
a b
7 b 8
29
a a b a
none
0 2 7 8 action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possible
When last state is accepting: execute action
30
a b b a
none
0 2 5 6 action2
1 4 8 8 action3
3 7
7 When two or more accepting states are reached, the
first action given in the Lex specification is executed
31
Example DFA
b
b
a
start a b b
0 1 2 3
a a
33
C
b a
b a
start a b b start a b b
A B D E A B D E
a a
a
a b a