CD ch2

SECTION 1.
1
LEXICAL ANALYSIS- INTRODUCTION
LEXICAL ANALYZER
 Lexical Analyzer reads the source program character by character to

produce tokens.
 Normally a lexical analyzer doesn’t return a list of tokens at one shot,
it returns a token when the parser asks a token from it.
source Lexical token

program Parser
Analyzer get next token
Symbol
Table
ROLES OF THE LEXICAL ANALYSER
Lexical analyzer performs following tasks:

 Helps to identify token in the symbol table
 Removes white spaces and comments from the source program
 Correlates error messages with the source program
 Helps you to expands the macros if it is found in the source program
 Read input characters from the source program

TOKENS, LEXEMES AND PATTERNS
 Token: Token is a sequence of characters that can be treated as a

single logical entity. Typical tokens are:
Identifiers 2) keywords 3) operators 4) special symbols 5)constants
 Lexeme: A lexeme is a sequence of characters in the source program

that is matched by the pattern for a token.
 Pattern: A set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule called a
pattern associated with the token.
TOKENS, LEXEMES AND PATTERNS
Token Lexeme Pattern
(element of a
kind )
ID x y n_0 letter followed by letters
and digits
NUM -123 any numeric constant
1.456e-5
IF if if
LPAREN ( (
LITERAL ``Hello'' any string of characters
(except ``) between `` and ``
 Regular expressions are widely used to specify patterns.

EXAMPLE #include <stdio.h>
int maximum(int x, int y){
// This will compare 2 numbers
Tokens Generated
Lexeme Token
int Keyword
maximu Identifier
m Type Examples
( Operator Comment // This will compare
2 numbers
int Keyword
Pre- #include <stdio.h>
x Identifier processor
directive
, Operator
Whitespace /n /b /t
int Keyword
Non-Tokens
Y Identifier
) Operator
{ Operator
TERMINOLOGY OF LANGUAGES
 Alphabet : a finite set of symbols (ASCII characters)

 String :
 Finite sequence of symbols on an alphabet
 Sentence and word are also used in terms of string
  is the empty string
 |s| is the length of string s.
 Language: sets of strings over some fixed alphabet
  the empty set is a language.
 {} the set containing empty string is a language
 The set of well-formed C programs is a language
 The set of all possible identifiers is a language.
 Operators on Strings:
 Concatenation: xy represents the concatenation of strings x and y.
OPERATIONS ON LANGUAGES
 Concatenation:
 L1L2 = { s1s2 | s1  L1 and s2  L2 }
 Union
 L1  L2 = { s | s  L1 or s  L2 }
 Exponentiation:
 L0 = {} L1 = L L2 = LL
 Kleene Closure

 L* = Li
i =0
 Positive Closure

L+ =  L
i
 i =1
EXAMPLE
 L1 = {a,b,c,d} L2 = {1,2}
 L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}
 L1  L2 = {a,b,c,d,1,2}
 L13 = all strings with length three (using a,b,c,d)
 L1* = all strings using letters a,b,c,d and empty string
 L1+ = doesn’t include the empty string

REGULAR EXPRESSIONS
 We use regular expressions to describe tokens of a programming

language.
 A regular expression is built up of simpler regular expressions

(using defining rules)
 Each regular expression denotes a language.
 A language denoted by a regular expression is called as a

regular set.
REGULAR EXPRESSIONS (RULES)
Regular expressions over alphabet 
Reg. Expr Language it denotes

 {}
a  {a}
(r1) | (r2) L(r1)  L(r2)
(r1) (r2) L(r1) L(r2)
(r)* (L(r))*
(r) L(r)
 (r)+ = (r)(r)*
 (r)? = (r) | 
REGULAR EXPRESSIONS (CONT.)
 We may remove parentheses by using precedence rules.
 * highest
 concatenation next
 | lowest
 ab*|c means (a(b)*)|(c)
 Ex:
  = {0,1}
 0|1 => {0,1}
 (0|1)(0|1) => {00,01,10,11}
 0* => { ,0,00,000,0000,....}
 (0|1)* => all strings with 0 and 1, including the empty string
REGULAR DEFINITIONS
 To write regular expression for some languages can be difficult,
because their regular expressions can be quite complex. In those cases,
we may use regular definitions.
 We can give names to regular expressions and we can use these names
as symbols to define other regular expressions.
 A regular definition is a sequence of the definitions of the form:

d1 → r1 where di is a distinct name and
d2 → r2 ri is a regular expression over symbols in
. {d1,d2,...,di-1}
dn → rn
basic symbols previously defined names
REGULAR DEFINITIONS (CONT.)
 Ex: Identifiers in Pascal

letter → A | B | ... | Z | a | b | ... | z
digit → 0 | 1 | ... | 9
id → letter (letter | digit ) *
 If we try to write the regular expression representing identifiers without using regular
definitions, that regular expression will be complex.
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *
 Ex: Unsigned numbers in Pascal

digit → 0 | 1 | ... | 9
digits → digit +
opt-fraction → ( . digits ) ?
opt-exponent → ( E (+|-)? digits ) ?
unsigned-num → digits opt-fraction opt-exponent
NOTATIONAL SHORTHAND
 The following shorthand are often used:
r+ = rr*
r? = r│ε
[a-z] = a │ b │ c │ … │ z
 Examples:
digit → [0-9]
digits → digit+
optional_fraction → (. digits)?
optional_exponent → ( E (+ │ -)? digit+ )?
num → digits optional_fraction optional_exponent
RECOGNITION OF TOKENS
 e.g. Regular Definitions
stmt → if expr then stmt if → if
│ if expr then stmt else stmtthen → then
│ ε else → else
expr → term relop term relop → < │ <= │ = │ <> │ > │ >=
│ term id → letter (letter │digit)*
term → id num →digits optional_fraction
│ num optional_exponent
Assumptions
delim → blank │tab │newline
TRANSITION DIAGRAMS
relop → <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
id → letter ( letterdigit )* letter or digit
start letter other

9 10 11 * return(gettoken(),
install_id())
TRANSITION DIAGRAMS: CODE
 token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
 else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
int fail()
break;
{ forward = token_beginning;
case 1:
swith (start) {
…
case 0: start = 9; break;
if (isletter(c)) state = 10;
else state = fail();
break;
case 25: recover(); break;
default: /* error */
if (isletter(c)) state = 10;
}
else if (isdigit(c)) state = 10;
return start;
else state = 11;
}
break;
…
THE LEX AND FLEX SCANNER GENERATORS
 Lex and its newer cousin flex are scanner generators
 Systematically translate regular definitions into C source code

for efficient scanning
 Generated code is easy to integrate in C applications

CREATING A LEXICAL ANALYZER WITH LEX AND FLEX
lex
source lex or flex lex.yy.c
program compiler
lex.l
lex.yy.c C a.out
compiler
input sequence
stream a.out of tokens
LEX SPECIFICATION
 A lex specification consists of three parts:

regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
 The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
REGULAR EXPRESSIONS IN LEX
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1 r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1\r2 match r1 when followed by r2
{d} match the regular expression defined by d
STAR OPERATION (KLEENE CLOSURE)
a* = {a0, a1, a2, a3, a4,…. a∞} ={ε, a, aa, aaa, aaaa,….. a∞}
Important Characteristics
➢ Value of * ranges from 0 to ∞ i.e. the elements of set a* will include {a0, a1, a2, a3, a4,
a5…. a∞}
➢ a0 means zero number of a’s and this is represented by ε.
➢ * is represented in finite automata by a loop on that particular state; if value of a is 3
i.e. a3 loop iterates for 3 times.
➢ If value of a is 0 i.e. a0 loop will not iterate at all.
q2f m/c for a*

POSITIVE CLOSURE
a+ = {a1, a2, a3, a4,…., a ∞} = { a, aa, aaa, aaaa,….. a ∞}
Important Characteristics
➢ value of + ranges from 1 to ∞ i.e. the elements of set a+ will include {a1, a2, a3, a4,
a5…. a ∞}
➢ There is no a0 move i.e. ε is not part of this set.
➢ Value of a will start from 1 i.e. at least one will come which can be followed by 0 or
more 1’s.
➢ Please remember: a+ = a.a* a
a
q0 q2f
m/c for a+
CONCATENATION OPERATION
Concatenation means joining (a.b)
Important Note: a.b ≠ b.a i.e. order of join will change the design of automata
a
q0 qq2f
m/c for a
b
m/c for b q0 q2f
b b a
a
q0 q1 qq2f q0 q1 qqq2f
f
m/c for a.b m/c for b.a

OR OPERATION
a
q0 qq2f
m/c for a
b
q0 q2f
m/c for b
NFA for a+b (a/b)
a q2f
q0
m/c for a/b
b q2f
SECTION 1.2
INTRODUCTION TO FINITE AUTOMATA
FINITE AUTOMATA
Automata means machine
Finite Automata consist of 5 tuples:
M = (Q, Σ, δ, q0, F)
Q A finite set of states
Σ A finite set of input alphabet
δ A transition function
q0 The initial/starting state, q0 is in Q
F A set of final/accepting states, which is a subset of F
TYPES OF AUTOMATA
There are two types of finite Automata:
➢ Deterministic Finite Automata (DFA)
➢ Non-deterministic finite Automata (NFA)

DETERMINISTIC FINITE AUTOMATA
Deterministic Finite Automata is a Machine where corresponding to
a every input of Σ, there can be only one output from every state.
b Here Σ = { a, b} and at
every state there is one
a
q1 O/P from ‘a’ and one
q0 a, O/P from ‘b’. None of
b a b the states have more
b
q2 then one output
corresponding to a or
qf
a b.
NON-DETERMINISTIC FINITE AUTOMATA
Non-Deterministic Finite Automata is a machine where corresponding to a single

input of Σ (a,b), there can be more than one output from a particular state.
b
Here state q0 has two
a moves from a, one to
q0 q1
q1 and other to q2,
a b like wise state q2 has
a two moves on ‘b’ one
b q2
qf self loop to q1 and
b another to qf
TYPES OF NFA
There are two type of NFA
i. NFA without ε -move
ii. NFA with ε -move

NFA WITH Ε-MOVE
Consider the following NFA, here corresponding q1 there is an ε-move.
a,
b
a q1
q0
a ε
a,b
qf
DIFFERENCE BETWEEN DFA AND NFA
Deterministic Finite Non-Deterministic Finite
Automata Automata
 Deterministic Finite  Non-Deterministic

Automata is a Machine Finite Automata is a
where corresponding to a machine where
every input of Σ, there corresponding to a
can be only one output single input of Σ (a,b),
from every state. there can be more than
 DFA will not have ε- one output from a
move particular state.
 NFA can have ε-move
SECTION 1.3
THOMSON’S CONSTRUCTION
THOMPSON’S CONSTRUCTION
We have three operations on Regular Expressions:
i) Star operation
ii) Concatenation
iii) OR operation
For each operation we have defined rules to build a NFA with ε-move
Thompson’s Construction for Star Operation
a* = {ε, a, aa, aaa, aaaa,…..} a
qf
NFA for a*
NFA for a* using Thomson’s Construction:
ε ε
q0 q1 q2 qf
a
ε

ε
Only ε
ε ε
q0 q1 q2 qf
a
ε
ε
Single a
ε ε
q0 q1 q2 qf
a
ε

ε
Two a’s
ε ε q0→q1→q2→q1→q2→qf
q0 q1 q2 qf
a
ε
ε N number of a’s
q0→q1→q2→q1→q2→qf
ε ε q1→q2→q1 loops for N
q0 q1 q2 qf
a times where N varies from
2 to ∞
ε
THOMPSON’S CONSTRUCTION FOR CONCATENATION
OPERATION
a
NFA for a q0 qf
b
NFA for b q0 qf
NFA for ab using Thomson’s Construction
a b
q0 q1 qf
THOMPSON’S CONSTRUCTION FOR OR OPERATION
a
NFA for a q0 qf
b
NFA for b q0 qf
NFA for a+b (a/b) using Thomson’s Construction
a ε
ε q1 q2
q0 qf
ε b q4 ε
q3
THOMPSON’S CONSTRUCTION FOR AA*B Question 1
a
Thompson’s for a: q0 qf
b
Thompson’s for b: q0 qf
ε
Thompson’s for a*: ε ε
q0 q1 q2 qf
a
ε
THOMPSON’S CONSTRUCTION FOR a*b(a/b)
Question 1
Thompson’s Construction for aa*b:
ε
a ε ε b
q0 q1 q2 q3 q4 qf
a
ε
NFA using Thompson’s Construction
a
a
q0 q1 qf
b
NFA without Thompson’s
Question 2
ε
Thompson’s for a*: ε ε
q0 q1 q2 qf
a
ε
b
a ε
ε q1 q2
Thompson’s for a/b: q0 qf
ε b q4 ε
q3
Question 2
ε a ε
ε q5 q6
ε ε b qf
q0 q1 q2 q3 q4
a b
ε q7 q8 ε
ε
b a,b
q0 q1 qf

THOMPSON’S CONSTRUCTION FOR (a/b/c)
ε q1
a
q2 ε Question 3
b ε qf Three ε out moves moves from a
q0 q3 q4
ε state are not allowed
c q6 ε
ε q5
a
ε q1 q2 ε
b qf
q0 ε q4 q6 ε
ε q3 q8 ε
ε c
q5 q7 ε
Final Output
THOMPSON’S CONSTRUCTION FOR ab(a/b)*
Question 4
a ε
ε q1 q2
Thompson’s for a/b: q0 qf
ε b q4 ε
q3
Thompson’s for (a/b)*: ε

a ε
ε q2 q3
ε q1 q6 ε
q0 qf
ε b q5 ε
q4
ε
THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4
a
Thompson’s for a: q0 qf
b
Thompson’s for (a/b)*:

ε
a ε
ε q2 q3
ε q1 q6 ε qf
q0
ε b q5 ε
q4
ε
THOMPSON’S CONSTRUCTION FOR ab(a/b)*
Question 4
ε
a ε
ε q4 q5
a b ε q3 q8 ε
q0 q1 q2 qf
ε b ε
q6 q7
ε
a,b
a b
q0 q1 qf

SECTION 1.4
SUBSET CONSTRUCTION
HOW TO WORK WITH Ε-CLOSURE FUNCTION
Steps for ε-Closure function:
➢ First step is to take ε-Closure of the start state , for e.g. if the start
state is 0 so take ε-Closure(0).
➢ ε-Closure(n) will include set of all the states which can be

traversed from state n without consuming any input i.e. through ε
move only.
➢ Most Imp.- “ε-Closure of a state will include that state itself in the
set”, i.e. ε-Closure(n) will include n in its set of states.
SUBSET CONSTRUCTION FOR (a/b)*ab
ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b
4 5 ε
ε
State a b
Start with the start state: state 0
A
ε-closure(0):{0,1,2,4,7} = A
(0,1,2,4,7)
ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b
4 5 ε
ε
Start with the start state:

ε-closure(0):{0,1,2,4,7} = A State a b
(A, a)= ({0,1,2,4,7}, a) = {0,a} ⋃{1,a} ⋃{2,a} ⋃{4,a} ⋃{7,a} A
= Φ ⋃ Φ ⋃{3} ⋃ Φ ⋃ {8} (0,1,2,4,7)
= ε -closure (3) ⋃ ε -closure (8)

ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b 5 ε
4
ε
State a b
(A, a)= ε -closure (3) ⋃ ε -closure (8) A B
= {1,2,3,4,6,7} U {8} (0,1,2,4,7) (1,2,3,4,6,7,8
)
= {1,2,3,4,6,7,8}=B
ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b 5 ε
4
ε
State a b
(A, b)= ({0,1,2,4,7}, b) A B
={0,b} ⋃{1,b} ⋃{2,b}⋃{4,b} ⋃{7,b} (0,1,2,4,7) (1,2,3,4,6,7,8)
= Φ ⋃ Φ ⋃ Φ ⋃{5} ⋃ Φ
= ε -closure (5)
ε
a ε
ε 2 3
ε 1 ε a b
6 7 8
0 9
ε b
4 5 ε
ε
State a b
(A, b)= ε -closure (5) A B C
= {1,2,4,5,6,7}=C (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
ε
a ε
ε 2 3
ε ε a b
1 6 7 8
0 9
ε b
4 5 ε
ε
(B, a)= ({1,2,3,4,6,7,8}, a) State a b

= {1,a}⋃{2,a} ⋃{a,a} ⋃{4,a}⋃{6,a}⋃{7,a} ⋃{8,a} A B C
= Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ {8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
B B
= {1,2,3,4,6,7,8}=B (Slide No. 55)
ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b ε
4 5
ε
(B, b)= ({1,2,4,5,6,7,8}, b) State a b

={1,b} ⋃{2,b} ⋃{4,b} ⋃{5,b} ⋃{6,b} ⋃{7,b} ⋃{8,b A B C
= Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ{9} (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
B B
ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε
(B, b) = ε -closure (5) ⋃ ε -closure (9) State a b

= {1,2,4,5,6,7,9}=D A B C
(0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
B B D
(1,2,4,5,6,7,9)
SUBSET CONSTRUCTION FOR(a/b)*ab
ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε
(C, a)= ({1,2,4,5,6,7}, a) State a b

= {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} A B C
= Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8 (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)

B B D
= {1,2,3,4,6,7,8}=B (Slide no. 55) (1,2,4,5,6,7,9)
C B
ε
a ε
ε 2 3
ε ε a b
0
1 6 7 8 9
ε b 5 ε
4
ε
(C, b)= ({1,2,4,5,6,7}, b) State a b

= {1,b} ⋃{2,b} ⋃{4,b}⋃{5,b} ⋃{6,b} ⋃{7,b} A B C
= Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57)

B B D
(1,2,4,5,6,7,9)
C B C
ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε
(D, a)= ({1,2,4,5,6,7,9}, a) State a b

= {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} ⋃{9,a} A B C
= Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (3) ⋃ ε -closure (8) B B D
(1,2,4,5,6,7,9)
= {1,2,3,4,6,7,8}=B (Slide no. 55)
C B C
D B
ε
a ε
ε 2 3
ε 1 ε a b
0 6 7 8 9
ε b
4 5 ε
ε
(D, b)= ({1,2,4,5,6,7,9}, b) State a b

= {1,b}⋃{2,b}⋃{4,b}⋃{5,b}⋃{6,b} ⋃{7,b} ⋃{9,b} A B C
= Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
= ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57) B B D
(1,2,4,5,6,7,9)
C B C
D B C
b
C State a b
b A B C
b a (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7)
B B D
a B
A (1,2,4,5,6,7,9)
C B C
a a
b D B C
qD2
➢ Here state A is start state since set
‘A’ has state ‘0’ in its subset which is
Final Output start state in the NFA with
Thompson’s construction.
➢ D is final state since the set D has
state ‘9’ which is final state in the
NFA with Thompson’s Construction
Ε-CLOSURE(T)
push all states of T onto stack

initialize ϵ-closure(T) to T
while (stack is not empty) do
begin
pop t, the top element, off stack;
for (each state u with an edge from t to u labelled ϵ do
begin
if (u is not in ϵ-closure(T)) do
begin
add u to ϵ-closure(T)
push u onto stack
end
end
end
CONVERTING A NFA INTO A DFA (SUBSET CONSTRUCTION)
put -closure({s0}) as an unmarked state into the set of DFA (DS)
while (there is one unmarked S1 in DS) do -closure({s0}) is the set of all states can be accessible
from s0 by -transition.
begin
mark S1 set of states to which there is a transition on
for each input symbol a do a from a state s in S1
begin
S2  -closure(move(S1,a))
if (S2 is not in DS) then
add S2 into DS as an unmarked state
transfunc[S1,a]  S2
end
end
 a state S in DS is an accepting state of DFA if a state s in S is an accepting state of

NFA
 the start state of DFA is -closure({s0})
SECTION 1.5
RE TO DFA THROUGH SYNTAX TREE
METHOD OR DIRECT METHOD
CONVERTING REGULAR EXPRESSIONS DIRECTLY TO
DFAS
 Important state
 We may convert a regular expression into a DFA (without creating a
NFA first).
 First we augment the given regular expression by concatenating it
with a special symbol #.
r ➔ (r)# augmented regular expression
 Then, we create a syntax tree for this augmented regular expression.
 In this syntax tree, all alphabet symbols (plus # and the empty
string) in the augmented regular expression will be on the leaves,
and all inner nodes will be the operators in that augmented regular
expression.
 Then each alphabet symbol (plus #) will be numbered (position
numbers).
FROM REGULAR EXPRESSION TO DFA DIRECTLY:
SYNTAX TREE OF (a/b)*abb#
concatenation
#
6
b
closure 5
b
4
a
* 3
alternation
| position
number
a b (for leafs )
1 2
ANNOTATING THE TREE
 nullable(n): the subtree at node n generates languages

including the empty string
 firstpos(n): set of positions that can match the first symbol
of a string generated by the subtree at node n
 lastpos(n): the set of positions that can match the last
symbol of a string generated by the subtree at node n
 followpos(i): the set of positions that can follow position i
in the tree
FROM REGULAR EXPRESSION TO DFA
DIRECTLY: ANNOTATING THE TREE
Node n nullable(n) firstpos(n) lastpos(n)
Leaf  true  
Leaf i false {i} {i}
| nullable(c1) firstpos(c1) lastpos(c1)

/ \ or ꓴ ꓴ
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if nullable(c1) then if nullable(c2) then
• nullable(c1)
firstpos(c1) ꓴ lastpos(c1) ꓴ
/ \ and
c1 c2 firstpos(c2) lastpos(c2)
nullable(c2)
else firstpos(c1) else lastpos(c2)
*
| true firstpos(c1) lastpos(c1)
c1
SYNTAX TREE OF (a/b)*abb#
{1, 2, 3} {6}
{1, 2, 3} {5} {6} # {6}

6
{1, 2, 3} {4} {5} b {5}

nullable 5
{1, 2, 3} {3} {4} b {4}

4
a {3} firstpos lastpos

{1, 2} {1, 2} {3}
* 3
{1, 2} | {1, 2}
{1} a {1} {2} b {2}

1 2
FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE
Node followpos (a/b)*a b b #

1 {1, 2, 3}
2 {1, 2, 3} 1 2 34 5 6
3 {4}
4 {5}
5 {6}
6 -
FROM RE TO DFA DIRECTLY
(a/b)*a b b #
Let {1,2,3}=A
A,a ({1,2,3},a) followpos (1) ꓴ {1,2,3,4} B 1 2 34 5 6
followpos(3) Node
Symbol followpos
Name
A,b ({1,2,3},b) followpos (2) {1,2,3} A
1 a {1, 2, 3}
B,a ({1,2,3,4},a followpos (1) ꓴ {1,2,3,4} B 2 b {1, 2, 3}
) followpos(3)
3 a {4}
B,b ({1,2,3,4},b followpos (2) ꓴ {1,2,3,5} C 4 b {5}
) followpos(4) 5 b {6}
C,a ({1,2,3,5},a followpos (1) ꓴ {1,2,3,4} B 6 # -
) followpos(3)
State a b
C,b ({1,2,3,5},b followpos (2) ꓴ {1,2,3,6} D
A B A
) followpos(5)
B B C
D,a ({1,2,3,6},a followpos (1) ꓴ {1,2,3,4} B C B D
) followpos(3)
D B A
D,b ({1,2,3,6},b followpos (2) {1,2,3} A
)
FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE
Node followpos
b b
1 {1, 2, 3}
a
2 {1, 2, 3}
start a 1,2, b 1,2, b 1,2,
3 {4} 1,2,3
3,4 3,5 3,6
4 {5} a
5 {6} a
6 -
DIFFERENT DFA’S FOR (a/b)*abb
b
State a b C b
A B C
b a
B B D
a
a b
C B C A B D EE
b
a
D B E a
E B C
b b State a b
a A B A
start a 1,2, b 1,2, b 1,2,
1,2,3 B B C
3,4 3,5 3,6
C A D
a
D B A
FOLLOWPOS
for each node n in the tree do
if n is a cat-node with left child c1 and right child c2 then
for each i in lastpos(c1) do
followpos(i) := followpos(i)  firstpos(c2)
end do
else if n is a star-node
for each i in lastpos(n) do
followpos(i) := followpos(i)  firstpos(n)
end do
end if
end do
ALGORITHM
s0 := firstpos(root) where root is the root of the syntax tree

Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a   do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do
SECTION 1.6
MINIMIZATION OF DFA
Question 1
MINIMIZATION THE FOLLOWING DFA, IF
POSSIBLE
a B
A
b a
a a
b C b
D E
b
b
USING FINAL AND NON FINAL STATE
Divide the entire set of states into two subsets: Set of final
States and set of non final states.
Consider each sub-set as a separate entity and identify if they

need to be split further or can they be combined together
Question 1
DFA MINIMIZATION USING PARTITIONING METHOD
a B
A Stat a b
b a → e
a a A B C
b C b
B B D
D E
b C B C
D B E
b *
E B C
Draw the transition table corresponding to the given DFA

Question 1
Divide the states into two subsets- final and non-final
State a b
→ A B C
B B D
Set of non Final States (NF): {A,B,C, D} C B C
Set of Final States (F): {E} D B E
* E B C
Question 1
Check O/P of all clubbed states (A,B,C,D) with Σ=a
NF= {A,B,C,D}
State a b F= {E}
→ A B C A,B,C
,D
B B D
C B C
E
D B E
* E B C
Question 1
Check O/P of all clubbed states (A,B,C,D) with Σ=b
A,B,C
,D NF= ({A,B,C} {D})
F= {E}
State a b
→ A B C A,B,C D
B B D b
Split into two since
C B C
E {A,B,C} goes on
D B E states within {A,B,C)
while state D goes to
* E B C State {E}
Question 1

Check O/P of all clubbed states (A,B,C) with Σ=a
NF= ({A,B,C}, {D})

State a b
→ A B C A,B,C D
B B D
C B C E
NO SPLIT
D B E
* E B C
Question 1
Check O/P of all clubbed states (A,B,C) with Σ=b
A,B,C
B NF= ({A,C}, {B}
State a b {D})
→ A B C A,C b
B B D
D
C B C Split into two since
{A,C} goes to state
D B E
E {C} while {B} goes
* E B C to State {D} which is
already separated.
Question 1

Check O/P of all clubbed states (A,C) with Σ=a
NO SPLIT
B NF= ({A,C}, {B}

State a b {D})
→ A B C A,C D
B B D
C B C E Both A and C go to
state B which is
D B E already separated
* E B C
Question 1

Check O/P of all clubbed states (A,C) with Σ=b
NO SPLIT
NF= ({A,C}, {B}

{D})
B
State a b Both A and C state
→ go to same group
A B C A,C D {A,C} on Σ=b
B B D
Since subset {A,C}
C B C E remain as single
D B E combined state till
end, both states will
* E B C
be joined together as a
single state
State a b State a b
DFA MINIMIZATION → A B C A,C B A,C
→
USING PARTITIONING METHOD B B D B B D
C B C D B E
D B E * E B A,C
* E B C
a b a
a B a
A A, B
C
b a
a a a a
b C b b
D E E
b D b
b b
Final Output
Question 2
MINIMIZATION THE FOLLOWING DFA, IF POSSIBLE
b
a
a b a
A B C D
a
a b
b b
b b a
E F G H
b a
a
Question 2
b
a
State a b
a b a
C D → A B F
A B
a B G C
a b C A C
b b *
D C G
b b a E H F
E F G H
F C G
b a G G E
a H G C
Draw the transition table corresponding to the given DFA

Question 2
Divide the states into two subsets- final and non-final
State a b
→ A B F
B G C
* C A C
D C G
Set of Non Final States (NF): {A,B,D,E,F,G,H} E H F
Set of Final States (F): {C} F C G
G G E
H G C
Question 2
Check O/P of all clubbed states (A,B,D,E,F,G,H) with Σ=a
State a b A,B,D,E NF= {A,B,E,G,H}, {D,F}

→ A , F,G,H
B F
B G C
* C A C A,B,E, D,F
D C G G,H
E H F
a Split into two since
F C G
{A,B,E,G,H} go to
G G E C state states within its
H G C set while {D,F} goes
to State {C}
Question 2
Check O/P of all clubbed states (A,B,E,G,H) with Σ=a
State a b NO SPLIT
→ A B F
B G C
* C A C A,B,E, D,F
D C G G,H
E H F
a
F C G
G G E C
H G C
Question 2
Check O/P of all clubbed states (A,B,E,G,H) with Σ=b
NF= {A,E},{G},{B,H},{D,F}
State a b A,B,E,
→ A B F G,H
B G C D,F
* C A C A,E
B,H
D C G b
E H F G
F C G b
G G E C
H G C
Question 2
Check O/P of all clubbed states (A,E) with Σ=a
State a b NO SPLIT
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C
* C A C A,E
B,H
D C G
E H F G
D,F
F C G
G G E C
H G C
Question 2
Check O/P of all clubbed states (A,E) with Σ=b
NO SPLIT
State a b
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C D,F
* C A C A,E
B,H
D C G
E H F G
F C G
G G E C
H G C
Question 2
Check O/P of all clubbed states (B,H) with Σ=a
State a b NO SPLIT
→ A B F
B G C NF= {A,E},{G},{B,H},{D,F}
* C A C A,E
B,H
D C G
E H F
G D,F
F C G
G G E C
H G C
Question 2
Check O/P of all clubbed states (B,H) with Σ=b
State a b NO SPLIT
→ A B F
B G C NF= {A,E},{G},{B,H},{D,F}
* C A C A,E
B,H
D C G
E H F
G D,F
F C G
G G E C
H G C
Question 2
Check O/P of all clubbed states (D,F) with Σ=a
State a b NO SPLIT
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C
* C A C A,E
B,H
D C G
E H F
G D,F
F C G a
G G E C
H G C
Question 2
Check O/P of all clubbed states (D,F) with Σ=b
State a b NO SPLIT
→ A B F NF= {A,E},{G},{B,H},{D,F}
B G C
* C A C A,E
B,H D,F
D C G
E H F
F C G G
G G E C
H G C
State a b State a b
DFA MINIMIZATION USING → A

B
B
G
F
C
→
B, H
A,E B,H
G
D,F
C
PARTITIONING METHOD * C
D
A
C
C
G
*
C A,E C
E H F D,F C G
b G G A,E
a
F C G
G G E
H G C
b b
a b a a
A B C D
a a b a D,
A, B,
a b H
C F
b E
b a
b
b b a a
E F G H
a
b a G
a
a b
Final Output
THANKS

CD ch2

Uploaded by

Copyright:

Available Formats

CD ch2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD ch2

Uploaded by

Copyright:

Available Formats

SECTION 1.

 Lexical Analyzer reads the source program character by character to

source Lexical token

Lexical analyzer performs following tasks:

 Removes white spaces and comments from the source program

 Correlates error messages with the source program

 Helps you to expands the macros if it is found in the source program

 Read input characters from the source program

 Token: Token is a sequence of characters that can be treated as a

 Lexeme: A lexeme is a sequence of characters in the source program

 Regular expressions are widely used to specify patterns.

 Alphabet : a finite set of symbols (ASCII characters)

 L13 = all strings with length three (using a,b,c,d)

 L1* = all strings using letters a,b,c,d and empty string

 L1+ = doesn’t include the empty string

 We use regular expressions to describe tokens of a programming

 A regular expression is built up of simpler regular expressions

 Each regular expression denotes a language.

 A language denoted by a regular expression is called as a

Reg. Expr Language it denotes

 A regular definition is a sequence of the definitions of the form:

 Ex: Identifiers in Pascal

 Ex: Unsigned numbers in Pascal

start letter other

 Lex and its newer cousin flex are scanner generators

 Systematically translate regular definitions into C source code

 Generated code is easy to integrate in C applications

 A lex specification consists of three parts:

q2f m/c for a*

m/c for a.b m/c for b.a

NFA for a+b (a/b)

There are two types of finite Automata:

➢ Deterministic Finite Automata (DFA)

➢ Non-deterministic finite Automata (NFA)

Non-Deterministic Finite Automata is a machine where corresponding to a single

i. NFA without ε -move

ii. NFA with ε -move

Consider the following NFA, here corresponding q1 there is an ε-move.

 Deterministic Finite  Non-Deterministic

a* = {ε, a, aa, aaa, aaaa,…..} a

NFA for a* using Thomson’s Construction:

NFA for a* using Thomson’s Construction:

NFA for ab using Thomson’s Construction

NFA for a+b (a/b) using Thomson’s Construction

NFA without Thompson’s

Thompson’s for (a/b)*: ε

Thompson’s for (a/b)*:

NFA without Thompson’s

Steps for ε-Closure function:

➢ ε-Closure(n) will include set of all the states which can be

Start with the start state:

= ε -closure (3) ⋃ ε -closure (8)

(B, a)= ({1,2,3,4,6,7,8}, a) State a b

(B, b)= ({1,2,4,5,6,7,8}, b) State a b

(B, b) = ε -closure (5) ⋃ ε -closure (9) State a b

(C, a)= ({1,2,4,5,6,7}, a) State a b

= ε -closure (3) ⋃ ε -closure (8)

(C, b)= ({1,2,4,5,6,7}, b) State a b

= ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57)