Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lexi Cal

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Lexical Analysis

Find the FIRST and FOLLOW

• S -> ABCD | Є
• A -> a | Є
• B-> bA
• C -> a | Є
• D -> d
• The task of analyzing the syntax is divided into two parts.
– Lexical – deals with small-scale language constructs, such as
names and numeric literals.
– Syntax – deals with large-scare constructs, such as expressions,
statements, and program units.

• Reason for separating it.

– Simplicity: Less complex.
– Efficiency: Allows optimization of lexical analyser
– Portability: machine dependent and independent
Lexical Analyzer
• A pattern matcher
• To find a substring of a given string of
characters that matches a given character
Lexical Analyzer
• Token
• Pattern
• Lexeme

printf(“Total = %d\n”,score);

Id – printf & score (Lexemes)

Literal - “Total = %d\n” (Lexeme)
• Keyword
• Operator
• Identifiers
• Constant - Numbers & Literals
• Punctuation Symbols – Left & Right
parenthesis, comma and semicolon.
Lexical Analyzer Responsibilities
• Lexical analyzer [Scanner]
– Scan input
– Remove white spaces
– Remove comments
– Manufacture tokens
– Generate lexical errors
– Pass token to parser

Two process
• Scanner – deletion of comments, compaction
of consecutive whitespace characters into

• Lexical analysis – Producing tokens from the

output of the scanner.
Attributes for Tokens
E = M * C **2

<id, pointer to symbol table>

<id, pointer to symbol table>
Lexical Errors
• Very hard for the Lexical analyzer to tell that
there is an error in the code without the aid of
the other components.

• fi(a==f(x))

• How do LA know that the if is written as fi or fi

is an undefined identifier.
Lexical Errors
• However, in some situation the LA is unable to
proceed because none of the patterns for
tokens matches any prefix of the remaining

• Simplest recovery strategy is “panic mode”

recovery. (Delete the successive characters
until the LA can find a well –formed token).
Other Possible recovery actions
• Delete one character from the remaining input

• Insert a missing character into the remaining


• Replace a character by another character.

• Transpose two adjacent characters.

Tricky problems in Token recognition

DO index variable = start, end, step


Or equivalent

do 100 n=2,10,1
100 nfac=nfac*n
Tricky problems in Token recognition

• Assignment
DO 5 I = 1.25

do loop
• DO 5 I = 1,25
Input Buffering
• Two – Buffer Scheme
Input Buffering
• Examining ways of speeding reading the source program
– In one buffer technique, the last lexeme under process will be over-written when we
reload the buffer.
– Two-buffer scheme handling large look ahead safely
Buffer Pairs
• Two buffers of the same size, say 4096, are alternately reloaded.
• Two pointers to the input are maintained:
– Pointer lexeme_Begin marks the beginning of the current
– Pointer forward scans ahead until a pattern match is found.
Regular Expression
• Describing all the languages that can be built
from these operators applied to the symbols
of some alphabet.


RE are built recursively out of smaller re, using

the rules
Specification of Patterns for Tokens:
• An alphabet  is a finite set of symbols
• A string s is a finite sequence of symbols from

– s denotes the length of string s
–  denotes the empty string, thus  = 0
• A language is a specific set of strings over
some fixed alphabet 

Specification of Patterns for Tokens: String
• The concatenation of two strings x and y is
denoted by xy
• The exponentation of a string s is defined by

s0 =  (Empty string: a string of length zero)

si = si-1s for i > 0

note that s = s = s

Recognition of Tokens
Transition Diagrams
• Patterns -> Stylished flow charts

• Lexeme Begin and forward.

start < =
0 1 2 return(relop, LE)
3 return(relop, NE)

= 4
return(relop, LT)

5 return(relop, EQ)

6 7 return(relop, GE)
8 return(relop, GT)
Two More...
id :
letter or digit

start letter other *

9 10 11

delim :

start delim other *

28 29 30

RE to Automata
• The DFA for a(b|c)*
Example #2: Applying Minimization
Example # 4
• Minimize the following DFA:

b a
b a
a b b start a b b
a a
a b a
From Regular Expression to DFA Directly

• The “important states” of an NFA are those

without an -transition, that is if
move({s},a)   for some a then s is an
important state
• The subset construction algorithm uses only
the important states when it determines

From Regular Expression to DFA Directly
• Augment the regular expression r with a
special end symbol # to make accepting states
important: the new expression is r#
• Construct a syntax tree for r#
• Traverse the tree to construct functions
nullable, firstpos, lastpos, and followpos

From Regular Expression to DFA Directly:
Syntax Tree of (a|b)*abb#


closure 5
* 3

| number
(for leafs )
a b
1 2
From Regular Expression to DFA Directly:
Annotating the Tree
• nullable(n): the sub tree at node n generates languages
including the empty string

• firstpos(n): set of positions that can match the first

symbol of a string generated by the sub tree at node n

• lastpos(n): the set of positions that can match the last

symbol of a string generated be the sub tree at node n

• followpos(i): the set of positions that can follow position

i in the tree
From Regular Expression to DFA Directly:
Annotating the Tree
Node n nullable(n) firstpos(n) lastpos(n)

Leaf  true  

Leaf i false {i} {i}

| nullable(c1) firstpos(c1) lastpos(c1)

/ \ or  
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if nullable(c1) then if nullable(c2) then
• nullable(c1)
firstpos(c1)  lastpos(c1) 
/ \ and
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
else firstpos(c1) else lastpos(c2)
| true firstpos(c1) lastpos(c1)
c1 33
From Regular Expression to DFA Directly:
Syntax Tree of (a|b)*abb#

{1, 2, 3} {6}

{1, 2, 3} {5} {6} # {6}

{1, 2, 3} {4} {5} b {5}
nullable 5
{1, 2, 3} {3} {4} b {4}
firstpos lastpos
{1, 2} {1, 2} {3} a {3}
* 3

{1, 2} | {1, 2}

{1} a {1} {2} b {2} 34

1 2
From Regular Expression to DFA Directly:

for each node n in the tree do

if n is a cat-node with left child c1 and right child c2 then
for each i in lastpos(c1) do
followpos(i) := followpos(i)  firstpos(c2)
end do
else if n is a star-node
for each i in lastpos(n) do
followpos(i) := followpos(i)  firstpos(n)
end do
end if
end do

From Regular Expression to DFA Directly:
s0 := firstpos(root) where root is the root of the syntax tree
Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a   do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do

From Regular Expression to DFA Directly:

Node followpos
1 {1, 2, 3} 1
2 {1, 2, 3} 3 4 5 6
3 {4}
4 {5}
5 {6}
6 -

b b
start a 1,2, b 1,2, b 1,2,
3,4 3,5 3,6
a 37
Thank You

You might also like