Lesson 10
Lesson 10
Lesson 10
Overview
of
Previous Lesson(s)
Over View
Symbol tables are data structures that are used by compilers to
hold information about source-program constructs.
3
Over View..
A token is a pair consisting of a token name and an optional
attribute value.
4
Over View…
Some definitions:
5
Over View…
Def: A language over an alphabet is a countable set of strings over
the alphabet.
Ex: All grammatical English sentences with five, eight, or twelve
words is a language over ascii.
6
Over View…
Def: A prefix of string S is any string obtained by removing zero or
more symbols from the end of s.
Ex: ban, banana, and ε are prefixes of banana.
7
Over View...
Operations on Languages:
L U D is the set of letters and digits, each of which strings is either one
letter or one digit.
LD is the set of 520 strings of length two, each consisting of one letter
followed by one digit.
L4 is the set of all 4-letter strings.
L * is the set of ail strings of letters, including ε, the empty string.
L(L U D)* is the set of all strings of letters and digits beginning with a
letter.
D+ is the set of all strings of one or more digits.
8
Over View…
A regular expression is a sequence of characters that forms a
search pattern, mainly for use in pattern matching with strings.
9
Over View…
Ex. Let Σ = {a, b}
2. (a|b)(a|b) denotes {aa, ab, ba, bb} , the language of all strings of
length two over the alphabet Σ .
10
Over View…
11
TODAY’S LESSON
12
Contents
Recognition of Tokens
Transition Diagrams
Recognition of Reserved Words and Identifiers
Recognizing Whitespace
Recognizing Numbers
Finite Automata
NFA
Transition Tables
13
Recognition of Tokens
Now we see how to build a piece of code that examines the input
string and finds a prefix that is a lexeme matching one of the
patterns.
Our current goal is to perform the lexical analysis needed for the
following grammar.
Recall that the terminals are the tokens & the nonterminals
produce terminals.
14
Recognition of Tokens..
A regular definition for the terminals is
15
Recognition of Tokens…
We also want the lexer to remove whitespace so we define a new
token
ws → ( blank | tab | newline ) +
where blank, tab, and newline are symbols used to represent the
corresponding ascii characters.
16
Recognition of Tokens..
Our goal for the lexical analyzer is summarized below:
17
Transition Diagram
As an intermediate step in the construction of a lexical analyzer, we
first convert patterns into stylized flowcharts, called "transition
diagrams”.
18
Transition Diagram..
Edges are directed from one state of the transition diagram to
another.
Each edge is labeled by a symbol or set of symbols.
Some important conventions:
The double circles represent accepting or final states at which point a
lexeme has been found. There is often an action to be done (e.g.,
returning the token), which is written to the right of the double circle.
If we have moved one (or more) characters too far in finding the token,
one (or more) stars are drawn.
20
Recognition of Reserved Words and Identifiers
Recognizing keywords and identifiers presents a problem.
The transition diagram below corresponds to the regular definition
given previously.
21
Recognition of Reserved Words and Identifiers
Two questions arises:
We will use the method, i.e having the keywords installed into the
identifier table prior to any invocation of the lexer.
The table entry will indicate that the entry is a keyword.
22
Recognition of Reserved Words and Identifiers..
installID() checks if the lexeme is already in the table. If it is not
present, the lexeme is installed as an id token. In either case a
pointer to the entry is returned.
23
Recognizing Whitespace
Recognizing Whitespace
24
Recognizing Numbers
The transition diagram for token number
25
Finite Automata
Finite automata are like the graphs in transition diagrams but they
simply decide if an input string is in the language (generated by our
regular expression).
Finite automata are recognizers, they simply say "yes" or "no" about
each possible input string.
26
Finite Automata..
So if you know the next symbol and the current state, the next state is
determined. That is, the execution is deterministic, hence the name.
27
N - Finite Automata
A nondeterministic finite automaton (NFA) consists of:
28
N - Finite Automata..
An NFA is basically a flow chart like the transition diagrams we
have already seen.
29
N - Finite Automata...
Ex: The transition graph for an NFA recognizing the language of
regular expression (a | b) * abb
This ex, describes all strings of a's and b's ending in the particular
string abb.
30
Transition Tables
Transition Table is an equivalent way to represent an NFA, in which,
for each state s and input symbol x (and ε), the set of successor
states x leads to from s.