Ch41
Ch41
Ch41
CHAPTER FOUR
Context-free grammars
Like regular expressions, context-free grammars describe sets of strings, i.e., languages.
Additionally, a context-free grammar also defines structure on the strings in the language it defines.
A language is defined over some alphabet, for example the set of tokens produced by a lexer or the set
of alphanumeric characters.
The symbols in the alphabet are called terminals.
A context-free grammar recursively defines several sets of strings.
Each set is denoted by a name, which is called a nonterminal. The set of nonterminals is disjoint from
the set of terminals.
One of the nonterminals are chosen to denote the language described by the grammar. This is called
the start symbol of the grammar.
The sets are described by a number of productions. Each production describes some of the possible strings that
are contained in the set denoted by a nonterminal A production has the form
N -> X1......Xn
where N is a nonterminal and X1 : : :Xn are zero or more symbols, each of which is either a terminal or a
nonterminal.
In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production
rule is of the form
V→w
where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty).
A formal grammar is considered "context free" when its production rules can be applied regardless of the
context of a nonterminal.
It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side
can always be replaced by the right hand side.
Derivation:
1) Whenever we have a nonterminal, we can replace it by the right-hand side of any production in which
the nonterminal appears on the left-hand side. W
2) We can do this anywhere in a sequence of symbols (terminals and nonterminals) and repeat doing so
until we have only terminals left.
1
Chapter 4 Syntax Analysis Compiler Design
Derivatrion:
Grammar rules determine the legal strings of token symbols by means of derivations.
A derivation begins with a single structure name and ends with a string of token symbols.
At each step in a derivation, a single replacement is made using one choice from a grammar rule.
exp exp op exp | (exp) | number
op + | – | *
derivation steps use a different arrow from the arrow meta-symbol in the grammar rules. Because
grammar rules define and derivation steps construct by replacement.
The set of all strings of token symbols
L(G) = { s | exp =>* s } obtained by derivations from the exp
symbol is the language defined by the
grammar of expressions.
(1) G represents the expression grammar
(2) s represents an arbitrary string of token symbols (sometimes called a sentence)
(3) The symbols =>* stand for a derivation consisting of a sequence of replacements as described earlier.
(The asterisk is used to indicate a sequence of steps, much as it indicates repetition in regular
expressions.)
(4) Grammar rules are sometimes called productions because they "produce" the strings in L(G) via
derivations.
Example 3: A CFG for ab* = { a, ab, abb, abbb, abbbb, . . . . }
1. Terminals: ∑ = {a, b},
2. Nonterminal: N = {S, B}
3. Productions: P = {
S -> aB
B -> bB | ^
2
Chapter 4 Syntax Analysis Compiler Design
S -> abA
A -> aA | ^
S -> aBa
B -> aB | ^
Definition 4.1 Given a context-free grammar G with start symbol S, terminal symbols
T and productions P, the language L(G) that G generates is defined to be the
set of strings of terminal symbols that can be obtained by derivation from S using the productions P
TR
T aTc
R
R RbR
T
aTc
aaTcc
aaRcc
aaRbRcc
aaRbcc
3
Chapter 4 Syntax Analysis Compiler Design
aaRbRbcc
aaRbRbRbcc
aaRbbRbcc
aabbRbcc
aabbbcc Derivation of the string aabbbcc using grammar
T
aTc
aaTcc
aaRcc
aaRbRcc
aaRbRbRcc
aabRbRcc
aabRbRbRcc
aabbRbRcc
aabbbRcc
aabbbcc Leftmost derivation of the string aabbbcc using grammar
Example
1) A context-free grammar
S aSb
S
A derivation
Another derivation
S aSa abSba abaSaba abaaba
Definition: Context-Free Grammars
Derivation Order
1) A leftmost derivation: a derivation in which the leftmost nonterminal is replaced at each step in
the derivation. Corresponds to the preorder numbering of the internal nodes of its associated parse
tree.
2) A rightmost derivation: a derivation in which the rightmost nonterminal is replaced at each step
in the derivation. Corresponds to the postorder numbering of the internal nodes of its associated
parse tree.
Example 1:
1. S AB 2. A aaA 4. B Bb
3. A 5. B
1 2 3 4 5
S AB aaAB aaB aaBb aab
Rightmost derivation:
5
Chapter 4 Syntax Analysis Compiler Design
1 4 5 2 3
S AB ABb Ab aaAb aab
example 2: S aAB
A bBb
B A|
Leftmost derivation:
S AB A aaA | B Bb |
S AB
S AB A aaA | B Bb | S AB aaAB
6
Chapter 4 Syntax Analysis Compiler Design
7
Chapter 4 Syntax Analysis Compiler Design
Ambiguity:
The syntax tree adds structure to the string that it derives. It is this structure that we exploit in the later phases
of the compiler.
For compilation, we do the derivation backwards:
We start with a string and want to produce a syntax tree. This process is called syntax analysis or
parsing.
Even though the order of derivation does not matter when constructing a syntax tree, the choice of
production for that nonterminal does.
Obviously, different choices can lead to different strings being derived, but it may also happen that
several different syntax trees can be built for the same string.
When a grammar permits several different syntax trees for some strings we call the grammar
ambiguous.
8
Chapter 4 Syntax Analysis Compiler Design
If our only use of grammar is to describe sets of strings, ambiguity is not a problem. However, when
we want to use the grammar to impose structure on strings, the structure had better be the same every
time.
Hence, it is a desireable feature for a grammar to be unambiguous. In most (but not all) cases, an
ambiguous grammar can be rewritten to an unambiguous grammar that generates the same set of
strings, or external rules can be applied to decide which of the many possible syntax trees is the “right
one”.
Example
E E E | E E | (E) | a
a aa
E E E E a E a EE
a a E a a*a
E E
Right most derivation
a E E
a a 31
E E E | E E | (E) | a
a aa
E E a
a a 32
9
Chapter 4 Syntax Analysis Compiler Design
E EE | E E | (E) | a
a aa
Two derivation trees
E E
E E E E
a E E E E a
a a a a 33
E E
E E E E
a E E E E a
a a a a
34
10