Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ch41

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Chapter 4 Syntax Analysis Compiler Design

CHAPTER FOUR
Context-free grammars

 Like regular expressions, context-free grammars describe sets of strings, i.e., languages.
 Additionally, a context-free grammar also defines structure on the strings in the language it defines.
 A language is defined over some alphabet, for example the set of tokens produced by a lexer or the set
of alphanumeric characters.
 The symbols in the alphabet are called terminals.
 A context-free grammar recursively defines several sets of strings.
 Each set is denoted by a name, which is called a nonterminal. The set of nonterminals is disjoint from
the set of terminals.
 One of the nonterminals are chosen to denote the language described by the grammar. This is called
the start symbol of the grammar.

The sets are described by a number of productions. Each production describes some of the possible strings that
are contained in the set denoted by a nonterminal A production has the form
N -> X1......Xn

where N is a nonterminal and X1 : : :Xn are zero or more symbols, each of which is either a terminal or a
nonterminal.

In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production
rule is of the form

V→w

where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty).

A formal grammar is considered "context free" when its production rules can be applied regardless of the
context of a nonterminal.

It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side
can always be replaced by the right hand side.

Derivation:

The basic idea of derivation is to consider productions as rewrite rules:

1) Whenever we have a nonterminal, we can replace it by the right-hand side of any production in which
the nonterminal appears on the left-hand side. W
2) We can do this anywhere in a sequence of symbols (terminals and nonterminals) and repeat doing so
until we have only terminals left.

Derivations and the language defined by a grammar

How grammar rules determine a "language," or set of legal strings of tokens.


(34-3)*42 corresponds to the legal string of seven tokens

1
Chapter 4 Syntax Analysis Compiler Design

(number - number ) * number

(34-3*42 is not a legal expression,


because there is a left parenthesis that is not matched by a right parenthesis and the second choice in the
grammar rule for an exp requires that parentheses be generated in pairs.

Derivatrion:
Grammar rules determine the legal strings of token symbols by means of derivations.

A derivation is a sequence of replacements of structure names by choices on the right-hand sides of


grammar rules.

A derivation begins with a single structure name and ends with a string of token symbols.

At each step in a derivation, a single replacement is made using one choice from a grammar rule.
exp  exp op exp | (exp) | number
op  + | – | *

Figure 3.1 : a derivation


(1) exp => exp op exp [exp  exp op exp]
(2) => exp op number [exp  number]
(3) => exp * number [op * ]
(4) => ( exp ) * number [exp ( exp ) ]
(5) =>{ exp op exp ) * number [exp  exp op exp}
(6) => (exp op number) * number [exp number]
(7) => (exp - number) * number [op  - ]
(8) => (number - number) * number [exp  number]

derivation steps use a different arrow from the arrow meta-symbol in the grammar rules. Because
grammar rules define and derivation steps construct by replacement.
The set of all strings of token symbols
L(G) = { s | exp =>* s } obtained by derivations from the exp
symbol is the language defined by the
grammar of expressions.
(1) G represents the expression grammar
(2) s represents an arbitrary string of token symbols (sometimes called a sentence)
(3) The symbols =>* stand for a derivation consisting of a sequence of replacements as described earlier.
(The asterisk is used to indicate a sequence of steps, much as it indicates repetition in regular
expressions.)
(4) Grammar rules are sometimes called productions because they "produce" the strings in L(G) via
derivations.
Example 3: A CFG for ab* = { a, ab, abb, abbb, abbbb, . . . . }
1. Terminals: ∑ = {a, b},
2. Nonterminal: N = {S, B}
3. Productions: P = {
S -> aB
B -> bB | ^

2
Chapter 4 Syntax Analysis Compiler Design

DERIVATION of abbb using the CFG of example 3:

S => aB => abB => abbB => abbbB => abbb


Most of the time the set of productions are explicitly given for a CFG and the terminals and the non-terminals
are understood from the context as shown in examples 4-6 :

Example 4: A CFG for aba* = { ab, aba, abaa, abaaa, . . . . }

S -> abA
A -> aA | ^

DERIVATION of abaaa using the CFG of example 4:


S => abA => abaA => abaaA => abaaaA => abaaa

Example 5: A CFG for ab*a = { aa, aba, abba, abbba, . . . . }

S -> aBa
B -> aB | ^

DERIVATION of abbbba using the CFG of example 5:


S => aBa => abBa => abbBa => abbbBa => abbbbBa => abbbba

Example 6: A CFG for { ancbn :n > 0} = { acb, aacbb, aaacbbb, . . . . }

S -> aSb | acb

DERIVATION of aaacbbb using the CFG of example 6:


S => aSb => aaSbb => aaacbbb

Definition 4.1 Given a context-free grammar G with start symbol S, terminal symbols
T and productions P, the language L(G) that G generates is defined to be the
set of strings of terminal symbols that can be obtained by derivation from S using the productions P

TR
T aTc
R
R  RbR

Grammar 4.4: Example grammar

T
 aTc
 aaTcc
 aaRcc
 aaRbRcc
 aaRbcc

3
Chapter 4 Syntax Analysis Compiler Design

 aaRbRbcc
aaRbRbRbcc
 aaRbbRbcc
aabbRbcc
aabbbcc Derivation of the string aabbbcc using grammar

T
aTc
aaTcc
aaRcc
aaRbRcc
aaRbRbRcc
aabRbRcc
aabRbRbRcc
aabbRbRcc
 aabbbRcc
 aabbbcc Leftmost derivation of the string aabbbcc using grammar

Example

1) A context-free grammar
S  aSb
S 
A derivation

S  aSb  aaSbb  aabb


2) A context-free grammar :
S  aSb
S 
Another derivation

S  aSb  aaSbb  aaaSbbb  aaabbb

3) A context-free grammar :S  aSa


S  bSb
A derivation:
S 
S  aSa  abSba  abba
4) A context-free grammar :S  aSa
S  bSb
S 
4
Chapter 4 Syntax Analysis Compiler Design

Another derivation
S  aSa  abSba  abaSaba  abaaba
Definition: Context-Free Grammars

Derivation Order

There are two types:

1) A leftmost derivation: a derivation in which the leftmost nonterminal is replaced at each step in
the derivation. Corresponds to the preorder numbering of the internal nodes of its associated parse
tree.

2) A rightmost derivation: a derivation in which the rightmost nonterminal is replaced at each step
in the derivation. Corresponds to the postorder numbering of the internal nodes of its associated
parse tree.

Example 1:

Given Grammar is,

1. S  AB 2. A  aaA 4. B  Bb
3. A  5. B  
1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab
Rightmost derivation:

5
Chapter 4 Syntax Analysis Compiler Design

1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
example 2: S  aAB
A  bBb
B  A|
Leftmost derivation:

S  aAB  abBbB  abAbB  abbBbbB


 abbbbB  abbbb
Rightmost derivation:

S  aAB  aA  abBb  abAb


 abbBbb  abbbb
Derivation Trees
We can draw a derivation as a tree:
 The root of the tree is the start symbol of the grammar, and whenever we rewrite a nonterminal we add
as its children the symbols on the right-hand side of the production that was used.
 The leaves of the tree are terminals which, when read from left to right, form the derived string.
 If a nonterminal is rewritten using an empty production, an e is shown as its child.
 This is also a leaf node, but is ignored when reading the string from the leaves of the tree.
 Can use a tree to illustrate how a string is derived from a CFG.
 Definition: These trees are called syntax trees, parse trees, generation trees,
production trees, or derivation trees.

Steps to create Derivation Tree from given CFG

S  AB A  aaA |  B  Bb | 
S  AB

S  AB A  aaA |  B  Bb |  S  AB  aaAB

6
Chapter 4 Syntax Analysis Compiler Design

7
Chapter 4 Syntax Analysis Compiler Design

Ambiguity:

The syntax tree adds structure to the string that it derives. It is this structure that we exploit in the later phases
of the compiler.
For compilation, we do the derivation backwards:
 We start with a string and want to produce a syntax tree. This process is called syntax analysis or
parsing.
 Even though the order of derivation does not matter when constructing a syntax tree, the choice of
production for that nonterminal does.
 Obviously, different choices can lead to different strings being derived, but it may also happen that
several different syntax trees can be built for the same string.
 When a grammar permits several different syntax trees for some strings we call the grammar
ambiguous.

8
Chapter 4 Syntax Analysis Compiler Design

 If our only use of grammar is to describe sets of strings, ambiguity is not a problem. However, when
we want to use the grammar to impose structure on strings, the structure had better be the same every
time.
 Hence, it is a desireable feature for a grammar to be unambiguous. In most (but not all) cases, an
ambiguous grammar can be rewritten to an unambiguous grammar that generates the same set of
strings, or external rules can be applied to decide which of the many possible syntax trees is the “right
one”.

Example

E  E  E | E  E | (E) | a
a  aa

E E  E  E  a E  a EE
 a  a E  a  a*a
E  E
Right most derivation

a E  E

a a 31

E  E  E | E  E | (E) | a
a  aa

E  EE  E  EE  a EE E


 a  aE  a  aa
E  E
leftmost derivation

E  E a

a a 32

9
Chapter 4 Syntax Analysis Compiler Design

E EE | E  E | (E) | a
a  aa
Two derivation trees
E E

E  E E  E

a E  E E  E a

a a a a 33

The grammar E  E  E | E  E | (E) | a


is ambiguous:

string a  a  a has two derivation trees

E E

E  E E  E

a E  E E  E a

a a a a
34

10

You might also like