Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
50 views

Lecture#5 - Chap#2 (Syntax Directed Translator (Part-I) )

Here are the solutions to the tasks: a) The string aa+a* can be generated by the following derivation: S → SS+ → aS+ → aa+ → aa+a* b) The parse tree is: S / \ a S+ / \ a S* / \ a ε c) The language generated is {anbm | n,m>=0} (all strings with equal number of a's and b's, where b is + or *). This is because the grammar generates nested pairs of a's separated by + or *.

Uploaded by

ahmadghaffar033
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Lecture#5 - Chap#2 (Syntax Directed Translator (Part-I) )

Here are the solutions to the tasks: a) The string aa+a* can be generated by the following derivation: S → SS+ → aS+ → aa+ → aa+a* b) The parse tree is: S / \ a S+ / \ a S* / \ a ε c) The language generated is {anbm | n,m>=0} (all strings with equal number of a's and b's, where b is + or *). This is because the grammar generates nested pairs of a's separated by + or *.

Uploaded by

ahmadghaffar033
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture# 05

Compiler Construction
A Simple Syntax-Directed Translator,
(Part-I)

by Safdar Hussain

Topics
• Syntax Definition (Structure of Compiler, Tokens vs Terminals, Tree Terminology, Grammars,
Parse Trees, Derivation, Ambiguity, Associativity & Precedence, left-associative grammars,
right-associative grammars, Syntax Statements, Multiple CFG Tasks
Overview
• This chapter contains introductory material
• Building a simple compiler
– Syntax Definition
– Syntax-Directed Translation
– Parsing
– A Translator for Simple Expressions
– The Lexical Analyzer

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 2


Overview
Programming Language can be defined by describing
1. The syntax of the language
a. What its program looks like
b. We use CFG or BNF (Backus Naur Form)

2. The semantics of the language


a. What its program mean
b. Difficult to describe
c. Use informal descriptions and suggestive examples

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 3


The Entire Compilation Process
• Grammars for Syntax Definition
• Syntax-Directed Translation
• Parsing - Top Down & Predictive
• Pulling Together the Pieces
• The Lexical Analysis Process
• Symbol Table Considerations
• A Brief Look at Code Generation
• Concluding Remarks/Looking Ahead

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 4


The Structure of our Compiler

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 5


A model of a Compiler’s Front End

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 6


Grammars for Syntax Definition
• A Context-free Grammar (CFG) Is Utilized to Describe
the Syntactic Structure of a Language
• A CFG Is Characterized By:
Non-Terminals
1. A Set of Tokens or Terminal Symbols
list  list + digit
2. A Set of Non-terminals
list  list - digit
3. A Set of Production Rules list  digit
Each Rule Has the Form digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
NT  {T, NT}*

4. A Non-terminal Designated As the Start Symbol Terminals

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 7


Tokens Versus Terminals
• The lexical analyzer reads the characters of the
source program, groups them into lexically
meaningful units called lexemes, and produces as
output tokens representing these lexemes.
• The token names are abstract symbols that are used
by the parser for syntax analysis. Often, we shall call
these token names terminals, since they appear as
terminal symbols in the grammar for a programming
language.
• In syntax analysis, often the tokens and terminals are
used interchangeably (synonymously).

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 8


Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal (=token)
or 
• Each interior node is labeled by a non-terminal
• If A  X1 X2 … Xn is a production, then node A has
children X1, X2, …, Xn where Xi is a (non)terminal or 
( denotes the empty string)

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 9


Tree Terminology

Thanks to Dragon Book

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 10


Grammars for Syntax Definition
Example CFG
list  list + digit
list  list - digit
list  digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(the “|” means OR)
(So we could have written
list  list + digit | list - digit |digit )

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 11


Example CFG (Function call)

call  id ( optparams )
optparams  params | 
params  params, param |param

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 12


Key Terms
 A string of tokens is a sequence of zero or more tokens.
 The string containing with zero tokens, written as , is
called empty string.
 A grammar derives strings by beginning with the start
symbol and repeatedly replacing the non terminal by
the right side of a production for that non terminal.
 The token strings that can be derived from the start
symbol form the language defined by the grammar.

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 13


Grammars are Used to Derive Strings:
Using the CFG defined on the earlier slide, we can
derive the string: 9 - 5 + 2 as follows:
list  list + digit P1 : list  list + digit
 list - digit + digit P2 : list  list - digit
 digit - digit + digit P3 : list  digit
 9 - digit + digit P4 : digit  9
 9 - 5 + digit P4 : digit  5
 9-5+2 P4 : digit  2

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 14


Grammars are Used to Derive Strings:
This derivation could also be represented via a Parse Tree
(parents on left, children on right)
list  list + digit list
 list - digit + digit
list + digit
 digit - digit + digit
- 2
 9 - digit + digit list digit
 9 - 5 + digit 5
digit
 9-5+2
9

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 15


Ambiguity
Consider the following Context Free Grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
With productions p =
string  string + string | string - string | 0 | 1 | … | 9
This grammar is ambiguous, because more than one parse
tree generates the string 9-5+2

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 16


Ambiguity Continued
string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2

First Parse Tree Second Parse Tree

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 17


Associativity of Operators
Left-associative operators have left-recursive productions:
left  left + term | term
String a+b+c has the same meaning as (a+b)+c

Right-associative operators have right-recursive


productions:
right  term = right | term
String a=b=c has the same meaning as a=(b=c)

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 18


Parse trees for left & right-associative
grammars

Parse tree for Left-associative grammar Parse tree for right-associative grammar

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 19


Precedence of Operators
Operators with higher precedence “bind more tightly”:
expr  expr + term | term
term  term * factor | factor BoDMAS Rule
factor  number | ( expr )

String 2+3*5 has the same meaning as 2+(3*5)

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 20


Syntax of Statements
stmt  id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts  stmt ; opt_stmts
|

Note: Opt stands for optional

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 21


More Complex Grammars
block  begin opt_stmts end
opt_stmts  stmt_list | 
stmt_list  stmt_list ; stmt | stmt

What is this grammar for ?


What does “” represent ?
What kind of production rule is this ?

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 22


Defining a Parse Tree
• A parse tree pictorially shows how the start symbol of a
grammar derives a string in the language.
• More formally, a parse tree for a CFG has the following
properties:
 Root is labeled with the Start Symbol
 Leaf Node is a Token or 
 Interior Node is a Non-Terminal
 If A  X1X2…Xn, Then A Is an Interior; X1X2…Xn Are Children
of A and May Be Non-Terminals or Tokens

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 23


Problem & Solution (CFG Tasks)
Consider the context-free grammar
SSS+|SS*|a
a) Show how the string aa+a* can be generated by this grammar.
b) Construct a parse tree for this string
c) What language does this grammar generate? Justify your
answer.

?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 24
Tasks Solution
Consider the context-free grammar
SSS+|SS*|a
a) Show how the string aa+a* can be generated by this grammar.
b) Construct a parse tree for this string
c) What language does this grammar generate? Justify your
answer.
Answers
(a)- string generation (b)- parse tree (c)- Grammar’s language
S  SS*
S SS+S* L = {Postfix expression
S aS+S* consisting of digits, plus and
S aa+S* multiple signs}
S aa+a*
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 25
Problem & Solution (CFG Tasks)
What language is generated by the following
grammars? In each case justify your answer.
a) S  0S1 | 01
b) S  +SS | -SS | a
c) S  S(S)S | Ԑ

?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 26
Tasks Solution
What language is generated by the following grammars? In each
case justify your answer.
a) S  0S1 | 01
b) S  +SS | -SS | a
c) S  S(S)S | Ԑ
Q# Languages
a L = {0n 1n | n>=1}

b L = {Prefix expression consisting of plus and minus signs}

c L = {Matched brackets of arbitrary arrangement and nesting, includes Ԑ}

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 27


Problem & Solution (CFG Tasks)
Consider the following grammars:
a) S  0S1 | 01
b) S  +SS | -SS | a
c) S  S(S)S | Ԑ
Which of the grammars are ambiguous?

?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 28
Tasks Solution
Consider the following grammars: Which of the grammars
are ambiguous?
Q# Grammar Ambiguous? c
a S  0S1 |01 No
b S  +SS | -SS | a No
For String = ()(), it can generate 2
trees

c S  S(S)S | Ԑ Yes

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 29


Problem & Solution (CFG Tasks)
Construct unambiguous context-free grammars for each of the
following languages. In each case show that your grammar is
correct.
a) Arithmetic expressions in postfix notation.
b) Left-associative lists of identifiers separated by commas.
c) Right-associative lists of identifiers separated by commas.

?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 30
Tasks Solution
Construct unambiguous context-free grammars for each of the
following languages. In each case show that your grammar is
correct.
a) Arithmetic expressions in postfix notation.
b) Left-associative lists of identifiers separated by commas.
c) Right-associative lists of identifiers separated by commas.
Q# Unambiguous CFGs
a E  E E op | num

b list  list , id |id

c list  id , list |id

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 31


Problem & Solution (CFG Task)
Construct a context-free grammar for roman numerals.

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 32


Task Hint
Decimal Roman Decimal Roman
1 I 11 XI

2 II 12 XII

3 III 13 XIII

4 IV 14 XIV

5 V 15 XV

6 VI 16 XVI

7 VII 17 XVII

8 VIII 18 XVIII

9 IX 19 XIX

10 X 20 XX

Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 33


The End

34

You might also like