Describing a Programming
Language
Chapter 3
Rearranged by 14203100
Language and Sentence
• A language is a set of strings of characters from some alphabet.
• The strings of a language are called sentences or statements. These
small units are called lexemes which includes numeric literals,
operators and special words. Each lexeme group is represented by a
name or token.
Chapter 3: Describing a Programming Langauge
2
Syntax and Semantic
• Syntax: Syntax is a set of rules for grammar and spelling which
specify the correct combined sequence of symbols that can be used to
form a correctly structured program.
• Semantic: is the meaning of programming languages. Semantic is
used to describe those expressions, statements and program units.
Chapter 3: Describing a Programming Langauge
3
Regular Expression
• Each Regular Expression (RE) corresponds to a regular langauge.
The regular expressions over are the smallest set of expressions including
`c`
A+B
AB
A*
where c
where A, B are RE over
where A, B are RE over
where A is a RE over
Chapter 3: Describing a Programming Langauge
4
Backus-Naur Form (BNF)
• BNF is a metalanguage that is use to describe programming langauges.
BNF uses abstractions for syntactic structures. For example; Java
assignment statement might be represented by the abstraction <assign>
and definition of <assign> can be given by
<assign> → <var> = <expression>
The abstractions in BNF grammar are often called non-terminal symbols
or simply non-terminals and the lexemes and tokens of the rules are
called terminal symbols or simply terminals.
Chapter 3: Describing a Programming Langauge
5
• A grammar is a finite nonempty set of rules
• A rule has one left hand symbol (LHS) and can have more than one right
hand symbols
<program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts>
<stmt> <var> = <expr>
<var> a | b | c | d
<expr> <term> + <term> | <term> - <term>
<term> <var> | const
Chapter 3: Describing a Programming Langauge
6
BNF Rules
• A rule is recursive if its LHS appears in its RHS.
• The following rules illustrate how recursion is used to describe lists:
<ident_list> → identifier
| identifier, <ident_list>
Chapter 3: Describing a Programming Langauge
7
Derivation
• Derivation is the process of generating sentence by repeating application of
rules, starting from the start symbol. A leftmost derivation is one in which
the leftmost nonterminal in each sentential form is the one that is expanded.
<program>
=> <stmts> => <stmt>
=> <var> = <expr> => a =<expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Chapter 3: Describing a Programming Langauge
8
Grammars and Derivations
Chapter 3: Describing a Programming Langauge
9
Chapter 3: Describing a Programming Langauge
10
Chapter 3: Describing a Programming Langauge
11
Parse Tree
<program>
• A parse tree is the hierarchical
representation of a derivation. Every
internal node of a parse tree is
labeled with a non-terminal symbol.
Every leaf is labeled with a terminal
symbol.
<stmts>
<stmt>
<var>
=
<expr>
a <term>
<var>
+
<term>
const
b
Chapter 3: Describing a Programming Langauge
12
Parse Tree of A=B*(A+C)
Chapter 3: Describing a Programming Langauge
13
Show a parse tree and a leftmost derivation
for A := A * (B + (C * A))
<assign> --> <id> := <expr>
A := <expr>
A := <id> * <expr>
A := A * <expr>
A := A * (<expr>)
A := A * ( <id> + <expr> )
A := A * ( B + <expr> )
A := A * ( B + ( <expr> ) )
A := A * ( B + ( <id> * <expr> ) )
A := A * ( B + ( C * <expr> ) )
A := A * ( B + ( C * <id> ) )
A := A * ( B + ( C * A ) )
Chapter 3: Describing a Programming Langauge
14
A = A + (B + (C * A))
< assign > = > <id> = <expr>
= > A = <expr>
= > A = <id> + <expr>
= > A = A + <expr>
= > A = A + ( <expr> )
= > A = A + ( <id> + <expr> )
= > A = A + ( B + <expr> )
= > A = A + ( B + ( <expr> ) )
= > A = A + ( B + ( <id> * <expr> ) )
= > A = A + ( B + ( C * <expr> ) )
= > A = A + ( B + ( C * <id> ) )
=>A=A+( B + ( C *A))
Chapter 3: Describing a Programming Langauge
15
Ambiguity
• A grammar is ambiguous if and if it generates a sentential form that has two or
more distinct parse trees. For example, the following grammar can generate two
distinct parse tree.
<expr> <expr> <op> <expr> | const
<op> / | <expr>
<expr>
<expr>
<expr>
const
<op>
-
<op> <expr>
<expr> <op>
<expr>
const
/
const
Chapter 3: Describing a Programming Langauge
const
<expr>
<expr> <op>
<expr>
-
const
const /
16
Removing Ambiguity
• Associativity: when an expression includes two operators that have same
precedence - for example, A / B * C—a semantic rule is required to specify which
should have precedence. This process is known as associativity.
• Precedence: when an expression includes two different operators, for example, x +
y * z, assigning different levels to operators is important. For example; the
multiplication operator is generated lower in the tree, which could indicate that it
has precedence over the addition operator in the expression.
• ambiguous:
<expr> -> <expr> + <expr> | const
• unambiguous:
<expr> -> <expr> + <term> | <term>
<term> <term> / const| const
Chapter 3: Describing a Programming Langauge
17
Ambiguous Grammar Example
Prove that the following grammar is ambiguous:
<S> -> <A>
<A> -> <A> + <A> | <id>
<id> -> a | b | c
There are two different parse trees for many expressions, for example, a + b + c
Chapter 3: Describing a Programming Langauge
18
Unambiguous Grammar for if-then-else
<if_stmt> -> if <logic_expr> then <stmt>
if <logic_expr> then <stmt> else <stmt>
Chapter 3: Describing a Programming Langauge
19
Extended BNF
• Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed inside parentheses and separated
via vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed inside braces { }
<ident> → letter {letter|digit}
Chapter 3: Describing a Programming Langauge
20
BNF and EBNF
BNF
<expr> <expr> + <term>
| <expr> - <term>
| <term>
<term> <term> * <factor>
| <term> / <factor>
| <factor>
Chapter 3: Describing a Programming Langauge
<expr> <term> {(+ | -) <term>}
EBNF
<term> <factor> {(* | /)
<factor>}
21
Extensions in EBNF
• Three extensions are commonly included in the various versions of
EBNF.
• The first extension denotes an optional part of an RHS, which is
delimited by brackets.
• The second extension is the use of the brackets in an RHS to indicate
that the enclosed part can be repeated indefinitely or left out
altogether.
• And the third extension deals with multiple-choice options.
Chapter 3: Describing a Programming Langauge
22
Some Examples
Consider the grammar given below
(d) Draw a parse tree for the sentence (x).
<pop> ::= [ <bop> , <pop> ] | <bop>
<bop> ::= <boop> | ( <pop> )
<boop> ::= x | y | z
(a) What are the nonterminal symbols?
<pop> <bop> <boop>
(b) What are the terminal symbols?
[ ] , ( ) x y z
(c) What is the start symbol?
<pop>
Chapter 3: Describing a Programming Langauge
23
(e)
Draw a parse tree for the sentence [(x),[y,x]].
Chapter 3: Describing a Programming Langauge
24
Describe, in English the language defined by the following grammar.
<S> -> <A> <B> <C>
<A> -> a <A> | a
<B> -> b <B> | b
<C> -> c <C> | c
<A> will generate one or more consecutive a's
<B> will generate one or more consecutive b's
<C> will generate one or more consecutive c's
So <A><B><C> will generate
One or more a's followed by one or more b's followed by one or more c's
E.g. aaaaabbbccccccc
Also <S> is start symbol for this grammar.
Chapter 3: Describing a Programming Langauge
25
Which of the following sentences are in the language generated by this grammar?
a.
baab
b. bbbab
c.
bbaaaaa
d. bbaab
<A> will generate one or more consecutive b's
<B> will generate one or more consecutive a's
So <A> a <B> b will generate
One or more b's followed by One a followed by One or more a's followed by a b
which is the same as
One or more b's followed by Two a's followed by a b
Which matches a and d
Chapter 3: Describing a Programming Langauge
26
Consider the following grammar
<S> -> a <S> c <B> | <A> | b
<A> -> c <A> | c
<B> -> d | <A>
Which of the following sentences are in the language generated by this grammar
a. abcd
b. acccbd
c.
acccbcc
d. acd
e.
accc
<A> generates one or more c's
<B> will generate either One d or a string of one or more c's
So S will generate
a <S> c {d | c's} | c's | b
which matches a and e
Chapter 3: Describing a Programming Langauge
27
Describing a Programming Language
Describing Tokens
(Using Regular Expressions)
Describing Syntax
(Using BNF / CFG)
Describing Semantics
(Using Regular Expressions)
Static Semantic
(Using Attributed Grammar)
Chapter 3: Describing a Programming Langauge
Dynamic Semantic
(Using Attributed Grammar)
Possible ways: Operational,
Axiomatic and Denotational.
28
Static Semantics
• Static semantics illustrate the categories of language rules and it’s only
indirectly related to the meaning of programs during execution.
Many static sematic rules of a language state its type constraints and
static sematics can be described using attrributed grammar which is an
extension of context-free grammar.
Chapter 3: Describing a Programming Langauge
29
Attributes
• An attribute is a specification that defines a property of an object,
element or file. It may also refer to or set the specific value for a given
instance.
There are two main types of attrubute.
Synthesized attributes
Inherited attributes
values are computed from ones of the children nodes
P
values are computed from attributes of the siblings and
parent of the node
P
c1
c2
c3
c4
Synthesized of P = f(c1, c2, c3, c4)
S1
S2
S3
S4
Inherited of S4= f(P, S1, S2, S3)
Chapter 3: Describing a Programming Langauge
30
Example of Attributes
A
A
D
E
F
b is synthesized attribute of A
ADEF
D
E
F
b is Inherited attribute of D
DAEF
Synthesized/Inherited attributes are naturally computed bottom-up/top-down, respectively
Chapter 3: Describing a Programming Langauge
31
The Attributes for the Non terminals
• Actual_type: A Synthesized attribute associated with the non
terminals <var> and <expr>. In the case of an expression, it is
determined from the actual types of the child node.
• Expected_type: An Inherited attribute associated with the non
terminal <expr> determined by the type of the variable.
Chapter 3: Describing a Programming Langauge
32
Rules for Type Checking
• Let us consider the following attributed grammar.
<assign> → <var> = <expr>
<expr> → <var> + <var>
| <var>
<var> → A | B | C
The syntax and static semantics of this assignment statement are as
follows;
- The only variable names are A, B and C.
- The right side of the assignments can be either a variable or an
expression of a variable added to another variable.
Chapter 3: Describing a Programming Langauge
33
Rules for Type Checking (cont’)
- The variable can be int or real.
- When there are two variables on the right side of an assignment, they
need not be same type.
- The type of the expression when the operand types are not the same is
always real.
- When they are same, the expression type is that of the operands.
- The type of the left side of the assignment must match the type of the
right side.
Chapter 3: Describing a Programming Langauge
34
The look-up function
looks up a given
variable name in the
symbol table and
returns the variable’s
type.
Chapter 3: Describing a Programming Langauge
35
The flow of Attributes in the Tree
Chapter 3: Describing a Programming Langauge
36
Dynamic Semantics
Operational Semantics
• In operational semantics, certain properties of a program, such as
correctness, safety or security are verified by constructing proofs from
logical statements about its execution and procedures. For example;
operational semantics are used to describe semantics of PL/I.
• Operational semantics are classified in two categories.
• Structural operational semantics or small step semantics: formally
describe how the individual steps of a computation take place in a
computer-based system.
• Natural semantics or big-step semantics: describe how the overall
results of the executions are obtained.
Chapter 3: Describing a Programming Langauge
37
• Example of operational semantics.
C statement:
for (expr1; expr2; expr3) {
...
}
Meaning
expr1;
loop: if expr2 == 0 goto out
...
expr3;
goto loop
out: . . .
Chapter 3: Describing a Programming Langauge
38
Axiomatic Semantics
• Axiomatic semantics is an approach based on predicate calculus to
proving the correctness of computer programs.
• Axiomatic semantics can form some rules.
Assertions: pre-, post- conditions: {P} statement {Q}
{b > 0}
a=b+1
{a > 1}
• Axioms (logical statement that assumed to be true) or inference rules:
{P} S {Q}, P' P, Q Q'
{P'} S {Q' }
Chapter 3: Describing a Programming Langauge
39
Denotational Semantics
• Denotational semantics is an approach of formalizing the meaning of
programming languages by constructing mathematical objects that
describe the meaning of expressions from the languages.
• Denotational semantics based on recursive function theory and
originally developed by Scott and Strachey in 1970. This is the most
abstract semantics description method.
• The state of a program is the values of all its current variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}
Chapter 3: Describing a Programming Langauge
40
Example: We use a very simple language construct, character string
representations of binary numbers, to introduce the denotational
method. The syntax of such binary numbers can be described by the
following grammar rules:
<bin_num> → '0'
| '1'
| <bin_num> '0'
| <bin_num> '1'
Chapter 3: Describing a Programming Langauge
41
Static vs Dynamic Semantics
• Static semantics is more on the legal forms of programs (syntax rather
symantics) and is only indirectly related to the meaning of the
programs during execution. The semantic rules of language state its
type constraints.
• Dynamic semantics is describing the meaning of the programs.
Programmers need to know precisely what statements of a language
do. Compile writers determine the semantics of a language for which
they are writing compilers from English descriptions.
Chapter 3: Describing a Programming Langauge
42