Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
(Syntax)
(note to instructor, have a video to load under 1st CFG Grammar Example)
u
B
g
R
p
e
o tD
rsUlIm
v n
E
a Creation Order of Language X
Implementers shape how the code is formed, but are handcuffed by the Language
1
Syntax vs. semantics
Syntax
o the form or structure of the expressions, statements, and program units
Semantics
o the meaning of the expressions, statements, and program units
Syntax and semantics provide a language’s definition
o Users of a language definition
Other language designers
Implementers
Programmers (the users of the language)
Both are closely related
A well designed language you should be able to read the statement
parentheses syntax and get what it is they will do (semantics)
Terms in syntax
Language
o set of sentences, combination of keywords
Sentence
o a string of characters over some alphabet
o a line of syntax
Lexeme
o lowest level syntactic unit of a language (e.g., *, sum, begin)
o Numerical limits, operators, special words, etc…
o a program is a study of lexemes
2
Tokens
o is a category of lexemes (identifiers)
o words in a syntax
Lexemes
o are read in and recognized by a Scanner
Scanner described below (images)
o that Scanner then places that lexeme into a Token category
Tokens
o lexemes broken down into the categories
reserved or keywords words
an identifier cannot be in the same sentence as a reserved
word
int else;
identifiers
names of variables, methods, classes, etc…
Operators and special symbols
+, -, /, etc…
Literals or constants
Values placed in equations or hard coded digits
3
Where the scanner is in the entire process
4
Unary Operators
operators that act upon a single operand, set around a value
o prefix or postfix around the value
o 3 – (-2), x++
C family of Unary Operators
Increment: ++x, x++ Positive: +x
Decrement: −−x, x−− Negative: −x
Address: &x One's complement: ~x
Indirection: *x Logical negation: !x
Grammars
type of language generator, meant to describe the syntax of natural languages
the grammar can have nested, recursive, self-similar branches in their syntax
trees
o so they can handle nested structures well.
They can be implemented as state automaton with stack
o This stack is used to represent the nesting level of the syntax
one portion may have to wait until another portion is solved
Two grammar classes
o Context-free (CF)
o Regular
Both used to describe the syntax of programming languages
5
1st CFG Grammar Example
EXP -> NUM | ( EXP OP EXP )
OP -> + | - | * | /
NUM -> DIGIT | DIGIT NUM
DIGIT -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | -1 | -2 | …
6
4 digit value be covered by this grammar?
7
Context-free language and grammar (CFGs)
Context Free Grammar (CFG) Example
a CFG for the language of all
palindromes using letters a and b Right Hand Side
8
What other sentences would match the grammar above/below? (Create 3 more
using the example below, with 7 or more characters)
Sentence parse tree proving it fits
aabaa
S→P
P → ε // think of epsilon as a “null”
P→a
P→b
P → aPa
P → bPb
9
Parse Trees
graphical representation showing the hierarchical syntax structure of the
sentence of the language they define
or (I like better) a hierarchical representation of a derivation
must know
o the grammar
or may need many parse trees to solve what the grammar is
o the sentence you are striving for
10
Terms in Grammar
Grammar
o a finite non-empty set of rules
abstractions
o also called non-terminals
o can have 2 or more distinct definitions/representations …
uses the | to separate sentences
11
Terms in a Grammar
sentence
o entire line in a rule
o and final solution (line of syntax)
usually from a derivation (later)
o Sebesta book uses both!!!
BNF description/grammar
o generation device for defining languages
o collection of rules.
non-terminals
o part of a sentence that can be further broken down
o in normal form (non-BNF), UPPER CASE
terminal
o part of a sentence that cannot be broken down any further
o in normal form (non-BNF), lower case
start symbol
12
o special non-terminal symbol, the very highest, non-reduced symbol in
the grammar
Backus – Naur form (BNF)
a form of CFG
invented by John Backus for Algol 58
widely used notation
Used “abstraction” for general description notation for syntax structures
13
The meaning of “list” in Grammar
not the array or list data structure!!
items of the same data type
A “list” of items
int a, b, c, d, e, f;
uses recursion in the Grammar in order to replicate the same “code” over and
over
o like in recursion, the rule must have a “base case” in order to stop
does not call itself (rule) again
o please notice a “,” is used to separate the items of the list
14
General Grammar setup
some obvious (hopefully) things
o <program> is the starting point of a syntax
or some rule that has an obvious starting name
o special words or lexemes are in bold and mixed throughout the sentence
15
Derivations in General
are what you JUST used to solve which syntax did NOT fit
o in a more formulated, step by step way
solution set from the Grammar given using its rules and ending with a
sentence
to solve
o given TARGET SYNTAX, GRAMMAR and Derivation Order
order covered in a minute
o always start with the start symbol in the grammar and its rule
o then use the other rules to get to the target syntax
the results
o the symbol “=>” means derives
o notice in each line of the derivation, only one abstraction/substitution is
derived
o each line is derived from the line before (above)
o each line is called sentential form
16
Derivation Order
order of derivation replacement of a sentential form
Leftmost
o in each line of the derivation, the leftmost non-terminal is replaced
using a rule in the Grammar
Rightmost (another order option)
o rightmost non-terminal is solved first
o creates different sentences (maybe)
In order (another order option)
** notice in line 2, you have two options, but only the left was replaced
Derivation order (should not at least) has no effect on the language generated
by the grammar
by using ALL of the different order combinations do you get your entire
language sentences
o if we did not have a target syntax
o which in reality, would be impossible (super huge!) to get ALL
combinations, but at least a good feel for it
17
Solving Derivations and First Example
to solve
o given TARGET SYNTAX, GRAMMAR and Derivation Order
o always start with the start symbol in the grammar and its rule
o then use the other rules to get to the target syntax
but you CANNOT change derivation orders (left, right, etc…) within the same
derivation
o either the whole derivation is left, right, or inorder
18
Exercise #1
Given Grammar Leftmost Derivation w/ Target Syntax
<sentence> -> <subject> <predicate> A DOG PETS A DOG
<subject> -> <article> <noun>
<predicate> -> <verb> <direct-object>
<sentence> ->
<direct-object> -> <article> <noun>
Answer:
<article> -> THE | A
<noun> -> MAN | DOG
<verb> -> BITES | PETS
Exercise #2
Given Grammar Rightmost Derivation w/ Target Syntax
<sentence> -> <subject> <predicate> THE MAN BITES A DOG
<subject> -> <article> <noun>
<predicate> -> <verb> <direct-object> (rightmost!!)
<direct-object> -> <article> <noun> Answer:
<article> -> THE | A
<noun> -> MAN | DOG
<verb> -> BITES | PETS
19
Using the JFlap tool
www.jflap.org
o free to download
o used to draw a parse tree
o used to test a grammar in many ways
installation
o fill out form
o find “JFLAP_Thin.jar” and download to desktop. No installation
needed.
how to create parse trees in JFlap
20
Ambiguity
where a sentence can be represented by more than one parse tree
o NOT DERIVATION!!!
o this is bad!!!
o could be that
left/right most derivations do not match!!
several lefts don’t match!!
why does this matter??
o mathematically seems the same
o just try programming this!!
In proving ambiguity you are TRYING TO MISMATCH with the SAME
TARGET Syntax
o just make sure your syntax is correct first!!
21
2nd Ambiguous Grammar Example
Grammar Solution(s) for: 3 + 4 * 5
E→E+E
E→E*E
E→i
22
Fixing Ambiguity – well kinda
sadly, there is no “procedure” to fix
cannot be done automatically
more of a trial and error
but USUALLY it includes
o more rules
o better precedence order (see below)
o not as many “|” in a single sentence
23
Proving Ambiguity within a Grammar
start with a Grammar and a legit sentence
o USE ALL RULES WITHIN THE GIVEN GRAMMAR!!!
try both a left and right derivation
o if the trees EXACTLY match, grammar could be good
24
Greedy means the Grammar is crap
Well, really means crap or educational example
The better the grammar the less you worry about greedy
25
<S> <A>
<A> <A> + <A> | <id>
<id> a | b | c
1. Create another target syntax string that works with the given grammar above
a. Instead of x + y + z, try something new with a,b,c respectfully
2. Try left most and right most greedy parse tree to prove ambiguity on that new
target string
3. Do the same thing (greedy) with the grammar below: Answerb:
<binary-string> -> 0
| 1
| <binary-string> <binary-string>
26
Remember how to solve trees?
remember it’s bottom up
leaves-ish are solved first
tree = recursive, remember the last valid recursive call REALLY gets solved
first
we use this idea for precedence below
(((5 + 2) * 5) + 3)
1. In the left tree, what portion of the equation is completed first? Why?
2. What would the answer be for either of these?
3. What would the equations below look like as a tree??? Use our normal
understanding of (PEMDAS) to create the tree.
a. (((3 + 4) * 5) *6)
b. 6 + 128 * 34
c. 2 ^ 6 * 12 + 5 Answersb:
27
Setting up Operator Precedence in a Grammar
setting the order of operations in a grammar
can assign different levels or precedence for operators in the grammar design
operators lower(est) on the parse tree must be completed/solved first
o left parse tree from example below is what we want
Solving by precedence
How would we solve 3 + (5 * 9 + 2) using a tree?
28
Unambiguous grammar for Operator (+ and *) Precedence
<assign> <id> = <expr>
<expr> <expr> + <term>
| <term>
<term> <term> * <factor>
| <factor>
<factor> ( <expr> )
| <id>
<id> A | B | C
What other operators should be deeper or even with <factor>?
*** notice to JUST get to factor (which has the highest precedence) it
takes many rules to get to!! <expr> <term> <factor>
1. Draw the simple parse trees for: (ignore left or rightmost for now)
a. A = C + B which level (starting at 0) does the + reside?
b. A = C * B which level does the * reside?
answerb:
2. Using the grammar above, which ALWAYS will come first, (top down)
<expr> or <term>?
29
Derivation for A = B + C * A
<assign> <id> = <expr>
<expr> <expr> + <term>
| <term>
<term> <term> * <factor>
| <factor>
<factor> ( <expr> )
| <id>
<id> A | B | C
(leftmost) (rightmost)
<assign> => <id> = <expr> <assign> => <id> = <expr>
=> A = <expr> => <id> = <expr> + <term>
=> A = <expr> + <term> => <id> = <expr> + <term> * <factor>
=> A = <term> + <term> => <id> = <expr> + <term> * <id>
=> A = <factor> + <term> => <id> = <expr> + <term> * A
=> A = <id> + <term> => <id> = <expr> + <factor> * A
=> A = B + <term> => <id> = <expr> + <id> * A
=> A = B + <term> * <factor> => <id> = <expr> + C * A
=> A = B + <factor> * <factor> => <id> = <term> + C * A
=> A = B + <id> * <factor> => <id> = <factor> + C * A
=> A = B + C * <factor> => <id> = <id> + C * A
=> A = B + C * <id> => <id> = B + C * A
=> A = B + C * A => A = B + C * A
30
Parse Tree for A = B + C * A
(left and right!!)
31
1. Try creating parse trees for these:
A = (B + C) * A A = (B * C) * A
Answerb: Answerb:
32
Associativity in General
not only if you have to deal with * and /
but operators of the same precedence
parse trees (leftmost and rightmost) SHOULD look exactly the same, BUT
solve to the same equation because of associativity
BUT THE DERIVATIONS (leftmost and rightmost) will NOT LOOK the
same!!
remember, usually LEFT to RIGHT when using operators of the same
precedence.
o +, -, *, /, % are all evaluated left to right
o B+C+A
is really (B + C) + A
33
Associative Rule Breakers!! Intro. to Recursion
Right to Left rule breakers!!!
o unary
o power (7 ** 8) // ( ** or ^ depending on the language)
o !
o ~ (bitwise compliment)
Recursion
o rule calls itself
Left Recursion, call (symbol) to itself is physically LEFT of the operator
Right Recursion, call (symbol) to itself is physically RIGHT of the operator
Remember, the part of the tree that “dangles” the lowest is completed first!!
Solves left hand side first!! Solves right hand side first!!
34
Recursion Example
Left Recursion Right Recursion
Precedence
| <term> | <exp>
<term> <term> * <factor> <exp> ( <expr> )
| <factor> | <id>
1. Try A + (B * C) / D just to get used to this new grammar. Did it come out
right?
2. Then try A ** B ** C. Which portion was the lowest on the tree?
Answersb:
35
FYI Section
Answers:
36
Extended (Updated) BNF (EBNF)
like anything else, some updates were made for convenience
only increased the readability/writability (for us!!)
there are other versions of the updates
3 most common updates
o optional part in RHS
Optional parts are placed in brackets [ ]
almost anything with |’s in reality is now replaced
if symbol is unique to any of the rules, it may need to stay
less rules, or as many lines to write
in C++:
<if_stmt> if( <expression> ) <statements> [else <statements> ]
Replaces
o repeating
0 or more!!
use of braces in an RHS to indicate that the enclosed part can be
repeated indefinitely OR left out altogether
works great for lists!!
look for any recursion in the BNF form to be replaced
37
EBNF “Repeating” Update
<ident_list> identifier
| identifier , <ident_list>
o Multiple choice!!
choose a single element from a group
options are placed in ( )s and separated by |
notice the | count is the same, just now in one line
38
This new fangled EBNF thingy
the brackets, braces, and parentheses in the EBNF for are called
“metasymbols”
o notational tools
o not terminal symbols
issues
o in case the metasymbols are also terminal symbols in the language
the instance that are terminals are underlined or quoted
o loss of associativity
using the EBNF for of + above
no longer does it imply direction of associativity
this is fixed by using a EBNF syntax analyzer discussed later
A ::= a A | B
⇒ A ::= a { a } B
Look for common string that can be factored out with grouping and options.
A ::= a B | a
⇒ A := a [B]
40
Lupoli’s Strategy on BNFs to EBNF
<program> begin <stmt_list> end
<stmt_list> <stmt>
| <stmt> ; <stmt_list>
<stmt> <var> = <expression>
<var> A | B | C
<expression> <var> + <var>
| <var> - <var>
| <var>
3. Optionals, rules that are very similar, one extends from another
41
a. none in this case
42
Answers
#2
#3
43
Binary String Grammar Example
<binary-string> -> 0
| 1
| <binary-string> <binary-string>
1010111
44
Turning Equations into Trees
45
Why certain operations are not covered in grammar
Where order does not matter (covered)
3+4=
4+3=
3*4=
4*3=
4/2=
2/4=
46
Proving Order precedence works
A=C+B A=C*B
47
A = (B + C) * A A = (B * C) * A
48
A+(B*C)/D A ** B ** C
49
Resources:
http://stackoverflow.com/questions/2842809/lexers-vs-parsers
http://teaching.idallen.com/cst8152/97w/slides/sld021.htm
http://www.slideshare.net/dasprid/about-tokens-and-lexemes
http://everything2.com/title/Language+Generators+vs.+Language+Recognizers
http://www.antlr.org/wiki/display/CS652/Grammars
http://goose.ycp.edu/~dhovemey/fall2009/cs340/lecture/lecture2.html
http://condor.depaul.edu/ichu/csc447/notes/wk3/BNF.pdf
http://en.unitedstatesof.net/2008/09/11/2-dlr-scanner/
http://www.cs.utsa.edu/~wagner/CS3723/grammar/examples.html
http://www.box.com/shared/e31pciv7b9
http://www.codeproject.com/KB/cs/intro_functional_csharp2/figure3.png
50