Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

2 Chomsky, Lexical Analysis and Pasing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

Programming Languages

Lexical and Syntactic Analysis


• Chomsky Grammar Hierarchy
• Lexical Analysis – Tokenizing
• Syntactic Analysis – Parsing Noam Chomsky

• Hmm Concrete Syntax


• Hmm Abstract Syntax

Dr. Philip Cannata 1


Chomsky Hierarchy

• Regular grammar – used for tokenizing


• Context-free grammar (BNF) – used for parsing
• Context-sensitive grammar – not really used for
programming languages

Dr. Philip Cannata 2


Regular Grammar
• Simplest; least powerful
• Equivalent to:
– Regular expression (think of perl)
– Finite-state automaton
• Right regular grammar:
  Terminal*,
A and B  Nonterminal
A→B
A→
• Example:
Integer → 0 Integer | 1 Integer | ... | 9 Integer |
0 | 1 | ... | 9

Dr. Philip Cannata 3


Regular Grammar

• Less powerful than context-free grammars


• The following is not a regular language
{ aⁿ bⁿ | n ≥ 1 }
i.e., cannot balance: ( ), { }, begin end

Dr. Philip Cannata 4


Regular Expressions

x a character x
\x an escaped character, e.g., \n
{ name } a reference to a name
M|N M or N
MN M followed by N
M* zero or more occurrences of M
M+ One or more occurrences of M
M? Zero or one occurrence of M
[aeiou] the set of vowels
[0-9] the set of digits
. any single character

Dr. Philip Cannata 5


Regular Expressions

Dr. Philip Cannata 6


Regular Expressions

Dr. Philip Cannata 7


Finite State Automaton for Identifiers

(S, a2i$) ├ (I, 2i$)


├ (I, i$)
├ (I, $)
├ (F, )

Thus: (S, a2i$) ├* (F, )

Dr. Philip Cannata 8


Deterministic Finite State Automaton Examples

Dr. Philip Cannata 9


Context-Free Grammar

Production:
α→β
α  Nonterminal
β  (Nonterminal  Terminal)*
ie, lefthand side is a single nonterminal, and righthand
side is a string of nonterminals and/or terminals
(possibly empty).

Dr. Philip Cannata


Context-Sensitive Grammar

Production:
α→β |α| ≤ |β|
α, β  (Nonterminal  Terminal)*
ie, lefthand side can be composed of strings of
terminals and nonterminals

Dr. Philip Cannata


Syntax

• The syntax of a programming language is a precise


description of all its grammatically correct programs.
• Precise syntax was first used with Algol 60, and has been
used ever since.
• Three levels:
– Lexical syntax - all the basic symbols of the language
(names, values, operators, etc.)
– Concrete syntax - rules for writing expressions,
statements and programs.
– Abstract syntax - internal representation of the program,
favoring content over form.

Dr. Philip Cannata


Grammars
Grammars: Metalanguages used to define the concrete syntax of a
language.

Backus Normal Form – Backus Naur Form (BNF)


• Stylized version of a context-free grammar (cf. Chomsky hierarchy)
• First used to define syntax of Algol 60
• Now used to define syntax of most major languages
Production:
α→β
α  Nonterminal
β  (Nonterminal  Terminal)*
ie, lefthand side is a single nonterminal, and β is a string of nonterminals
and/or terminals (possibly empty).
• Example
Integer  Digit | Integer Digit
Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Dr. Philip Cannata


Extended BNF (EBNF)

Additional metacharacters
{ } a series of zero or more
( ) must pick one from a list
[ ] pick none or one from a list

Example
Expression -> Term { ( + | - ) Term }
IfStatement -> if ( Expression ) Statement [ else Statement ]

EBNF is no more powerful than BNF, but its production rules are often simpler
and clearer.

Javacc EBNF
( … )* a series of zero or more
( … )+ a series of one or more
[ … ] optional
Dr. Philip Cannata
For more details, see Chapter 2 of
“Programming Language Pragmatics, Third Edition (Paperback)”
Michael L. Scott (Author)

Dr. Philip Cannata


Instance of a Programming
Language:
int main ()
{
Internal Parse Tree
return 0 ;
}

Program (abstract syntax):


Function = main; Return type = int
params =
Block:
Return:
Variable: return#main, LOCAL addr=0
IntValue: 0

Abstract Syntax

Dr. Philip Cannata


Now we’ll focus
on the internal
parse tree

Dr. Philip Cannata


Parse Trees

Integer  Digit | Integer Digit


Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Parse Tree for 352 as an Integer

Dr. Philip Cannata


Arithmetic Expression Grammar

Expr  Expr + Term | Expr – Term | Term


Term  0 | ... | 9 | ( Expr )

Parse of 5 - 4 + 3

Dr. Philip Cannata


Associativity and Precedence

• A grammar can be used to define associativity and


precedence among the operators in an expression.
E.g., + and - are left-associative operators in mathematics;
* and / have higher precedence than + and - .
• Consider the following grammar:
Expr -> Expr + Term | Expr – Term | Term
Term -> Term * Factor | Term / Factor | Term % Factor | Factor
Factor -> Primary ** Factor | Primary
Primary -> 0 | ... | 9 | ( Expr )

Dr. Philip Cannata


Associativity and Precedence
Parse of 4**2**3 + 5 * 6 + 7

Dr. Philip Cannata


Associativity and Precedence

Precedence Associativity Operators


3 right **
2 left * / %
1 left + -

Note: These relationships are shown by the structure


of the parse tree: highest precedence at the bottom,
and left-associativity on the left at each level.

Dr. Philip Cannata


Ambiguous Grammars

• A grammar is ambiguous if one of its strings has two


or more diffferent parse trees.

• Example:
Expr -> Expr Op Expr | ( Expr ) | Integer
Op -> + | - | * | / | % | **

• Equivalent to previous grammar but ambiguous

Dr. Philip Cannata


Ambiguous Grammars

Ambiguous Parse of 5 – 4 + 3

Dr. Philip Cannata


Dangling Else Ambiguous Grammars

IfStatement -> if ( Expression ) Statement |


if ( Expression ) Statement else Statement
Statement -> Assignment | IfStatement | Block
Block -> { Statements }
Statements -> Statements Statement | Statement

With which ‘if’ does the following ‘else’ associate

if (x < 0)
if (y < 0) y = y - 1;
else y = 0;

Dr. Philip Cannata


Dangling Else Ambiguous Grammars

Dr. Philip Cannata


Hmm BNF (i.e., Concrete Syntax)

Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject}


Function : ( ) Block
MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function
}}
MyObject: Identifier Identifier = create Identifier callArgs
Constructor: Identifier ([{ Parameter } ]) block
Declaration : Type Identifier [ [Literal] ]{ , Identifier [ [ Literal ] ] }
Type : int|bool| float | list |tuple| object | string | void
Statements : { Statement }
Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|
ReturnStatement
Block : { Statements }
ForEach: for( Expression <- Expression ) Block
Assignment : Identifier [ [ Expression ] ]= Expression ;
Parameter : Type Identifier
IfStatement: if ( Expression ) Block [elseifStatement| Block ]
WhileStatement: while ( Expression ) Block

Dr. Philip Cannata


Hmm BNF (i.e., Concrete Syntax)
Expression : Conjunction {|| Conjunction }
Conjunction : Equality {&&Equality }
Equality : Relation [EquOp Relation ]
EquOp: == | !=
Relation : Addition [RelOp Addition ]
RelOp: <|<= |>|>=
Addition : Term {AddOp Term }
AddOp: + | -
Term : Factor {MulOp Factor }
MulOp: * | / | %
Factor : [UnaryOp]Primary
UnaryOp: - | !
Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension|
ObjFunction
callOrLambda : Identifier callArgs|LambdaDef
callArgs : ([Expression |passFunc { ,Expression |passFunc}] )
passFunc : Identifier (Type Identifier { Type Identifier } )
LambdaDef : (\\ Identifier { ,Identifier } -> Expression)

Dr. Philip Cannata


Hmm BNF (i.e., Concrete Syntax)

IdentifierOrArrayRef : Identifier [ [Expression] ]


subExpressionOrTuple : ([ Expression [,[ Expression { , Expression } ] ] ] )
ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<-
Expression ] } ]
ObjFunction: Identifier . Identifier . Identifier callArgs
Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)}
Literal : Integer | True | False | ClFloat | ClString
Integer : Digit { Digit }
ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9}
ClString: ” {~[“] }”

Dr. Philip Cannata


Associativity and Precedence for Hmm

Clite Operator Associativity


Unary - ! none
*/ left
+- left
< <= > >= none
== != none
&& left
|| left

Dr. Philip Cannata


Hmm Parse Tree Example
z = x + 2 * y;

Dr. Philip Cannata


Now we’ll focus
on the Abstract
Syntax

Dr. Philip Cannata


Hmm Parse Tree
z = x + 2 * y;
=

Dr. Philip Cannata


Very Approximate Hmm Abstract Syntax

Dr. Philip Cannata


Very Approximate Hmm Abstract Syntax

Assignment = Variable target; Expression source


Expression = VariableRef | Value | Binary | Unary
VariableRef = Variable | ArrayRef
Variable = String id
ArrayRef = String id; Expression index
Value = IntValue | BoolValue | FloatValue | CharValue
Binary = Operator op; Expression term1, term2
Unary = UnaryOp op; Expression term
Operator = ArithmeticOp | RelationalOp | BooleanOp
IntValue = Integer intValue

Dr. Philip Cannata


Hmm Abstract Syntax – Binary Example
z=x+2*y

Binary

Operator Variable Binary


+ x

Operator Value Variable


* 2 y
Dr. Philip Cannata

You might also like