Lecture 8 Syntax Analysis
Lecture 8 Syntax Analysis
LECTURE 8
LECTURE OUTLINE
• SYNTAX ANALYSIS
• CONTEXT FREE GRAMMAR
• DERIVATIONS
• AMBIGUITY
• ASSOCIATIVE
• PRECEDENT
• PRECEDENCE
SYNTAX ANALYSIS
source program
lexical analyzer
tokens
syntax analyzer
parse tree
semantic analyzer
parser tree
SYNTAX ANALYSIS
• A PARSE TREE DEPICTS ASSOCIATIVITY AND PRECEDENCE OF OPERATORS. THE DEEPEST SUB-
TREE IS TRAVERSED FIRST, THEREFORE THE OPERATOR IN THAT SUB-TREE GETS PRECEDENCE
OVER THE OPERATOR WHICH IS IN THE PARENT NODES.
• THAT IS, AN IF-ELSE STATEMENT IS THE CONCATENATION OF THE KEYWORD IF, AN OPENING
PARENTHESIS, AN EXPRESSION, A CLOSING PARENTHESIS, A STATEMENT, THE KEYWORD ELSE,
AND ANOTHER STATEMENT.
• USING THE VARIABLE EXPR TO DENOTE AN EXPRESSION AND THE VARIABLE STMT TO DENOTE A
STATEMENT, THIS STRUCTURING RULE CAN BE EXPRESSED AS
• IN WHICH THE ARROW MAY BE READ AS "CAN HAVE THE FORM." SUCH A RULE IS CALLED AM
PRODUCTION. IN A PRODUCTION, LEXICAL ELEMENTS LIKE THE KEYWORD IF AND THE
PARENTHESES
• ARE CALLED TERMINALS. VARIABLES LIKE EXPR AND STMT REPRESENT SEQUENCES OF
TERMINALS AND ARE CALLED NONTERMINALS.
CONTEXT-FREE GRAMMAR
G=(S ,N,P,S)
• S IS A FINITE SET OF TERMINALS
• N IS A FINITE SET OF NON-TERMINALS
• P IS A FINITE SUBSET OF PRODUCTION RULES
• S IS THE START SYMBOL
EXAMPLE
• WE TAKE THE PROBLEM OF PALINDROME LANGUAGE, WHICH CANNOT BE DESCRIBED BY
MEANS OF REGULAR EXPRESSION. THAT IS, L = { W | W = WR } IS NOT A REGULAR
LANGUAGE. BUT IT CAN BE DESCRIBED BY MEANS OF CFG, AS ILLUSTRATED BELOW:
G = ( V, Σ, P, S )
V = { Q, Z, N }
Σ = { 0, 1 }
P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }
S={Q}
THIS GRAMMAR DESCRIBES PALINDROME LANGUAGE, SUCH AS: 1001, 11100111, 00100,
1010101, 11111, ETC.
SYNTAX ANALYZERS
• A SYNTAX ANALYZER OR PARSER TAKES THE INPUT FROM A LEXICAL ANALYZER IN THE FORM
OF TOKEN STREAMS. THE PARSER ANALYZES THE SOURCE CODE (TOKEN STREAM) AGAINST THE
PRODUCTION RULES TO DETECT ANY ERRORS IN THE CODE. THE OUTPUT OF THIS PHASE IS A
PARSE TREE.
• THIS WAY, THE PARSER ACCOMPLISHES TWO TASKS, I.E., PARSING THE CODE, LOOKING FOR
ERRORS, AND GENERATING A PARSE TREE AS THE OUTPUT OF THE PHASE.
• PARSERS ARE EXPECTED TO PARSE THE WHOLE CODE EVEN IF SOME ERRORS EXIST IN THE
PROGRAM. PARSERS USE ERROR RECOVERING STRATEGIES.
SYNTAX ANALYZERS
DERIVATION
LEFT-MOST DERIVATION
• IF THE SENTENTIAL FORM OF AN INPUT IS SCANNED AND REPLACED FROM LEFT TO RIGHT, IT
IS CALLED LEFT-MOST DERIVATION. THE SENTENTIAL FORM DERIVED BY THE LEFT-MOST
DERIVATION IS CALLED THE LEFT-SENTENTIAL FORM.
RIGHT-MOST DERIVATION
• IF WE SCAN AND REPLACE THE INPUT WITH PRODUCTION RULES, FROM RIGHT TO LEFT, IT IS
KNOWN AS RIGHT-MOST DERIVATION. THE SENTENTIAL FORM DERIVED FROM THE RIGHT-
MOST DERIVATION IS CALLED THE RIGHT-SENTENTIAL FORM.
EXAMPLE
PRODUCTION RULES:
E→E+E
E→E*E
INPUT STRING: ID + ID * ID
LEFT-MOST DERIVATION
THE LEFT-MOST DERIVATION IS:
E→E*E
E→E+E*E
E → ID + E * E
E → ID + ID * E
E → ID + ID * ID
NOTICE THAT THE LEFT-MOST SIDE NON-TERMINAL IS ALWAYS PROCESSED FIRST
RIGHT-MOST DERIVATION
IN A PARSE TREE:
• ALL LEAF NODES ARE TERMINALS.
• ALL INTERIOR NODES ARE NON-TERMINALS.
• IN-ORDER TRAVERSAL GIVES ORIGINAL INPUT STRING.
A PARSE TREE DEPICTS ASSOCIATIVITY AND PRECEDENCE OF OPERATORS.
THE DEEPEST SUB-TREE IS TRAVERSED FIRST, THEREFORE THE OPERATOR IN THAT SUB-TREE GETS
PRECEDENCE OVER THE OPERATOR WHICH IS IN THE PARENT NODES.
AMBIGUITY
• GRAMMAR THAT PRODUCES MORE THAN ONE PARSE TREE FOR SOME SENTENCE IS SAID TO BE
AMBIGUOUS. PUT ANOTHER WAY.
• AN AMBIGUOUS GRAMMAR IS ONE THAT PRODUCES MORE THAN ONE LEFTMOST DERIVATION
OR MORE THAN ONE RIGHTMOST DERIVATION FOR THE SAME SENTENCE.
• FOR MOST PARSERS, IT IS DESIRABLE THAT THE GRAMMAR BE MADE UNAMBIGUOUS, FOR IF IT
IS NOT , WE CANNOT UNIQUELY DETERMINE WHICH PARSE TREE TO SELECT FOR A SENTENCE.
EXAMPLE
THE ARITHMETIC EXPRESSION GRAMMAR PERMITS TWO DISTINCT LEFTMOST DERIVATIONS FOR
THE SENTENCE ID + ID * ID:
E =} E + E E =} E * E
=} ID + E :::} E + E * E
=} ID + E * E =} ID + E * E
=} ID + ID * E :=} ID + ID * E
=} ID + ID * ID :=} ID + ID * ID
TWO PARSE TREES FOR ID+ID*ID
.
EXPLANATION
• THAT THE PARSE TREE REFLECTS THE COMMONLY ASSUMED PRECEDENCE OF + AND *.
• WHILE THE TREE DOES NOT. THAT IS, IT IS CUSTOMARY TO TREAT OPERATOR * AS HAVING
HIGHER PRECEDENCE THAN +.
• CORRESPONDING TO THE FACT THAT WE WOULD NORMALLY EVALUATE AN EXPRESSION LIKE
A + B * C AS A + (B * C) , RATHER THAN AS (A + B) * C.
AMBIGUITY
• IF AN OPERAND HAS OPERATORS ON BOTH SIDES, THE SIDE ON WHICH THE OPERATOR TAKES
THIS OPERAND IS DECIDED BY THE ASSOCIATIVITY OF THOSE OPERATORS.
• IF THE OPERATION IS LEFT-ASSOCIATIVE, THEN THE OPERAND WILL BE TAKEN BY THE LEFT
OPERATOR; OR IF THE OPERATION IS RIGHT-ASSOCIATIVE, THE RIGHT OPERATOR WILL TAKE
THE OPERAND.
EXAMPLE
• OPERATIONS SUCH AS ADDITION, MULTIPLICATION, SUBTRACTION, AND DIVISION ARE LEFT
ASSOCIATIVE. IF THE EXPRESSION CONTAINS:
ID OP ID OP ID
IT WILL BE EVALUATED AS:
(ID OP ID) OP ID
FOR EXAMPLE, (ID + ID) + ID
OPERATIONS LIKE EXPONENTIATION ARE RIGHT ASSOCIATIVE, I.E., THE ORDER OF EVALUATION IN
THE SAME EXPRESSION WILL BE:
ID OP (ID OP ID)
PRECEDENCE
• IF TWO DIFFERENT OPERATORS SHARE A COMMON OPERAND, THE PRECEDENCE OF
OPERATORS DECIDES WHICH WILL TAKE THE OPERAND.
• THAT IS, 2+3*4 CAN HAVE TWO DIFFERENT PARSE TREES, ONE CORRESPONDING TO (2+3)*4
AND ANOTHER CORRESPONDING TO 2+(3*4).
• BY SETTING PRECEDENCE AMONG OPERATORS, THIS PROBLEM CAN BE EASILY REMOVED.