UGC-NET Computer Science
UGC-NET Computer Science
CHAPTER 3
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Evaluation of parse tree will always happen from bottom to up and left to right.
Ambiguity :
A grammar is ambiguous if a sentence has more than one parse tree, i.e., more than one
leftmost (or rightmost) derivation of a sentence is possible.
Example : Given the grammar ( set of productions)
E -> E + E
E -> E * E
E -> id
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Usual precedence order from highest to lowest is : - (unary minus), *|/, +|-
Golden rule : Build grammar from lowest to highest precedence
Goal -> Expr
Expr -> Expr + Term | Expr - Term | Term
Term -> Term * Factor | Term / Factor | Factor
Factor -> -Primary | Primary
Primary -> id
Now the leftmost derivation for - id + id * id are
Goal => Expr
Expr => Expr + Term
=> Term + Term
=> Factor + Term
=> - Primary + Term
=> - id + Term
=> - id + Term*Factor
=> -id + Factor*Factor
=> -id + Primary*Factor
=> -id + id * Factor
=> -id + id * Primary
=> - id + id * id
There are three new non-terminals ( Term, Factor, Primary ). You can not have 2 parse
tree for the above sentence using above grammar.
Parser : A program that, given a sentence, reconstructs a derivation for that sentence ---- if
done successfully, it “recognizes” the sentence. All parsers read their input left-to-right, but
construct parse tree differently.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
'prediction'.
1. If the guess is wrong then one need to revert the guess and try it again. This is called
'backtracking'.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Step 4 : If match then go to step 2 till the complete sentence is matched Else it is a wrong
guess and revert back the derivation and go to step 2 -- Backtrack
If the prediction matches the input string then no backtracking else backtracking.
Some disadvantages of top-down parsing.
Two problems arise due to possibility of backtracking
a. Semantic analysis can not be performed while making a prediction. The action must be
delayed until the prediction is known to be part of successful part. i.e. you don’t know
whether this prediction is correct or not.
a. A source string is known to be erroneous only after all predictions have failed. This
makes it very inefficient.
Based on prediction and backtracking top-down parsers can be categorized into two
categories
1. Recursive-Descent Parsing ( RD) - A top-down parser with backtrack
• Backtracking is needed (If a choice of a production rule does not work, we
backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used. Not efficient. Can be used
for quick and dirty parsing.
• At each derivation it uses RHS of a derivation from left to right.
• Grammar with right recursion is suitable for this and do not enter into infinite
loop while making predictions.
• Why the name is recursive-descent ?
Parser is recursive in nature ( recursive derivations )
Descent because it goes from top->down
Example :
S aBc
B bc | b ( Here it uses bc for B first and then b )
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
=> V + E
=> <id> + E
=> <id> + T
=> <id> + V * T
=> <id> + <id> * T
=> <id> + <id> * V
=> <id> + <id> * <id>
recursive production rule the same non terminal must have a production rule for
epsilon also. With left recursion there might be chances of infinity loop which
will never make this possible for a right prediction of K symbols.
Given a left recursive grammar ( or right recursive grammar)this can be converted
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Left factoring : Take common parts of productions and form a new non terminal. With left
factoring each production (i.e. each non terminal)become non-recursive or right recursive.
If the production is right recursive then there is production for e (epsilon)
Examples : How to convert a left recursive grammar into LL(k) grammar
E => E + T | T
T => T * V | V ----> Left recursive ( Not suitable for any top-down parsing )
V => <id>
|
|
v
E => T + E | T
T => V * T | V ---> right recursive ( suitable for top-down recursive descent parsing )
V => <id>
|
|
v
E => TE'
E' => +T E' | e ( Note that all the recursive production will have an derivation to e
(epsilon ) )
T => V T ' ---> Left factored LL(k) grammar ( suitable for top-down predictive
parsing )
T' => *V T' | e
V = <id>
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars
that contain left recursion.
LL(1) --> recursive descent parser can decide which production to apply by examining only
the next '1' token of input.
• The predictive parser which uses the LL(1) grammar is known as LL(1) parser.
Something more about LL(1) parser
○ LL(1) means that
the input is processed left-to-right
a leftmost derivation is constructed
the method uses at most one lookahead token
○ An LL(1) parser is a table driven parser for left-to-left parsing ( LL parsing ).
○ The '1' in LL(1) indicates that the grammar uses a look-ahead of one source
symbol. i.e. the prediction to be made is
Example : Here is the example of LL(1) grammar for arithmetic operation and the
corresponding table
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Grammar :
Input string :
Parsing Table :
’
{Input : A string and a and parsing table M for grammar G.}
ω
Initially, the parser is in a configuration in which it has $S on the stack with S, the start
Repeat
Let X be the top stack symbol and a the symbol pointed to by ip.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
if X = a
Pop of X fromt he stack and advance ip.
else
error()
end if
else {X is a nonterminal}
if
M [ X , a] = X → Y1Y2 ⋅⋅⋅ Yk
else
error()
end if
end if
unitl X = $ {stack is empty}
Parsing steps :
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Bottom-up parser : (Construct parse tree “bottom-up” --- from leaves to the root ) As the
name suggests, bottom-up parsing works in the opposite direction from topdown. A top-
down parser begins with the start symbol at the top of the parse tree and works downward,
driving productions in forward order until it gets to the terminal leaves. A bottom-up parse
starts with the string of terminals itself and builds from the leaves upward, working
backwards to the start symbol by applying the productions in reverse. Along the way, a
bottom-up parser searches for substrings of the working string that match the right side of
some production. When it finds such a substring, it reduces it, i.e., substitutes the left side
nonterminal for the matching right side. The goal is to reduce all the way up to the start
symbol and report a successful parse.
In general, bottom-up parsing algorithms are more powerful than top-down methods, but not
surprisingly, the constructions required are also more complex. It is difficult to write a
bottom-up parser by hand for anything but trivial grammars, but fortunately, there are
excellent parser generator tools like yacc that build a parser from an input specification
Some features of bottom up parsing
○ Bottom-up parsing always constructs right-most derivation
○ It attempts to build trees upward toward the start symbol.
○ More complex than top-down but efficient
Types of bottom up parser ( 2 types - shift reduce and precedence)
• Shift reduce parser
Shift-reduce parsing is the most commonly used and the most powerful of the bottom-up
techniques. It takes as input a stream of tokens and develops the list of productions used to
build the parse tree, but the productions are discovered in reverse order of a topdown parser.
Like a table-driven predictive parser, a bottom-up parser makes use of a stack to keep track
of the position in the parse and a parsing table to determine what to do next.
To illustrate stack-based shift-reduce parsing, consider this simplified expression grammar:
S –> E
E –> T | E + T
T –> id | (E)
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
The shift-reduce strategy divides the string that we are trying parse into two parts: an
undigested part and a semi-digested part.
The undigested part contains the tokens that are still to come in the input, and the semi-
digested part is put on a stack. If parsing the string v, it starts out completely undigested, so
the input is initialized to v, and the stack is initialized to empty. A shift-reduce parser
proceeds by taking one of three actions at each step:
○ Reduce: If we can find a rule A –> w, and if the contents of the stack are qw for
some q (q may be empty), then we can reduce the stack to qA. We are applying the
production for the nonterminal A backwards. There is also one special case: reducing the
entire contents of the stack to the start symbol with no remaining input means we have
recognized the input as a valid sentence (e.g., the stack contains just w, the input is
empty, and we apply S –> w). This is the last step in a successful parse. The w being
reduced is referred to as a handle.
○ Shift: If it is impossible to perform a reduction and there are tokens remaining in the
undigested input, then we transfer a token from the input onto the stack. This is called a
shift. For example, using the grammar above, suppose the stack contained ( and the input
contained id+id). It is impossible to perform a reduction on ( since it does not match
the entire right side of any of our productions. So, we shift the first character of the input
onto the stack, giving us (id on the stack and +id) remaining in the input.
○ Error: If neither of the two above cases apply, we have an error. If the sequence on the
stack does not match the right-hand side of any production, we cannot reduce. And if
shifting the next input token would create a sequence on the stack that cannot eventually
be reduced to the start symbol, a shift action would be futile. Thus, we have hit a dead
end where the next token conclusively determines the input cannot form a valid
sentence. This would happen in the above grammar on the input id+). The first id would
be shifted, then reduced to T and again to E, next + is shifted. At this point, the stack
contains E+ and the next input token is ). The sequence on the stack cannot be reduced,
and shifting the ) would create a sequence that is not viable, so we have an error.
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
The general idea is to read tokens from the input and push them onto the stack attempting to
build sequences that we recognize as the right side of a production. When we find a match,
we replace that sequence with the nonterminal from the left side and continue working our
way up the parse tree. This process builds the parse tree from the leaves upward, the inverse
of the top-down parser. If all goes well, we will end up moving everything from the input to
the stack and eventually construct a sequence on the stack that we recognize as a right-hand
side for the start symbol.
Example :
Grammar :
Input :
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Another example :
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
for LR parsing the way that LL parsing requires. The primary disadvantage is the amount of
work it takes to build the tables by hand, which makes it infeasible to hand-code an LR
parser for most grammars. Fortunately, there are LR parser generators that create the parser
from an unambiguous CFG specification. The parser tool does all the tedious and complex
work to build the necessary tables and can report any ambiguities or language constructs that
interfere with the ability to parse it using LR techniques. Rather than reading and shifting
tokens onto a stack, an LR parser pushes "states" onto the stack; these states describe what is
on the stack so far.
An LR parser uses two tables:
1. The action table : Action[s,a] tells the parser what to do when the state on top of the
stack is s and terminal a is the next input token. The possible actions are to shift a state onto
the stack, to reduce the handle on top of the stack, to accept the input, or to report an error.
2. The goto table : Goto[s,X] indicates the new state to place on top of the stack after a
reduction of the nonterminal X while state s is on top of the stack.
LR Parser Types
There are three types of LR parsers: LR(k), simple LR(k), and lookahead LR(k)
(abbreviated to LR(k), SLR(k), LALR(k))). The k identifies the number of tokens of
lookahead. We will usually only concern ourselves with 0 or 1 tokens of lookahead, but the
techniques do generalize to k > 1.
Here are some widely used LR parsers based on value of k.
○ LR(0) - No lookahead symbol
○ SLR(1) - Simple with one lookahead symbol
○ LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
unambiguous CFGs can be parsed with LR(1). The drawback of adding the lookahead is that
the algorithm becomes somewhat more complex and the parsing table gets much, much
bigger. The full LR(1) parsing table for a typical programming language has many thousands
of states compared to the few hundred needed for LR(0). A compromise in the middle is
found in the two variants SLR(1) and LALR(1) which also use one token of lookahead but
employ techniques to keep the table as small as LR(0). SLR(k) is an improvement over
LR(0) but much weaker than full LR(k) in terms of the number of grammars for which it is
applicable. LALR(k) parses a larger set of languages than SLR(k) but not quite as many as
LR(k). LALR(1) is the method used by the yacc parser generator.
○ Precedence parser
Simple precedence parser
Operator-precedence parser
Extended precedence parser
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
programs but also the compiler itself it is a comprehensive consistency check as it should be
able to reproduce its own object code.
Earlier versions are written for a subset of language and then checks itself and then
incrementally completes this.
1. Explain the lexical analysis ? Which tool can be used to generate the lexical analyzer ?
Explain a bit about tool.
1. Explain various tasks performed during lexical analysis. Also explain the relevance of
Regular Expression in lexical analysis.
1. What is context free grammar. Write down CFG for for loop of 'C' language
1. What is symbol table ?
An essential function of compiler is to record the identifiers and the relevant information
about its attribute type, its scope and in case of procedure or the function, names, arguments,
return types. A symbol table is a table containing a record for each identifier with fields for
the attribute of the identifier . This table is used by all the steps of compiler to access the data
as well as report errors.
1. Generate parse tree for following sentences based on standard arithmetic CFG
a -b *c
a + b * c -d / ( e * f)
a + b *c -d + e -f /(g + h )
a + b * c / d + e -f
A /b + c * d + e -f
9*7+5-2
Use following grammar if not given :
E => E + T | E-T|T
T => T * V | T/V| V
V => <id> | (E)
a-b*c
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 34
TRAJECTORY EDUCATION UGC–NET/COMPUTER SCIENCE/SLOT 10
First of all find a start prediction. ( Always start from lower to higher precedence in the input
string).
E => E - T
=> T-T
=> V-T
=> <id> - T
=> <id> - T*V
=> <id> - V*V
=> <id> - <id>*V
=> <id> - <id> * <id>
a+b *c -d / (e*f) ( two at the lowest precedence + and -, choose the one which is at rightmost
side i.e. -)
E => E - T
E => E + T - T
=> T + T - T
=> V + T - T
=> <id> + T - T
=> <id> + T * V - T
=> <id> + V * V - T
=> <id> + <id> * V - T
=> <id> + <id> * <id> - T
=> <id> + <id> * <id> - T/V
=> <id> + <id> * <id> - V/V
=> <id> + <id> * <id> - <id>/V
=> <id> + <id> * <id> - <id>/(E)
=> <id> + <id> * <id> - <id>/(T)
=> <id> + <id> * <id> - <id>/(T*V)
=> <id> + <id> * <id> - <id>/(V*V)
=> <id> + <id> * <id> - <id>/(<id>*V)
=> <id> + <id> * <id> - <id>/(<id>*<id>)
Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845, www.trajectoryeducation.com
Page no. 33