Compilers - Week 4
Compilers - Week 4
-With only one token of lookahead, if the parser knew the next terminal
in the input stream is an int, it wouldn’t know which production to use
because we have two production rules that begin with an int.
-the parser wouldn't be able to decide on a production rule for E, since
they both begin with the nonterminal T.
-to solve this problem, the grammar is left factored.
Code:
Initialize stack <S $> and next
repeat
Case stack of
<X, rest> : if T[X, *next] = 𝑌1.... 𝑌𝑛
then stack <- <𝑌1.... 𝑌𝑛rest>;
else error();
<t, rest> : if t == *next++
Then stack <- <rest>;
Else error();
until stack == <>
Notes:
I-the stack initially contains the starting nonterminal and the
dollar sign.
ii-the dollar sign marks the bottom of the stack, or the end of the
input.
Note: the top of the stack will be a non-terminal, only when it’s
the leftmost non-terminal.
Example:
If X is a terminal:
The first set of a terminal contains only the terminal
itself.
If X is a nonterminal:
i- and it goes to ε, then ε ∈ First(𝑋)
ii- and it goes to a sequence of other nonterminals
which all go to ε, then ε ∈ First(𝑋)
iii- and it goes to a sequence of other nonterminals
which all go to ε followed by another nonterminal or a
terminal α , then the First(α) ⊆ First(𝑋)
iii- follow sets
Follow(X) = { t | S -> *β X t δ }
we say that t is in the follow of x if there is some derivation, where
that terminal t can appear immediately after the symbol x.
Consider the grammar:
S -> Xt
X -> AB
i- First(B) - {ε} ⊆ Follow(A)
ii- Follow(X) ⊆ Follow(B) where X is on the left of the
production, B is the rightmost terminal or nonterminal on
the right side of the production
S -> Xt -> ABt
iii- Follow(X) ⊆ Follow(A) if B-> *ε
Notes:
i- the dollar sign is in the follow set of the starting symbol.
ii- Epsilons never appear in follow sets, so follow sets are
just sets of terminals.
iii- If Follow(X) ⊆ Follow(E) and Follow(E) ⊆ Follow(X), then
Follow(E) = Follow(X).
iv- Blank entries represent parsing errors
a b $
S b/ Sa
Here we have a multiply defined entry since for the nonterminal
S, we can obtain b in the first position by using either of these
two production rules.
-if S is our leftmost nonterminal, and b is our next input symbol,
this table doesn't tell us exactly what move to make. (it’s not
deterministic)
-if any entry in the table is multiply defined, then the grammar is
not LL(1).
Grammars that are guaranteed to not be LL(1):
-Not left factored
-Left recursive
-Ambiguous
-grammars that require more than one token of look ahead
Notes:
-the above list is not complete.
-just because a grammar is left factored, not left
recursive, and unambiguous, doesn’t guarantee it’s
LL(1). The only way to know for sure is to construct
the parsing table and check if all entries are not
multiply defined.
-The grammars that describe most programming
languages are not LL1, since LL(1) grammars are too
weak to actually capture all of the interesting and
important constructs in commonly used
programming languages.
VII- bottom-up parsing
-Although bottom-up parsing builds on the ideas in top-down parsing,
it’s more general and just as efficient.
-an important advantage of bottom-up parsing is that it doesn’t have to
deal with left-factored grammars only -but it sure can’t deal with
ambiguous grammars.
-bottom-up parsing reduces the input string of tokens to the start
symbol by inverting productions (reductions).
-When we do a reduction we replace the children (right hand side) of
some production by its left hand side (the parent).
An interesting fact:
A bottom-up parser traces a right-most derivation in reverse
(using reductions instead of productions).
VIII- shift-reduce parsing (the only two moves used by bottom-up parser)
Let αβω be a step of a bottom-up parse, assuming the next reduction is
by X -> β, then ω is a string of terminals because a bottom-up parsing
traces a right-most derivation in reverse which means X has to be the
rightmost nonterminal, that is there are no nonterminals on the right
of X.
Note:
-those terminal symbols represented by ω to the right of the
right most non-terminal are exactly the unexamined input.
-a vertical bar is placed between the examined substring of the
input -which contains both terminals and nonterminals- and the
unexamined substring of the input -which contains terminals
only.
i- shift moves
a shift move, reads one token of input, and that can be
represented by moving the vertical bar one token to the right.
ii- reduce moves
A reduce move is to apply an inverse production at the right end
of the string on the left of the vertical bar.
Note:
it turns out that the left string can be implemented by a
stack, and that's because we only do reduce operations
immediately to the left of the vertical bar, so it's always
some suffix of the string to the left of the vertical bar
where the reduction is happening.
In short:
A shift move pushes a terminal that has been read onto the stack,
while a reduce move pops some number of symbols - that
represent the right hand side of a production rule- off of the
stack, and pushes a nonterminal -that represents the left hand
side of the production rule- onto the stack.
Items of a production:
An item is a production with a “.” somewhere on the RHS.
Consider the following production:
T -> (E)
Possible items for this production are:
T -> .(E)
T -> (.E)
T -> (E.)
T -> (E).
Notes:
-the only item for a production X -> ε is X -> . and it’s
referred to as an LR(0) item.
-in any successful parse, what is on the stack has to always
be a prefix of the right hand side of some production or
productions.
Consider the input (int), and the following production rules:
E -> T + E | T
T -> int * T | int | (E)
-(E is a prefix of the rhs of T -> (E), and it’s gonna be
reduced after the next shift.
- Item T -> (E.) records the fact that we're working on the
production T -> (E), so far we've seen (E, and we’re
hoping to see a ).
Notes:
-M is the machine and G is our grammar.
-S|$ means all the input is gone and we've reduced the
entire input to the start symbol.
-checking if M rejects α is not necessary since a parsing
error is reported if neither a shift nor reduce moves can
take place which already implies that we will never form an
invalid stack.
-if M accepts α with items I -> items I is a state of items in
our DFA.
-If there is a conflict in the last step, grammar is not SLR(k)
where k is the amount of lookahead -which is 1 in practice.
😁
CHECK OUT THE TWO EXAMPLES WRITTEN IN THE NOTEBOOK
And with that, the compiler's season comes to an end! Good luck!