Context Free Grammar
Context Free Grammar
Context Free Grammar
nonterminal: N = { S }
nonterminal: N = { S }
}
DERIVATION of
aba*
S ->
A ->
abA
aA | ^
ab*a
S ->
B ->
aBa
aB | ^
aSb | acb
Claim: L1 = L2.
Proof:
We first show that L2 L1.
Consider a^n L2 for n >= 1. We can generate an by using first production n times, and
then second production.
Can generate ^ L2 by using second production only.
Hence L2 L1.
We now show that L1 L2.
Since a is the only terminal, CFG can only produce strings having only as.
Thus, L1 L2.
Note that Two types of arrows:
-> used in statement of productions
=> used in derivation of string
in the above derivation of a4, there were many unfinished stages that consisted of both
terminals and nonterminals. These are called working strings.
^ is neither a nonterminal (since it cannot be replaced with something else) nor a
terminal (since it disappears from the string).
Example: terminals: = { a, b }
nonterminals: S
productions:
S -> aS
S -> bS
S -> a
S -> b
More compact notation:
S -> aS | bS | a | b
Can produce the string abbab as follows:
S => aS
=> abS
=> abbS
=> abbaS
=> abbab
Let L1 be the CFL, and let L2 be the language generated by the
regular expression (a + b)+.
Claim: L1 = L2.
Proof:
First we show that L2 L1.
Consider any string w L2.
Read letters of w from left to right.
For each letter read in, if it is not the last, then
Trees
Can use a tree to illustrate how a string is derived from a CFG.
Definition: These trees are called syntax trees, parse trees, generation trees,
production trees, or derivation trees.
Example: CFG:
terminals: a, b
nonterminals: S, A
productions:
S ! AAA | A
A ! AA | aA | Ab | a | b
String abaaba has the following derivation:
S => AAA
=> aAAA
=> abAA
=> abAbA
=> abaAbA
=> abaabA
=> abaaba
which corresponds to the following derivation tree:
S
/|\
/|\
/|\
/|\
AAA
/||\|
/||\|
a AAb a
|/\
|/\
baA
|
|
a
: +, _, 0, 1, 2, . . . , 9
:S
:
S!S+S|S_S|0|1|2||9
Consider the expression 2 _ 3 + 4.
Ambiguous how to evaluate this:
Does this mean (2 _ 3) + 4 = 10 or 2 _ (3 + 4) = 14 ?
Can eliminate ambiguity by examining the two possible derivation trees
SS
/|\/|\
/|\/|\
/|\/|\
/|\/|\
S+SS*S
/|\||/|\
/|\||/|\
2*3423+4
Eliminate the Ss as follows:
+*
/\/\
/\/\
/\/\
/\/\
*42+
/\/\
/\/\
2334
Note that we can construct a new notation for mathematical expressions:
start at top of tree
walk around tree keeping left hand touching tree
first time hit each terminal, print it out.
Definition: A CFG is ambiguous if for at least one string in its CFL there are two possible
derivations of the string that correspond to two different syntax trees.
Example: PALINDROME
Terminals
: a, b
Nonterminals : S
Productions :
S ! aSa | bSb | a | b | _
Can generate the string babbab as follows:
S => bSb
=> baSab
=> babSbab
=> babbab
which has derivation tree:
S
/|\
bSb
/|\
aSa
/|\
bSb
|
^
Can show that this CFG is unambiguous.
Definition: For a given CFG, the total language tree is the tree with root S, whose
children are all the productions of S, whose second descendents are all the working
strings that can be constructed by applying one production to the leftmost nonterminal in
each of the children, and so on.
Example:
Terminals : a, b
Nonterminals : S, X
Productions :
S -> aX | Xa | aXbXa
X -> ba | ab
This CFG has total language tree as follows:
S
/|\
/|\
/|\
/|\
/|\
aX Xa aXbXa
/|/|/\
/|/|/\
aba aab baa aba ababXa aabbXa
/\/\
ababbaa abababa aabbbaa aabbaba
The CFL is finite.
Other References:
http://web.njit.edu/~marvin/cis341/chap12.pdf
http://www.cs.appstate.edu/~dap/classes/2490/chap12.html
S-
A
+
b
B
b
9
10
Example:
FA:
A+
a
a
a
S-
C+
b
b
b
a
productions:
S => aS | bA
A => aC | bB | A
B => aB | bC
C => aA | bB | A
Consider a CFG G = (E, , R, S), where
E is the set of terminals
is the set of nonterminals, and
S is the starting nonterminal R (E+ )* is the set of productions,
where a production (N, U) R with N and U (E + )* is written as N -> U
Definition: For a given CFG G = (E,,R, S), W is a semiword if W E * ;
i.e., W is a string of terminals (maybe none) cancatenated with exactly one nonterminal
(on the right).
11
Right RG:
Ni *Nj
4b.
Left RG:
Ni Nj *
A regular grammar can be either Right Regular or Left Regular not both. Each of N i
and Nj is a single non-terminal and they could be the same. That is, N i = Nj is
allowed.
Theorem 22 If a CFG is a regular grammar, then the language generated by this CFG is
regular.
Proof.
We will prove theorem by showing that there is a TG that accepts the language enerated
by the CFG.
Suppose CFG is as follows:
N1 -> w1M1
N2 -> w2M2
...
Nn -> wnMn
Nn+1 -> wn+1
Nn+2 -> wn+2
...
Nn+m ! wn+m
where Ni and Mi are nonterminals (not necessarily distinct) and wi E* are strings of
terminals. Thus, wiMi is a semiword. At least one of the Ni = S. Assume that N1 = S.
Create a state of the TG for each nonterminal Ni and for each nonterminal Mj .
Also create a state +.
Make the state for nonterminal S the initial state of the transition graph.
Draw an arc labeled with wi from state Ni to state Mi if and only if there is a
production Ni -> wiMi.
Draw an arc labeled with wi from state Ni to state + if and only if there is a
production Ni -> wi.
Thus, we have created a TG.
12
Remarks:
all regular languages can be generated by some regular grammars (Theorem 21)
all regular grammars generate some regular language.
a regular language may have many CFGs that generate it, where some of the
CFGs may not be regular grammars.
Example: CFG
productions:
S -> aB | bA | abA | baB
A -> abaA | bb
B -> baA | ab
ab
aba
bb
b
S-
ba
a
a, b
ba
B
13
14
15
Example: CFG:
S -> XY
X -> Y b | Xa | aa | Y Y
Y -> XbbX | ab
The word abbaaabbabab has the following derivation tree:
S
/\
/ \
/ \
/
\
/
\
X _Y_
/\ //\\
Yb /||\
/\ XbbX
ab /\/\
Xa /\
/\Y Y
aa /\/\
ab ab
Note that if we walk around the tree starting down the left branch of the root with our left
hand always touching the tree, then the order in which we first visit each nonterminal
corresponds to the order in which the nonterminals are replaced in LMD.
This is true for any derivation in any CFG
Theorem 27 Any word that can be generated by a given CFG by some derivation
also has a LMD.
Other References:
http://web.njit.edu/~marvin/cis341/chap13.pdf
16
17
Once blank is encountered on INPUT TAPE, all of the following cells also
contain .
Read TAPE one cell at a time, from left to right. Cannot go back.
START, ACCEPT, and REJECT states.
Once enter either ACCEPT or REJECT state, cannot ever leave.
READ state to read input letter from INPUT TAPE.
Also, have an infinitely tall PUSHDOWN STACK, which has Last-In-First-Out
(LIFO) discipline.
Always start with STACK empty.
STACK can hold letters of STACK alphabet (which can be same as input
alphabet) and blanks .
A
B
B
.
.
a
b
ba
18
a
+
Figure 2:
Start
Rea
d
Rea
d
Rea
d
Accept
b
Reject
Figure 3: A
Reject
PDA
19
A pushdown automaton for { anbn : where n >= 0 } is given below in Figure 4. Please
note that X is used a stack symbol; that is, when an a is read from input X is
pushed into the stack and when a b is read from the input X is poped out of the
stack. On the other hand, a could also have been used as a stack symbol.
Figure 4.
A Pushdown Automaton
20
PDA:
Input alphabet E = {a, b,X}
Stack alphabet L = {a, b}
21
Start
Push a
a
Read
1
Accep
t
PO
P1
Push b
Read
2
PO
P2
PO
P3
22
Other References:
http://www.cs.odu.edu/~toida/nerzic/390teched/cfl/cfg.html
23