Theory of Computation - CSE 105 Context-Free Languages: Sample Problems and Solutions
Theory of Computation - CSE 105 Context-Free Languages: Sample Problems and Solutions
Theory of Computation - CSE 105 Context-Free Languages: Sample Problems and Solutions
Context-free Languages
Sample Problems and Solutions
Designing CFLs
Problem 1 Give a context-free grammar that generates the following language over {0, 1}∗ :
Idea: this is similar to the language where the number of 0s is equal to the number of 1s, except we must
ensure that we generate at least one 1, and we must allow an arbitrary number of 1s to be generated anywhere
in the derivation. The following grammar accomplishes this task:
S → S1 1S1
S1 → 0S1 1|1S1 0|S1 S1 |1S1 |ǫ
Proof of correctness: it should be clear that this grammar cannot generate any strings not in L. The
production for S guarrantees that any string contains at least one 1, and any time a 0 is generated, at least
one additional 1 is generated with it. We must argue that the grammar generates all strings with more 1s
than 0s. The productions for S1 generate all strings containing a number of 1s greater than or equal to the
number of 0s (proven below). The production for S asserts that any string z in L can be written z = x1y
where N1 (x) ≥ N0 (x) and N1 (y) ≥ N0 (y). This is true: if z begins with a 1, we can say that z = ǫ1y. If
z begins with a 0, we can use a counter which is incremented by 1 for each 0 encountered and decremented
by 1 for each 1 encountered, and at some point in the string this counter must become -1 upon encountering
a 1 since z contains more 1s than 0s. Let the part of z prior to this point be x and the part of z after this
point be y; clearly, this breakdown of z = x1y satisfies the requirements stated above.
Now, to show that S1 generates all strings z such that N1 (z) ≥ N0 (z), the same “counter” argument will
work. If z begins with a 0, it must be of the form z = 0x1y where N1 (x) = N0 (x) and N1 (y) ≥ N0 (y).
If, on the other hand, z begins with a 1, it must either be the case that z = 1x0y where N1 (x) = N0 (x) and
N1 (y) ≥ N0 (y), or it is the case that z = 1x where N1 (x) ≥ N0 (x). Both of these cases are handled by the
S1 transitions.
Idea: we can break this language into the union of several simpler languages: L = {ai bj |i > j} ∪
{ai bj |i < j} ∪ (a ∪ b)∗ b(a ∪ b)∗ a(a ∪ b)∗ . That is, all strings of a’s followed by b’s in which the number of
a’s and b’s differ, unioned with all strings not of the form ai bj .
First, we can achieve the union of the CFGs for the three languages:
S → S1 |S2 |S3
S1 → aS1 b|aS1 |a
1
Similarly for {ai bj |i < j}:
S2 → aS2 b|S2 b|b
Finally, (a ∪ b)∗ b(a ∪ b)∗ a(a ∪ b)∗ is easily generated as follows:
S3 → XbXaX
X → aX|bX|ǫ
S → S1 |S2
For A1 , we simply ensure that the number of a’s equals the number of b’s:
S1 → S1 c|A|ǫ
A → aAb|ǫ
Similarly for ensuring that the number of b’s equals the number of c’s:
S2 → aS2 |B|ǫ
B → bBc|ǫ
Problem 4 Give a simple description of the language generated by the following grammar in English, then
use that description to give a CFG for the complement of that language.
S → aSb|bY |Y a
Y → bY |aY |ǫ
Clearly, Y generates (a ∪ b)∗ . S, then, generates strings like an (a ∪ b)∗ abn and an b(a ∪ b)∗ bn . Thus we
can get strings like ai bj where i > j, and we can also get strings like ai bj where i < j, but cannot get ai bj
where i 6= j. Furthermore, we can generate any string beginning with a b or ending with an a, and every
string beginning with a and ending with b that is not of the form ai bj . This, then, is exactly the complement
of the language {an bn |n ≥ 0}.
A grammar for the complement of this language (which is, of course, just {an bn |n ≥ 0}) is simply
S → aSb|ǫ.
2
Chomsky Normal Form (CNF)
Problem 5 Convert the following CFG into Chomsky normal form.
A → BAB|B|ǫ
B → 00|ǫ
S0 → A
A → BAB|B|ǫ
B → 00|ǫ
S0 → A
A → BAB|BA|AB|A|B|ǫ
B → 00
S0 → A|ǫ
A → BAB|BA|AB|BB|B
B → 00
S0 → A|ǫ
A → BAB|BA|AB|BB|00
B → 00
S0 → BAB|BA|AB|BB|00|ǫ
A → BAB|BA|AB|BB|00
B → 00
S0 → BA1 |BA|AB|BB|N0 N0 |ǫ
A → BA1 |BA|AB|BB|N0 N0
A1 → AB
B → N0 N0
N0 → 0
This grammar satisfies all the requirements for Chomsky Normal Form.
3
Designing PDAs
Problem 6 Give an informal description and state diagram for the language L = {w|w = wR , that is, w is a palindrome
This is fairly simple: we can push the first half of w, nondeterministically guess where its middle is, and
start popping the stack for the second half of w, making sure the second half matches what we pop off the
stack. We have to worry about the case where |w| is odd or even, though.
Here is the state diagram:
q0 q3
ε, ε $ ε, $ ε
q1 q2
ε, ε ε
0, ε ε
1, ε ε
0, ε 0 0, 0 ε
1, ε 1 1, 1 ε
From the start state q0 , we push a $ onto the stack to mark its bottom. In state q1 , we push the first half of
w onto the stack, not including the middle symbol if |w| is odd. Then we nondeterministically guess where
the middle occurs, at which point we can either move to state q2 without consuming any input if the length
of w is even, or simply ignore the middle symbol if |w| is odd. In state q2 , we pop each stack symbol from
the stack, ensuring that it matches the current input symbol. Finally, if all goes well, we will reach the end
of w with an empty stack (top symbol = $) and accept. Otherwise the PDA will always crash.
Problem 7 Give an informal English description of a PDA for the langauge L = the complement of the
language {an bn |n ≥ 0}.
A PDA for this language can be motivated by the CFG for it. Here is the CFG:
S → aSb|bY |Y a
Y → bY |aY |ǫ
Recall that this CFG generates strings of the form an b(a ∪ b)∗ bn pr an (a ∪ b)∗ abn . All we have to do to
accept strings of this form is to push the first n a’s onto the stack in state q1 , and nondeterministically switch
to a new state q2 when that is done. At this point we have two branches:
1. If the next symbol is a b, we “flush” that input, go to state q3 , then continue flushing the part of the
string corresponding to (a ∪ b)∗ . We nondeterministically guess when this is done and move to state
q4 , which pops n b’s corresponding to the number of a’s that were pushed at the beginning of the
string, finally switching to an accept state q5 if the correct number of b’s were matched.
2. If the next symbol was not a b, on the other hand, we allow the machine to switch from q2 to q6 ,
nondeterministically “flush” the (a ∪ b)∗ part of the string (in this case our input string must be of the
form an (a ∪ b)∗ abn ) then consume the a on the way to state q4 which as before pops n b’s and accepts
if everything matches correctly.
4
Problem 8 Convert the CFG G4 given below to an equivalent PDA.
The CFG G4 is:
E → E + T |T
T → T × F |F
F → (E)|a
Assuming that a shorthand notation allows us to write an entire string to the stack in one PDA step, this task
simply reduces to forming transition rules that implement the productions in the grammar.
Here is the PDA:
qstart
ε, ε E$
ε, E E+T a, a ε
ε, E T +, + ε
q
loop
ε, T TxF ), ) ε
ε, T F (, ( ε
ε, F (E) x, x ε
ε, $ ε ε, F a
q
accept
The transitions for the rules of the grammar allow us to nondeterministically replace grammar non-
terminals on the stack with their corresponding right-hand-sides; the transitions for the terminals of the
grammar (+, ×, ), (, a) allow matching of input symbols to grammar terminals. There will be an accepting
path through the PDA on string w if and only if w can be generated by the grammar G4 .
Problem 9 Construct a PDA for the language of all non-palindromes over {a, b}.
We can use the PDA for recognizing palindromes to create a PDA for this language. To change the PDA
accepting all palindromes into one that accepts all non-palidromes, we simply insist in the new machine that
there is at least one inconsistency between the first and second half of input string w. So the new PDA can
essentially be the same, except when we are popping symbols off the stack and matching them with inputs,
we must make sure that there is at least one a where there should have been a b or vice versa.
Here is the PDA:
5
0, 0 ε
1, 1 ε
q4 q3 1, 0 ε
q0 ε, $ ε 0, 1 ε
ε, ε $ 1, 0 ε
0, 1 ε
q1 q2
ε, ε ε
0, ε ε
1, ε ε
0, ε 0 0, 0 ε
1, ε 1 1, 1 ε
1, 0 ε
0, 1 ε
We first mark the bottom of the stack with a $, push the first half of the string (excepting the middle
symbol if the string has odd length) onto the stack in q1 , guess nondeterministically where the middle of the
string is and switch to state q2 . If wR 6= w, then at some point there will be a mismatch between what is on
the stack and in the input; when this is true, the machine can take the transition from q2 to q3 . Otherwise,
any match or mismatch of inputs symbols to symbols on the stack is allowed. Finally, the machine accepts
when 1) the input is exhausted and 2) the stack is empty.
Problem 11 Decide whether L = {x ∈ {a, b}∗ |Na (x) < Nb (x) < 2Na (x)} is a CFL and prove your
answer.
L is not context free. We can prove this with the pumping lemma. Let s = ap+1 b2p+1 . Clearly this string
is in L. If vxy = aj , j > 0, then uxz = ap+1−k b2p+1 , j ≤ k > 0, which is not in L because the number
of b’s is more than twice the number of a’s. If vxy = bj , j > 0, then uv 2 xy 2 z = ap+1 b2p+1+k , k > 0.
This string cannot be in L because there are at least twice as many b’s as a’s. If, on the other hand, vxy
contains both a’s and b’s, then the situation is a little more complicated. If Na (vy) > Nb (vy) then we can
pump down to get uxz, for which Na (uxz) = p + 1 − j and Nb (uxz) = 2p + 1 − k with j < k. But
2(p + 1 − j) = 2p + 1 + 1 − 2j < 2p + 1 − k, since 1 − 2j < −k whenever j < k, so uxz is not in L. If
Na (vx) ≤ Nb (vx), on the other hand, we can pump up to get a number of b’s more than the number of a’s:
if we use the string uv p xy p z, Na (uv p xy p z) = p + 1 + (p − 1)j and Nb (uv p xy p z) = 2p + 1 + (p − 1)k
where k ≥ j. Then 2Na (uv p xy p z) = 2p + 2 + 2j(p − 1) < 2p + 1 + (p − 1)k
6
Problem 12 Write a context-free grammar for the language L = {w#x | wR is a substring of x for
w, x ∈ {0, 1}∗ }.
Solution: Strings in this language share the property that they start with a string w followed by a #,
followed by anything, followed by wR , followed by anything. So we want strings of the form w#(0 ∪
1)∗ wR (0 ∪ 1)∗ . Let A generate the w#(0 ∪ 1) ∗ wR part, and let B generate the final (0 ∪ 1)∗ part. Thus
we want derivations that proceed as follows:
S → AB
A → 0A0 | 1A1 | #B
B → 0B | 1B | ǫ
Since the recursion with nonterminal A ends only when the transition A → #B, A must generate a
string whose beginning and end are mirror images. Since B generates (0 ∪ 1)∗ , the nonterminal A generates
all strings of the form w#(0 ∪ 1)∗ wR . Note that this also covers the case where w = ǫ. Since A is
followed by B in the transition for the top-level nonterminal S, the grammar generates all strings of the
form w#(0 ∪ 1)∗ wR (0 ∪ 1)∗ .
Problem 13 Show that D = {xy|x, y ∈ {0, 1}∗ , |x| = |y|, x 6= y} is a context free language.
Solution: any string z ∈ D must be even length, and its two halves must differ in at least one bit. This
means z can be written z = t0yv1w or z = t1yv0w where |t| = |v| and |y| = |w|. But this is the same as
saying z = t0vy1w or z = t1vy0w where |t| = |v| and |y| = |w|. Formulated this way, we can easily write
a grammar for the language:
S → S0 S1 |S1 S0
S0 → XS0 X|0
S1 → XS1 X|1
X → 0|1
Problem 14 Let C be a context-free language and R be a regular language. Prove that the language C ∩ R
is context-free. Then use the above to show that the language given below is not a CFL.
A = {w | w ∈ {a, b, c}∗ and contains equal numbers of a’s, b’s and c’s}
Solution: We have a CFL C and a regular language R and we want to show that C ∩ R is context-free.
Since C is given to be a CFL we know that there exists a PDA, say M1 , to recognize C. Since R is given
to be regular we have a DFA, say M2 , to recognize R. To prove that C ∩ R is a CFL we demonstrate a
pushdown automaton, call it M , that recognizes C ∩ R.
The proof is by construction. We construct M from M1 and M2 . The construction is similar to the proof
of showing that the class of regular languages are closed under the union (or intersection) operation on pg.
45 of the text.
Let M1 recognize C, where M1 = (Q1 , Σ, Γ1 , δ1 , q1 , F1 ).
Let M2 recognize R, where M2 = (Q2 , Σ, δ2 , q2 , F2 ).
Construct M to recognize C ∩ R, where M = (Q, Σ, Γ, δ, q, F ).
7
1. Q = {(r1 , r2 )|r1 ∈ Q1 and r2 ∈ Q2 }
3. Γ = Γ1
5. q = (q1 , q2 )
Note that the above construction works only because one of the machines being simulated (the DFA M2
above) does not need a stack. Observe that we may need to maintain 2 stacks if we attempted to simulate 2
PDA’s instead, and that a PDA cannot do that.
Now to show that the given language A is not a CFL, we will make the assumption that it is and then
derive a contradiction. Under this assumption we are guaranteed (from the part above) that if we intersected
some regular language with A, then the resulting language would be a CFL. So if we show that for some
regular language R and some language B which is not a CFL that, A ∩ R = B, then we have derived the
contradiction. To see what this R and B might be consider all these languages, A, B and R as capturing
“some property”. From the definition of A we see that this property is “equality” of a’s, b’s and c’s. For
B lets try the canonical example of the language that is not a CFL, viz. B = {an bn cn |n ≥ 0}. B has the
property of “equality” as well as “order” of (zero or more) a’s followed by (zero or more) b’s followed by
(zero or more) c’s. Now it is easy to see what we want of R; that R should have the property of “order”.
This is R = a∗ b∗ c∗ (and we know that this is regular).
Since we have a contradiction, it must be that A is not a CFL.
Problem 15 Let L4 = {w#x | w is a substring of x}. Show that L4 is not a context-free language (CFL).
Solution: In order to show that L4 is not a CFL, we will proceed by contradiction. Assume L4 is a CFL.
Let p be the pumping length given by the pumping lemma and let s = ap bp #ap bp . Because s is a member of
L4 and |s| > p, the pumping lemma says that s can be split into uvxyz satisfying the following conditions:
1. for each i ≥ 0, uv i xy i z ∈ L4 ,
3. |vxy| ≤ p.
For convenience, we also write s as L#R, where L and R stand for the strings to the left and to the right of
#, respectively.
First of all, we can note that vy cannot contain #, since uv 2 xy 2 z would contain more than one # and
would not be in L4 . Then, we can think of three possible cases for the string vy:
8
• vy contains more symbols from R than from L.
In this case (pumping down), uv 0 xy 0 z = L′ #R′ where |L′ | > |R′ |. Therefore, L′ cannot be a
substring of R′ and the entire string is not in L4 .
Because every possible way of splitting the input into uvxyz yields a contradiction, the initial assump-
tion that L4 is a CFL is false and the proof is complete.