Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit3 Toc

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 97

Unit-III

REGULAR & CONTEXT FREE LANGUAGE


Content of This Unit

A Defining grammar, Chomsky hierarchy CO4


of Languages and Grammar.
Ambiguous to Unambiguous CFG.
B Simplification of CFGs. CO4
C Normal forms for CFGs, Pumping CO4
lemma for CFLs.
Beyond Regular Languages
• We know many languages are not regular.

• So what happens to the languages which are not regular?


An Example
• Let us consider the language of palindromes. The language L pal of
palindromes over Σ ={0, 1} is not a regular language.
• Lpal = { w | w is a binary palindrome}. Is Lpal regular?
• Proof:
• Let w=0n10n
• By Pumping lemma, w can be rewritten as xyz, such that xykz is also Lpal (for any
k≥0)
• But |xy|≤n and y≠ε
• ==> y=0+
• ==> xykz will NOT be in Lpal for k=0
• ==> Contradiction
An Example
• There is a natural, recursive definition of when a string of 0's and 1's is in
Lpa1.

• It starts with a basis saying that a few obvious strings are in Lpa1

• Then exploits the idea that if a string is a palindrome, it must begin and end
with the same symbol.

• Further, when the first and last symbols are removed, the resulting string
must also be a palindrome.
An Example
• Lpal can be inductively described as follows:
• BASIS: ε, 0, and 1 are palindromes.
• INDUCTION: If w is a palindrome, so are 0w0 and 1w1. No string, is a
palindrome of 0’s and 1’s, unless it follows this basis and induction rule.
• We will write the above rules formally as follows:
• P→ε
• P→0
A context-free grammar for palindromes
• P→1
• P→0P0
• P→1P1
Grammars
Example
Example
Chomsky hierarchy of Languages and
Grammar
• The classification of languages into Four classes.

• The description of the grammar for each class,

• The machine that can recognize the grammar / language.


The Chomsky Hierarchy

Type Language Grammar Automaton


0 Recursively Unrestricted Turing Machine
Computable /Recursively
enumerable
1 Context Sensitive Context Sensitive Linearly Bounded Automaton
2 Context Free Context Free Nondeterministic Pushdown
automaton (NPDA)
3 Regular Regular Deterministic or Nondeterministic
Finite-State Automaton (DFA, NFA)

Type3 ⊂ Type2 ⊂ Type1⊂ Type0


Types of Grammars
Types of Grammars
Unrestricted Grammars
Example
Context-Sensitive Grammars
Example
Context-Free Grammars
Example
Regular Grammars
Example
The Chomsky Hierarchy
Context-Free Grammars
• There are four important components in a grammatical description of a language:
1. There is a finite set of symbols that form the strings of the language being defined. We
call this alphabet the terminals, or terminal symbols. This set was {O, 1} in the
palindrome example.
2. There is a finite set of variables, also called sometimes nonterminals or syntactic
categories. Each variable represents a language; i.e., a set of strings. In the palindrome
example, there was only one variable, P.
3. One of the variables represents the language being defined; it is called the start
variable/symbol. Other variables represent auxiliary classes of strings that are used to
help define the language of the start variable. In the palindrome example, P, the only
variable, is the start symbol.
4. There is a finite set of productions or rules that represent the recursive definition of a
language. Each production consists of:
a. A variable that is being (partially) defined by the production. This variable is often called the head of
the production.
b. The production symbol →.
c. A string of zero or more terminals and variables. This string, called the body of the production,
represents one way to form strings in the language of the variable of the head. In so doing, we leave
terminals unchanged and substitute for each variable of the body any string that is known to be in the
Formal Definition of A Context-free Grammar
• A context-free grammar (CFG) is a 4-tuple (V, T, P, S), where
1. V is a finite set called the variables,
2. T is called the terminals,
3. P is a finite set of productions/rules, with each rule being a variable and a
string of variables and terminals, and
4. S ∈ V is the start variable.
Context-Free Grammar (CFG)
• The grammar Gpal for the palindromes is represented by

• Gpal =({P}, {0, 1}, A, P)

• where A represents the set of five productions.


Example: CFG for { 0n1n | n > 1}
• Productions:
S -> 01
S -> 0S1
• Basis: 01 is in the language.
• Induction: if w is in the language, then so is 0w1.

26
Example:A context-free grammar for simple
expressions
• We need two variables in this grammar. E, represents expressions. I,
represents identifiers.
1. E→I
2. E→E+E G = ({E, I}, T, P, E), where T is the set of symbols {+, *,
(, ), a, b, 0, 1} and P is the set of productions.
3. E→E*E
4. E→(E)
5. I→a
6. I→b
7. I→Ia
8. I→Ib
9. I→I0
10. I→I1
Derivations Using a Grammar
• We expand the start symbol using one of its productions (i.e., using a
production whose head is the start symbol).

• We further expand the resulting string by replacing one of the


variables by the body of one of its productions, and so on, until we
derive a string consisting entirely of terminals.

• The language of the grammar is all strings of terminals that we can


obtain in this way. This use of grammars is called derivation.
Derivations Using a Grammar
• The process of deriving strings by applying productions from head to
body requires the definition of a new relation symbol ⇒.

• Suppose G = (V, T, P, S) is a CFG. Let αAβ be a string of terminals


and variables, with A a variable. That is, α and β are strings in (V U
T)*, and A is in V.

• Let A→γ be a production of G. Then we say . Notice that one


derivation step replaces any variable anywhere in the string by the
body of one of its productions.
Derivations Using a Grammar
• We may extend the ⇒ relationship to represent zero, one, or many
derivation steps, much as the transition function δ of a finite
automaton was extended to . For derivations, we use a * to denote
"zero or more steps," as follows:
• Basis: For any string α of terminals and variables, we say That is, any
string derives itself.
• Induction: If and , . That is, if α can become β by zero or more steps,
and one more step takes β toγ , then α can become γ.
Example:
• The inference that a *(a+ b00) is in the language of variable E can be
reflected in a derivation of that string, starting with the string E. Here
is one such derivation:

• E⇒E*E⇒I*E⇒a*E⇒a*(E)⇒a*(E+E)⇒a*(a+E)⇒a*(a+I)⇒a*(a+I0)⇒
a*(a+I00)⇒a*(a+b00)
Leftmost and Rightmost Derivations
• In order to restrict the number of choices we have in deriving a string,
it is often useful to require that at each step we replace the leftmost
variable by one of its production bodies. Such a derivation is called a
leftmost derivation, and we indicate that a derivation is leftmost by
using the relation and , for one or many steps, respectively.
• Similarly, it is possible to require that at each step the rightmost
variable is replaced by one of its bodies. If so, we call this derivation
rightmost and use the symbols and to indicate one or many
rightmost derivation steps, respectively.
Example:
• The inference that a *(a+ b00) is in the language of variable E can be
reflected in a derivation of that string, starting with the string E.

Rightmost Derivation
Leftmost Derivation
Example: Leftmost Derivations
• Balanced-parentheses grammmar:
S -> SS | (S) | ()
• S =>lm SS =>lm (S)S =>lm (())S =>lm (())()
• Thus, S =>*lm (())()
• S => SS => S() => (S)() => (())() is a derivation, but not
a leftmost derivation.

34
Example: Rightmost Derivations
• Balanced-parentheses grammmar:
S -> SS | (S) | ()
• S =>rm SS =>rm S() =>rm (S)() =>rm (())()
• Thus, S =>*rm (())()
• S => SS => SSS => S()S => ()()S => ()()() is neither a
rightmost nor a leftmost derivation.

35
The Language of a Grammar
Example:
• G has productions S -> ε and S -> 0S1.
• L(G) = {0n1n | n > 0}.
Example
Example
Example
Example
Sentential Forms

Left-sentential form, consider a* E


Parse Trees
• Parse trees are trees labeled by symbols of a
particular CFG.
• Leaves: labeled by a terminal or ε.
• Interior nodes: labeled by a variable.
• Children are labeled by the right side of a production
for the parent.
• Root: must be labeled by the start symbol.

43
Example: Parse Tree
S -> SS | (S) | ()
S

S S

( S ) ( )

( )

44
Yield of a Parse Tree
• The concatenation of the labels of the leaves in left-to-right order
• That is, in the order of a preorder traversal.
is called the yield of the parse tree.
• Example: yield of is (())()

S S

( S ) ( )

( )
45
Parse Trees, Left- and Rightmost
Derivations
• For every parse tree, there is a unique leftmost, and a unique
rightmost derivation.

1. If there is a parse tree with root labeled A and yield w, then A =>*lm w.
2. If A =>*lm w, then there is a parse tree with root A and yield w.

46
Ambiguous Grammars
• A CFG is ambiguous if there is a string in the language that is the yield
of two or more parse trees.
• Example: S -> SS | (S) | ()

47
Example – Two parse trees for ()()()
S S

S S S S

S S ( ) ( ) S S

( ) ( ) ( ) ( )

48
Example
Example
Consider the sentential form E + E * E. It has two derivations from
Removing Ambiguity From Grammars
We can remove ambiguity solely on the basis of the following two properties –

1. Precedence: If different operators are used, we will consider the precedence of the operators. The three
important characteristics are :
I. The level at which the production is present denotes the priority of the operator used.
II. The production at higher levels will have operators with less priority. In the parse tree, the nodes which
are at top levels or close to the root node will contain the lower priority operators.
III. The production at lower levels will have operators with higher priority. In the parse tree, the nodes
which are at lower levels or close to the leaf nodes will contain the higher priority operators.
2. Associativity: If the same precedence operators are in production, then we will have to consider the
associativity.
• If the associativity is left to right, then we have to prompt a left recursion in the production. The parse
tree will also be left recursive and grow on the left side. +, -, *, / are left associative operators.

• If the associativity is right to left, then we have to prompt the right recursion in the productions. The parse
tree will also be right recursive and grow on the right side. ^ is a right associative operator.
Example

An unambiguous expression grammar


Ambiguity, Left- and Rightmost
Derivations
• If there are two different parse trees, they must
produce two different leftmost.
• Conversely, two different leftmost derivations
produce different parse trees by the other part of
the proof.
• Likewise for rightmost derivations.

53
Inherently Ambiguous CFLs
• However, for some languages, it may not be possible to remove
ambiguity

• A CFL is said to be inherently ambiguous if every CFG that describes it


is ambiguous
Example:
• L = { anbncmdm | n,m≥ 1} U {anbmcmdn | n,m≥ 1}
• L is inherently ambiguous
• Why?
Input string: anbncndn
Closure Properties of CFL’s
• CFL’s are closed under union, concatenation, and Kleene closure.

• Also, under reversal, homomorphisms and inverse homomorphisms.

• But not under intersection or difference.

55
Closure of CFL’s Under Union
• Let L and M be CFL’s with grammars G and H,
respectively.
• Assume G and H have no variables in common.
• Names of variables do not affect the language.
• Let S1 and S2 be the start symbols of G and H.

56
Closure Under Union – (2)
• Form a new grammar for L  M by combining all the symbols and
productions of G and H.
• Then, add a new start symbol S.
• Add productions S -> S1 | S2.

57
Closure Under Union – (3)
• In the new grammar, all derivations start with S.
• The first step replaces S by either S1 or S2.
• In the first case, the result must be a string in L(G) = L, and in the
second case a string in L(H) = M.

58
Closure of CFL’s Under Concatenation

• Let L and M be CFL’s with grammars G and H,


respectively.
• Assume G and H have no variables in common.
• Let S1 and S2 be the start symbols of G and H.

59
Closure Under Concatenation – (2)
• Form a new grammar for LM by starting with all symbols and
productions of G and H.
• Add a new start symbol S.
• Add production S -> S1S2.
• Every derivation from S results in a string in L followed by one in M.

60
Closure Under Star

• Let L have grammar G, with start symbol S1.


• Form a new grammar for L* by introducing to G a new
start symbol S and the productions S -> S1S | ε.
• A rightmost derivation from S generates a sequence of
zero or more S1’s, each of which generates some string in
L.

61
Closure of CFL’s Under Reversal

• If L is a CFL with grammar G, form a grammar for LR


by reversing the right side of every production.
• Example: Let G have S -> 0S1 | 01.
• The reversal of L(G) has grammar S -> 1S0 | 10.

62
Closure of CFL’s Under Homomorphism

• Let L be a CFL with grammar G.


• Let h be a homomorphism on the terminal
symbols of G.
• Construct a grammar for h(L) by replacing each
terminal symbol a by h(a).

63
Example: Closure Under Homomorphism

• G has productions S -> 0S1 | 01.


• h is defined by h(0) = ab, h(1) = ε.
• h(L(G)) has the grammar with productions S -> abS
| ab.

64
Nonclosure Under Intersection
• Unlike the regular languages, the class of CFL’s is not
closed under .
• We know that L1 = {0n1n2n | n > 1} is not a CFL (use
the pumping lemma).
• However, L2 = {0n1n2i | n > 1, i > 1} is.
• CFG: S -> AB, A -> 0A1 | 01, B -> 2B | 2.
• So is L3 = {0i1n2n | n > 1, i > 1}.
• But L1 = L2  L3.

65
Nonclosure Under Difference
• We can prove something more general:
• Any class of languages that is closed under difference is closed under
intersection.
• Proof: L  M = L – (L – M).
• Thus, if CFL’s were closed under difference, they would be closed
under intersection, but they are not.

66
Intersection with a Regular Language
• Intersection of two CFL’s need not be context free.
• But the intersection of a CFL with a regular
language is always a CFL.
• Proof involves running a DFA in parallel with a PDA,
and noting that the combination is a PDA.
• PDA’s accept by final state.

67
Simplification Of Context-free Grammars
• There are several ways in which one can restrict the format of
productions without reducing the generative power of context-free
grammars.
• If L is a nonempty context-free language then it can be generated by a
context-free grammar G with the following properties.
• Each variable and each terminal of G appears in the derivation of some word
in L.
• There are no productions of the form A -> B where A and B are variables.
Eliminating Useless Symbols
• Let G = (V, T, P, S) be a grammar. A symbol X is useful if there is a
derivation for some α, β, and w, where w is in T*. Otherwise X is
useless.

• A symbol is useful if it appears in some derivation of some terminal


string from the start symbol. Otherwise, it is useless.

• Eliminate all useless symbols by:


• Eliminate symbols that derive no terminal string.
• Eliminate unreachable symbols.
Eliminating Useless Symbols
• Lemma: Given a CFG G = (V, T, P, S) with L(G) ≠∅, we can
effectively find an equivalent CFG G' = (V’, T, P’, S) such that for
each A in V’ there is some w in T* for which .

• Algorithm:
• Step 1: Include all Symbols W1, that derives some terminal and initialize i=1.
• Step 2: Include all Symbols Wi+1, that derives Wi.
• Step 3: Increment i and repeat step 2, util Wi+1=Wi
• Step4: Include all production rules that have Wi in it.
Eliminating Useless Symbols
• Lemma: Given a CFG G = (V, T, P, S) we can effectively find an
equivalent CFG G' = (V’, T’, P’, S) such that for each X in V’ U T
there exist α and β in (V’ U T )* for which .

• Algorithm:
• Step 1: Include the Start Symbol in Y1 and initialize i=1
• Step 2:Include all symbols Yi+1, that can be derived from Yi and include all
production rules that have been applied
• Step 3:Incrimnet i and repeat step 2, util Yi+1=Yi.
Example
• Consider the grammar Apply first Lemma:
T={a,c,e}, W1={A,C,E}
• S→AC|B W2={A,C,E,S}
• A→a W3={A,C,E,S}
G’={(A,C,E,S), (a,c,e), P, S}
• C→c|BC P: S→AC, A→a, C→c, E→aA|e

• E→aA|e Apply Second Lemma:


Y1={S}
Y2={S, A, C}
Y3={S, A, C, a, c}
Y4={S, A, C, a, c}
G’’={(S, A, C), (a, c), P, S}
P: S→AC, A→a, C→c
Example
• Consider the grammar Apply first Lemma:
• S→AB|a T={a}, W1={S, A}
W2={S, A}
• A→a
G’={(S, A), (a), P, S}
P: S→a, A→a

Apply Second Lemma:


Y1={S}
Y2={S}
G’’={(S), (a), P, S}
P: S→a
ε-Productions
• The elimination of productions of the form A→ε, which we call ε-
productions.
• Surely if ε is in L(G), we cannot eliminate all ε- productions from G,
but if ε is not in L(G), it turns out that we can.
• The method is to determine for each variable A whether . If so, we call
A nullable. We may replace each production B → X1X2… Xn by all
productions formed by striking out some subset of those X i’s that are
nullable, but we do not include B →ε, even if all Xi’s are nullable.
Elimination of ε-Productions
• Algorithm:
• Step 1: To remove A→ε, look for all productions whose right side contains A.
• Step 2: Replace each occurrences of ‘A’ in each of these productions with ε.
• Step 3: Add the resultant productions to the grammar.
• Example:
• S→ABAC, A→aA|ε, B→bB|ε, C→c
• Remove A→ε
• S→ABAC| BAC| ABC| BC, A→aA|a, B→bB|ε, C→c
• Remove B→ε
• S→ABAC| BAC| ABC| BC|AAC|AC|C, A→aA|a, B→bB|b, C→c
Elimination of Unit Production
• Any production rule of the form A→B where A, B ∈ Non terminals is
called Unit Production.

• Algorithm:
• Step 1: To eliminate A→B, add production A→x to the grammar rule
whenever B→x occurs in the grammar. [ x is in Terminal or can be null]
• Step 2: Delete A→B from the grammar.
• Step 3:Repeat from step 1until all Unit Productions are eliminated.
Example
• Remove unit production from following grammar:
• S→XY
• X→a,
• Y→Z|b Re
mo
• Z→M ve
u ni t
• M→N pro
duc
ti o
• N→a ns
S→XY
X→a,
Y→a|b Remove unreachable symbols S→XY
Z→a X→a,
M→a Y→a|b
N→a
Normal forms for CFGs
• Normal forms: Productions are of a special form

• Chomsky Normal Form (CNF)

• Greibach Normal Form (GNF)


Chomsky Normal Form
Converting a CFG to a grammar in Chomsky
Normal Form
• Let G = (V, Σ, P, S) be a CFG. Below we give an algorithm to convert G
into a CFG in Chomsky Normal Form.

1. Removing S from RHS of rules: If S appears on the RHS of some rule,


add a new start variable S0 and the rule S0 → S.
2. Removing ε-rules: Pick an ε-rule A → ε and remove it (where A is not
the start variable).
• Now for every occurrence of A on the RHS of some of all rules, add a rule deleting that
occurrence of A. If A occurs multiple times on the RHS of a rule, then multiple rules might be
added.
• If we have the rule R → A, then add R → ε, unless the rule R → ε has been removed previously.
• Repeat until no more ε-rules remain, except possibly involving the start variable.
Converting a CFG to a grammar in Chomsky
Normal Form
• Example: Suppose a grammar had the following rules:
• A →ε
• B → uAv
• C → u1Av1Aw1
• Then the grammar formed by removing the rule A → ε will have the
corresponding set of rules
Converting a CFG to a grammar in Chomsky
Normal Form
3. Removing unit rules: Remove a rule A → B and for all rules of the
form B → u add the rule A → u, unless A → u is a unit rule that has
already been removed. Repeat until no more unit rules remain.
• Example: Suppose a grammar had the following rules:
• A→B
• B→u
• Then the grammar formed by removing the rule A → B will have the
corresponding set of rules
Converting a CFG to a grammar in Chomsky
Normal Form
4. Shortening the RHS:
Converting a CFG to a grammar in Chomsky
Normal Form
5. Replacing certain terminals
An Example – CFG to CNF
An Example – CFG to CNF
An Example – CFG to CNF
An Example – CFG to CNF
An Example – CFG to CNF
Example
• Consider the CFG : S→ aSb|ε generating the language L={anbn|n≥0}. we will
construct a CNF to generate the language L-{ε }i.e. {anbn|n≥1}.
• Solution:
• We first eliminate ε-productions ( generating the language {a nbn|n≥1}) and get S→
aSb|ab.
• Introduce nonterminals A, B and replace these productions with S→ ASB|AB,
A→a, B→b.
• Introduce nonterminal C and replace the only production S→ ASB (which is not
allowable form in CNF) with and S→AC and C→SB
• The final grammar in CNF is now
• S→AC | AB
• C→SB
• A→a
• B→b
Greibach Normal Form (GNF)
• A CFG G = (V, T, R, S) is said to be in GNF if every production is of
the form A → aα, where a ∈ T and α ∈ V∗ , i.e., α is a string of zero or
more variables.
Converting a CFG to a grammar in Greibach
Normal Form (GNF)
An Example – CFG to GNF
An Example – CFG to GNF
An Example – CFG to GNF
Pumping lemma for CFLs
• Let L be a context free language. Then there exist a constant n, that
depends only on L such that for all x ∈L, |x|≥n, then we can write
x=uvwxy, subject to the following conditions:
1. |vwx|≤n. That is, the middle portion is not too long.
2. vx≠ε. Since v and x are the pieces to be "pumped," this condition says that at
least one of the strings we pump must not be empty.
3. For all i ≥ 0, uviwxiy is in L. That is, the two strings v and x may be
"pumped" any number of times, including 0, and the resulting string will still
be a member of L.
Example
• L1 = {anbncn| n ≥ 0}
• Given p, choose w = apbpcp
• Now for any partition w = uvxyz, set i = 2.
• We show that w’ = uv2xy2 is not in L1.
• Consider the string vxy. Since |vxy| ≤ p, therefore vxy cannot contain all
three symbols. More specifically, it does not contain either a or c.
• Assume that it does not contain c’s. Also since v and y cannot both be
empty, therefore w’ will have more number of either a’s or b’s than the
number of c’s.
• Hence w’ ∉ L1. The case when w’ does not contain a’s is analogous.

You might also like