Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 38

Context-Free Languages &

Grammars (CFLs & CFGs)


READING: CHAPTER 5

1
Context-Free Languages
Grammar is a set of rules that check if a string belong to a
language or not.
A grammar that supports natural, recursive notation is called
“context-free grammar”.
CFLs are larger than the class of regular languages (those
supported by finite state machine)
Applications:
◦ Parse trees, compilers
◦ XML Regular Context-
(FA/RE) free
(PDA/CFG)

2
An Example
A palindrome is a word that reads identical from both ends

◦ E.g., madam, redivider, malayalam, 010010010

Let L = { w | w is a binary palindrome}

Is L regular?
◦ No.

3
An Example(Cntd.)
Proof Using Pumping Lemma:
Let w=0N10N (assuming N to be the p/l constant)

By Pumping lemma, w can be rewritten as xyz, such that xykz is also L (for
any k≥0) when |xy|≤N and y≠

w= xyz
=(0N-i) (0i) (10N)
Proof w= xykz in L
If k = 2 then w=(0N-i ) (0i)2 (10N)
=(0N+i ) 1 (0N)

4
Context-Free Grammar:
Definition
A context-free grammar G=(V,T,P,S), where:
◦ V: set of variables or non-terminals
◦ T: set of terminals (= alphabet U {})
◦ P: set of productions, each of which is of the form
V ==> 1 | 2 | …
◦ Where each i is an arbitrary string of variables and terminals
◦ S ==> start variable

CFG for the language of binary palindromes:


G=({A},{0,1},P,A)
P: A ==> 0 A 0 | 1 A 1 | 0 | 1 | 

5
CFG conventions
Terminal symbols <== a, b, c…
Non-terminal symbols <== A,B,C, …
Terminal or non-terminal symbols <== X,Y,Z
Terminal strings <== w, x, y, z
Arbitrary strings of terminals and non-terminals <==
, , , ..

6
Simple Expressions
E ==> E+E | E*E | (E) | F
F ==> aF | bF | 0F | 1F | a | b | 0 | 1

7
Simple Expressions…
G = (V,T,P,S)
◦ V = {E,F}
◦ T = {0,1,a,b,+,*,(,)}
◦ S = {E}
◦ P:
◦ E ==> E+E | E*E | (E) | F
◦ F ==> aF | bF | 0F | 1F | a | b | 0 | 1

8
How does the CFG for
palindromes work?
An input string belongs to the language (i.e.,
accepted) iff it can be generated by the CFG
G:
Example: w=01110 A => 0A0 | 1A1 | 0 | 1 | 
G can generate w as follows:
Generating a string from a grammar:
1. A => 0A0 1.Pick and choose a sequence
2. => 01A10 of productions that would
3. => 01110 allow us to generate the
string.
2.At every step, substitute one variable
with one of its productions.
9
But the language of
palindromes…
is a CFL, because it supports recursive substitution
(in the form of a CFG)
This is because we can construct a “grammar”
like this:
1. A ==>  Same as:
2. A ==> 0 Terminal A => 0A0 | 1A1 | 0 | 1 | 
3. A ==> 1
4. A ==> 0A0
Productions
5. A ==> 1A1
Variable or non-terminal

10
General structure of a
Production
head derivation body

A =======> 1 | 2 | … | k

The above is same as:


1. A ==> 1
2. A ==> 2
3. A ==> 3

K. A ==> k

11
Example #2
Language of balanced paranthesis
e.g., ()(((())))((()))….
CFG?
G:
S => (S) | SS | 

How would you “interpret” the string “(((()))()())” using this grammar?

12
Example #3
A grammar for L = {0m1n | m≥n}

CFG? G:
S => 0S1 | A
A => 0A | 

13
Example #4
A program containing if-then(-else) statements
if Condition then Statement else Statement
(Or)
if Condition then Statement

CFG?

14
More examples
•L1 = {0n | n≥0 }
•L2 = {0n | n≥1 }
•L3={0i1j2k | i=j or j=k, where i,j,k≥0}
•L4={0i1j2k | i=j or i=k, where i,j,k≥1}

15
String membership
How to say if a string belong to the language defined by
a CFG?
1. Derivation
◦ Head to body
2. Recursive inference Both are equivalent forms
◦ Body to head
Example: G:
◦ w = 01110 A => 0A0 | 1A1 | 0 | 1 | 
◦ Is w a palindrome?
A => 0A0
=> 01A10
=> 01110

16
Generalization of derivation
 Derivation is head ==> body

 A==>X (A derives X in a single step)


 A ==>*G X (A derives X in a multiple steps)

 Transitivity:
If A ==>*GB, and B ==>*GC, Then A ==>*G C

17
Context-Free Language
General Definition:
The language of a CFG, G=(V,T,P,S), denoted by
L(G), is the set of terminal strings that have a
derivation from the start variable S.
L(G) = { w in T* | S ==>*G w }

18
Left-most & Right-most G:

Derivation Styles E => E+E | E*E | (E) | F


F => aF | bF | 0F | 1F | 
Derive the string a*(ab+10) from G:
E E
==> E * E ==> E * E

==> F * E ==> E * (E)

==> aF * E ==> E * (E + E)
Left-most ==> a * E ==> E * (E + F) Right-most
derivation: ==> a * (E) ==> E * (E + 1F)
derivation:
==> a * (E + E) ==> E * (E + 10F)

Always ==> a * (F + E) ==> E * (E + 10)


Always
substitute ==> a * (aF + E) ==> E * (F + 10)
substitute
leftmost ==> a * (abF + E) ==> E * (aF + 10)
rightmost
==> a * (ab + E) ==> E * (abF + 0)
variable variable
==> a * (ab + F) ==> E * (ab + 10)

==> a * (ab + 1F) ==> F * (ab + 10)

==> a * (ab + 10F) ==> aF * (ab + 10)

==> a * (ab + 10) ==> a * (ab + 10)

E ==>*G a*(ab+10)
19
Leftmost vs. Rightmost
derivations
Q1) For every leftmost derivation, there is a rightmost derivation,
and vice versa. True or False?
True - will use parse trees to prove this

Q2) Does every word generated by a CFG have a leftmost and a


rightmost derivation?
Yes – easy to prove (reverse direction)

Q3) Could there be words which have more than one leftmost (or
rightmost) derivation?
Yes – depending on the grammar

20
How to prove that your CFGs
are correct?
(USING INDUCTION)

21
CFG & CFL Gpal:
A => 0A0 | 1A1 | 0 | 1 | 

Theorem: A string w in (0+1)* is in L(Gpal), if and


only if, w is a palindrome.
Proof:
◦ Use induction
◦ on string length for the IF part:
◦ It assumes w is a palindrome and prove that it belongs to L(G pal)
◦ On length of derivation for the ONLY IF part
◦ It assumes w is in L(Gpal) and prove w can be derived from A.

22
Parse Trees
Each derivation of CFG can be expressed using trees. A derivation
tree/parse tree is an ordered rooted tree that graphically represents
the semantic information a string derived from CFG.
Each CFG can be represented using a parse tree:
◦ Each internal node is labeled by a variable in V
◦ Each leaf is terminal symbol A
◦ For a production, A==>X1X2…Xk, X1 … Xi … Xk
then any internal node labeled A has
k children which are labeled from
X1,X2,…Xk from left to right

23
Examples
E

Recursive inference
A
E + E
0 A 0
F F

Derivation
1 A 1
a 1

Parse tree for 0110


Parse tree for a + 1
G: G:
E => E+E | E*E | (E) | F A => 0A0 | 1A1 | 0 | 1 | 
F => aF | bF | 0F | 1F | 0 | 1 | a | b
24
Parse Trees, Derivations, and
Recursive Inferences
Production:
A ==> X1..Xi..Xk
A

Derivation
X1 … Xi … Xk
Recursive
inference

Left-most Parse tree


derivation

Derivation Right-most
Recursive
derivation
inference
25
Interchangeability of different
CFG representations
Parse tree ==> left-most derivation
◦ DFS left to right
Parse tree ==> right-most derivation
◦ DFS right to left
==> left-most derivation == right-most derivation
Derivation ==> Recursive inference
◦ Reverse the order of productions
Recursive inference ==> Parse trees
◦ bottom-up traversal of parse tree

26
Connection between CFLs
and RLs

27
What kind of grammars result for regular languages?

CFLs & Regular Languages


A CFG is said to be right-linear if all the productions are
one of the following two forms: A ==> wB (or) A ==> w
Where:
• A & B are variables,
• w is a string of terminals

Theorem 1: Every right-linear CFG generates a regular


language
Theorem 2: Every regular language has a right-linear
grammar
Theorem 3: Left-linear CFGs also represent RLs

28
Ambiguity in CFGs and CFLs

29
Ambiguity in CFGs
A CFG is said to be ambiguous if there exists a string which has more
than one left-most derivation or right-most derivation

Example: LM derivation #1: LM derivation #2:


S ==> AS |  S => AS S => AS
=> 0A1S => A1S
A ==> A1 | 0A1 | 01 => 0A11S
=>0A11S
=> 00111S => 00111S
=> 00111 => 00111

Input string: 00111


Can be derived in two ways

30
Why does ambiguity matter?
Values are
E ==> E + E | E * E | (E) | a | b | c | 0 | 1 different !!!
string = a * b + c
E
• LM derivation #1:
•E => E + E => E * E + E E + E (a*b)+c
==>* a * b + c
E * E c

a b
E
• LM derivation #2
•E => E * E => a * E => E E a*(b+c)
*
a * E + E ==>* a * b + c
a E + E
The calculated value depends on which
of the two parse trees is actually used. b c

31
Removing Ambiguity in
Expression Evaluations
It MAY be possible to remove ambiguity for some
CFLs
◦ E.g.,, in a CFG for expression evaluation by imposing rules
& restrictions such as precedence
◦ This would imply rewrite of the grammar

Precedence: (), * , + Modified unambiguous version:


E => E + T | T
T => T * F | F
Ambiguous version: F => I | (E)
I => a | b | c | 0 | 1
E ==> E + E | E * E | (E) | a | b | c | 0 | 1

32
Inherently Ambiguous CFLs
However, for some languages, it may not be
possible to remove ambiguity

A CFL is said to be inherently ambiguous if every


CFG that describes it is ambiguous
Example:
◦ L = { anbncmdm | n,m≥ 1} U {anbmcmdn | n,m≥ 1}
◦ L is inherently ambiguous
◦ Why?
Input string: akbkckdk

33
Example
For w = aabbccdd

34
Applications of CFLs & CFGs
Compilers use parsers for syntactic checking
Parsers can be expressed as CFGs
1. Balancing paranthesis:
◦ B ==> BB | (B) | Statement
◦ Statement ==> …
2. If-then-else:
◦ S ==> SS | if Condition then Statement else Statement | if Condition then
Statement | Statement
◦ Condition ==> …
◦ Statement ==> …
3. C paranthesis matching { … }
4. Pascal begin-end matching
5. YACC (Yet Another Compiler-Compiler)

35
More applications
Markup languages
◦ Nested Tag Matching
◦ HTML
◦ <html> …<p> … <a href=…> … </a> </p> … </html>

◦ XML
◦ <PC> … <MODEL> … </MODEL> .. <RAM> … </RAM> … </PC>

36
Tag-Markup Languages
Roll ==> <ROLL> Class Students </ROLL>
Class ==> <CLASS> Text </CLASS>
Text ==> Char Text | Char
Char ==> a | b | … | z | A | B | .. | Z
Students ==> Student Students | 
Student ==> <STUD> Text </STUD>

Here, the left hand side of each production denotes one non-terminals
(e.g., “Roll”, “Class”, etc.)
Those symbols on the right hand side for which no productions (i.e.,
substitutions) are defined are terminals (e.g., ‘a’, ‘b’, ‘|’, ‘<‘, ‘>’, “ROLL”,
etc.)

37
Summary
•Context-free grammars
•Context-free languages
•Productions, derivations, recursive inference, parse
trees
•Left-most & right-most derivations
•Ambiguous grammars
•Removing ambiguity
•CFL/CFG applications
• parsers, markup languages

38

You might also like