Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

CYK Algorithm

The CYK algorithm is used to parse strings and determine if they are generated by a given context-free grammar. It works by filling a table based on the grammar rules, where each cell represents a substring of the input string. Cells are filled starting from the top row, by looking at combinations of cells above and diagonally to generate the substring for the current cell based on the grammar rules. In this example, the algorithm is filling the table to parse the string "aaabb" based on sample grammar rules provided.

Uploaded by

karuniaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

CYK Algorithm

The CYK algorithm is used to parse strings and determine if they are generated by a given context-free grammar. It works by filling a table based on the grammar rules, where each cell represents a substring of the input string. Cells are filled starting from the top row, by looking at combinations of cells above and diagonally to generate the substring for the current cell based on the grammar rules. In this example, the algorithm is filling the table to parse the string "aaabb" based on sample grammar rules provided.

Uploaded by

karuniaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CYK Algorithm: More Details

𝑆 → 𝐴𝐵 Start by filling in the “top” row in


𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎
the table, with the variables that
𝐵 → 𝐶𝐶
𝐶→𝑏 directly produce the terminal symbol
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) at the corresponding position.

a a a b b
i→
1 2 3 4 5
j 1 A A A C C
 E.g. the single symbol at position
2 5 in the string is directly produced
3 by variable 𝐶, so we put a 𝐶 in the
4 entry at column 5, row 1.
5
Jim Anderson (modified by Nathan Otterness) 1
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Fill in subsequent rows by looking at the
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 cells directly above and the cells diagonally
𝐵 → 𝐶𝐶 to the upper right. (How to do this isn’t
𝐶→𝑏 very clear using row 2 as an example, so I’ll
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) wait for row 3 to go over it in more detail.)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3
4
5
Jim Anderson (modified by Nathan Otterness) 2
CYK Algorithm: More Details
𝑆 → 𝐴𝐵
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3
4
Say we want to fill
5
in this cell next.
Jim Anderson (modified by Nathan Otterness) 3
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We want to fill the red cell with the set of
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 variables that produce the substring with
𝐵 → 𝐶𝐶 length 𝑗 that starts at position 𝑖 (in this case,
𝐶→𝑏 we want variables that produce 𝑎𝑎𝑎).
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3
4
5
Jim Anderson (modified by Nathan Otterness) 4
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We are going to look for ways to produce a
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 substring of length 3 starting at position 1 by
𝐵 → 𝐶𝐶 concatenating two shorter strings.
𝐶→𝑏 We know that the first of these shorter strings
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) must start at the same position as the new,
longer substring.
a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 Substrings starting at
4 position 1 are produced by
5 variables in this column.
Jim Anderson (modified by Nathan Otterness) 5
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We know that, in order to have a substring of
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 length 3, we need to choose the second of our
𝐵 → 𝐶𝐶 two strings in such a way that the length of
𝐶→𝑏 the combined string is 3.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) A string produced
by a variable in cell
a a a b b
i→ 1,1 has a length of 1.
1 2 3 4 5 So, to get a string of length 3, we
j 1 A A A C C
 need to concatenate the first part
2 A A ∅ B with a second part that starts at
3 position 2 and is of length 2.
4 Such strings are produced by
5 variables in cell 2,2 of the table.
Jim Anderson (modified by Nathan Otterness) 6
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 So, now we know that we can get a substring
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 of length 3 starting at position 1 by
𝐵 → 𝐶𝐶 concatenating two strings that are produced
𝐶→𝑏 by variable 𝐴. So, we need to look for
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) productions that produce two 𝐴’s
concatenated together.
a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3
4
5
Jim Anderson (modified by Nathan Otterness) 7
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this case, one such production exists:
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 𝐴 → 𝐴𝐴. We will therefore add the
𝐵 → 𝐶𝐶 “producer” variable to the list of variables in
𝐶→𝑏 cell 1,3.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A
4
5
Jim Anderson (modified by Nathan Otterness) 8
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We can continue checking all of the possible
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 ways to produce strings of length 3 starting at
𝐵 → 𝐶𝐶 position 1. However, in this case, we’re just
𝐶→𝑏 looking at two 𝐴’s again so it’s not very
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) interesting.
A string produced
by a variable in cell
a a a b b
i→ 1,2 has a length of 2.
1 2 3 4 5
j 1 So, to get a string of length 3,
A A A C C
 we need a string that starts at
2 A A ∅ B
position 3 and is of length 1.
3 A
Such strings are produced by
4
variables in cell 3,1 of the table.
5
Jim Anderson (modified by Nathan Otterness) 9
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Let’s do the same exercise for the next cell in
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 the table.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A
4
5
Jim Anderson (modified by Nathan Otterness) 10
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this case, we want strings starting at
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 position 2 of length 3.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) A string produced by a
variable in cell 2,1 starts at
a a a b b position 2 and has length 1.
i→
1 2 3 4 5
j 1 So, to get a string of length 3,
A A A C C
 we need a string that starts at
2 A A ∅ B
position 3 and is of length 2.
3 A
Such strings are produced by
4
variables in cell 3,2 of the table.
5
Jim Anderson (modified by Nathan Otterness) 11
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 No possible productions produce a variable in
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 cell 2,1 followed by a variable in cell 3,2,
𝐵 → 𝐶𝐶 simply because cell 3,2 doesn’t contain any
𝐶→𝑏 variables.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A
4
5
Jim Anderson (modified by Nathan Otterness) 12
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Now, we are looking for productions that
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 produce an 𝐴 followed by a 𝐶. However,
𝐵 → 𝐶𝐶 there aren’t any.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) A string produced by a
variable in cell 2,2 starts at
a a a b b position 2 and has length 2.
i→
1 2 3 4 5
j 1 So, to get a string of length 3,
A A A C C
 we need a string that starts at
2 A A ∅ B
position 4 and is of length 1.
3 A
Such strings are produced by
4
variables in cell 4,1 of the table.
5
Jim Anderson (modified by Nathan Otterness) 13
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We weren’t able to find any productions for
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 this cell, so we’ll just indicate that the “set of
𝐵 → 𝐶𝐶 variables producing substrings of length 3
𝐶→𝑏 starting at position 2” is empty.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅
4
5
Jim Anderson (modified by Nathan Otterness) 14
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Following the same pattern, for cell 3,3, we’ll
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 first look for productions that produce an 𝐴
𝐵 → 𝐶𝐶 followed by a 𝐵.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅
4
5
Jim Anderson (modified by Nathan Otterness) 15
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 There are two variables that produce this: 𝑆,
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 and 𝐴. So, we’ll put both of these variables
𝐵 → 𝐶𝐶 into the set in cell 3,3.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 16
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 The next entries we check once again involves
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 an empty cell, so we don’t add anything else
𝐵 → 𝐶𝐶 to cell 3,3.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 17
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Now, we’ll move on to cell 4,1, corresponding
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 to substrings of length 4 starting at position 1.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 18
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 First, we’ll check these two cells, one of which
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 is empty so we clearly won’t find any
𝐵 → 𝐶𝐶 productions for this combination.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) Reminder: produces
substrings of length 1
a a a b b starting at position 1.
i→
1 2 3 4 5
j 1 A A A C C Produces substrings of length
 3 starting at position 2
2 A A ∅ B
3 A ∅ S,A (basically this is saying that
4 no variable in this grammar
5 can produce the string 𝑎𝑎𝑏.)
Jim Anderson (modified by Nathan Otterness) 19
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Next, there’s another production involving an
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 empty set, so we won’t find anything here.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 20
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Finally, we’ll check for productions that
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 produce an 𝐴 followed by a 𝐶, and there
𝐵 → 𝐶𝐶 aren’t any.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 21
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 So, cell 1,4 is empty.
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅
5
Jim Anderson (modified by Nathan Otterness) 22
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Next, we’ll check cell 2,4. The first
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 combination we check has two possible
𝐵 → 𝐶𝐶 variables corresponding to the second part of
𝐶→𝑏 the substring, so we’ll look for productions of
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) 𝐴𝑆 and 𝐴𝐴 (using both possible variables in
the second cell).
a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅
5
Jim Anderson (modified by Nathan Otterness) 23
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Nothing produces an 𝑆, however 𝐴 produces
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 𝐴𝐴, so we can add 𝐴 to cell 2,4.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ A
5
Jim Anderson (modified by Nathan Otterness) 24
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Next, we’ll look for productions that produce
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 𝐴𝐵, and we see that 𝑆 and 𝐴 produce 𝐴𝐵.
𝐵 → 𝐶𝐶 Cell 2,4 already contains 𝐴, so we only need to
𝐶→𝑏 add the 𝑆.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5
Jim Anderson (modified by Nathan Otterness) 25
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Finally, we know we won’t get any new
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 productions from these two cells, so we’re
𝐵 → 𝐶𝐶 done with cell 2,4.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5
Jim Anderson (modified by Nathan Otterness) 26
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Finally, we need to fill in the last cell,
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 corresponding to a string starting at position 1
𝐵 → 𝐶𝐶 of length 5. (In other words, the entire string.)
𝐶→𝑏 We’ll just work through the possibilities like
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) before.

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5
Jim Anderson (modified by Nathan Otterness) 27
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Once again, we’re looking for productions
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 that produce either 𝐴𝑆 or 𝐴𝐴. 𝐴 produces 𝐴𝐴,
𝐵 → 𝐶𝐶 so add it to cell 1,5.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 A
Jim Anderson (modified by Nathan Otterness) 28
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this example, we ended up already
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 considering the next combination, so we’ve
𝐵 → 𝐶𝐶 already added the variables we need to.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 A
Jim Anderson (modified by Nathan Otterness) 29
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We next look for productions producing 𝐴𝐵.
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 We already added 𝐴 to cell 1,5, so we only
𝐵 → 𝐶𝐶 need to add the 𝑆.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 S,A
Jim Anderson (modified by Nathan Otterness) 30
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 And finally, we have a combination involving
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 an empty cell—no productions from this.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 S,A
Jim Anderson (modified by Nathan Otterness) 31
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 So now we’re finished filling out the table. To
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 know if the string is in the language produced
𝐵 → 𝐶𝐶 by the CFG, we only need to see if the start
𝐶→𝑏 symbol is in the last cell.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 S,A
Jim Anderson (modified by Nathan Otterness) 32
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this case, cell 1,5 does contain the start
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 symbol 𝑆, so the string 𝑎𝑎𝑎𝑏𝑏 is in the
𝐵 → 𝐶𝐶 language produced by the grammar.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)

a a a b b
i→
1 2 3 4 5
j 1 A A A C C

2 A A ∅ B Recall that this is saying that the
3 A ∅ S,A start symbol is able to produce the
4 ∅ S,A “substring” with a length of 5
5 S,A starting at position 1.
Jim Anderson (modified by Nathan Otterness) 33

You might also like