CYK Algorithm
CYK Algorithm
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
E.g. the single symbol at position
2 5 in the string is directly produced
3 by variable 𝐶, so we put a 𝐶 in the
4 entry at column 5, row 1.
5
Jim Anderson (modified by Nathan Otterness) 1
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Fill in subsequent rows by looking at the
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 cells directly above and the cells diagonally
𝐵 → 𝐶𝐶 to the upper right. (How to do this isn’t
𝐶→𝑏 very clear using row 2 as an example, so I’ll
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) wait for row 3 to go over it in more detail.)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3
4
5
Jim Anderson (modified by Nathan Otterness) 2
CYK Algorithm: More Details
𝑆 → 𝐴𝐵
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3
4
Say we want to fill
5
in this cell next.
Jim Anderson (modified by Nathan Otterness) 3
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We want to fill the red cell with the set of
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 variables that produce the substring with
𝐵 → 𝐶𝐶 length 𝑗 that starts at position 𝑖 (in this case,
𝐶→𝑏 we want variables that produce 𝑎𝑎𝑎).
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3
4
5
Jim Anderson (modified by Nathan Otterness) 4
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We are going to look for ways to produce a
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 substring of length 3 starting at position 1 by
𝐵 → 𝐶𝐶 concatenating two shorter strings.
𝐶→𝑏 We know that the first of these shorter strings
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) must start at the same position as the new,
longer substring.
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 Substrings starting at
4 position 1 are produced by
5 variables in this column.
Jim Anderson (modified by Nathan Otterness) 5
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We know that, in order to have a substring of
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 length 3, we need to choose the second of our
𝐵 → 𝐶𝐶 two strings in such a way that the length of
𝐶→𝑏 the combined string is 3.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) A string produced
by a variable in cell
a a a b b
i→ 1,1 has a length of 1.
1 2 3 4 5 So, to get a string of length 3, we
j 1 A A A C C
need to concatenate the first part
2 A A ∅ B with a second part that starts at
3 position 2 and is of length 2.
4 Such strings are produced by
5 variables in cell 2,2 of the table.
Jim Anderson (modified by Nathan Otterness) 6
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 So, now we know that we can get a substring
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 of length 3 starting at position 1 by
𝐵 → 𝐶𝐶 concatenating two strings that are produced
𝐶→𝑏 by variable 𝐴. So, we need to look for
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) productions that produce two 𝐴’s
concatenated together.
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3
4
5
Jim Anderson (modified by Nathan Otterness) 7
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this case, one such production exists:
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 𝐴 → 𝐴𝐴. We will therefore add the
𝐵 → 𝐶𝐶 “producer” variable to the list of variables in
𝐶→𝑏 cell 1,3.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A
4
5
Jim Anderson (modified by Nathan Otterness) 8
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We can continue checking all of the possible
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 ways to produce strings of length 3 starting at
𝐵 → 𝐶𝐶 position 1. However, in this case, we’re just
𝐶→𝑏 looking at two 𝐴’s again so it’s not very
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) interesting.
A string produced
by a variable in cell
a a a b b
i→ 1,2 has a length of 2.
1 2 3 4 5
j 1 So, to get a string of length 3,
A A A C C
we need a string that starts at
2 A A ∅ B
position 3 and is of length 1.
3 A
Such strings are produced by
4
variables in cell 3,1 of the table.
5
Jim Anderson (modified by Nathan Otterness) 9
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Let’s do the same exercise for the next cell in
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 the table.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A
4
5
Jim Anderson (modified by Nathan Otterness) 10
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this case, we want strings starting at
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 position 2 of length 3.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) A string produced by a
variable in cell 2,1 starts at
a a a b b position 2 and has length 1.
i→
1 2 3 4 5
j 1 So, to get a string of length 3,
A A A C C
we need a string that starts at
2 A A ∅ B
position 3 and is of length 2.
3 A
Such strings are produced by
4
variables in cell 3,2 of the table.
5
Jim Anderson (modified by Nathan Otterness) 11
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 No possible productions produce a variable in
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 cell 2,1 followed by a variable in cell 3,2,
𝐵 → 𝐶𝐶 simply because cell 3,2 doesn’t contain any
𝐶→𝑏 variables.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A
4
5
Jim Anderson (modified by Nathan Otterness) 12
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Now, we are looking for productions that
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 produce an 𝐴 followed by a 𝐶. However,
𝐵 → 𝐶𝐶 there aren’t any.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) A string produced by a
variable in cell 2,2 starts at
a a a b b position 2 and has length 2.
i→
1 2 3 4 5
j 1 So, to get a string of length 3,
A A A C C
we need a string that starts at
2 A A ∅ B
position 4 and is of length 1.
3 A
Such strings are produced by
4
variables in cell 4,1 of the table.
5
Jim Anderson (modified by Nathan Otterness) 13
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We weren’t able to find any productions for
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 this cell, so we’ll just indicate that the “set of
𝐵 → 𝐶𝐶 variables producing substrings of length 3
𝐶→𝑏 starting at position 2” is empty.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅
4
5
Jim Anderson (modified by Nathan Otterness) 14
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Following the same pattern, for cell 3,3, we’ll
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 first look for productions that produce an 𝐴
𝐵 → 𝐶𝐶 followed by a 𝐵.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅
4
5
Jim Anderson (modified by Nathan Otterness) 15
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 There are two variables that produce this: 𝑆,
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 and 𝐴. So, we’ll put both of these variables
𝐵 → 𝐶𝐶 into the set in cell 3,3.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 16
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 The next entries we check once again involves
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 an empty cell, so we don’t add anything else
𝐵 → 𝐶𝐶 to cell 3,3.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 17
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Now, we’ll move on to cell 4,1, corresponding
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 to substrings of length 4 starting at position 1.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 18
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 First, we’ll check these two cells, one of which
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 is empty so we clearly won’t find any
𝐵 → 𝐶𝐶 productions for this combination.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) Reminder: produces
substrings of length 1
a a a b b starting at position 1.
i→
1 2 3 4 5
j 1 A A A C C Produces substrings of length
3 starting at position 2
2 A A ∅ B
3 A ∅ S,A (basically this is saying that
4 no variable in this grammar
5 can produce the string 𝑎𝑎𝑏.)
Jim Anderson (modified by Nathan Otterness) 19
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Next, there’s another production involving an
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 empty set, so we won’t find anything here.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 20
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Finally, we’ll check for productions that
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 produce an 𝐴 followed by a 𝐶, and there
𝐵 → 𝐶𝐶 aren’t any.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4
5
Jim Anderson (modified by Nathan Otterness) 21
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 So, cell 1,4 is empty.
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅
5
Jim Anderson (modified by Nathan Otterness) 22
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Next, we’ll check cell 2,4. The first
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 combination we check has two possible
𝐵 → 𝐶𝐶 variables corresponding to the second part of
𝐶→𝑏 the substring, so we’ll look for productions of
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) 𝐴𝑆 and 𝐴𝐴 (using both possible variables in
the second cell).
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅
5
Jim Anderson (modified by Nathan Otterness) 23
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Nothing produces an 𝑆, however 𝐴 produces
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 𝐴𝐴, so we can add 𝐴 to cell 2,4.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ A
5
Jim Anderson (modified by Nathan Otterness) 24
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Next, we’ll look for productions that produce
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 𝐴𝐵, and we see that 𝑆 and 𝐴 produce 𝐴𝐵.
𝐵 → 𝐶𝐶 Cell 2,4 already contains 𝐴, so we only need to
𝐶→𝑏 add the 𝑆.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5
Jim Anderson (modified by Nathan Otterness) 25
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Finally, we know we won’t get any new
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 productions from these two cells, so we’re
𝐵 → 𝐶𝐶 done with cell 2,4.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5
Jim Anderson (modified by Nathan Otterness) 26
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Finally, we need to fill in the last cell,
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 corresponding to a string starting at position 1
𝐵 → 𝐶𝐶 of length 5. (In other words, the entire string.)
𝐶→𝑏 We’ll just work through the possibilities like
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5) before.
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5
Jim Anderson (modified by Nathan Otterness) 27
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 Once again, we’re looking for productions
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 that produce either 𝐴𝑆 or 𝐴𝐴. 𝐴 produces 𝐴𝐴,
𝐵 → 𝐶𝐶 so add it to cell 1,5.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 A
Jim Anderson (modified by Nathan Otterness) 28
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this example, we ended up already
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 considering the next combination, so we’ve
𝐵 → 𝐶𝐶 already added the variables we need to.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 A
Jim Anderson (modified by Nathan Otterness) 29
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 We next look for productions producing 𝐴𝐵.
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 We already added 𝐴 to cell 1,5, so we only
𝐵 → 𝐶𝐶 need to add the 𝑆.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 S,A
Jim Anderson (modified by Nathan Otterness) 30
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 And finally, we have a combination involving
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 an empty cell—no productions from this.
𝐵 → 𝐶𝐶
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 S,A
Jim Anderson (modified by Nathan Otterness) 31
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 So now we’re finished filling out the table. To
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 know if the string is in the language produced
𝐵 → 𝐶𝐶 by the CFG, we only need to see if the start
𝐶→𝑏 symbol is in the last cell.
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B
3 A ∅ S,A
4 ∅ S,A
5 S,A
Jim Anderson (modified by Nathan Otterness) 32
CYK Algorithm: More Details
𝑆 → 𝐴𝐵 In this case, cell 1,5 does contain the start
𝐴 → 𝐴𝐴 | 𝐴𝐵 | 𝑎 symbol 𝑆, so the string 𝑎𝑎𝑎𝑏𝑏 is in the
𝐵 → 𝐶𝐶 language produced by the grammar.
𝐶→𝑏
𝑥 = 𝑎𝑎𝑎𝑏𝑏 (𝑛 = 5)
a a a b b
i→
1 2 3 4 5
j 1 A A A C C
2 A A ∅ B Recall that this is saying that the
3 A ∅ S,A start symbol is able to produce the
4 ∅ S,A “substring” with a length of 5
5 S,A starting at position 1.
Jim Anderson (modified by Nathan Otterness) 33