Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Week14 Chap7 String Algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Week14 Chap7 String Algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

APPLIED ALGORITHMS

CONTENTS

• Boyer Moore algorithm


• Rabin Karp algorithm
• KMP algorithm
APPLIED ALGORITHMS
STRING PROCESSING ALGORITHM

3 4
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

5 6

BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible characters as possible
• Preprocessing the sample string P
• Last[x]: The rightmost position that appears the
letter x in P Last[a] = 4, Last[b] = 3,
Last[c] = 2

7 8
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a
Bad character c
• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left j = 4, unmatch position • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right (a character of T), P is slid to the right
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) where j is the current index (mismatch occurring)
on P when matching characters from right to left on P when matching characters from right to left

9 10

BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible a c b a characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
a c b a a c b a
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right (a character of T), P is slid to the right a c b a
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) where j is the current index (mismatch occurring)
on P when matching characters from right to left on P when matching characters from right to left

11 12
BOYER MOORE ALGORITHM BOYER MOORE ALGORITHM

a b a c b a c a c b a c a b a c b a c a c b a c

a c b a a c b a

• Slide the sample string P from left to right • Slide the sample string P from left to right
• Match: right to left • Match: right to left
• Use preprocessing information to skip as many • Use preprocessing information to skip as many
characters as possible a c b a characters as possible a c b a
• Preprocessing the sample string P • Preprocessing the sample string P
• Last[x]: The rightmost position that appears the • Last[x]: The rightmost position that appears the
a c b a a c b a
letter x in P letter x in P
• When a mismatch occurs with the bad character x • When a mismatch occurs with the bad character x
(a character of T), P is slid to the right a c b a (a character of T), P is slid to the right a c b a
max{j - Last[x], 1} positions max{j - Last[x], 1} positions
where j is the current index (mismatch occurring) a c b a where j is the current index (mismatch occurring) a c b a
on P when matching characters from right to left on P when matching characters from right to left
a c b a
13 14

BOYER MOORE ALGORITHM CONTENTS

computeLast(p){ boyerMoore(P, T){


for c = 0 to 255 do last[c] = 0; computeLast(P);
• Boyer Moore algorithm
k = p.length();
for i = k-1 downto i >= 0 do {
s = 0; cnt = 0;
N = T.length(); M = P.length();
• Rabin Karp algorithm
if last[p[i]] = 0 then last[p[i]] = i; while s <= N-M do { • KMP algorithm
} j = M-1;
} while j >= 0 && T[j+s] = P[j] do
j = j -1;
if j == -1 then {
cnt++; s = s + 1;
}else{
k = last[T[j+s]];
s = s + (j - k > 1 ? j - k : 1);
}
}
return cnt;
}

15 16
RABIN KARP ALGORITHM RABIN KARP ALGORITHM

• The Rabin-Karp algorithm converts the sample strings to non-negative integers • Disadvantage
• Each letter in the alphabet is represented by a non-negative integer less than d • When M is large, converting strings to numbers takes considerable time,
• Convert the string P[1..M] to a positive integer • Can cause overflow for the basic data types of the programming language

p = P[1]*dM-1 + P[2]*dM-2 + . . . + P[M]*d0 • Solution: perform division by Q and get the remainder value
• Match patterns by comparing 2 corresponding code values: • When the 2 remainders are different, it means 2 different numeric values and 2 corresponding
• If the two codes are different, the two corresponding strings are different strings are also different
• If the two codes are equal, we proceed to match each character • When the two remainders are equal, match each character in the traditional way
• Use the Horner scheme to increase the speed of calculating the encoding of substrings in T
• With sliding position s, convert the substring T[s+1 .. s+M] to number:

Ts = T[s+1]*dM-1 + T[s+2]*dM-2 + . . . + T[s+M]*d0


• With sliding position s+1, Ts+1 can be efficiently calculated based on Ts (previously calculated):
Ts+1 = (Ts - T[s+1]*dM-1)*d + T[s+M+1]

17 18

RABIN KARP ALGORITHM CONTENTS

hashCode(p){ rabinKarp(P, T){


c = 0; cnt = 0; N = T.length(); M = P.length(); • Boyer Moore algorithm
for i = 0 to p.length()-1 do {
c = c*256 + p[i];
e = dM-1;
codeP = hashCode(P); codeT = hashCode(T,0,M-1);
• Rabin Karp algorithm
c = c%Q; for s = 0 to N-M do {
• KMP algorithm
} if(codeP = codeT){
return c; ok = true;
} for j = 0 to M-1 do if P[j] != T[j + s] then {
hashCode(s, start, end){ ok = false; break;
c = 0; }
for i = start to end do { if ok then cnt++;
c = c*256 + s[i]; }
c = c%Q; t = T[s]*e; t = t %Q; t = (codeT - t)%Q;
} codeT = (t*d + T[s+M])%Q;
return c; }
} return cnt;
}

19 20
KMP ALGORITHM KMP ALGORITHM

a b a c b a c a c b a c • Preprocessing:
• [q]: length of the longest prefix which is also the
strict suffix of the string P[1..q]
a c b a

• Slide the sample string P from left to right 1 2 3 4 5 6 7 8


• Match: left to right P a b a b a b c a
• Use preprocessing information to skip as many
characters as possible  0 0 1 2 3 4 0 1

21 22

KMP ALGORITHM KMP ALGORITHM

• Preprocessing: • Slide the sample string P from left to right over T


• [q]: length of the longest prefix which is also the
strict suffix of the string P[1..q] kmp(P, T){
q = 0;
computePI(P){
for i = 1..N do {
1 2 3 4 5 6 7 8 pi[1] = 0;
while q > 0 && P[q+1] != T[i]
P a b a b a b c a k = 0;
q = pi[q];
for q = 2  M do {
 0 0 1 2 3 4 0 1 while(k > 0 && P[k+1] != P[q])
if P[q+1] = T[i]
q = q + 1;
k = pi[k];
if(q = M){
if P[k+1] = P[q] then
output(i-M+1);
k = k + 1;
q = pi[q];
pi[q] = k;
}
}
}
}
}

23 24
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; Init: q = 0 q = q + 1; i = 1, T[1] = P[0+1]  q = 1
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

25 26

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 2, T[2]  P[1+1] q = q + 1; i = 2, T[2]  P[1+1]  q = [1] = 0
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

27 28
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 3, T[3] = P[0+1] q = q + 1; i = 3, T[3] = P[0+1]  q = q + 1 = 1
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

29 30

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 4, T[4] = P[1+1] q = q + 1; i = 4, T[4] = P[1+1]  q = q + 1 = 2
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

31 32
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 5, T[5] = P[2+1] q = q + 1; i = 5, T[5] = P[2+1]  q = q + 1 = 3
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

33 34

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 6, T[6] = P[3+1] q = q + 1; i = 6, T[6] = P[3+1]  q = q + 1 = 4
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

35 36
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T • Slide the sample string P from left to right over T

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 7, T[7] = P[4+1] q = q + 1; i = 7, T[7] = P[4+1]  q = q + 1 = 5
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

37 38

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 8, T[8] = P[5+1] q = q + 1; i = 8, T[8] = P[5+1]  q = q + 1 = 6
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

39 40
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 9, T[9] = P[6+1] q = q + 1; i = 9, T[9] = P[6+1]  q = q + 1 = 7
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

41 42

KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 10, T[10] = P[7+1] q = q + 1; i = 10, T[10] = P[7+1]  q = q + 1 = 8 FOUND
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

43 44
KMP ALGORITHM KMP ALGORITHM

• Slide the sample string P from left to right over T i • Slide the sample string P from left to right over T i

kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21


q = 0; q = 0;
a a a b a b a b c a c a b a b a b a b c a a a a b a b a b c a c a b a b a b a b c a
for i = 1..N do { for i = 1..N do {
while q > 0 && P[q+1] != T[i] a b a b a b c a while q > 0 && P[q+1] != T[i] a b a b a b c a
q = pi[q]; q = pi[q];
q q
if P[q+1] = T[i] if P[q+1] = T[i]
q = q + 1; i = 10, T[10] = P[7+1]  q = q + 1 = 8 FOUND, q = [q] = [8] = 1 q = q + 1; i = 11, T[11]  P[1+1]
if(q = M){ if(q = M){
output(i-M+1); output(i-M+1);
q = pi[q]; q = pi[q];
} }
} 1 2 3 4 5 6 7 8 } 1 2 3 4 5 6 7 8
} P a b a b a b c a } P a b a b a b c a

 0 0 1 2 3 4 0 1  0 0 1 2 3 4 0 1

45 46

KMP ALGORITHM KMP ALGORITHM


kmp(P, T){
• Slide the sample string P from left to right over T i computePi(p){ P = "-" + P; T = "-" + T;
pi[1] = 0; computePi(P);
kmp(P, T){ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 int k = 0;
cnt = 0;
q = 0; for q = 2 to p.length()-1 do {
a a a b a b a b c a c a b a b a b a b c a N = T.length()-1;
while(k > 0 && p[k+1] != p[q]) do
for i = 1..N do { M = P.length()-1;
k = pi[k];
while q > 0 && P[q+1] != T[i] a b a b a b c a q = 0;
if (p[k+1] = p[q]) then k = k + 1;
q = pi[q]; for i= 1 to N do {
q pi[q] = k;
if P[q+1] = T[i] while(q > 0 and P[q+1] != T[i]) do
}
q = q + 1; i = 11, T[11]  P[1+1]  q = [q] = [1] = 0 q = pi[q];
}
if(q = M){ if(P[q+1] = T[i]) then
output(i-M+1); q = q + 1;
q = pi[q]; if(q = M) then {
} cnt += 1; q = pi[q];
} 1 2 3 4 5 6 7 8 }
} P a b a b a b c a }
return cnt;
 0 0 1 2 3 4 0 1
}

47 48
THANK YOU !

49

You might also like