Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
28 views

String Searching Algorithm

This document discusses several string searching algorithms: Naive, Knuth-Morris-Pratt, Shift-OR, Boyer-Moore, Boyer-Moore-Horspool, and Karp-Rabin. It explains the basic ideas and provides examples for each algorithm.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

String Searching Algorithm

This document discusses several string searching algorithms: Naive, Knuth-Morris-Pratt, Shift-OR, Boyer-Moore, Boyer-Moore-Horspool, and Karp-Rabin. It explains the basic ideas and provides examples for each algorithm.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

String Searching Algorithm

 指導教授 : 黃三益 教授
 組員 : 9142639 蔡嘉文
9142642 高振元
9142635 丁康迪
String Searching Algorithm
 Outline:
 The Naive Algorithm
 The Knuth-Morris-Pratt Algorithm
 The SHIFT-OR Algorithm
 The Boyer-Moore Algorithm
 The Boyer-Moore-Horspool Algorithm
 The Karp-Rabin Algorithm
 Conclusion
String Searching Algorithm
 Preliminaries:
 n: the length of the text
 m: the length of the pattern(string)
 c: the size of the alphabet
 Cn: the expected number of comparisons
performed by an algorithm while searching
the pattern in a text of length n
The Naive Algorithm
Char text[], pat[] ;
int n, m ;
{
int i, j, k, lim ; lim=n-m+1 ;
for (i=1 ; i<=lim ; i++) /* search */
{
k=i ;
for (j=1 ; j<=m && text[k]==pat[j]; j++) k++;
if (j>m) Report_match_at_position(i-j+1);
}
}
The Naive Algorithm(cont.)
 The idea consists of trying to match any
substring of length m in the text with the
pattern.
The Knuth-Morris-Pratt Algorithm
{
int j, k ;
int next[Max_Pattern_Size];
initnext(pat, m+1, next); /*preprocess pattern, 建立
j=k=1 ; next table*/
do{ /*search*/
if (j==0 || text[k]==pat[j] ) k++; j++;
else j=next[j] ;
if (j>m) Report_match_at_position(k-m);
} while (k<=n)
}
The Knuth-Morris-Pratt
Algorithm(cont.)
 To accomplish this, the pattern is preprocessed
to obtain a table that gives the next position in
the pattern to be processed after a mismatch.
 Ex:
position: 1 2 3 4 5 6 7 8 9 10 11
pattern: a b r a c a d a b r a
Next[j]: 0 1 1 0 2 0 2 0 1 1 0
text: a b r a c a f ……………
The Shift-Or Algorithm
 The main idea is to represent the state of the
search as a number.
 State=S1 . 20 + S2 . 21+…+Sm . 2m-1
 Tx=δ(pat1=x) . 20 + δ(pat2=x) +…..+
δ(patm=x) . 2m-1
 For every symbol x of the alphabet,
whereδ(C) is 0 if the condition C is true, and
1 otherwise.
The Shift-Or Algorithm(cont.)
 Ex:{a,b,c,d} be the alphabet, and ababc the
pattern.
T[a]=11010,T[b]=10101,T[c]=01111,T[d]=11111
the initial state is 11111
The Shift-Or Algorithm(cont.)
 Pattern: ababc
 Text: a b d a b a b c

 T[x]:11010 10101 11111 11010 10101 11010 10101 01111


 State: 11110 11101 11111 11110 11101 11010 10101 01111
 For example, the state 10101 means that in the current
position we have two partial matches to the left, of
lengths two and four, respectively.
 The match at the end of the text is indicated by the
value 0 in the leftmost bit of the state of the search.
The Boyer-Moore Algorithm
 Search from right to left in the pattern
 Shift method :
 match heuristic
compute the dd table for the pattern
 occurrence heuristic
compute the d table for the pattern
The Boyer-Moore Algorithm
(cont.)
Match shift
The Boyer-Moore Algorithm
(cont.)
occurrence shift
The Boyer-Moore Algorithm
(cont.)
k=m
while(k<=n){
j=m;
while(j>0&&text[k]==pat[j])
{ j -- , k -- }
if(j == 0)
{ report_match_at_position(k+1) ; }
else k+= max( d[text[k] , dd[j]);
}
The Boyer-Moore Algorithm
(cont.)
 Example

T : xyxabraxyzabracadabra
P : abracadabra

mismatch, compute a shift


The Boyer-Moore-Horspool
Algorithm
 A simplification of BM Algorithm

 Compares the pattern from left to right


The Boyer-Moore-Horspool
Algorithm(cont.)
for(k=;k<=m;k++) d[pat[k] = m+1-k;
pat[m+1]=CHARACTER_NOT_IN_THE_TEXT;
lim = n-m+1;
for( k=1; k<=lim ; k+= d[text[k+m]] )
{
i=k;
for(j=1 ; text[i]==pat[j] ; j++) i++;
if( j==m+1) report_match_at_position(k);
}
The Boyer-Moore-Horspool
Algorithm(cont.)
 Eaxmple :

T:xyzabraxyzabracadabra
P:abracadabra
The Karp-Rabin Algorithm
 Use hashing
 Computing the signature function of
each possible m-character substring
 Check if it is equal to the signature
function of the pattern
 Signature function h(k)=k mod q, q is a
large prime
The Karp-Rabin
Algorithm(cont.)
rksearch( text, n, pat, m ) /* Search pat[1..m] in text[1..n] */
char text[], pat[]; /* (0 m = n) */
int n, m;
{
int h1, h2, dM, i, j;
dM = 1;
for( i=1; i<m; i++ ) dM = (dM << D) % Q; /* Compute the signature */
h1 = h2 = O; /* of the pattern and of */
for( i=1; i<=m; i++ ) /* the beginning of the */
{ /* text */
h1 = ((h1 << D) + pat[i] ) % Q;
h2 = ((h2 << D) + text[i] ) % Q;
}
The Karp-Rabin
Algorithm(cont.)
for( i = 1; i <= n-m+1; i++ ) /* Search */
{
if( h1 == h2 ) /* Potential match */
{
for(j=1; j<=m && text[i-1+j] == pat[j]; j++ ); /* check */
if( j > m ) /* true match */
Report_match_at_position( i );
}
h2 = (h2 + (Q << D) - text[i]*dM ) % Q; /* update the signature */
h2 = ((h2 << D) + text[i+m] ) % Q; /* of the text */
}
}
Conclusions
 Test: Random pattern, random text and English
text
 Best: The Boyer-Moore-Horspool Algorithm
 Drawback: preprocessing time and space(depend
on alphabet/pattern size)
 Small pattern: The Shift-Or Algorithm
 Large alphabet: The Knuth-Morris-Pratt Algorithm
 Others: The Boyer-Moore Algorithm
 “don’t care”: The Shift-Or Algorithm

You might also like