Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
28 views

18 String Matching - KMP Algorithm

Uploaded by

ctxxfx5ykz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

18 String Matching - KMP Algorithm

Uploaded by

ctxxfx5ykz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Design and Analysis of Algorithms

String Matching Problem


(Knuth-Morris-Pratt Algorithm)

Dr. D. P. Acharjya
Professor, SCOPE
SJT Annex – 201 E
2/28/2024 Dr. D. P. Acharjya 1
String Matching Problem
 Finding all occurrences of a pattern (String) in a text
leads to string matching problem.
 It arises frequently in text editing programs where text
is a document, and the string searched for is a
particular word.
 String-matching algorithms are also used to search for
particular patterns in DNA sequences.
 For Example:

2/28/2024 Dr. D. P. Acharjya 2


Mathematical Formulation
 Assume that the text is an array T[1  n] of length n.
The pattern is an array P[1  m] of length m. It is to be
noted that m ≤ n.

 It is to be noted that the elements of P and T are


characters drawn from a finite alphabet Σ. For example,
Σ = {0, 1} or Σ = {a, b, , z}.

 The character arrays P and T are often called strings of


characters.
2/28/2024 Dr. D. P. Acharjya 3
Continued…
 The pattern P occurs with shift s in text T (or pattern P
occurs beginning at position s + 1 in text T) if
0 ≤ s ≤ n - m and T[s+1  s+m] = P[1  m]
 It means that T[s + j] = P[j], for 1 ≤ j ≤ m.
 If P occurs with shift s in T, then it is a valid shift
otherwise, s is an invalid shift.
 The string-matching problem is the problem of finding
all valid shifts with which a given pattern P occurs in
a given text T.

2/28/2024 Dr. D. P. Acharjya 4


Prefix Function
 The prefix function π for a pattern encapsulates
knowledge about how the pattern matches against
shifts of itself.
 This information can be used to avoid testing useless
shifts in the naïve pattern-matching algorithm.
 Simultaneously, it avoid the precomputation of δ
(transition function of finite automaton) for a string-
matching automaton.
 It minimizes the total computation in comparison with
finite automaton string matching.

2/28/2024 Dr. D. P. Acharjya 5


Knuth-Morris-Pratt Algorithm
KMP-MATCHER(T, P)
1. n ← Length[T]
2. m ← Length[P]
3. π ← COMPUTE PREFIX FUNCTION(P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P[q + 1] ≠ T[i]
7. do q ← π[q]
8. if P[q + 1] = T[i]
9. then q ← q + 1
10. if q = m
11. then print “Pattern occurs with shift” (i – m)
12. q ← π[q]
2/28/2024 Dr. D. P. Acharjya 6
Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
1. m ← Length[P]
2. π[1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P[k + 1] ≠ P[q]
6. do k ← π[k]
7. if P[k +1] = P[q]
8. then k ← k+1
9. π[q] ← k
10. Return π
2/28/2024 Dr. D. P. Acharjya 7
Computing Time
 The for loop at step 4 of compute prefix function is
computed (m-1) times. We associate a potential of k
with the current state k of the algorithm. This potential
has an initial value of 0.
 The only other line that affects k is line 8, which
increases k by at most one during each execution of
the for loop body. Since the number of outer-loop
iterations is (m - 1), the computing time for compute
prefix function is Θ(m).
 The total actual worst-case running time is Θ(m). It is
considered as pre-processing time.
2/28/2024 Dr. D. P. Acharjya 8
Continued…
 The for loop at step 5 of KMP algorithm is computed
n - times. We associate a potential of q with the
current state q of the algorithm. This potential has an
initial value of 0.
 The line that affects q is line 9, which increases q by at
most one during each execution of the for loop body.
 Since the number of outer-loop iterations is n, the
computing time for KMP algorithm is Θ(n).
 The total computing time for the algorithm is: Θ(m) +
Θ(n).
 The algorithm computes the match in linear time.
2/28/2024 Dr. D. P. Acharjya 9
Numerical Illustration
 Consider a text T =
bacbabababacaca, find all the
valid shifts for the pattern P =
ababaca.
 Compute Prefix Function
 m = Length[P] = 7
  (1) = 0
 k=0
 q=2
 k > 0 and P[1]P[2] (False)
2/28/2024 Dr. D. P. Acharjya 10
Continued …
 q=2
 k > 0 and P[1]P[2] (False)
 P[1] = P[2] (False)
 [2] = k = 0
 q=3
 k > 0 and P[1]P[3] (False)
 P[1] = P[3] (True)
 k=k+1=1
 [3] = 1

2/28/2024 Dr. D. P. Acharjya 11


Continued …
 q=4
 k=1>0 and P[2]P[4] (False)
 P[k+1=2] = P[4] (True)
 k=k+1=2
 [4] = 2
 q=5
 k=2>0 and P[3]P[5] (False)
 P[k+1=3] = P[5] (True)
 k=k+1=3
 [5] = 3
2/28/2024 Dr. D. P. Acharjya 12
Continued …
 q=6
 k=3>0 and P[4]P[6] (True)
 k = [3] = 1
 k=1>0 and P[2]P[6] (True)
 k = [1] = 0
 k=0>0 and P[1]P[6] (False)
 P[k+1=1] = P[6] (False)
 [6] = 0 (k)

2/28/2024 Dr. D. P. Acharjya 13


Continued …
 q=7
 k=0>0 and P[1]P[7] (False)
 P[k+1=1] = P[7] (True)
 k = k+1 =1
 [7] = 1 (k)

2/28/2024 Dr. D. P. Acharjya 14


Continued …
 n = Length [T] = 15
 m = Length[P] = 7
 Compute 

 q=0
 For i = 1 to 15

2/28/2024 Dr. D. P. Acharjya 15


Continued …
 For q = 0 and i = 1
 q >0 and P[1]  T[1] (False)
 P[1] = T[1] (False)
 q  m (7)

2/28/2024 Dr. D. P. Acharjya 16


Continued …
 For q = 0 and i = 2
 q >0 and P[1]  T[2] (False)
 P[1] = T[2] (True)
 q = q +1 = 1
 q (1)  m (7)

2/28/2024 Dr. D. P. Acharjya 17


Continued …
 For q = 1 and i = 3
 q=1>0 and P[2]T[3](True)
 q = [q =1] = 0
 q=0>0 and P[1]T[3](False)
 P[1] = T[3] (False)
 q (0)  m (7)

2/28/2024 Dr. D. P. Acharjya 18


Continued …
 For q = 0 and i = 4
 q>0 and P[1]T[4](False)
 P[1] = T[4] (False)
 q (0)  m (7)

2/28/2024 Dr. D. P. Acharjya 19


Continued …
 For q = 0 and i = 5
 q>0 and P[1]T[5](False)
 P[1] = T[5] (True)
 q = q + 1 =1
 q (1)  m (7)

2/28/2024 Dr. D. P. Acharjya 20


Continued …
 For q = 1 and i = 6
 q>0 and P[2]T[6](False)
 P[2] = T[6] (True)
 q = q + 1 =2
 q (2)  m (7)

2/28/2024 Dr. D. P. Acharjya 21


Continued …
 For q = 2 and i = 7
 q>0 and P[3]T[7](False)
 P[3] = T[7] (True)
 q = q + 1 =3
 q (3)  m (7)

2/28/2024 Dr. D. P. Acharjya 22


Continued …
 For q = 3 and i = 8
 q>0 and P[4]T[8](False)
 P[4] = T[8] (True)
 q = q + 1 =4
 q (4)  m (7)

2/28/2024 Dr. D. P. Acharjya 23


Continued …
 For q = 4 and i = 9
 q>0 and P[5]T[9](False)
 P[5] = T[9] (True)
 q = q + 1=5
 q (5)  m (7)

2/28/2024 Dr. D. P. Acharjya 24


Continued …
 For q = 5 and i = 10
 q>0 and P[6]T[10](True)
 q = [q =5] = 3
 q>0 and P[4]T[10](False)
 P[4] = T[10] (True)
 q = q + 1=4
 q (4)  m (7)

2/28/2024 Dr. D. P. Acharjya 25


Continued …
 For q = 4 and i = 11
 q>0 and P[5]T[11](False)
 P[5] = T[11] (True)
 q = q + 1=5
 q (5)  m (7)

2/28/2024 Dr. D. P. Acharjya 26


Continued …
 For q = 5 and i = 12
 q>0 and P[6]T[12](False)
 P[6] = T[12] (True)
 q = q + 1=6
 q (6)  m (7)

2/28/2024 Dr. D. P. Acharjya 27


Continued …
 For q = 6 and i = 13
 q>0 and P[7]T[13](False)
 P[7] = T[13] (True)
 q = q + 1=7
 q (7) = m (7)
 Pattern with shift 13 – 7 = 6
 q = [q =7] = 1

2/28/2024 Dr. D. P. Acharjya 28


Continued …
 For q = 1 and i = 14
 q>0 and P[2]T[14](True)
 q = [q=1] = 0
 P[2] = T[14] (False)
 q (0) = m (7)

2/28/202 Dr. D. P. Acharjya 29


4
Continued …
 For q = 0 and i = 15
 q>0 and P[1]T[15](False)
 P[1] = T[15] (True)
 q (0) = m (7) (False)
 Process Terminates

2/28/2024 Dr. D. P. Acharjya 30

You might also like