Lecture13 String Processing
Lecture13 String Processing
Lê Sỹ Vinh
Computational Science and Engineering
Email: vinhls@vnu.edu.vn
Outlines
• String matching
• Regular expression
String
• String is an array of characters.
For example: S = “Matching is a string algorithms”
Problem: Given a short string (pattern) P and a long string S (text), determine whether
if the pattern P appears in the text S.
Example:
• S = “Hello to string algorithms”
• P = “algorithm”
Naïve string matching
Moving from the begin to the end of the text S, for each position determine if the
pattern P appears at the position.
Naïve string matching
Complexity: O(mn)
Knuth Morris Pratt Algorithm
Idea: Whenever a
mismatch occurs, we
shift the pattern as far as
possible to avoid
redundant comparisons
Complexity: O(m+n)
Exercises on string
• Given a string, write an algorithm to determine all
duplicate words in the string.
\d Any digit, short for [0- /\d\d/ => “01”, “02” … “99”
9]
\D A non-digit, short for /c\Dt/ => “cat”, “cut”
[^0-9] but not “c4t”
16
Regular expression for an email
Regular expression a URL
18
Regular expression a URL
Regular expression
for an IP address
20
Regular expression for an IP address
Regular expression
for a variable