Chapter 3 - String Processing
Chapter 3 - String Processing
AND
ALGORITHM
https://sites.google.com/a/quest.edu.pk/dr-irfana-memon/lecture-slides
1
NO TOPIC CLO
01 A General Overview CLO1
02 Introduction to Data Structures and Algorithm CLO1
03 String Processing CLO1
04 Abstract Data Types CLO1
05 Linked list CLO1
06 Stack and Queue CLO1
07 Recursion CLO1
08 Complexity Analysis CLO2
09 Sorting and Searching techniques CLO2
10 Trees CLO2
11 Graph CLO3
12 P & NP CLO3
2
String Processing
3
• Computer terminology usually uses the term “string” for
a sequence of characters rather than the term “word”.
• Therefore, many texts sometimes use the expression”
string processing” instead of “word processing”
• This chapter discusses how such data are stored and
processed by the computer.
• Each programming language contains character set that is used
to communicate with the computer. The usually indicates the
following:
• Alphabet: A,B,C,D…..,Z
• Digits: 0,1,2,3,4,5,6,7,8,9
• Characters: +, -, /, *, ^, &, %, = etc.
• Record-Oriented
In fixed-length storage each line of print is viewed as a record,
where all records have the same length, i.e. each record
accommodate the same number of characters. Assume our
record has length 80 unless otherwise stated.
• Suppose the input consists of a program. Using a record-
oriented, fixed length storage medium, the input data will
appear in memory.
1. Fixed-Length Storage (Advantages and
Disadvantages)
• Advantages
– The ease of accessing data from any given record
– The ease of updating data in any given record (as long as the
length of the new data does not exceed the record length)
• Disadvantages
– Time is wasted reading an entire record if most of the storage
consists of blank spaces.
– Certain records may require more space than available.
– When the correction consists of more or fewer characters than
the original text, changing a misspelled word requires the entire
record to be changed.
2. Variable-Length Storage with Fixed Maximum
• Variable
- Static variable: Length is defined before the program is
executed and can not change throughout the program.
• Indexing (find())
– Indexing refers to finding the location of the string.
• Concatenation (concat(string1,string2))
– String concatenation is the operation of joining two character
strings end to end. For example, the strings "snow" and "ball" may
be concatenated to give "snowball".
– Replacement
• Replacing one string in the text by another i.e., replace(pos1,
len1,string, pos2, len2)
– Insertion
• Inserting a string in the middle of the text insert()
– Deletion
• Deleting a string from the text. Erase( position-FirstChar, length)
• Given strings T (text) and P (pattern), the pattern matching
problem consists of finding a substring of T equal to P.
• T: “the rain in spain stays mainly on the plain”
• P: “n th”
• We assume that the length of pattern does not exceed the
length of text.
• Applications:
– Text editors
– Web search engines (e.g. Google)
The Brute Force Algorithm
T = each
P=a
Q= e
S= Length of T
R = Length of P
2. A text T and a pattern P are stored as array with one character per element.
Write algorithm and draw flow chart that replaces every occurrence of P in T by Q.
[find index of T where pattern P start ]
T = each
P=a
Q= e
S= Length of T
R = Length of P
2. A text T and a pattern P are stored as array with one character per element.
Write algorithm and draw flow chart that replaces every occurrence of P in T by Q.
[find index of T where pattern P start ]
1. Set K=K+1 [Initialize] Set K= 1 and MAX=S-R+1
2. Repeat Step 3 to 5 while K<=MAX
3. Repeat for L=1 to R : [Test each character of P]
If P[L]!= T[K+L-1], then: Go to step 5.
[End of inner loop]
4. [Success] Set INDEX=K, and Goto step 7
T = each
P=a
Q= e
S= Length of T
R = Length of P
2. A text T and a pattern P are stored as array with one character per element.
Write algorithm and draw flow chart that replaces every occurrence of P in T by Q.
[find index of P]
1. [Initialize] Set K= 1 and MAX=S-R+1
2. Repeat Step 3 to 5 while K<=MAX
3. Repeat for L=1 to R : [Test each character of P]
If P[L]!= T[K+L-1], then: Go to step 5.
[End of inner loop]
4. [Success] Set INDEX=K, and Goto step 7
5. Set K=K+1
[End of Step 2 outer loop]
6. [Failure] Set INDEX=0 Goto step 11
7. Set M=1
[Loop for Replace P by Q] T = each
8. Repeat step 9 to 10 while M<=R P=a
Q= e
9. Set T[INDEX]=Q[M]
S= Length of T
10. INDEX=INDEX+1 and M=M+1 R = Length of P
11. Exit.
Wish
You
Good Luck
28