13 Python Ch05 ORC
13 Python Ch05 ORC
The function tries to match the pattern (which specifies the regular expression to be matched) with a string
(that will be searched for the pattern at the beginning of the string). The flag field is optional. Some values
of flags are specified in the Table 6.4. To specify more than one flag, you can use the bitwise OR operator as
in re.I | re.M. If the re.match() function finds a match, it returns the match object and None otherwise.
Table 6.4 Different values of flags
Flag Description
re.I Case sensitive matching
re.M Matches at the end of the line
re.X Ignores whitespace characters
re.U Interprets letters according to Unicode character set
import re
string = "She sells sea shells on the sea shore"
pattern1 = "sells"
if re.match(pattern1, string):
print("Match Found")
else:
print(pattern1, "is not present in the string")
pattern2 = "She"
if re.match(pattern2, string):
print("Match Found")
else:
print(pattern2, "is not present in the string")
OUTPUT
sells is not present in the string
Match Found
In the above program, ‘sells’ is present in the string but still we got the output as match not found. This is
because the re.match() function finds a match only at the beginning of the string. Since, the word ‘sells’ is
present in the middle of the string, hence the result.
Note On success, match() function returns an object representing the match, else returns None.
The syntax is similar to the match() function. The function searches for first occurrence of pattern
within a string with optional flags. If the search is successful, a match object is returned and None
otherwise.
import re
string = "She sells sea shells on the sea shore"
pattern = "sells"
if re.search(pattern, string):
print("Match Found")
else:
print(pattern, "is not present in the string")
OUTPUT
Match Found
According to the syntax, the sub() function replaces all occurrences of the pattern in string with repl,
substituting all occurrences unless any max value is provided. This method returns a modified string.
import re
string = "She sells sea shells on the sea shore"
pattern = "sea"
repl = "ocean"
new_string = re.sub(pattern, repl, string, 1)
print(new_string)
OUTPUT
She sells ocean shells on the sea shore
In the above program, note that only one occurrence was replaced and not all because we had provided 1
as the value of max.
import re
pattern = r"[a-zA-Z]+ \d+"
matches = re.findall(pattern, "LXI 2013, VXI 2015, VDI 20104, Maruti Suzuki Cars in
India")
for match in matches:
print(match, end = " ")
OUTPUT
LXI 2013 VXI 2015 VDI 20104
Note The re.findall() function returns a list of all substrings that match a pattern.
In the above code, the regular expression, pattern = r"[a-zA-Z]+ \d+", finds all patterns that begin
with one or more characters followed by a space and then followed by one or more digits.
The finditer() function is same as findall() function but instead of returning match objects, it returns
an iterator. This iterator can be used to print the index of match in the given string.
import re
pattern = r"[a-zA-Z]+ \d+"
matches = re.finditer(pattern, "LXI 2013, VXI 2015, VDI 20104, Maruti Suzuki Cars
availble with us")
for match in matches:
print("Match found at starting index : ", match.start())
print("Match found at ending index : ", match.end())
print("Match found at starting and ending index : ", match.span())
OUTPUT
Match found at starting index : 0
Match found at ending index : 8
Match found at starting and ending index : (0, 8)
Match found at starting index : 10
Match found at ending index : 18
Match found at starting and ending index : (10, 18)
Match found at starting index : 20
Match found at ending index : 29
Match found at starting and ending index : (20, 29)
Note that the start() function returns the starting index of the first match in the given string. Similarly,
we have end() function which returns the ending index of the first match. Another method, span() returns
the starting and ending index of the first match as a tuple.
Note The match object returned by search(), match(), and findall() functions have start()
and end() methods, that returns the starting and ending index of the first match.