Regular Expression
Regular Expression
and
Substituting
BY
MUHAMMAD FARSAN
Regular
Expressions
What is Regular Expression
^a……s$
The above code defines a RegEx pattern. The pattern is: any
five letter string starting with a and ending with s.
ython has a module named re to work with regular expression. Here’s an
example:
import re
pattern = ‘^a…s$’
test_string = ‘abyss’
Result =
re.match(pattern,
test_string)
if result:
print(“Search successful.”)
else:
print(“Search
unsuccessful”)
Here, we used re.match function to grab pattern within the test_string.
The method returns a match object if the search is successful. If not, it returns
No
Specify Pattern Using RegEx:
To specify regular expressions, metacharacters are used. In the
above example, ^ and $ are metacharacters.
Meta characters:
Metacharacters are characters that are interpreted in a special way by
a RegEx engine. Here's a list of metacharacters:
[] . ^ $ *
[] – Square brackets:
Square brackets specifies a set of char cters you wish to match.
Above, [abc] will match if the string you are trying to match contains
any of the a, b or c.
You can also specify a range of charac ters using – inside square
brackets.
o [a-e] is the same as [abcde].
o [1-4] is the same as [1234].
o [0-39] is the same as [01239].
r
You can complement (invert) the cha acter set by using caret ^ symbol
at the start of a square-bracket.
o [^abc] means any character except a or b or c.
o [^0-9] means any non-digit character.
– Period:
period matches any single character (except newline ’\n’).
^ – Caret:
The caret symbol ^ is used to check if a string starts with a certain
character.
$ – Dollar:
The dollar symbol $ is used to check if a string ends with a certain character.
+ – Plus:
The plus symbol + matches one or more occurrences of the pattern left to
it.
Python RegEx
import re
Example:
#program to extract numbers from a string
import re
The re.split method splits the string where there is a match and returns a
list of strings where the splits have occurred.
Example:
import re
re.sub(pattern, replace,
string)
Example:
#program to remove all white spaces
Import re
string = ‘abc 12\ de 23 \n f4
6’
#empty string
replace = ‘ ’
e
new_string =
re.sub(patter n
n, r
re.subn()
The re.subn() is similar to re.sub() expect it returns a tuple of 2
items containing the new string and the number of substitutions
made.
Example:
e
#program to remove all whitespac
s Import re
#multiline string
string = ‘abc 12\ de 23 \n f45
6’
#empty string
replace = ‘ ’
new_string = re.subn(pattern, replace, string)
print(new_string)
re.search()
The re.search() method takes two arguments: a pattern and a string. The
method looks for the first location where the RegEx pattern produces a match
with the string. If the search is successful, re.search() returns a match
object; if not, it returns None.
import re
String = “Python is fun”
if match:
print(“pattern found inside the string”)
else:
print(“pattern not found”)
You can get methods and attributes of a match object using dir()
function. Some of the commonly used methods and attributes of match
objects are:
Import re
String = ‘ 39801 356, 2102 1111 ’
s
#Three digit number followed by pace followed by two digit
number Pattern = ‘ (\d{3}) (\d{2}) ’
If match:
print(match.group())
Else:
print(“ pattern not found ”)
# Output: 801 35
ere, match variable contains a match object.
ur pattern (\d{3}) (\d{2}) has two subgroups (\d{3}) and (\d{2}). You can get
th art of the string of these parenthesized subgroups. Here's how:
>>> match.group(1)
‘801’
>>> match.group(2)
‘35’
>>> match.group(3)
(‘801’, ‘35’)
>>> match.group()
(‘801’, ‘35’)
match.start(), match.end() and match.span()
>>> match.start()
2
>>>match.end()
8
The span() function returns a tuple containing start and end index of
the matched part.
>>> match.span()
(2, 8)
match.re and match.string
>>> match.re
Re.compile(‘ (\\d{3}) (\\d{2}) )
’
>>> match.string
‘ 39801 356, 2102 1111 ’