python_reg_expressions
python_reg_expressions
http://www.tuto rialspo int.co m/pytho n/pytho n_re g _e xpre ssio ns.htm Co pyrig ht © tuto rials po int.co m
A regular expression is a special sequence of characters that helps you match or find other string s or sets of
string s, using a specialized syntax held in a pattern. Reg ular expressions are widely used in UNIX world.
T he module re provides full support for Perl-like reg ular expressions in Python. T he re module raises the
exception re.error if an error occurs while compiling or using a reg ular expression.
We would cover two important functions, which would be used to handle reg ular expressions. But a small thing
first: T here are various characters, which would have special meaning when they are used in reg ular expression.
T o avoid any confusion while dealing with reg ular expressions, we would use Raw String s as r'expression'.
string T his is the string , which would be searched to match the pattern at the
beg inning of string .
flag s You can specify different flag s using bitwise OR (|). T hese are modifiers,
which are listed in the table below.
T he re.match function returns a matc h object on success, None on failure. We would use group(num) or
groups() function of matc h object to g et matched expression.
g roup(num=0) T his method returns entire match (or specific subg roup num)
g roups() T his method returns all matching subg roups in a tuple (empty if there
weren't any)
Example:
#!/usr/bin/python
import re
if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
else:
print "No match!!"
When the above code is executed, it produces following result:
string T his is the string , which would be searched to match the pattern anywhere in
the string .
flag s You can specify different flag s using bitwise OR (|). T hese are modifiers,
which are listed in the table below.
T he re.search function returns a matc h object on success, None on failure. We would use group(num) or
groups() function of matc h object to g et matched expression.
g roup(num=0) T his method returns entire match (or specific subg roup num)
g roups() T his method returns all matching subg roups in a tuple (empty if there
weren't any)
Example:
#!/usr/bin/python
import re
if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
else:
print "No match!!"
Example:
#!/usr/bin/python
import re
No match!!
search --> matchObj.group() : dogs
Syntax:
T his method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless
max provided. T his method would return modified string .
Example:
Following is the example:
#!/usr/bin/python
import re
re.L Interprets words according to the current locale. T his interpretation affects the
alphabetic g roup (\w and \W), as well as word boundary behavior (\b and \B).
re.M Makes $ match the end of a line (not just the end of the string ) and makes ^ match
the start of any line (not just the start of the string ).
re.U Interprets letters according to the Unicode character set. T his flag affects the
behavior of \w, \W, \b, \B.
re.X Permits "cuter" reg ular expression syntax. It ig nores whitespace (except inside
a set [] or when escaped by a backslash) and treats unescaped # as a comment
marker.
Following table lists the reg ular expression syntax that is available in Python:
. Matches any sing le character except newline. Using m option allows it to match
newline as well.
a| b Matches either a or b.
(?: re) Groups reg ular expressions without remembering matched text.
(?#...) Comment.
(?! re) Specifies position using pattern neg ation. Doesn't have a rang e.
\S Matches nonwhitespace.
REGULAR-EXPRESSION EXAMPLES
Literal characters:
Character classes:
Example Desc ription
Repetition Cases:
Backreferences:
T his matches a previously matched g roup ag ain:
(['"])[^\1]*\1 Sing le or double-quoted string . \1 matches whatever the 1st g roup matched . \2
matches whatever the 2nd g roup matched, etc.
Alternatives:
Anchors:
T his needs to specify match position.