regular exp
regular exp
A Regular Expressions (RegEx) is a special sequence of characters that uses a search pattern to find a
string or set of strings. It can detect the presence or absence of a text by matching it with a particular
pattern, and also can split a pattern into one or more sub-patterns. Python provides a re module that
supports the use of regex in Python. Its primary function is to offer a search, where it takes a regular
expression and a string. Here, it either returns the first match or else none.
RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.
import re
RegEx in Python
When you have imported the re module, you can start using regular expressions:
Example-1:
Search the string to see if it starts with "My"and ends with "nagendra":
import re
x = re.search("^My.*nagendra$",a)
if x:
else:
print("No match")
output:
Example-2:
import re
print(re.search('nagendra', s))
output:
RegEx Functions
The re module offers a set of functions that allows us to search a string for match:
Function Description
Split Returns a list where the string has been split at each match
The match() function searches the string for a match, and returns a Match object if there is a match at
the starting of my string.
Example-1:
import re
x = re.match("my", a)
print(x)
Output:
Example-2:
import re
x = re.match("The", a)
print(x)
Output:
None
The search() function searches the string for a match, and returns a Match,if there is a match.
If there is more than one match, only the first occurrence of the match will be returned:
Match Object
A Match Object is an object containing information about the search and the result.
Note: If there is no match, the value None will be returned, instead of the Match Object.
Example-1:
import re
print(x)
Output:
Example-2:
import re
x = re.search("24", a)
print(x)
Output:
Example-1:
import re
x = re.findall("My", a)
print(x)
Output:
['my', 'my']
The split() function returns a list where the string has been split at each match:
Example-1:
import re
x = re.split("nagendra", a)
print(x)
Output:
The sub() function replaces the matches with the text of your choice:
Example-1:
import re
x = re.sub("nagendra","rahul", a)
print(x)
Output:
Meta Characters
To understand the RE analogy, Meta Characters are useful, important, and will be used in functions of
module re. Below is the list of meta characters.
[ ]- (Square Brackets)
characters that we wish to match. For example, the character class [abc]
We can also invert the character class using the caret(^) symbol. For example,
example-1:
import re
print(re.findall("[a-m]”, a))
output:
['m', 'a', 'm', 'e', 'i', 'a', 'g', 'e', 'd', 'a']
example-2:
import re
print(re.findall("[0-9]”, a))
output:
['2', '5']
^ (Caret)
Caret (^) symbol matches the beginning of the string i.e. checks whether
the string starts with the given character(s) or not. For example –
• ^B will check if the string starts with g such as Btech, Ball, BOX etc.
• ^BTECH will check if the string starts with BTECH such as BTECH
example-1
import re
a = 'Btech hyderabad'
result = re.match(‘^Btech’, a)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
$ (Dollar)
Dollar($) symbol matches the end of the string i.e checks whether the string ends with the given
character(s) or not. For example –
• s$ will check for the string that ends with a such as geeks, ends, setc.
• ks$ will check for the string that ends with ks such as marks, ks, etc.
example-1:
import re
a = 'Btech'
result = re.search(‘h$’, a)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
. (Dot)
Dot(.) symbol matches only a single character except for the newline character (\n). For example –
• a.b will check for the string that contains any character at the place of the dot such as acb, adb, arb,
a1b, etc...
a..b will check for the string that contains any two character at the place of the dot such as acrb,
adhb, arfb, a12b, etc…
example-1:
import re
a= "hello hyderabad"
x = re.findall("he..o", a)
print(x)
output:['hello']
| (Or)
Or symbol works as the or operator meaning it checks whether the pattern before or after the or
symbol is present in the string or not. For example –
Example-1:
import re
x = re.findall("btech|mtech", a)
print(x)
output:
['btech', 'mtech']
Example-2:
import re
print(x)
output:
['btech']
? (Question Mark)
The question mark symbol ? matches zero or one occurrence of the pattern left to it.
ma?n mn 1 match
man 1 match
woman 1 match
example-1:
import re
a= "i am a man"
x = re.findall("ma?n", a)
print(x)
output:
['man']
example-2:
import re
a= "i am a maaaan"
x = re.findall("ma?n", a)
print(x)
output:
[ ] ( output is empty because a repeated more than once. The question mark symbol ? matches zero or
one occurrence of the pattern left to it.)
* (Star)
The star symbol * matches zero or more occurrences of the pattern left to it.
ma*n mn 1 match
man 1 match
maaan 1 match
woman 1 match
example-1:
import re
a= "i am a maaan"
x = re.findall("ma*n", a)
print(x)
output: ['maaan']
+ (Plus)
The plus symbol + matches one or more occurrences of the pattern left to it.
man 1 match
maaan 1 match
woman 1 match
example-1:
import re
a= "i am a maaan"
x = re.findall("ma+n", a)
print(x)
output:['maaan']
{ } (Braces)
Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.
Example-1:
import re
x = re.findall("my{1,3}", a)
print(x)
from above pattern my{1,3} mean --if “my” is present at least once and maximum three time then it
will print “my” from above example “my” is present twice, so it will print my twice
( ) -Group
Special sequences do not match for the actual character in the string ,instead it tells the specific
location in the search string where the match must occur. It makes it easier to write commonly used
patterns.
\b Matches if the word begins or ends with the given character.\b(string) will check for the
beginning of the word and (string)\b will check for the ending of the word.
\B It is the opposite of the \b i.e. the string should not start or end with the given regex.
\d Matches any decimal digit, this is equivalent to the set class [0-9]
\D Matches any non-digit character, this is equivalent to the set class [^0-9]
Program:
import re
c=re.findall("\w",a) output of \w is ['m', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'n', 'a', 'g', 'e', 'n', 'd', 'r', 'a',
'm', 'y', 'a', 'g', 'e', 'i', 's', '2', '5']
d=re.findall("\W",a) output of \W is ['', '', '', ',', '', '', '', '']
f=re.findall("\D",a) output of \D is ['m', 'y', '', 'n', 'a', 'm', 'e', '', 'i', 's', '', 'n', 'a', 'g', 'e', 'n', 'd','r', 'a', ',', '',
'm', 'y', '', 'a', 'g', 'e', '', 'i', 's', '']
h=re.findall("\S",a) output of \S is ['m', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'n', 'a', 'g', 'e', 'n', 'd', 'r', 'a',
',', 'm', 'y', 'a', 'g', 'e', 'i', 's', '2', '5']
print("output of \S is ",h)