0% found this document useful (0 votes)

45 views

Unit-3 - Regular Expression

Regular expressions (REs) are strings that use special symbols to find and extract patterns in data. The re module in Python allows working with REs. Some key points: - REs can search, match, find, and split data according to patterns specified using symbols like *, +, ?, etc. - Methods like search(), match(), findall(), split(), sub() in the re module are used to apply REs to strings and extract results. - Examples show how to use REs to search for strings starting with 'm' and length 3, split strings, and replace substrings.

Uploaded by

Vatsal Bhalani

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Unit-3 - Regular Expression

Uploaded by

Vatsal Bhalani

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter: 3 Regular Expression

Topic 1: Regular Expressions – REs and Python:

Many a times, we are needed to extract required information from given data. For example,
we want to know the number of people who contacted us in the last month through Gmail or
we want to know the phone numbers of employees in a company whose names start with 'A'
or we want to retrieve the date of births of the patients in a hospital who joined for treatment
for hypertension, etc.

To get such information, we have to conduct the searching operation on the data. Once we get
required information, we have to extract that data for further use. Regular expressions are
useful to perform such operations on data.

 Regular Expressions

 A regular expression is a string that contains special symbols and characters to find and
extract the information needed by us from the given data.
 A regular expression helps us to search information, match, find and split information as
per our requirements.
 A regular expression is also called simply regex Regular expressions are available not
only in Python but also in many languages like Java, Perl, AWK, etc.
 Python provides re module that stands for regular expressions.
 This module contains methods like compile(), search0, match0, findall(), split(), etc,
which are used in finding the information in the available data.
 So, when we write a regular expression, we should import re module as:
import re

 Regular expressions are nothing but strings containing characters and special symbols.
Simple regular expression may look like this:
reg = r'm\w\w'

Meaning: any word starting with letter m and having length 3 letter.

 Sequence Character in RE

 Some of the special sequences beginning with '\' represent predefined sets of characters
that are often useful, such as the set of digits, the set of letters, or the set of anything that
isn’t whitespace.
 The following predefined special sequences are a subset of those available.
\d
Matches any decimal digit; this is equivalent to the class [0-9].
\D
Matches any non-digit character; this is equivalent to the class [^0-9].
\s
Matches any whitespace character; this is equivalent to the class [\t\n\r\f\v].
\S
Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].
\w
Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].
\W
Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].
\A
Matches only at the start of the string.
\b
Matches the empty string, but only at the beginning or end of a word.
\B
Matches the empty string, but only when it is not at the beginning or end of a word. This
means that r'py\B' matches 'python', 'py3', 'py2', but not 'py', 'py.', or 'py!'. \B is just the
opposite of \b,
\Z
Matches only at the end of the string.
 Special Characters and Pattern Matching

Python allows you to specify a pattern to match when searching for a substring contained in a
string. This pattern is known as a regular expression. For example, if you want to find a
North American-style phone number in a string, you can define a pattern consisting of three
digits, followed by a dash (-), and followed by four more digits.

In a regular expression, certain characters have special meanings. The table below lists these
special characters and their meanings.

Character Meaning Example

* Zero or more ab*c matches ac, abc, abbc, and so on
occurrences of a
character
+ One or more ab+c matches abc, abbc, and so on
occurrences of a
character
? Zero or one occurrences ab?c matches ac and abc
of a character
. Any character a.*c matches any substring starting with a and
ending with c
[chars] Any character inside the a[bB]c matches abc and aBc
brackets
[char1-char2] A range of characters a[a-z]c matches a, followed by any non-
capitalized letter, followed by c
[^chars] Any character not inside a[^bB]c matches a, followed by anything
the brackets but b or B, followed by c
[^char1- Any character not in a a[^a-z]c matches a, followed by anything that is
char2] range of characters not a lower-case letter, followed by c
{num} An exact number of ab{3}c matches abbbc
occurrences of a
character
{num1,num2} A number of ab{1,3}c matches abc, abbc and abbbc
occurrences of a
character in a specified
range
| Matches either of two abc|aBc matches abc or aBc
alternatives
^ Matches the start of the ^abc matches abc in abcd, but does not
string only match abc in dabc
$ Matches the end of the abc$ matches abc in dabc, but does not
string only match abc in abcd
^(?!str) Matches anything other ^(?!abc) matches anything other than abc
than the specified string
^(?!str1|str2) Matches anything other ^(?!abc|def) matches anything other
than the two specified than abc and def
strings

If you use multiple special characters in a pattern, you can use parentheses () to specify the
order in which the characters are to be interpreted.

How to write RE Program:

 First step is define the RE
i.e reg= r’m\w\w’
It will return 3 letter word starting with letter m
 Next step is to compile this RE using compile() method of re module.
i.e prog = re.compile(r’m\w\w’)
Now prog represents an object that contain RE.
 Next step is to run this expression on string ‘str’ using search() method or match()
method.
str ='cat mat man mom'
#This is the string on which RE will act.
res = prog.search(str) # searching for RE in str
 Result is stored in res object and we can display it by calling the group() method on
object
print(res.group())

Following methods belong to the ‘re' module that are used in the regular expressions:

The match() method searches in the beginning of the string and if the matching string
is found, it returns an object that contains the resultant string, otherwise it returns
None. We can access the string from the returned object using group() method.

The search() method searches the string from beginning till the end and returns the
first occurrence of the matching string, otherwise it returns None. We can use group0
method to retrieve the string from the object returned by this method.

The findall() method searches the string from beginning till the end and returns all
occurrences of the matching string in the form of a list object. If the matching strings
are not found, then it returns an empty list. We can retrieve the resultant strings
from the list using a for loop.

The split0 method splits the string according to the regular expression and the
resultant pieces are returned as a list. If there are no string pieces, then it returns an
empty list. We can retrieve the resultant string pieces from the list using a for loop.

The sub() method-substitutes (or replaces) new strings in the place of existing strings.
After substitution, the main string is returned by this method

Program 1: A python program to create Regular Expression to search for string

starting with m and having total 3 character using search() method.

reg = r'm\w\w’
str ='cat mat man mom'
prog = re.compile(r'\m\w\w')
res = prog.search(str)
print res.group()

# it is not required to compile it again

str ="frormet"
res=prog.search (str)
print(res.group())
#instead of compiling and running re again we can write single statement
import re
str ='name nest abc nmc'
result=re.search(r'n\w\w',str)
print result.group()

note: here statement

result=re.search(r'n\w\w',str)
is equivalent to
prog = re.compile(r'n\w\w')
result = prog.search(str)
so in place of these two statement we can use single statement

Program 2: A python program to create Regular Expression to search for string

starting with m and having total 3 character using findall() method.

# Search method will return only first search so in place of search use findall() method
import re
str ='cat mat man mom'
prog = re.compile(r'\m\w\w')
res = prog.findall(str)
print "Findall Method return list"
print res #it will print as list

#we can use for loop also

print "Using for Loop"
for r in res:
print r
Program 3: A python program to create Regular Expression to search for string
starting with m and having total 3 character using match() method.

# match method will return matched string if it is in starting otherwise it returns None
import re
str = 'mat man mom'
res = re.match(r'\m\w\w',str)
res1 = re.match(r'\n\w\w',str)
print res.group()
print res1

Program 4: A python program to create Regular Expression to split a string into pieces
where one or more non alpha numeric characters are found.

#split method
import re
str = "this: is; python\'s book"
res = re.split(r'\W+',str)
print "Split with any charecter which is not alpha numeric"
print res

str = "thisis python*s book"

res = re.split(r'\*',str)
print "split with *"
print res
Program 5: A python program to create Regular Expression to replace a string with
new string.

#RE also used to find and replace word. For this sub() method will be used
# Syntax : sub(regular expression, new string, string)
import re
str ="Today is sunday"
res = re.sub(r'sunday','Monday',str)
print res

 Repetition
Things get more interesting when you use + and * to specify repetition in the pattern
 + -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
 * -- 0 or more occurrences of the pattern to its left
 ? -- match 0 or 1 occurrences of the pattern to its left

Program 6: A python program to create Regular Expression to retrieve all the word
starting with given string.

#program to retrieve all word starting with a

import re
str = 'an appple a day keeps the doctor away'
res= re.findall (r'a[\w]*',str)

for r in res:
print r
# But there is problem in output. The ay is not word but still we get as output. So to solve this
#use \b.
#using \b
res= re.findall (r'\ba[\w]*\b',str)
print "\nUsing \\b"
for r in res:
print r

Program 7: A python program to create Regular Expression to retrieve all the word
starting with given string.

import re
Str1 = '1 2 3 one 1st two three four five six seven 8eight 9nine 10ten as on in'
print "String is : " + Str1

print "\nWords with starting letter Digit"

res = re.findall(r'\d[\w]*',Str1)
for r in res:
print r

print "\nAll the words having three characters"

res = re.findall (r'\b\w{3}\b',Str1)
print res

print "\nword have length atleast 3 letter"

res = re.findall (r'\b\w{3,}\b',Str1)
print res

print "\nWords having length 2,3 or 4 letter"

res = re.findall (r'\b\w{2,4}\b',Str1)
print res

print "Retrive only digits"

res = re.findall (r'\b\d\b',Str1)
print res

print "\nLast word of string starting with i"

res = re.findall (r'i[\w]*\Z',Str1)
print res

print "\nFirst word of string starting with 1"

res = re.findall (r'\A1[\w]*',Str1)
print res

Output:

Program 7: Program using special characters

import re
str1 ="hello world"
print str1
print "\nSearch in starting r'^He'"
res = re.search(r'^He',str1,re.IGNORECASE)
if res:
print ("String start with 'He'")
else:
print "string does not start with 'He'"

print "\nSearch at end r'World$'"

res = re.search(r'World$',str1)
if res:
print ("String end with 'world'")
else:
print "string does not end with 'world'"

#for world output will be else part because we don’t have used ignore case
Output:

 Using RE on Files

 We can use RE not only on string, but also on file where huge data is available.
 Following program illustrate how can we apply RE on file

Program 8: Python Program to create RE that reads email-ids from a text file.

Text file: mail.txt

Code to retrieve email ids:

import re
f=open('E:\Varsha\LJIET\Python\Notes\mails.txt','r')
lst=[]
for line in f:
res = re.findall(r'\S+@\S+',line)
lst.append(res)
for i in lst:
print i
f.close()

Output:

Program 9: Python Program to retrieve data from file using RE and then write data in
another file.

Text File : Salary

Program:
import re
f1=open('E:\Varsha\LJIET\Python\Notes\salary.txt','r')
f2=open('newfile.txt','w')
print "ID\tSalary"
for line in f1:
res1=re.search(r'\d{4}',line)
res2=re.search(r'\d{4,}.\d{2}',line)
print res1.group() +"\t"+res2.group()
f2.write(res1.group()+"\t"+res2.group()+"\n")
f1.close()
f2.close()

Output:

Text file : newfile

 Explain the parameters of Match function of RE.

This function attempts to match RE pattern to string with optional flags.

Here is the syntax for this function −

re.match(pattern, string, flags=0)

Here is the description of the parameters −
Parameter & Description
1 pattern
This is the regular expression to be matched.
2 string
This is the string, which would be searched to match the pattern at the beginning of string.
3 flags
You can specify different flags using bitwise OR (|). These are modifiers, which are listed
in the table below.
 re Flags: Many Python Regex Methods and Regex functions(search, match) take an
optional argument called Flags. This flags can modify the meaning of the given Regex
pattern. Various flags used in Python includes

Syntax for Regex Flags What does this flag do

[re.M] Make begin/end consider each line
[re.I] It ignores case
[re.S] Make [ . ]
[re.U] Make { \w,\W,\b,\B} follows Unicode rules
[re.L] Make {\w,\W,\b,\B} follow locale
[re.X] Allow comment in Regex

The re.match function returns a match object on success, None on failure. We use
group(num) or groups() function of match object to get matched expression.
Match Object Method & Description
1 group(num=0) This method returns entire match (or specific subgroup num)
2 groups() This method returns all matching subgroups in a tuple (empty if there weren't
any)
Program: Demonstrate use of group and groups function using re.match():
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.) are (.?) .*', line, re.M|re.I)

if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
else:
print "No match!!"

Output:

 Explain the parameters of search function of RE.

This function searches for first occurrence of RE pattern within string with optional flags.

Here is the syntax for this function −

re.search(pattern, string, flags=0)

The re.search function returns a match object on success, none on failure. We

use group(num) or groups() function of match object to get matched expression.
Match Object Methods & Description
1 group(num=0) This method returns entire match (or specific subgroup num)
2 groups() This method returns all matching subgroups in a tuple (empty if
there weren't any)
Program: Demonstrate use of group and groups function using re.search()and use of
flag re.I:
import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.) are (.?) .*', line, re.M|re.I)

if searchObj:
print "searchObj.group() : ", searchObj.group()
print "searchObj.group(1) : ", searchObj.group(1)
print "searchObj.group(2) : ", searchObj.group(2)
else:
print "Nothing found!!"

Output:

Program – Example that shows the use of re module

File: ReAllFunctionDemo.py

# Example of w+ and ^ Expression

import re

data = "ljiet32,education is fun"

r1 = re.findall(r"^\w+",data)
print r1

# Example of \s expression in re.split function

data = "ljiet32,education is fun"

r1 = re.findall(r"^\w+", data)
print (re.split(r'\s','we are splitting the words'))
print (re.split(r's','split the words'))

# Using re.findall for text

list = ["ljiet32 get", "ljiet32 give", "ljiet Selenium"]

for element in list:
z = re.match("(g\w+)\W(g\w+)", element)
if z:
print(z.groups())

patterns = ['software testing', 'ljiet32']

text = 'software testing is fun?'
for pattern in patterns:
print 'Looking for "%s" in "%s" ->' % (pattern, text),
if re.search(pattern, text):
print 'found a match!'
else:
print 'no match'
abc = 'ljiet32@google.com, careerljiet32@hotmail.com,
users@yahoomail.com'
emails = re.findall(r'[\w\.-]+@[\w\.-]+', abc)
for email in emails:
print email

# Example of re.M or Multiline Flags

data = """ljiet32
careerljiet32
selenium"""
k1 = re.findall(r"^\w", data)
k2 = re.findall(r"^\w", data, re.MULTILINE)
print k1
print k2
Output:

Regular Expression
No ratings yet
Regular Expression
20 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Regular Expressions Cheat Sheet
No ratings yet
Regular Expressions Cheat Sheet
8 pages
Regex Cheat Sheet 1
No ratings yet
Regex Cheat Sheet 1
8 pages
Module 4 - Regular Expressions1
No ratings yet
Module 4 - Regular Expressions1
37 pages
UNIT V
No ratings yet
UNIT V
11 pages
Module 3 Regular Expressions
No ratings yet
Module 3 Regular Expressions
8 pages
Regular Expressions in Java
No ratings yet
Regular Expressions in Java
30 pages
Module 4 - Regular Expressions
No ratings yet
Module 4 - Regular Expressions
35 pages
Python RegEx
No ratings yet
Python RegEx
8 pages
Regular Exp
No ratings yet
Regular Exp
6 pages
Oow Getting Regular With Regular Expressions
100% (1)
Oow Getting Regular With Regular Expressions
62 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
css unit 5 dev notes
No ratings yet
css unit 5 dev notes
13 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
JavaScript Regular Expression
No ratings yet
JavaScript Regular Expression
9 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Regular Expressions in Java
No ratings yet
Regular Expressions in Java
30 pages
16 Java Regex
100% (8)
16 Java Regex
26 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
06_regularexpression
No ratings yet
06_regularexpression
35 pages
Python Strings
No ratings yet
Python Strings
35 pages
Python Strings
No ratings yet
Python Strings
18 pages
Python Regex
No ratings yet
Python Regex
8 pages
Python Handbook
No ratings yet
Python Handbook
23 pages
POSIX Regular Expressions: Brackets
No ratings yet
POSIX Regular Expressions: Brackets
5 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
String Operators & Method
No ratings yet
String Operators & Method
31 pages
04 Regularexpression
No ratings yet
04 Regularexpression
35 pages
Python Strings
No ratings yet
Python Strings
13 pages
Unit 5
No ratings yet
Unit 5
16 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
6 pages
Regular Expression
No ratings yet
Regular Expression
28 pages
Strings Full
No ratings yet
Strings Full
54 pages
2-3 Working With Strings (Updated)
No ratings yet
2-3 Working With Strings (Updated)
45 pages
PHP String and Regular Expressions
No ratings yet
PHP String and Regular Expressions
40 pages
Regex Tutorial - A Quick Cheatsheet by Examples - by Jonny Fox - Factory Mind - Medium
No ratings yet
Regex Tutorial - A Quick Cheatsheet by Examples - by Jonny Fox - Factory Mind - Medium
7 pages
Regular Expression Syntax: Literals
No ratings yet
Regular Expression Syntax: Literals
5 pages
45 The Matching Characters
No ratings yet
45 The Matching Characters
3 pages
Regular Expression
No ratings yet
Regular Expression
18 pages
Oracle Searching Matching
No ratings yet
Oracle Searching Matching
27 pages
PHP - Regular Expressions
No ratings yet
PHP - Regular Expressions
7 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Java Regex
100% (1)
Java Regex
24 pages
AfroAI Reasearch - 2024
No ratings yet
AfroAI Reasearch - 2024
10 pages
Python Programming Strings
No ratings yet
Python Programming Strings
53 pages
Python Strings
No ratings yet
Python Strings
13 pages
306787a873bd4019a13b3bc8d67e1292
No ratings yet
306787a873bd4019a13b3bc8d67e1292
10 pages
VBA - Regular Expressions in VBScript
No ratings yet
VBA - Regular Expressions in VBScript
4 pages
14.Regular Expression
No ratings yet
14.Regular Expression
3 pages
Array Mathlab
No ratings yet
Array Mathlab
3 pages
Regex
No ratings yet
Regex
20 pages
Ch-6 Notes and Questions
No ratings yet
Ch-6 Notes and Questions
27 pages
8 Regular Expressions (E Next - In)
No ratings yet
8 Regular Expressions (E Next - In)
3 pages
Regular Expression
No ratings yet
Regular Expression
17 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)