0% found this document useful (0 votes)

47 views

Regular Exp

Regular expressions are strings used to search, find, and extract information from data by looking for patterns of characters. They allow users to perform operations like matching strings, searching strings, finding all occurrences of strings, splitting strings, and replacing strings. Common regular expression methods in Python include match(), search(), findall(), split(), and sub(). The document then provides examples of using regular expressions in Python to extract information from text and HTML data.

Uploaded by

MAYUR SAKULE

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Regular Exp

Uploaded by

MAYUR SAKULE

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Nageswarao Datatechs, Hyd

REGULAR EXPRESSIONS

A regular expression is a string that contains special symbols and characters to find and extract the
information needed by us from the given data. A regular expression helps us to search information,
match, find and split information as per our requirements. A regular expression is also called simply
regex.

Regular expressions are used to perform the following important operations:

□ Matching strings
□ Searching for strings
□ Finding all strings
□ Splitting a string into pieces
□ Replacing strings

The following methods belong to ‘re’ module that are used in the regular expressions:

□ match() method searches in the beginning of the string and if the matching string is found, it
returns an object that contains the resultant string, otherwise it returns None. We can access
the string from the returned object using group() method.

□ search() method searches the string from beginning till the end and returns the first occurrence
of the matching string, otherwise it returns None. We can use group() method to retrieve the
string from the object returned by this method.

□ findall() method searches the string from beginning till the end and returns all occurrences of
the matching string in the form of a list object. If the matching strings are not found, then it
returns an empty list. We can retrieve the resultant strings from the list using a for loop.

□ split() method splits the string according to the regular expression and the resultant pieces are
returned as a list. If there are no string pieces, then it returns an empty list. We can retrieve the
resultant string pieces from the list using a for loop.

□ sub() method substitutes (or replaces) new strings in the place of existing strings. After
substitution, the main string is returned by this method.

Sequence characters in regular expressions

Sequence characters match only one character in the string. Let us list out the sequence characters
which are used in regular expressions along with their meanings in Table 1.

Character Its description

\d Any digit ( 0 to 9)
\D Any non-digit
Nageswarao Datatechs, Hyd

\s White space. Ex: \t\n\r\f\v

\S Non-white space character
\w Any alphanumeric (A to Z, a to z, 0 to 9)
\W Non-alphanumeric
\b A space around words
\A Matches only at start of the string
\Z Matches only at end of the string

Each of these sequence characters represents a single character matched in the string. For example ‘\w’
indicates any one alphanumeric character. Suppose we write it as [\w]*. Here ‘*’ represents 0 or more
repetitions. Hence [\w]* represents 0 or more alphanumeric characters.

Example 1. A regular expression to search for strings starting with m and having total 3 characters using
search() method.

import re
str = 'man sun mop run'
result = re.search(r'm\w\w', str)
if result: # if result is not None
print(result.group())

Output:
man

Example 2. A regular expression to search for strings starting with m and having total 3 characters using
findall() method.

import re
str = 'man sun mop run'
result = re.findall(r'm\w\w', str)
print(result)
Output:
['man', 'mop']

Example 3. A regular expression using match() method to search for strings starting with m and having
total 3 characters.

import re
str = 'man sun mop run'
result = re.match(r'm\w\w', str)
print(result.group())
Output:
man

Example 4. A regular expression using match() method to search for strings starting with m and having
total 3 characters.

import re
str = 'sun man mop run'
result = re.match(r'm\w\w', str)
Nageswarao Datatechs, Hyd

print(result)
Output:
None

Example 5. A regular expression to split a string into pieces where one or more numeric characters are
found.

impor re
str = 'gopi 2222 vinay 9988 subba rao 89898'
res = re.split(r'\d+\b', str)
print(res)
Output:
['gopi ', ' vinay ', ' subba rao ', '']

Example 6. A regular expression to replace a string with a new string.

import re
str = 'Kumbhmela will be conducted at Ahmedabad in India.'
res = re.sub(r'Ahmedabad', 'Allahabad', str)
print(res)
Output:
Kumbhmela will be conducted at Allahabad in India.

Quantifiers in regular expressions

In regular expressions, some characters represent more than one character to be matched in the string.
Such characters are called ‘quantifiers’

Character Its description

+ 1 or more repetitions of the preceding regexp
* 0 or more repetitions of the preceding regexp
? 0 or 1 repetitions of the preceding regexp
{m} Exactly m occurrences
{m, n} From m to n. m defaults to 0. n to infinity.

Example7. A regular expression to find all words starting with ‘an’ or ‘ak’.

import re
str = 'anil akhil anant arun arati arundhati abhijit ankur'
res = re.findall(r'a[nk][\w]*', str)
print(res)
Output:
['anil', 'akhil', 'anant', 'ankur']

Example 8. A regular expression to retrieve date of births from a string.

import re
str = 'Vijay 20 1-5-2001, Rohit 21 22-10-1990, Sita 22 15-09-2000'
res = re.findall(r'\d{2}-\d{2}-\d{4}', str)
Nageswarao Datatechs, Hyd

print(res)
Output:
['22-10-1990', '15-09-2000']

NOTE: Try the following regex in the above example:

res = re.findall(r'\d{1,2}-\d{1,2}-\d{4}', str)

Special characters in regular expressions

Characters with special significance shown in the following Table can be used in regular expressions.

Character Its description

\ Escape special character nature
. Matches any character except new line
^ Matches beginning of a string
$ Matches ending of a string
[...] Denotes a set of possible characters. Ex: [6b-d] matches any characters ‘6’,
‘b’, ’c’ or ‘d’
[^...] Matches every character except the ones inside brackets. Ex: [^a-c6] matches
any character except ‘a’, ‘b’, ‘c’ or ‘6’
(... ) Matches the regular expression inside the parentheses and the result can be
captured.
R|S Matches either regex R or regex S

Example 9. A regular expression to search whether a given string is starting with ‘He’ or not.

import re
str = "Hello World"
res = re.search(r"^He", str)
if res:
print ("String starts with 'He'")
else:
print("String does not start with 'He'")
Output:
String starts with 'He'

Retrieving information from a HTML file

Let us see how to apply regular expressions on a HTML file and retrieve the necessary information. As an
example, let us take a HTML file that contains some items for breakfast and their prices in the form of a
table, as shown here:

<! breakfast.html>
<html>
<table border=2>
<tr align="center"><td>1</td> <td>Roti</td> <td>50.00</td></tr>
<tr align="center"><td>2</td> <td>Chapatti</td> <td>55.75</td></tr>
<tr align="center"><td>3</td> <td>Dosa</td> <td>48.00</td></tr>
<tr align="center"><td>4</td> <td>Idly</td> <td>25.00</td></tr>
Nageswarao Datatechs, Hyd

<tr align="center"><td>5</td> <td>Vada</td> <td>38.90</td></tr>

<tr align="center"><td>6</td> <td>Coffee</td> <td>20.00</td></tr>
<tr align="center"><td>7</td> <td>Tea</td> <td>15.00</td></tr>
</table>
</html>

When we open this file, we can see the table in the browser as shown below:

Let us assume that this file is available in our computer in the directory as: F:\py\breakfast.html. To
open this file, we have to use urlopen() method of urllib.request module in Python. So, we have to use
the following code:

import urllib.request
f = urllib.request.urlopen(r'file:///f|py\breakfast.html')

Observe the raw string passed to urlopen() method. It contains the path of the .html file, as:

file:///f|py\breakfast.html

The first word ‘file:///’ indicates file URL scheme that is used to refer to files in the local computer
system. The next word ‘f|py’ indicates the drive name ‘f’ and the sub directory ‘py’. In this, we have the
file breakfast.html. Once this file is open, we can read the data using read() method, as:

text = f.read()

But the data in the HTML files would be stored in the form of byte strings. Hence, we have to decode
them into normal strings using decode() method, as:

str = text.decode()
Nageswarao Datatechs, Hyd

Now, we have the string ‘str’. We have to retrieve the required information from this string using a
regular expression. Suppose we want to retrieve only item name and price, we can write:

r'<td>\w+</td>\s<td>(\w+)</td>\s<td>(\d\d.\d\d)</td>'

Please observe that the preceding expression contains three special characters: the first one is a \w+,
the second one is (\w+) and the third one is (\d\d.\d\d). They are embedded in the tags <td> and </td>.
So, the information which is in between the tags is searched.

The first \w+ indicates that we are searching for a word (item number). The next \w+ is written inside
parentheses (). The parentheses represents that the result of the regular expression written inside these
parentheses will be captured. So, (\w+) stores the words (item names) into a variable and the next
(\d\d.\d\d) stores the words (item prices) into another variable. If we use findall() method to retrieve
the information, it returns a list that contains these two variables as a tuple in every row. For example,
the first two values are ‘Roti’ and ’50.00’ which are stored in the list as a tuple as: [('Roti', '50.00')] .

PROGRAMS
40. To retrieve item name and its price from a HTML file using a regular expression.

# web scraping using regex

import re
import urllib.request
# open the html file using urlopen() method
f = urllib.request.urlopen(r'file:///f|py\breakfast.html')
# read data from the file object into text string
text = f.read()
# convert the byte string into normal string
str = text.decode()
# apply regular expression on the string
# here /s is for space
result =
re.findall(r'<td>\w+</td>\s<td>(\w+)</td>\s<td>(\d\d.\d\d)</td>', str)
# display result
print(result)
# display the items of the result
for item, price in result:
print('Item= %-15s Price= %-10s' %(item, price))
# close the file
f.close()

Facebook Pro: The Ultimate Guide To Hacking Facebook
100% (1)
Facebook Pro: The Ultimate Guide To Hacking Facebook
29 pages
Payshield 10K: The Hardware Security Module That Secures The World'S Payments
No ratings yet
Payshield 10K: The Hardware Security Module That Secures The World'S Payments
2 pages
Product Catalog: Guangzhou V-SOLUTION Telecommunication Technology CO., LTD
No ratings yet
Product Catalog: Guangzhou V-SOLUTION Telecommunication Technology CO., LTD
52 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Module 3 Regular Expressions
No ratings yet
Module 3 Regular Expressions
8 pages
Regular Expression
No ratings yet
Regular Expression
20 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
regular exp
No ratings yet
regular exp
10 pages
Python unit 3
No ratings yet
Python unit 3
46 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
PP - Chapter - 4
No ratings yet
PP - Chapter - 4
15 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Regular Expression
No ratings yet
Regular Expression
18 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
Module 4 - Regular Expressions1
No ratings yet
Module 4 - Regular Expressions1
37 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Unit III
No ratings yet
Unit III
79 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Unit 2
No ratings yet
Unit 2
69 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Lecture 7 Re Part2 Split
No ratings yet
Lecture 7 Re Part2 Split
8 pages
Regular Expression 4
No ratings yet
Regular Expression 4
16 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Supplement Python Regular Expression
No ratings yet
Supplement Python Regular Expression
6 pages
Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
17_Regular Expression
No ratings yet
17_Regular Expression
20 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
Python - Slide 5
No ratings yet
Python - Slide 5
42 pages
Regular Expression
No ratings yet
Regular Expression
17 pages
unit 4 Regular expression
No ratings yet
unit 4 Regular expression
16 pages
(i) Character classes_
No ratings yet
(i) Character classes_
8 pages
Regular Expressions Python
No ratings yet
Regular Expressions Python
26 pages
UNIT4
No ratings yet
UNIT4
67 pages
Python Regex
No ratings yet
Python Regex
8 pages
CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Lecture 3-4 Regex
No ratings yet
Lecture 3-4 Regex
33 pages
PP_Module-3 Notes
No ratings yet
PP_Module-3 Notes
56 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
29 pages
Regular Expressions
100% (1)
Regular Expressions
15 pages
RegEx in Python (4)
No ratings yet
RegEx in Python (4)
6 pages
Python Complete Unit 3
No ratings yet
Python Complete Unit 3
40 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Notes in Additional Mathematics with Examples and Exercises
From Everand
Notes in Additional Mathematics with Examples and Exercises
George N. Frempong
No ratings yet
C Programming Quiz 9 - Structure, Bitfield, Union
No ratings yet
C Programming Quiz 9 - Structure, Bitfield, Union
10 pages
C Programming Quiz 1 - Printf, Scanf, Data Types, Type Modifiers, Operators
No ratings yet
C Programming Quiz 1 - Printf, Scanf, Data Types, Type Modifiers, Operators
10 pages
C Programming Quiz 3 - Loop, Jump Statements
No ratings yet
C Programming Quiz 3 - Loop, Jump Statements
10 pages
Test Series 1 - (Section A+b) (19th May 2022)
No ratings yet
Test Series 1 - (Section A+b) (19th May 2022)
22 pages
Cost Estimation and Conceptual Process Planning: January 2007
No ratings yet
Cost Estimation and Conceptual Process Planning: January 2007
9 pages
Test Series 2 - (CERTIFICATE EXAM) (20th May 2022)
No ratings yet
Test Series 2 - (CERTIFICATE EXAM) (20th May 2022)
22 pages
Add onCreditCardApplicationFormv1 PDF
No ratings yet
Add onCreditCardApplicationFormv1 PDF
1 page
Chapter 1: Introduction To HCS12/MC9S12 The HCS12 Microcontroller Han-Way Huang Minnesota State University, Mankato September 2009
No ratings yet
Chapter 1: Introduction To HCS12/MC9S12 The HCS12 Microcontroller Han-Way Huang Minnesota State University, Mankato September 2009
52 pages
Easy Setup Guide: CV-X Series
No ratings yet
Easy Setup Guide: CV-X Series
28 pages
SafetyNet System - Overview
No ratings yet
SafetyNet System - Overview
31 pages
DSL-2730U U1 Manual v1.01
No ratings yet
DSL-2730U U1 Manual v1.01
79 pages
Dtap MM CM Service Request: Ranap Commonid
No ratings yet
Dtap MM CM Service Request: Ranap Commonid
2 pages
Chapter 03 Basic of Verilog
No ratings yet
Chapter 03 Basic of Verilog
105 pages
Hack
50% (2)
Hack
21 pages
VPN Connection Instruction (Window To Window)
No ratings yet
VPN Connection Instruction (Window To Window)
7 pages
Jurnal Information System Security
No ratings yet
Jurnal Information System Security
9 pages
Good Luck All, .: Kareem Mukhtar
No ratings yet
Good Luck All, .: Kareem Mukhtar
64 pages
Git Cheatsheet EN Grey
No ratings yet
Git Cheatsheet EN Grey
2 pages
MAC Protocols
No ratings yet
MAC Protocols
13 pages
Learn How Databricks Streamlines The Data Management Lifecycle
No ratings yet
Learn How Databricks Streamlines The Data Management Lifecycle
20 pages
Export EXGCE 15nov1354
No ratings yet
Export EXGCE 15nov1354
407 pages
DETALLA, Jarelle - History of Computers - ISB 11C Sec A
No ratings yet
DETALLA, Jarelle - History of Computers - ISB 11C Sec A
12 pages
Meam Fact Sheet
No ratings yet
Meam Fact Sheet
8 pages
Multi Threading in Java by Durga Sir
No ratings yet
Multi Threading in Java by Durga Sir
75 pages
Car Insurance System: A Project Report ON
No ratings yet
Car Insurance System: A Project Report ON
15 pages
Chapter 1: Introduction To The Pencil Code Environment: Two Ways of Looking at A Program
No ratings yet
Chapter 1: Introduction To The Pencil Code Environment: Two Ways of Looking at A Program
9 pages
Iot Unit 1 Notes: System Components of An M2M Solution Are As Follows
No ratings yet
Iot Unit 1 Notes: System Components of An M2M Solution Are As Follows
24 pages
Rail Reservation System Project Manual
No ratings yet
Rail Reservation System Project Manual
8 pages
Minimize
No ratings yet
Minimize
10 pages
Technical Notes For DocuWare Version 7 - EN
No ratings yet
Technical Notes For DocuWare Version 7 - EN
21 pages
AWS Certified Advanced Networking Specialty - Sample Questions c01
No ratings yet
AWS Certified Advanced Networking Specialty - Sample Questions c01
8 pages
Query Handling System
No ratings yet
Query Handling System
4 pages
Website Security Solutions Checklist Ensighten 3
No ratings yet
Website Security Solutions Checklist Ensighten 3
2 pages
MP 11x and MP 124 Sip Users Manual Ver 66
No ratings yet
MP 11x and MP 124 Sip Users Manual Ver 66
648 pages