0% found this document useful (0 votes)

3 views

17_Regular Expression

The document provides an overview of Regular Expressions (RegEx), explaining their purpose, syntax, and functions in Python's re module. Key functions such as match(), search(), findall(), split(), and sub() are discussed, along with their differences and applications in text processing. Understanding RegEx is essential for data scientists and developers for efficient data mining and text manipulation.

Uploaded by

Arif Ahmad

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

17_Regular Expression

Uploaded by

Arif Ahmad

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Regular Expressions

Learning objective
• What is a Regular Expression?
• Metacharacters
• match() function
• search() function
• re.match() vs re.search()
• findall() function
• split() function
• sub() function
What is a Regular Expression?
• A regular expression RegEx is a special sequence of characters that
helps you match or find other strings or sets of strings, using a
specialized syntax held in a pattern.

• Regular expressions are widely used in UNIX world.

• The module re provides full support for regular expressions in Python.

• The module re raises the exception re.error if an error occurs while
compiling or using a regular expression.
What is a Regular Expression?
• As a data scientist/developer, having a solid understanding of
Regex can help you perform data mining and text mining tasks very
easily.

• It is extremely useful for extracting information from text such as

files, log, spread sheets or even documents.

• While using the regular expression the first thing is to recognize is

that everything is essentially a character, and we are writing
patterns to match a specific sequence of characters also referred
as string.
What is a Regular Expression?
• For instance, a regular expression could tell a program to search
for specific text from the string and then to print out the result
accordingly.

• Regular expressions (regex) are essentially text patterns that you

can use to automate searching through and replacing elements
within strings of text.

• This can make cleaning and working with text-based data sets
much easier, saving you the trouble of having to search through
mountains of text by hand.
Metacharacters
• To understand the RE analogy, Metacharacters are useful,
important and will be used in functions of module re.
• There are many metacharacters available in re module.

\ :Used to indicate the special meaning of character following it

[] :Represent a character class
^ :Matches the beginning
$ : Matches the end
. :Matches any character except newline
Metacharacters
? :Matches zero or one occurrence.
| :Matches with any of the characters separated by it.
* :Any number of occurrences (including 0 occurrences)
+ :One or more occurrences
{} :Indicate number of occurrences of a preceding RE to match.
() :Enclose a group of REs
match() function
• This function checks for a match only at the beginning of the string. This
function attempts to match Regular Expression pattern to string with
optional flags.

• Here is the syntax for this function:

re.match(pattern, string, flags=0)

• Here is the description of the parameters:

• pattern :This is the regular expression to be matched.
• String: This is the string, which would be searched to match the pattern
at the beginning of string.
• flags(optinal): Flags that modify the behavior of the regex. It can be a
combination of various constants, such as re.IGNORECASE
match() function
• The re.match function returns a match object on success, none on failure.
• Example: Simple example of match() function.
import re
line = “Learning Data Science"
matchObj = re.match(r'(.*) Data', line)
print(matchObj)
• Here r character (r’portal’) stands for raw, not regex. The raw string is
slightly different from a regular string, it won’t interpret the \ character as
an escape character.
• This is because the regular expression engine uses \ character for its own
escaping purpose.
search() function
• The search() function searches the string for a match, and returns a
Match object if there is a match.
• If there is more than one match, only the first occurrence of the match
will be returned.
• This function searches for first occurrence of RE pattern within string
with optional flags.
• Here is the syntax for this function: re.search(pattern, string, flags=0)
• pattern: The regular expression pattern to search for.
• string: The input string in which to search for the pattern.
• Flags that modify the behavior of the regex. It can be a combination of
various constants, such as re.IGNORECASE
Search for the exact word "apple" as a whole
word in the text.
import re
# Input string
text = "I have an apple and a pineapple."
# Search for the pattern in the string
match = re.search(r'\bapple\b', text)
# Check if a match is found
if match:
print("Pattern found:", match.group())
print("Start index:", match.start())
print("End index:", match.end())
else:
print("Pattern not found.")
search() function
#Search the string to see if it starts with "The" and ends with "Spain":
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
print(x)

#Example:
import re
txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)
re.match() vs re.search()
• There is a difference between the use of both functions.

• re.match() function checks for a match only at the beginning of the

string
• re.search() function searches for a match anywhere in the string.

• Return Value: If the pattern is found at the beginning of the string,

re.match() returns a match object; otherwise, it returns None.

• In case of re.search, it returns a match object if the pattern is found

anywhere in the string; otherwise, it returns None
re.match() vs re.search()
Substring ='Science'
String ='You are learning Data Science with Python Programming.’

# Use of re.search() Method

print(re.search(Substring, String, re.IGNORECASE))

# Use of re.match() Method

print(re.match(Substring, String, re.IGNORECASE))
The findall() function
• The findall() function returns a list containing all matches. The list
contains the matches in the order they are found. If no matches
are found, an empty list is returned.
The split() function
The split() function returns a list where the string has been split at
each match:
Example:

txt = "The rain in Spain“

x = re.split("\t", txt)
print(x)
print(txt.split())
import re
# Load the text file and read it line by line
file_path = "zen_of_python.txt"
with open(file_path, 'r') as file:
content = file.readlines()

# Define a regular expression to find lines containing the word "better"

# re.IGNORECASE makes the search case-insensitive.
pattern = re.compile(r'\bbetter\b', re.IGNORECASE)

# The search method checks if the pattern exists in each line.

matching_lines = [line.strip() for line in content if pattern.search(line)]

# Print matching lines

print("Lines containing the word 'better':")
for line in matching_lines:
print(line)
import re
file_path = "zen_of_python.txt"
with open(file_path, 'r') as file:
content = file.read()

# Define a regular expression to find all 5-letter words

# \w{5} matches exactly 5 word characters (letters, digits, or
underscores).
pattern = re.compile(r'\b\w{5}\b')

# findall() method returns all non-overlapping matches of the pattern in a

list.
five_letter_words = pattern.findall(content)

print("Five-letter words in the text:")

print(five_letter_words)
print(len(five_letter_words))
import re
file_path = "zen_of_python.txt"
with open(file_path, 'r') as file:
content = file.read()

# Define a regular expression to find the word "than"

pattern = r'\bthan\b'

# re.sub() replaces all occurrences of the pattern with "then".

updated_content = re.sub(pattern, 'then', content)

# The updated text is written to a new file to preserve the original.

output_file_path = "zen_of_python_updated.txt"
with open(output_file_path, 'w') as file:
file.write(updated_content)

print(f"Replaced 'than' with 'then' in the text. Updated file saved to {output_file_path}.")
You must have learnt:
• What is a Regular Expression?
• Metacharacters
• match() function
• search() function
• re.match() vs re.search()
• findall() function
• split() function
• sub() function

Carescape One Service Manual
No ratings yet
Carescape One Service Manual
258 pages
ASU Official Transcript
No ratings yet
ASU Official Transcript
2 pages
All V Rising Console Commands and Cheat Codes - GamesRadar+
No ratings yet
All V Rising Console Commands and Cheat Codes - GamesRadar+
1 page
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Research Methodology MCQ Questions and Answers PDF
No ratings yet
Research Methodology MCQ Questions and Answers PDF
18 pages
Vienna Simulators LTE-A Downlink System Level Simulator Documentation, v1.8r1375
No ratings yet
Vienna Simulators LTE-A Downlink System Level Simulator Documentation, v1.8r1375
34 pages
A Project Report On
100% (3)
A Project Report On
12 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Regular Expression 4
No ratings yet
Regular Expression 4
16 pages
UNIT4
No ratings yet
UNIT4
67 pages
unit 4 Regular expression
No ratings yet
unit 4 Regular expression
16 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
Regular Expression
No ratings yet
Regular Expression
17 pages
Regular Expression
No ratings yet
Regular Expression
22 pages
regular exp
No ratings yet
regular exp
10 pages
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
No ratings yet
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
17 pages
PP - Chapter - 4
No ratings yet
PP - Chapter - 4
15 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Python unit 3
No ratings yet
Python unit 3
46 pages
Day-13 Python Regx
No ratings yet
Day-13 Python Regx
11 pages
13 Python Ch05 ORC
No ratings yet
13 Python Ch05 ORC
4 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
No ratings yet
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
4 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
Python Re
No ratings yet
Python Re
18 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
Python Regex
No ratings yet
Python Regex
8 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
Lecture 7 Re Part2 Split
No ratings yet
Lecture 7 Re Part2 Split
8 pages
Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
Manipulating Text with Regular Expression in python
No ratings yet
Manipulating Text with Regular Expression in python
4 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
Ge Rex
No ratings yet
Ge Rex
32 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
29 pages
Python Complete Unit 3
No ratings yet
Python Complete Unit 3
40 pages
Unit 4 - Regular Expressions
No ratings yet
Unit 4 - Regular Expressions
20 pages
RegEx-in-Python
No ratings yet
RegEx-in-Python
5 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Untitled
No ratings yet
Untitled
53 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expressions - Regexes in Python (Part 1) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 1) - Real Python
44 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
No ratings yet
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
3 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Unit 2
No ratings yet
Unit 2
69 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
RegEx 1
No ratings yet
RegEx 1
48 pages
Advanced Python Programming Practical Manual
No ratings yet
Advanced Python Programming Practical Manual
29 pages
Regular Expressions Python
No ratings yet
Regular Expressions Python
26 pages
23.python Regular Expressions
No ratings yet
23.python Regular Expressions
7 pages
Python - Slide 5
No ratings yet
Python - Slide 5
42 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Python Re Modul
No ratings yet
Python Re Modul
3 pages
PP_Module-3 Notes
No ratings yet
PP_Module-3 Notes
56 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
6 pages
Py Regex
No ratings yet
Py Regex
50 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
UNIT-4 (Regular Expressions)
No ratings yet
UNIT-4 (Regular Expressions)
25 pages
Module II
No ratings yet
Module II
17 pages
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_025-027
No ratings yet
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_025-027
3 pages
8_time, random, datetime
No ratings yet
8_time, random, datetime
22 pages
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_004-006
No ratings yet
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_004-006
3 pages
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_019-021
No ratings yet
CompTIA A+ 220-1001 Core 1 Course Notes by Professor Messers_019-021
3 pages
College Certificate 1
No ratings yet
College Certificate 1
6 pages
7_Functions and Modules
No ratings yet
7_Functions and Modules
38 pages
Ceritificate 1
No ratings yet
Ceritificate 1
1 page
Power Bi
No ratings yet
Power Bi
234 pages
Lab-06-Using Optional Backup Features
No ratings yet
Lab-06-Using Optional Backup Features
6 pages
ID
No ratings yet
ID
2 pages
Abdullah Alkalbani OM
No ratings yet
Abdullah Alkalbani OM
1 page
SCJMapper QGuide.V2.35beta
No ratings yet
SCJMapper QGuide.V2.35beta
36 pages
A Robotic Automatic Assembly System Based On Visio
100% (1)
A Robotic Automatic Assembly System Based On Visio
19 pages
CV Harsh Ranglani
No ratings yet
CV Harsh Ranglani
1 page
VMware Workstation Pro 17 Keys
No ratings yet
VMware Workstation Pro 17 Keys
90 pages
Dynamic Formatting in A VFP9 Report
No ratings yet
Dynamic Formatting in A VFP9 Report
2 pages
DS-2CD6924G0-IHS NFC Datasheet V5.5.84 20220114
No ratings yet
DS-2CD6924G0-IHS NFC Datasheet V5.5.84 20220114
6 pages
Ti 275281169 SK Tu4 Eip C en 0623 Desk 1
No ratings yet
Ti 275281169 SK Tu4 Eip C en 0623 Desk 1
8 pages
3 12th Business Maths Question Bank English Medium
No ratings yet
3 12th Business Maths Question Bank English Medium
109 pages
XAUBOT Guideline2024
No ratings yet
XAUBOT Guideline2024
31 pages
Maths QB by @procbse
100% (1)
Maths QB by @procbse
508 pages
Click On Cost Explorer To Get Monthly EC2 Running Hours, Costs, and Usage
No ratings yet
Click On Cost Explorer To Get Monthly EC2 Running Hours, Costs, and Usage
6 pages
Vinay Srivastava-Updated 2024 Informatica MDM
No ratings yet
Vinay Srivastava-Updated 2024 Informatica MDM
7 pages
Pattern Recognition and Computer Vision Third Chinese Conference PRCV 2020 Nanjing China October 16 18 2020 Proceedings Part III Yuxin Peng download pdf
100% (4)
Pattern Recognition and Computer Vision Third Chinese Conference PRCV 2020 Nanjing China October 16 18 2020 Proceedings Part III Yuxin Peng download pdf
47 pages
f37 Book Intarch Pres Pt5
No ratings yet
f37 Book Intarch Pres Pt5
75 pages
GSX Manual
No ratings yet
GSX Manual
46 pages
Job Seekers Cheat Sheet
No ratings yet
Job Seekers Cheat Sheet
5 pages
MP Online Examination Unit 1 MCQ'S: SAJ MPMCQ
No ratings yet
MP Online Examination Unit 1 MCQ'S: SAJ MPMCQ
20 pages
Acknowledgement
No ratings yet
Acknowledgement
55 pages
The 3.5 - Floppy List For CPC Amstrads. 09-2016
No ratings yet
The 3.5 - Floppy List For CPC Amstrads. 09-2016
7 pages
Data Structures - UNIT - 4
No ratings yet
Data Structures - UNIT - 4
24 pages
Dice Resume CV AKPORODE SHEMI
No ratings yet
Dice Resume CV AKPORODE SHEMI
2 pages
Specification For Llwas System
No ratings yet
Specification For Llwas System
2 pages
MERN Stack Interview Questions (2024)
100% (1)
MERN Stack Interview Questions (2024)
24 pages
Fine-Tuning_and_Chatbot_Planning
No ratings yet
Fine-Tuning_and_Chatbot_Planning
2 pages
E Logic
No ratings yet
E Logic
7 pages