0% found this document useful (0 votes)

20 views

Lecture 6 Re Basics

Uploaded by

mudassirsabri45

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Lecture 6 Re Basics

Uploaded by

mudassirsabri45

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

A Regular Expressions (RegEx)

• is a special sequence of characters that uses a search pattern to find a string or set of
strings. It can detect the presence or absence of a text by matching with a particular
pattern, and also can split a pattern into one or more sub-patterns.

• Python provides a re module that supports the use of regex in Python. Its primary
function is to offer a search, where it takes a regular expression and a string. Here, it
either returns the first match or else none.

• A regular expression is a special sequence of characters that helps you match or find
other strings or sets of strings, using a specialized syntax held in a pattern. Regular
expressions are widely used in UNIX world.

• The Python module re provides full support for Perl-like regular expressions in
Python. The re module raises the exception re.error if an error occurs while
compiling or using a regular expression.

• We would cover two important functions, which would be used to handle regular
expressions. But a small thing first: There are various characters, which would have
special meaning when they are used in regular expression. To avoid any confusion
while dealing with regular expressions, we would use Raw Strings as r'expression'.
Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly
specialized programming language embedded inside Python and made available through the re
module. Using this little language, you specify the rules for the set of possible strings that you
want to match; this set might contain English sentences, or e-mail addresses, or TeX
commands, or anything you like. You can then ask questions such as “Does this string match the
pattern?”, or “Is there a match for the pattern anywhere in this string?”. You can also use REs to
modify a string or to split it apart in various ways.

Regular expression patterns are compiled into a series of bytecodes which are then executed by
a matching engine written in C. For advanced use, it may be necessary to pay careful attention to
how the engine will execute a given RE, and write the RE in a certain way in order to produce
bytecode that runs faster. Optimization isn’t covered in this document, because it requires that
you have a good understanding of the matching engine’s internals.

The regular expression language is relatively small and restricted, so not all possible string
processing tasks can be done using regular expressions. There are also tasks that can be done
with regular expressions, but the expressions turn out to be very complicated. In these cases,
you may be better off writing Python code to do the processing; while Python code will be
slower than an elaborate regular expression, it will also probably be more understandable.

import re
s = "Welcome to Artificial intelligence"
res = re.search(r"\D{3} t",s)
print(res.group())

ome t

# Code gives the starting index and the ending index of the string
"characters".
import re

s = 'a special sequence of characters that uses a characters search

pattern to find a string or set of strings'

match = re.search(r'characters', s)
# Here r character (r'characters') stands for raw
# The raw string is slightly different from a regular string, it won’t
interpret the \ character as an escape character.
# This is because the regular expression engine uses \ character for
its own escaping purpose.

print(match)
print('Start Index:', match.start())
print('End Index:', match.end())
print('span Index:', match.span())

<re.Match object; span=(22, 32), match='characters'>

Start Index: 22
End Index: 32
span Index: (22, 32)

Match Object
A Match object contains all the information about the search and the result and if there is no
match found then None will be returned. Let’s see some of the commonly used methods and
attributes of the match object.

Getting the string and the regex

match.re attribute returns the regular expression passed and match.string attribute returns the
string passed.

Getting index of matched object

• start() method returns the starting index of the matched substring
• end() method returns the ending index of the matched substring
• span() method returns a tuple containing the starting and the ending index of the
matched substring
#!/usr/bin/python
import re

string = "Cats are smarter than dogs"

matchObj = re.match( r'(.) are (.?) .*', string, re.M|re.I)

if matchObj:
print("matchObj.group() : ", matchObj.group())
print("matchObj.group(1) : ", matchObj.group(1))
print("matchObj.group(2) : ", matchObj.group(2))
else:
print("No match!!")

matchObj.group() : Cats are smarter than dogs

matchObj.group(1) : Cats
matchObj.group(2) : smarter

import re

s = "Welcome to Artificial intelligence"

# here x is the match object

res = re.search(r"\bArti", s)

print(res.start())
print(res.end())
print(res.span())

11
15
(11, 15)

MetaCharacters Description
• Used to drop the special meaning of character following it.
• [] Represent a character class.
• ^ Matches the beginning.
• $ Matches the end.
• . Matches any character except newline.
• | Means OR (Matches with any of the characters separated by it.
• ? Matches zero or one occurrence.
• * Any number of occurrences (including 0 occurrences).
• + One or more occurrences.
• {} Indicate the number of occurrences of a preceding regex to match.
• () Enclose a group of Regex.
– Backslash
• The backslash () makes sure that the character is not treated in a special way. This can be
considered a way of escaping metacharacters.
• For example, if you want to search for the dot(.) in the string then you will find that dot(.)
will be treated as a special character as is one of the metacharacters (as shown in the
above table).
• So for this case, we will use the backslash() just before the dot(.) so that it will lose its
specialty. See the below example for a better understanding.
# A Python program to demonstrate working of re.match().
import re
s = 'Artificail .Intelligence'

# without using \
match = re.search(r'.', s)
print(match)

# using \
match = re.search(r'\.', s)
print(match)

<re.Match object; span=(0, 1), match='A'>

<re.Match object; span=(11, 12), match='.'>

import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments

num = re.sub(r'#.*$', "", phone)
print("Phone Num : ", num)

# Remove anything other than digits

num = re.sub(r'\D', "", phone)
print("Phone Num : ", num)

Phone Num : 2004-959-559 F

Phone Num : 2004959559

import re

string = '39801 356, 2102 1111'

# Three digit number followed by space followed by two digit number

pattern = '(\d{3}) (\d{2})'

# match variable contains a Match object.

match = re.search(pattern, string)
if match:
print(match.group())
else:
print("pattern not found")

801 35

[] Square Brackets
• Square Brackets ([]) represents a character class consisting of a set of characters
that we wish to match. For example, the character class [abc] will match any single
a, b, or c.

• We can also specify a range of characters using – inside the square brackets. For
example,

• [0, 3] is sample as [0123]

• [a-c] is same as [abc]

• We can also invert the character class using the caret(^) symbol. For example,

• [^0-3] means any number except 0, 1, 2, or 3

• [^a-c] means any character except a, b, or c

^ Caret
• Caret (^) symbol matches the beginning of the string i.e. checks whether the string
starts with the given character(s) or not. For example –

• ^g will check if the string starts with g such as geeks, globe, girl, g, etc.

• ^ge will check if the string starts with ge such as geeks, geeksforgeeks, etc.

$ Dollar
• Dollar($) symbol matches the end of the string i.e checks whether the string ends
with the given character(s) or not. For example –

• s$ will check for the string that ends with a such as geeks, ends, s, etc.
• ks$ will check for the string that ends with ks such as geeks, geeksforgeeks, ks, etc.

. Dot
• Dot(.) symbol matches only a single character except for the newline character (\n).
For example –

• a.b will check for the string that contains any character at the place of the dot such
as acb, acbd, abbb, etc

• .. will check if the string contains at least 2 characters

| Or
• Or symbol works as the or operator meaning it checks whether the pattern before
or after the or symbol is present in the string or not. For example –

• a|b will match any string that contains a or b such as acd, bcd, abcd, etc.

? Question Mark
• Question mark(?) checks if the string before the question mark in the regex occurs
at least once or not at all. For example –

• ab?c will be matched for the string ac, acb, dabc but will not be matched for abbc
because there are two b. Similarly, it will not be matched for abdc because b is not
followed by c.

* Star
• Star () symbol matches zero or more occurrences of the regex preceding the symbol.
For example –

• ab*c will be matched for the string ac, abc, abbbc, dabc, etc. but will not be matched
for abdc because b is not followed by c.
+ Plus
• Plus (+) symbol matches one or more occurrences of the regex preceding the +
symbol. For example –

• ab+c will be matched for the string abc, abbc, dabc, but will not be matched for ac,
abdc because there is no b in ac and b is not followed by c in abdc.

{m, n} Braces
• Braces match any repetitions preceding regex from m to n both inclusive. For
example –

• a{2, 4} will be matched for the string aaab, baaaac, gaad, but will not be matched for
strings like abc, bc because there is only one a or no a in both the cases.

() Group
• Group symbol is used to group sub-patterns. For example –

• (a|b)cd will match for strings like acd, abcd, gacd, etc.

Special Sequence Description Examples

• \A Matches if the string begins with the given character. ( \Afor | for geeks | for the world)
• \b Matches if the word begins or ends with the given character. \b(string) will check for
the beginning of the word and (string)\b will check for the ending of the word. (\bge |
geeks | get)
• \B It is the opposite of the \b i.e. the string should not start or end with the given regex. (\
Bge | together | forge)
• \d Matches any decimal digit, this is equivalent to the set class [0-9] ( \d |123 | gee1)
• \D Matches any non-digit character, this is equivalent to the set class [^0-9] (\D | geeks |
geek1)
• \s Matches any whitespace character. (\s gee ks a bc a)
• \S Matches any non-whitespace character (\S a bd abcd)
• \w Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_]. (\w |
123 | geeKs4)
• \W Matches any non-alphanumeric character. (\W | >$ | gee<>)
• \Z Matches if the string ends with the given regex (ab\Z |abcdab | abababab)

re.findall()
• Return all non-overlapping matches of pattern in string, as a list of strings. The string is
scanned left-to-right, and matches are returned in the order found.
# A Python program to demonstrate working of
# findall()
import re

# A sample text string where regular expression

# is searched.
string = """Hello my Number is 123456789 and
my friend's number is 987654321"""

# A sample regular expression to find digits.

regex = '\d+' #\d Matches any decimal digit, this is equivalent to
the set class [0-9]

match = re.findall(regex, string)

print(match)

# This example is contributed by Ayush Saluja.

['123456789', '987654321']

re.compile()
• Regular expressions are compiled into pattern objects, which have methods for various
operations such as searching for pattern matches or performing string substitutions.
# Module Regular Expression is imported
# using __import__().
import re

# compile() creates regular expression

# character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with
# 'a', 'b', 'c', 'd', 'e'.
p = re.compile('[^a-e]')

# findall() searches for the Regular Expression

# and return a list upon finding
print(p.findall("Aye, said Mr. Gibenson Stark"))

['A', 'y', ',', ' ', 's', 'i', ' ', 'M', 'r', '.', ' ', 'G', 'i', 'n',
's', 'o', 'n', ' ', 'S', 't', 'r', 'k']

# Module Regular Expression is imported

# using __import__().
import re

# compile() creates regular expression

# character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with
# 'a', 'b', 'c', 'd', 'e'.
p = re.compile('[$s]')

# findall() searches for the Regular Expression

# and return a list upon finding
print(p.findall("Aye, said Mr. Gibenson Stark"))

['s', 's']

# Module Regular Expression is imported

# using __import__().
import re

# compile() creates regular expression

# character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with
# 'a', 'b', 'c', 'd', 'e'.
p = re.compile('[^s]')

# findall() searches for the Regular Expression

# and return a list upon finding
print(p.findall("Aye, said Mr. Gibenson Stark"))

['A', 'y', 'e', ',', ' ', 'a', 'i', 'd', ' ', 'M', 'r', '.', ' ', 'G',
'i', 'b', 'e', 'n', 'o', 'n', ' ', 'S', 't', 'a', 'r', 'k']

Understanding the Output:

• First occurrence is ‘e’ in “Aye” and not ‘A’, as it being Case Sensitive.
• Next Occurrence is ‘a’ in “said”, then ‘d’ in “said”, followed by ‘b’ and ‘e’ in “Gibenson”,
the Last ‘a’ matches with “Stark”.
• Metacharacter backslash ‘’ has a very important role as it signals various sequences. If
the backslash is to be used without its special meaning as metacharacter, use’\’
Example 2: Set class [\s,.] will match any
whitespace character, ‘,’, or, ‘.’
import re

# \d is equivalent to [0-9].
p = re.compile('\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

# \d+ will match a group on [0-9], group

# of one or greater size
p = re.compile('\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

['1', '1', '4', '1', '8', '8', '6']

['11', '4', '1886']

import re

# \w is equivalent to [a-zA-Z0-9_].
p = re.compile('\w')
print(p.findall("He said * in some_lang."))

# \w+ matches to group of alphanumeric character.

p = re.compile('\w+')
print(p.findall("I went to him at 11 A.M., he \
said *** in some_language."))

# \W matches to non alphanumeric characters.

p = re.compile('\W')
print(p.findall("he said *** in some_language."))

['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_', 'l',
'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said', 'in',
'some_language']
[' ', ' ', '*', '*', '*', ' ', ' ', '.']

import re

# '*' replaces the no. of occurrence

# of a character.
p = re.compile('ab*')
print(p.findall("ababbaabbb"))

['ab', 'abb', 'a', 'abbb']

Understanding the Output:
• Our RE is ab*, which ‘a’ accompanied by any no. of ‘b’s, starting from 0.
• Output ‘ab’, is valid because of single ‘a’ accompanied by single ‘b’.
• Output ‘abb’, is valid because of single ‘a’ accompanied by 2 ‘b’.
• Output ‘a’, is valid because of single ‘a’ accompanied by 0 ‘b’.
• Output ‘abbb’, is valid because of single ‘a’ accompanied by 3 ‘b’.

# A Python program to demonstrate working of re.match().

import re

# Lets use a regular expression to match a date string

# in the form of Month name followed by day number
regex = r"([a-zA-Z]+) (\d+)"

match = re.search(regex, "I was born on June 24")

if match != None:

# We reach here when the expression "([a-zA-Z]+) (\d+)"

# matches the date string.

# This will print [14, 21), since it matches at index 14

# and ends at 21.
print ("Match at index %s, %s" % (match.start(), match.end()))

# We us group() method to get all the matches and

# captured groups. The groups contain the matched values.
# In particular:
# match.group(0) always returns the fully matched string
# match.group(1) match.group(2), ... return the capture
# groups in order from left to right in the input string
# match.group() is equivalent to match.group(0)

# So this will print "June 24"

print ('date = ', match.group(0))

# So this will print "June"

print ("Month: %s" % (match.group(1)))

# So this will print "24"

print ("Day: %s" % (match.group(2)))

else:
print ("The regex pattern does not match.")

Match at index 14, 21

Full match: June 24
Month: June
Day: 24

Capstone Story Template
No ratings yet
Capstone Story Template
20 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Karvonite 4.0: Release Candidate
No ratings yet
Karvonite 4.0: Release Candidate
28 pages
Microservices Design Guide - Platform Engineer - Medium
No ratings yet
Microservices Design Guide - Platform Engineer - Medium
34 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
18 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
20 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
howto-regex
No ratings yet
howto-regex
20 pages
Python How To Regex
No ratings yet
Python How To Regex
19 pages
Structuring with regix
No ratings yet
Structuring with regix
49 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
PP_Module-3 Notes
No ratings yet
PP_Module-3 Notes
56 pages
Python Regex
No ratings yet
Python Regex
8 pages
RegEx-in-Python
No ratings yet
RegEx-in-Python
5 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Python RegEx
No ratings yet
Python RegEx
8 pages
Supplement Python Regular Expression
No ratings yet
Supplement Python Regular Expression
6 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Regular-Expressions-Cheat-Sheet
No ratings yet
Regular-Expressions-Cheat-Sheet
5 pages
RegEx 1
No ratings yet
RegEx 1
48 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Regular Expressions Python
No ratings yet
Regular Expressions Python
26 pages
Regular Expression Howto: A.M. Kuchling
No ratings yet
Regular Expression Howto: A.M. Kuchling
20 pages
regular exp
No ratings yet
regular exp
10 pages
9Python-Simple-Character-Matches
No ratings yet
9Python-Simple-Character-Matches
19 pages
Regular Expressions
No ratings yet
Regular Expressions
104 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
29 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
Regularexpressionsnew 140722045337 Phpapp01
No ratings yet
Regularexpressionsnew 140722045337 Phpapp01
26 pages
Python unit 3
No ratings yet
Python unit 3
46 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Regular Expressions (Slides)
No ratings yet
Regular Expressions (Slides)
20 pages
Regular Expression 4
No ratings yet
Regular Expression 4
16 pages
Module II
No ratings yet
Module II
17 pages
RegEx in Python (4)
No ratings yet
RegEx in Python (4)
6 pages
Manipulating Text with Regular Expression in python
No ratings yet
Manipulating Text with Regular Expression in python
4 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Continuous Integration
No ratings yet
Continuous Integration
6 pages
Tut 01
No ratings yet
Tut 01
2 pages
Bresenham
No ratings yet
Bresenham
73 pages
Bsod 1 - 50
0% (1)
Bsod 1 - 50
59 pages
Systemverilog Qref PDF
No ratings yet
Systemverilog Qref PDF
3 pages
PHP String Function
No ratings yet
PHP String Function
6 pages
Assignment 3 DM in SSAS
No ratings yet
Assignment 3 DM in SSAS
37 pages
Java Swing Source Code
No ratings yet
Java Swing Source Code
19 pages
Unit-IV_Files
No ratings yet
Unit-IV_Files
33 pages
Object Oriented Software Engineering
No ratings yet
Object Oriented Software Engineering
2 pages
Full Learning Python 4th Edition Mark Lutz Ebook All Chapters
100% (14)
Full Learning Python 4th Edition Mark Lutz Ebook All Chapters
70 pages
Assignment 1
No ratings yet
Assignment 1
42 pages
Rajeev D Hand
No ratings yet
Rajeev D Hand
5 pages
Important Methods
No ratings yet
Important Methods
7 pages
OOP Mentor Exam Apple
No ratings yet
OOP Mentor Exam Apple
5 pages
70+ IT skill course PDF
No ratings yet
70+ IT skill course PDF
4 pages
Assignment One
No ratings yet
Assignment One
13 pages
Data Science-Lab Manual
100% (1)
Data Science-Lab Manual
15 pages
GC Reddymanualtesting
No ratings yet
GC Reddymanualtesting
41 pages
E Insurance Project
No ratings yet
E Insurance Project
10 pages
CS-235 0bject Oriented Programming With C++ Complete Lecture Notes
No ratings yet
CS-235 0bject Oriented Programming With C++ Complete Lecture Notes
242 pages
Project Report On Shopping Application
No ratings yet
Project Report On Shopping Application
118 pages
SAP UI5 Course Content
No ratings yet
SAP UI5 Course Content
4 pages
Case Tools 1
No ratings yet
Case Tools 1
47 pages
Mobile App Security
No ratings yet
Mobile App Security
11 pages
CSCI 2122 Assignment 3
No ratings yet
CSCI 2122 Assignment 3
6 pages
Web Services Spring Boot JPA Hibernate
No ratings yet
Web Services Spring Boot JPA Hibernate
9 pages