Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

regular exp

Regular Expressions (RegEx) are sequences of characters that define search patterns for strings, allowing for matching, splitting, and replacing text. Python's built-in 're' module provides functions such as search, match, findall, split, and sub to work with RegEx. Meta characters and special sequences are used to enhance pattern matching capabilities in strings.

Uploaded by

sr0935364
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

regular exp

Regular Expressions (RegEx) are sequences of characters that define search patterns for strings, allowing for matching, splitting, and replacing text. Python's built-in 're' module provides functions such as search, match, findall, split, and sub to work with RegEx. Meta characters and special sequences are used to enhance pattern matching capabilities in strings.

Uploaded by

sr0935364
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Regular Expressions

A Regular Expressions (RegEx) is a special sequence of characters that uses a search pattern to find a
string or set of strings. It can detect the presence or absence of a text by matching it with a particular
pattern, and also can split a pattern into one or more sub-patterns. Python provides a re module that
supports the use of regex in Python. Its primary function is to offer a search, where it takes a regular
expression and a string. Here, it either returns the first match or else none.

RegEx Module

Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

import re

RegEx in Python

When you have imported the re module, you can start using regular expressions:

Example-1:

Search the string to see if it starts with "My"and ends with "nagendra":

import re

a = "My name is nagendra"

x = re.search("^My.*nagendra$",a)

if x:

print("YES! We have a match!")

else:

print("No match")

output:

YES! We have a match!

Example-2:

import re

s = 'my name is nagendra'

print(re.search('nagendra', s))

output:

<re.Match object; span=(11, 19), match='nagendra'>

RegEx Functions

The re module offers a set of functions that allows us to search a string for match:
Function Description

Findall Returns a list containing all matches

Search Returns a Match object, if there is a match anywhere in the string

Split Returns a list where the string has been split at each match

Sub Replaces one or many matches with a string

Match Returns a Match object if there is a match starting in the string.

The match() Function

The match() function searches the string for a match, and returns a Match object if there is a match at
the starting of my string.

Example-1:

import re

a = "my name is nagendra, my age is 24"

x = re.match("my", a)

print(x)

Output:

<re.Match object; span=(0, 2), match='My'>

Example-2:

import re

a = "my name is nagendra, my age is 24"

x = re.match("The", a)

print(x)

Output:

None

The search() Function

The search() function searches the string for a match, and returns a Match,if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

Match Object

A Match Object is an object containing information about the search and the result.

Note: If there is no match, the value None will be returned, instead of the Match Object.

Example-1:
import re

a = "my name is nagendra, my age is 24"

x = re.search("my", a) #here, my is match object

print(x)

Output:

<re.Match object; span=(0, 2), match='My'>

Example-2:

import re

a = "my name is nagendra, my age is 24"

x = re.search("24", a)

print(x)

Output:

<re.Match object; span=(31, 33), match='24'>

The findall() Function

The findall() function returns a list containing all matches.

Example-1:

import re

a = "my name is nagendra, my age is 24"

x = re.findall("My", a)

print(x)

Output:

['my', 'my']

The split() Function

The split() function returns a list where the string has been split at each match:

Example-1:

import re

a = "my name is nagendra, my age is 24"

x = re.split("nagendra", a)

print(x)
Output:

['my name is ', ', my age is 24']

The sub() Function

The sub() function replaces the matches with the text of your choice:

Example-1:

import re

a = "my name is nagendra, my age is 24"

x = re.sub("nagendra","rahul", a)

print(x)

Output:

my name is rahul, my age is 24

Meta Characters

To understand the RE analogy, Meta Characters are useful, important, and will be used in functions of
module re. Below is the list of meta characters.

Meta Characters Description

\ -Used to drop the special meaning of character following it

[] -Represent a character class

^ -Matches the beginning

$ -Matches the end

. -Matches any character except newline

| -Means OR (Matches with any of the characters separated by it.

? Matches zero or one occurrence

* -Any number of occurrences (including 0 occurrences)

+ -One or more occurrences

{} -Indicate the number of occurrences of a preceding regex to match.

() -Enclose a group of Regex

[ ]- (Square Brackets)

Square Brackets ([]) represent a character class consisting of a set of

characters that we wish to match. For example, the character class [abc]

will match any single a, b, or c.


We can also specify a range of characters using – inside the square brackets. For example,

• [0-3] is sample as [0123]

• [a-c] is same as [abc]

We can also invert the character class using the caret(^) symbol. For example,

• [^0-3] means any number except 0, 1, 2, or 3

• [^a-c] means any character except a, b, or c

example-1:

import re

a= "my name is nagendra"

print(re.findall("[a-m]”, a))

output:

['m', 'a', 'm', 'e', 'i', 'a', 'g', 'e', 'd', 'a']

example-2:

import re

a= "my name is nagendra, my age is 25"

print(re.findall("[0-9]”, a))

output:

['2', '5']

^ (Caret)

Caret (^) symbol matches the beginning of the string i.e. checks whether

the string starts with the given character(s) or not. For example –

• ^B will check if the string starts with g such as Btech, Ball, BOX etc.

• ^BTECH will check if the string starts with BTECH such as BTECH

HYDERABAD, BTECH AIML, BTECH CSE etc.

example-1

import re

a = 'Btech hyderabad'

result = re.match(‘^Btech’, a)

if result:

print("Search successful.")
else:

print("Search unsuccessful.")

output: Search successful.

$ (Dollar)

Dollar($) symbol matches the end of the string i.e checks whether the string ends with the given
character(s) or not. For example –

• s$ will check for the string that ends with a such as geeks, ends, setc.

• ks$ will check for the string that ends with ks such as marks, ks, etc.

example-1:

import re

a = 'Btech'

result = re.search(‘h$’, a)

if result:

print("Search successful.")

else:

print("Search unsuccessful.")

output: Search successful.

. (Dot)

Dot(.) symbol matches only a single character except for the newline character (\n). For example –

• a.b will check for the string that contains any character at the place of the dot such as acb, adb, arb,
a1b, etc...

• .. will check if the string contains at least 2 characters.for example

a..b will check for the string that contains any two character at the place of the dot such as acrb,
adhb, arfb, a12b, etc…

example-1:

import re

a= "hello hyderabad"

x = re.findall("he..o", a)

print(x)

output:['hello']
| (Or)

Or symbol works as the or operator meaning it checks whether the pattern before or after the or
symbol is present in the string or not. For example –

• btech|mtech will match any string that contains btech or mtech.

Example-1:

import re

a= "i am from btech and i am from mtech "

x = re.findall("btech|mtech", a)

print(x)

output:

['btech', 'mtech']

Example-2:

import re

a= "i am nagendra and i am from BTECH"

x = re.findall(" BTECH | MTECH ", a)

print(x)

output:

['btech']

? (Question Mark)

The question mark symbol ? matches zero or one occurrence of the pattern left to it.

Expression String Matched?

ma?n mn 1 match

man 1 match

maaan No match (more than one a character)

main No match (a is not followed by n)

woman 1 match

example-1:

import re
a= "i am a man"

x = re.findall("ma?n", a)

print(x)

output:

['man']

example-2:

import re

a= "i am a maaaan"

x = re.findall("ma?n", a)

print(x)

output:

[ ] ( output is empty because a repeated more than once. The question mark symbol ? matches zero or
one occurrence of the pattern left to it.)

* (Star)

The star symbol * matches zero or more occurrences of the pattern left to it.

Expression String Matched?

ma*n mn 1 match

man 1 match

maaan 1 match

main No match (a is not followed by n)

woman 1 match

example-1:

import re

a= "i am a maaan"

x = re.findall("ma*n", a)

print(x)

output: ['maaan']

+ (Plus)

The plus symbol + matches one or more occurrences of the pattern left to it.

Expression String Matched?


ma+n mn No match (no a character)

man 1 match

maaan 1 match

main No match (a is not followed by n)

woman 1 match

example-1:

import re

a= "i am a maaan"

x = re.findall("ma+n", a)

print(x)

output:['maaan']

{ } (Braces)

Consider this code: {n,m}. This means at least n, and at most m repetitions of the pattern left to it.

Example-1:

import re

a= "my name is nagendra, my age is 25"

x = re.findall("my{1,3}", a)

print(x)

output: ['my', 'my']

from above pattern my{1,3} mean --if “my” is present at least once and maximum three time then it
will print “my” from above example “my” is present twice, so it will print my twice

( ) -Group

Group symbol is used to group sub-patterns.

List of special sequences

Special sequences do not match for the actual character in the string ,instead it tells the specific
location in the search string where the match must occur. It makes it easier to write commonly used
patterns.

Special Sequence Description


\A Matches if the string begins with the given character

\b Matches if the word begins or ends with the given character.\b(string) will check for the
beginning of the word and (string)\b will check for the ending of the word.

\B It is the opposite of the \b i.e. the string should not start or end with the given regex.
\d Matches any decimal digit, this is equivalent to the set class [0-9]

\D Matches any non-digit character, this is equivalent to the set class [^0-9]

\s Matches any whitespace character.

\S Matches any non-whitespace character

\w Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_].

\W Matches any non-alphanumeric character.

\Z Matches if the string ends with the given regex

Program:

import re

a="my name is nagendra, my age is 25"

b=re.findall("\Amy",a) output of \A is ['my']

c=re.findall("\w",a) output of \w is ['m', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'n', 'a', 'g', 'e', 'n', 'd', 'r', 'a',
'm', 'y', 'a', 'g', 'e', 'i', 's', '2', '5']

d=re.findall("\W",a) output of \W is ['', '', '', ',', '', '', '', '']

e=re.findall("\d",a) output of \d is ['2', '5']

f=re.findall("\D",a) output of \D is ['m', 'y', '', 'n', 'a', 'm', 'e', '', 'i', 's', '', 'n', 'a', 'g', 'e', 'n', 'd','r', 'a', ',', '',

'm', 'y', '', 'a', 'g', 'e', '', 'i', 's', '']

g=re.findall("\s",a) output of \s is ['', '', '', '', '', '', '']

h=re.findall("\S",a) output of \S is ['m', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'n', 'a', 'g', 'e', 'n', 'd', 'r', 'a',

',', 'm', 'y', 'a', 'g', 'e', 'i', 's', '2', '5']

i=re.findall(r"\bna", a) output of \b is ['na’]

j=re.findall(r"ra\b",a) output of \b is ['ra’]

print("output of \A is ",b) , print("output of \w is ",c),

print("output of \W is ",d),print("output of \d is ",e)

print("output of \D is ",f),print("output of \s is ",g)

print("output of \S is ",h)

print("output of \b is ",i), print("output of \b is ",j)

You might also like