Regular Expressions in Python
Regular Expressions in Python
The power of regular expressions is that they can specify patterns, not just
fixed characters. Many examples in this articles can be found on:
Googles Python Course (https://developers.google.com/edu/python/regular-expressions)
Basic patterns
a, X, 9
ordinary characters just match themselves exactly.
. ^ $ * + ? { [ ] | ( )
meta-characters with special meanings (see below)
. (a period)
matches any single character except newline 'n'
w
matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].
It only matches a single character not a whole word.
W
matches any non-word character.
w+
matches one or more words / characters
b
boundary between word and non-word
s
matches a single whitespace character, space, newline, return, tab, form
S
matches any non-whitespace character.
t, n, r
tab, newline, return
D
matches anything but a digit
d
matches a decimal digit [0-9]
d{1,5}
matches a digit between 1 and 5 in lengths.
{n} d{5}
matches for 5 digits in a row
^
match the start of the string
$
match the of the string end
*
matches 0 or more repetitions
?
matches 0 or 1 characters of whatever precedes it
If you are unsure if a character has special meaning, such as '@', you can
put a slash in front of it, @, to make sure it is treated just as a character.
re.findall
The findall() is probably the single most powerful function in the re module
and we will use that function in this script.
In the example below we create a string that have a text with many email
addresses.
We then create a variable (emails) that will contain a list of all the found
email strings.
Lastly, we use a for loop that we can do something with for each email string
that is found.
(If you want to read more about file handling in Python, we have written a
'Cheat Sheet' that you can find here (https://www.pythonforbeginners.com/cheatsheet/python-fil
e-handling/))
# Open file
f = open('test.txt', 'r')
# Feed the file text into findall(); it returns a list of all the found strings
strings = re.findall(r'some pattern', f.read())
re.search
The re.search() method takes a regular expression pattern and a string and
searches for that pattern within the string.
where:
pattern
regular expression to be matched.
string
the string which would be searched to match the pattern anywhere in the string.
It searches for first occurrence of RE pattern within string with optional flags.
It is common to use the 'r' at the start of the pattern string, that designates
a python "raw" string which passes through backslashes without change which is
very handy for regular expressions.
This example searches for the pattern 'word:' followed by a 3 letter word.
The code match = re.search(pat, str) stores the search result in a variable
named "match".
Then the if-statement tests the match, if true the search succeeded and
match.group() is the matching text (e.g. 'word:cat').
If the match is false, the search did not succeed, and there is no matching text.
else:
print 'did not find'
As you can see in the example below, I have used the | operator, which search for either patte
rn I specify.
import re
programming = ["Python", "Perl", "PHP", "C++"]
pat = "^B|^P|i$|H$"
if re.search(pat,lang,re.IGNORECASE):
print lang , "FOUND"
else:
print lang, "NOT FOUND"
Python FOUND
Perl FOUND
PHP FOUND
C++ NOT FOUND
re.sub
The re.sub() function in the re module can be used to replace substrings.
import re
text = "Python for beginner is a very cool website"
pattern = re.sub("cool", "good", text)
print text2
re.compile
With the re.compile() function we can compile pattern into pattern objects,
which have methods for various operations such as searching for pattern matches
or performing string substitutions.
The first example checks if the input from the user contains only letters,
spaces or . (no digits)
import re
name_check = re.compile(r"[^A-Za-zs.]")
while name_check.search(name):
print "Please enter your name correctly!"
name = raw_input ("Please, enter your name: ")
The second example checks if the input from the user contains only numbers,
parentheses, spaces or hyphen (no letters)
phone_check = re.compile(r"[^0-9s-()]")
while phone_check.search(phone):
print "Please enter your phone correctly!"
phone = raw_input ("Please, enter your phone: ")
@
scan till you see this character
[w.]
a set of characters to potentially match, so w is all alphanumeric characters,
and the trailing period . adds to that set of characters.
+
one or more of the previous set.
Because this regex is matching the period character and every alphanumeric
after an @, it'll match email domains even in the middle of sentences.
import re
domain = re.search("@[w.]+", s)
print domain.group()
outputs:
@gmail.com
More Reading
https://developers.google.com/edu/python/regular-expressions (https://developers.google.com/ed
u/python/regular-expressions)
http://www.doughellmann.com/PyMOTW/re/ (http://www.doughellmann.com/PyMOTW/re/)
http://www.daniweb.com/ (http://www.daniweb.com/software-development/python/tutorials/238544/s
imple-regex-tutorial#)
Datacamp (https://www.datacamp.com/?tap_a=5644-dce66f&tap_s=75426-9cf8ad&tm_source=recommended)
provides online interactive courses that combine interactive coding challenges with videos from top instructors in
the field.
Datacamp has beginner to advanced Python training that programmers of all levels benefit from.
5 reasons to learn … Microsoft Introduc… Python Secure FTP … 8 surprising bene t… 5 billionaires
Tweet
Like 0 Share
markus spiske
markus spiske
Sponsored
0 Comments Pythonforbeginners.com
1 Login
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the
link and purchase the item, I will receive an affiliate commission. Regardless, PythonForBeginners.com only recommend
products or services that we try personally and believe will add value to our readers.
Search SEARCH
Categories
Basics (/basics/)
Cheatsheet (/cheatsheet/)
Development (/development/)
Dictionary (/dictionary/)
Lists (/lists/)
Loops (/loops/)
Modules (/modules-in-python/)
Strings (/python-strings/)
pythonforbeginners
@pythonbeginners
How to Use Reddit API in Python ow.ly/Uq3s30iRnrE#reddit #python #coding
© Python For Beginners (https://www.pythonforbeginners.com) 2012-2017 | Privacy Policy (/privacy-policy/) | Write
For Us (/write/) | Contact Us (/contact-us/)