Implementation of Regular Expression
Implementation of Regular Expression
Implementation:
R E GE X IN PY T H ON
By,
Sadiya Fatima Khwaja (3178)
Overview
• Introduction
• The re module
• Implementation
Introduction
• Regular Expressions (regex) are simple expressions that can
describe the language that finite automata accept. It is the most
efficient method of representing any language.
• A regular expression can also be defined as a pattern sequence
that defines a string.
• Regular Expressions are helpful in a wide range of text processing
tasks and string processing in general, where the data does not
have to be textual.
• For example:
• Data validation,
• Data scraping (particularly web scraping),
• Simple parsing
• The creation of syntax highlighting systems and a variety of other tasks are
typical applications.
The re module in Python:
Function Description
• Regex functionality in Python
resides in a module named re.findall() finds and returns all matching
re. occurrences in a list
re.compil Regular expressions are compiled into
• How to Import re : e() pattern objects
import re re.split() Split string by the occurrences of a
character or a pattern.
• re module contains many re.sub() Replaces all occurrences of a character
functions that help us to or patter with a replacement string.
search a string for a match. re.escape Escapes special character
()
Searches for first occurrence of
re.search() character or pattern
Metacharacters Supported by the re
Module
Character(s Meaning
)
• The following table . Matches any single character
briefly summarizes all except newline
the metacharacters
supported by the re ^ • Anchors a match at the start of a
string
module. Some • Complements a character class
characters serve more
$ Anchors a match at the end of a
than one purpose: string
+ Matches one or more repetitions
\ • Escapes a metacharacter of its
special meaning
• Introduces a special character
class
• Introduces a grouping
backreference
[] Specifies a character class
Implementation: code
import re
• \.: This matches a literal dot. The backslash (\) is used to escape the dot
(.), as the dot is a special character in regex that matches any single
character except a newline.
• [a-zA-Z0-9-.]+: One or more alphanumeric characters, dots, or hyphens.
• $: End of the string.
2. Compile the regex pattern: Compiling the pattern can improve
performance, especially if it will be used multiple times.
3. Match the pattern: The match() method checks if the pattern matches
the entire string.
Thank you