
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Regular Expression Back References in Python
Backreferences in regular expressions allow us to reuse a previously recorded group inside the same regex pattern. This ability is very useful when we want to match recurrent patterns in strings.
What are Backreferences?
A regular expression reference to a previously recorded group is called a backreference. When parentheses "()" are used in a regex pattern, a group is formed. Each group is assigned a number; the number for the first group is 1. We can refer to these recorded groups in our regex by using the backslash \ after the group number.
Basic Syntax
Here is the basic syntax we can use to define a backreference -
-
(\w+): It is used to capture a word as the first group.
-
\1: It is used to refer to the first recorded group.
So, backreferences simplify the patterns that repeat. It is also used to match complex data structures like phrases and identifiers. It can efficiently find duplicates in the given strings.
Examples 1
This program checks a given string for words that appear more than once. After the regex (\w+) records a word, \1 matches the same word again after some whitespace. It returns a list of phrases that are repeated.
import re # Pattern to find repeated words pattern = r'\b(\w+)\s+\1\b' text = "This is is just a test test string" # Find all matches matches = re.findall(pattern, text) print("Repeated Words:", matches)
Here is the output of the above program -
Repeated Words: ['is', 'test']
Example 2
In this example, we will search for the same string pairs with a space between them. Immediately after capturing a word and checking if it recurs, the regex returns the duplicated strings.
import re # Pattern to find duplicate strings pattern = r'([a-zA-Z]+) \1' text = "hello hello world world example example" # Find all matches matches = re.findall(pattern, text) print("Duplicated Strings:", matches)
Below is the result of the above program -
Duplicated Strings: ['hello', 'world', 'example']
Example 3
To validate simple hex color codes, the regex captures six hexadecimal digits. When the application finds duplicates of the same color code in the text, it gives valid matches.
import re # Pattern to validate simple hex color codes pattern = r'#([0-9A-Fa-f]{6})\s+#\1' text = "#AFAFAF #AFAFAF this is not a color #123456 #123456" # Find all matches matches = re.findall(pattern, text) print("Valid Hex Colors:", matches)
This will produce the following result -
Valid Hex Colors: ['AFAFAF', '123456']
Example 4
This program uses a regex to search for palindromic patterns. The regex exposes possible palindrome-forming sequences by removing a word to see if it immediately repeats itself.
import re # Pattern to find palindromic patterns pattern = r'(\w+)(?=\1)' text = "madam racecar level deified not a palindrome" # Find all matches matches = re.findall(pattern, text) print("Palindromic Patterns:", matches)
This will lead to the following outcome -
Palindromic Patterns: []