Python - All substrings Frequency in String
Last Updated :
16 May, 2023
Given a String, extract all unique substrings with their frequency.
Input : test_str = "ababa"
Output : {'a': 3, 'ab': 2, 'aba': 2, 'abab': 1, 'ababa': 1, 'b': 2, 'ba': 2, 'bab': 1, 'baba': 1}
Explanation : All substrings with their frequency extracted.
Input : test_str = "GFGF"
Output : {'G': 2, 'GF': 2, 'GFG': 1, 'GFGF': 1, 'F': 2, 'FG': 1, 'FGF': 1}
Explanation : All substrings with their frequency extracted.
Method #1: Using count() method
First, we need to find all the substrings then count() method can be used to find the frequency of a substring and store it in the dictionary. Then, simply print the dictionary.
Python3
# Python3 code to demonstrate working of
# All substrings Frequency in String
# Using loop + list comprehension
# initializing string
test_str = "abababa"
# printing original string
print("The original string is : " + str(test_str))
# list comprehension to extract substrings
temp = [test_str[idx: j] for idx in range(len(test_str)) for j in range(idx + 1, len(test_str) + 1)]
# loop to extract final result of frequencies
d=dict()
for i in temp:
d[i]=test_str.count(i)
# printing result
print("Extracted frequency dictionary : " + str(d))
OutputThe original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 2, 'abab': 1, 'ababa': 1, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 1, 'baba': 1, 'babab': 1, 'bababa': 1}
Method #2: Using loop + list comprehension
The combination of the above functionalities can be used to solve this problem. In this, we first extract all the substrings using list comprehension, post that loop is used to increase frequency.
Python3
# Python3 code to demonstrate working of
# All substrings Frequency in String
# Using loop + list comprehension
# initializing string
test_str = "abababa"
# printing original string
print("The original string is : " + str(test_str))
# list comprehension to extract substrings
temp = [test_str[idx: j] for idx in range(len(test_str))
for j in range(idx + 1, len(test_str) + 1)]
# loop to extract final result of frequencies
res = {}
for idx in temp:
if idx not in res.keys():
res[idx] = 1
else:
res[idx] += 1
# printing result
print("Extracted frequency dictionary : " + str(res))
OutputThe original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 3, 'abab': 2, 'ababa': 2, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 2, 'baba': 2, 'babab': 1, 'bababa': 1}
Method #3: Using list comprehension
This is yet another way in which this task can be performed. In this, we perform both the tasks, of extracting substring and computing frequency in a single nested list comprehension.
Python3
# Python3 code to demonstrate working of
# All substrings Frequency in String
# Using list comprehension
# initializing string
test_str = "abababa"
# printing original string
print("The original string is : " + str(test_str))
# list comprehension to extract substrings and frequency
res = dict()
for ele in [test_str[idx: j] for idx in range(len(test_str)) for j in range(idx + 1, len(test_str) + 1)]:
res[ele] = 1 if ele not in res.keys() else res[ele] + 1
# printing result
print("Extracted frequency dictionary : " + str(res))
OutputThe original string is : abababa
Extracted frequency dictionary : {'a': 4, 'ab': 3, 'aba': 3, 'abab': 2, 'ababa': 2, 'ababab': 1, 'abababa': 1, 'b': 3, 'ba': 3, 'bab': 2, 'baba': 2, 'babab': 1, 'bababa': 1}
Time Complexity: O(n2)
Auxiliary Space: O(n)
Method #4: Using regex + findall() method
Step by step Algorithm:
- Initialize a dictionary 'd' to store substring frequencies.
- Loop through range(1, len(test_str)+1).
- For each i in range, find all substrings of length i using regex findall function.
- For each substring 'sub', update its frequency in the dictionary 'd'.
- Return the 'd' dictionary with substring frequencies.
Python3
import re
# initializing string
test_str = "abababa"
# printing original string
print("The original string is : " + str(test_str))
# using regex to count substring frequencies
d = {}
for i in range(1, len(test_str)+1):
for sub in re.findall('(?=(.{'+str(i)+'}))', test_str):
d[sub] = d.get(sub, 0) + 1
# printing result
print("Extracted frequency dictionary : " + str(d))
OutputThe original string is : abababa
Extracted frequency dictionary : {'a': 4, 'b': 3, 'ab': 3, 'ba': 3, 'aba': 3, 'bab': 2, 'abab': 2, 'baba': 2, 'ababa': 2, 'babab': 1, 'ababab': 1, 'bababa': 1, 'abababa': 1}
Time complexity: O(n^2), where n is the length of the input string. The nested loops for finding substrings and counting their frequencies contribute to the O(n^2) time complexity.
Auxiliary Space: O(n), where n is the length of the input string.
Method 5: Using a sliding window technique with a dictionary to keep track of the counts.
Step-by-step approach:
- Initialize an empty dictionary freq_dict to keep track of the substring frequencies.
- Initialize a variable n to the length of the given string test_str.
- Loop through the range of n:
- Initialize a variable window_size to i + 1.
- Loop through the range n - window_size + 1:
- Initialize a variable substring to the substring from test_str starting at the current index and having length window_size.
- If substring is already in freq_dict, increment its value by 1. Otherwise, add it to freq_dict with a value of 1.
- Return the freq_dict.
Python3
test_str = "abababa"
print("The original string is : " + str(test_str))
# using sliding window with a dictionary to count substring frequencies
freq_dict = {}
n = len(test_str)
for i in range(n):
window_size = i + 1
for j in range(n - window_size + 1):
substring = test_str[j:j+window_size]
freq_dict[substring] = freq_dict.get(substring, 0) + 1
# printing result
print("Extracted frequency dictionary : " + str(freq_dict))
OutputThe original string is : abababa
Extracted frequency dictionary : {'a': 4, 'b': 3, 'ab': 3, 'ba': 3, 'aba': 3, 'bab': 2, 'abab': 2, 'baba': 2, 'ababa': 2, 'babab': 1, 'ababab': 1, 'bababa': 1, 'abababa': 1}
Time complexity: O(n^3), since we have a nested loop over the range of n and over the range n - window_size + 1 for each window_size.
Auxiliary space: O(n^3), since we are storing all possible substrings in the dictionary.
Similar Reads
Python | Frequency of substring in given string
Finding a substring in a string has been dealt with in many ways. But sometimes, we are just interested to know how many times a particular substring occurs in a string. Let's discuss certain ways in which this task is performed. Method #1: Using count() This is a quite straightforward method in whi
6 min read
Python - Get all substrings of given string
A substring is any contiguous sequence of characters within the string. We'll discuss various methods to extract this substring from a given string by using a simple approach. Using List Comprehension :List comprehension offers a concise way to create lists by applying an expression to each element
3 min read
Python - All occurrences of substring in string
A substring is a contiguous occurrence of characters within a string. Identifying all instances of a substring is important for verifying various tasks. In this article, we will check all occurrences of a substring in String.Using re.finditer()re.finditer() returns an iterator yielding match objects
3 min read
Python - Substring presence in Strings List
Given list of substrings and list of string, check for each substring, if they are present in any of strings in List. Input : test_list1 = ["Gfg", "is", "best"], test_list2 = ["I love Gfg", "Its Best for Geeks", "Gfg means CS"] Output : [True, False, False] Explanation : Only Gfg is present as subst
5 min read
Python - Bigrams Frequency in String
Sometimes while working with Python Data, we can have problem in which we need to extract bigrams from string. This has application in NLP domains. But sometimes, we need to compute the frequency of unique bigram for data collection. The solution to this problem can be useful. Lets discuss certain w
4 min read
Python - Remove after substring in String
Removing everything after a specific substring in a string involves locating the substring and then extracting only the part of the string that precedes it. For example we are given a string s="Hello, this is a sample string" we need to remove the part of string after a particular substring includin
3 min read
Python - List Words Frequency in String
Given a List of Words, Map frequency of each to occurrence in String. Input : test_str = 'geeksforgeeks is best for geeks and best for CS', count_list = ['best', 'geeksforgeeks', 'computer'] Output : [2, 1, 0] Explanation : best has 2 occ., geeksforgeeks 1 and computer is not present in string.Input
4 min read
Python | Get matching substrings in string
The testing of a single substring in a string has been discussed many times. But sometimes, we have a list of potential substrings and check which ones occur in a target string as a substring. Let's discuss certain ways in which this task can be performed. Method #1: Using list comprehension Using l
6 min read
Python - Substring Suffix Frequency
Given a String and substring, count all the substitutes from string that can be used to complete the substring. Input : test_str = "Gfg is good . Gfg is good . Gfg is better . Gfg is good .", substr = "Gfg is" Output : {'good': 3, 'better': 1} Explanation : good occurs 3 times as suffix after substr
5 min read
Python - Frequency of K in sliced String
Given a String, find the frequency of certain characters in the index range. Input : test_str = 'geeksforgeeks is best for geeks', i = 3, j = 9, K = 'e' Output : 0 Explanation : No occurrence of 'e' between 4th [s] and 9th element Input : test_str = 'geeksforgeeks is best for geeks', i = 0, j = 9, K
6 min read