Java Program to Implement Levenshtein Distance Computing Algorithm
Last Updated :
21 Feb, 2022
The Levenshtein distance also called the Edit distance, is the minimum number of operations required to transform one string to another.
Typically, three types of operations are performed (one at a time) :
- Replace a character.
- Delete a character.
- Insert a character.
Examples:
Input: str1 = "glomax", str2 = "folmax"
Output: 3
str1 is converted to str2 by replacing 'g' with 'o', deleting the second 'o', and inserting 'f' at the beginning. There is no way to do it with fewer than three edits.
Input: s1 = "GIKY", s2 = "GEEKY"
Output: 2
s1 is converted to s2 by inserting 'E' right after 'G',and replacing 'I' with 'E'.
This problem can be done in two ways :
- Using Recursion.
- Using Dynamic Programming.
Method 1: Recursive Approach
Let's consider by taking an example
Given two strings s1 = "sunday" and s2 = "saturday". We want to convert "sunday" into "saturday" with minimum edits.
- Consider 'i' and 'j' as the upper-limit indices of substrings generated using s1 and s2.
- Let us pick i = 2 and j = 4 i.e. prefix strings are ‘su’ and ‘satu’ respectively (assume the strings indices start at 1). The rightmost characters can be aligned in three different possible ways.
- Possible Case 1: Align the characters ‘u’ and ‘u’. They are equal, no edit is required. We still left with the problem of i = 1 and j = 3, so we should proceed to find Levenshtein distance(i-1, j-1).
- Possible Case 2 (Deletion): Align the right character from the first string and no character from the second string. We need a deletion here. We still left with problem of i = 1 and j = 4, so we should proceed finding Levenshtein distance(i-1, j).
- Possible Case 3 (Insertion): Align the right character from the second string and no character from the first string. We need an insertion here. We still left with the problem of i = 2 and j = 3, so we should proceed to find Levenshtein distance(i, j-1).
- We assume that the character to be inserted in the first string is the same as the right character of the second string.
- Possible Case 4 (Replacement): Align right characters from the first string as well as from the second string. We need a substitution here. We still left with problem of i = 1 and j = 3, so we should proceed finding Levenshtein distance(i-1, j-1).
- We assume that the replaced character in the first string is the same as the right character of the second string.
- We have to find the minimum of all the possible three cases.
Recursive Implementation:
Java
// Java implementation of recursive Levenshtein distance
// calculation
import java.util.*;
class LevenshteinDistanceRecursive {
static int compute_Levenshtein_distance(String str1,
String str2)
{
// If str1 is empty, all
// characters of str2 are
// inserted into str1, which is
// of the only possible method of
// conversion with minimum
// operations.
if (str1.isEmpty())
{
return str2.length();
}
// If str2 is empty, all
// characters of str1 are
// removed, which is the
// only possible
// method of conversion with minimum
// operations.
if (str2.isEmpty())
{
return str1.length();
}
// calculate the number of distinct characters to be
// replaced in str1
// by recursively traversing each substring
int replace = compute_Levenshtein_distance(
str1.substring(1), str2.substring(1))
+ NumOfReplacement(str1.charAt(0),str2.charAt(0));
// calculate the number of insertions in str1
// recursively
int insert = compute_Levenshtein_distance(
str1, str2.substring(1))+ 1;
// calculate the number of deletions in str1
// recursively
int delete = compute_Levenshtein_distance(
str1.substring(1), str2)+ 1;
// returns minimum of three operations
return minm_edits(replace, insert, delete);
}
static int NumOfReplacement(char c1, char c2)
{
// check for distinct characters
// in str1 and str2
return c1 == c2 ? 0 : 1;
}
static int minm_edits(int... nums)
{
// receives the count of different
// operations performed and returns the
// minimum value among them.
return Arrays.stream(nums).min().orElse(
Integer.MAX_VALUE);
}
// Driver Code
public static void main(String args[])
{
String s1 = "glomax";
String s2 = "folmax";
System.out.println(compute_Levenshtein_distance(s1, s2));
}
}
Time Complexity: O(3^n) because at each step, we branch-off into three recursive calls. Here, 'n' is the length of the first string.
Method 2: Dynamic Programming Approach
If we draw the recursion tree of the above solution, we can see that the same sub-problems are getting computed again and again. We know that Dynamic Programming comes to the picture when subproblem solutions can be memoized rather than computed again and again.
- The Memoized version follows the top-down approach since we first break the problems into subproblems and then calculate and store values.
- We can also solve this problem in a bottom-up approach. In a bottom-up manner, we solve the sub-problems first, then solve larger sub-problems from them.
Dynamic Programming Implementation (Optimised approach)
Java
// Java implementation of Levenshtein distance calculation
// Using Dynamic Programming (Optimised solution)
import java.util.*;
class LevenshteinDistanceDP {
static int compute_Levenshtein_distanceDP(String str1,
String str2)
{
// A 2-D matrix to store previously calculated
// answers of subproblems in order
// to obtain the final
int[][] dp = new int[str1.length() + 1][str2.length() + 1];
for (int i = 0; i <= str1.length(); i++)
{
for (int j = 0; j <= str2.length(); j++) {
// If str1 is empty, all characters of
// str2 are inserted into str1, which is of
// the only possible method of conversion
// with minimum operations.
if (i == 0) {
dp[i][j] = j;
}
// If str2 is empty, all characters of str1
// are removed, which is the only possible
// method of conversion with minimum
// operations.
else if (j == 0) {
dp[i][j] = i;
}
else {
// find the minimum among three
// operations below
dp[i][j] = minm_edits(dp[i - 1][j - 1]
+ NumOfReplacement(str1.charAt(i - 1),str2.charAt(j - 1)), // replace
dp[i - 1][j] + 1, // delete
dp[i][j - 1] + 1); // insert
}
}
}
return dp[str1.length()][str2.length()];
}
// check for distinct characters
// in str1 and str2
static int NumOfReplacement(char c1, char c2)
{
return c1 == c2 ? 0 : 1;
}
// receives the count of different
// operations performed and returns the
// minimum value among them.
static int minm_edits(int... nums)
{
return Arrays.stream(nums).min().orElse(
Integer.MAX_VALUE);
}
// Driver Code
public static void main(String args[])
{
String s1 = "glomax";
String s2 = "folmax";
System.out.println(compute_Levenshtein_distanceDP(s1, s2));
}
}
Time Complexity: O(m*n), where m is the length of the first string, and n is the length of the second string.
Auxiliary Space: O(m*n), as the matrix used in the above implementation has dimensions m*n.
Applications:
- Spell Checkers.
- Speech Recognition.
- DNA Analysis.
Similar Reads
Java Program to Implement Bitap Algorithm for String Matching The Bitap Algorithm is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern. Here approximately equal states that if the substring and pattern are within a given distance k of each other. The algorit
5 min read
Java Program to Implement of Gabow Scaling Algorithm Gabow's Algorithm is a scaling algorithm that aims in solving a problem by initially considering only the highest order bit of each relevant input value (such as an edge weight). Then it refines the initial solution by looking at the two highest-order bits. It progressively looks at more and more hi
4 min read
Java Program to Implement LinkedHashSet API The LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements. When the iteration order is needed to be maintained, this class is used. When iterating through a HashSet, the order is unpredictable, while a LinkedHashSet lets us iterate through the element
4 min read
Java Program to Implement Wagner and Fisher Algorithm for Online String Matching The Wagner-Fischer Algorithm is a dynamic programming algorithm that measures the Levenshtein distance or the edit distance between two strings of characters. Levenshtein Distance(LD) calculates how similar are the two strings. The distance is calculated by three parameters to transform the string1
3 min read
Java Program to Implement ZhuâTakaoka String Matching Algorithm Zhu-Takaoka String Matching Algorithm is a Variant of Boyer Moore Algorithm for Pattern Matching in a String. There is a slight change in the concept of Bad Maps in this algorithm. The concept of Good Suffixes remains as same as that of Boyer Moore's but instead of using a single character for Bad S
7 min read
How to Calculate the Levenshtein Distance Between Two Strings in Java Using Recursion? In Java, the Levenshtein Distance Algorithm is a pre-defined method used to measure the similarity between two strings and it can be used to calculate the minimum number of single-character edits (inserts, deletions, or substitutions) required to change one string into another. Prerequisites: Recurs
4 min read
Implementing Rabin Karp Algorithm Using Rolling Hash in Java There are so many pattern searching algorithms for the string. KMP algorithm, Z algorithm Rabin Karp algorithm, etc these algorithms are the optimization of Naive Pattern searching Algorithm. Naive Pattern Searching Algorithm: Input : "AABACACAACAC" Pattern : "CAC" Output : [4,9] AABACACAACAC Implem
5 min read
Java Program To Find Longest Common Prefix Using Sorting Problem Statement: Given a set of strings, find the longest common prefix.Examples: Input: {"geeksforgeeks", "geeks", "geek", "geezer"} Output: "gee" Input: {"apple", "ape", "april"} Output: "ap" The longest common prefix for an array of strings is the common prefix between 2 most dissimilar strings
2 min read
Java Program For Comparing Two Strings Represented As Linked Lists Given two strings, represented as linked lists (every character is a node in a linked list). Write a function compare() that works similar to strcmp(), i.e., it returns 0 if both strings are the same, 1 if the first linked list is lexicographically greater, and -1 if the second string is lexicograph
2 min read
Java Program To Find Longest Common Prefix Using Word By Word Matching Given a set of strings, find the longest common prefix. Examples: Input : {âgeeksforgeeksâ, âgeeksâ, âgeekâ, âgeezerâ} Output : "gee" Input : {"apple", "ape", "april"} Output : "ap"Recommended: Please solve it on âPRACTICE â first, before moving on to the solution. We start with an example. Suppose
5 min read