Class LevenshteinDistance


  • @PublicAPI(stability=UNCOMMITTED,
               mayInstantiate=false,
               mayExtend=false,
               mayInvoke=true)
    public final class LevenshteinDistance
    extends Object
    This class provides an implementation of the Levenshtein distance algorithm, which may be used to determine the minimum number of changes required to transform one string into another. For the purpose of this algorithm, a change is defined as replacing one character with another, inserting a new character, or removing an existing character.

    The basic algorithm works as follows for a source string S of length X and a target string T of length Y:
    1. Create a matrix M with dimensions of X+1, Y+1.
    2. Fill the first row with sequentially-increasing values 0 through X.
    3. Fill the first column with sequentially-increasing values 0 through Y.
    4. Create a nested loop iterating over the characters in the strings. In the outer loop, iterate through characters in S from 0 to X-1 using an iteration counter I. In the inner loop, iterate through the characters in T from 0 to Y-1 using an iterator counter J. Calculate the following three things and place the smallest value in the matrix in row I+1 column J+1:
      • One more than the value in the matrix position immediately to the left (i.e., 1 + M[I][J+1]).
      • One more than the value in the matrix position immediately above (i.e., 1 + M[I+1][J]).
      • Define a value V to be zero if S[I] is the same as T[J], or one if they are different. Add that value to the value in the matrix position immediately above and to the left (i.e., V + M[I][J]).
    5. The Levenshtein distance value, which is the least number of changes needed to transform the source string into the target string, will be the value in the bottom right corner of the matrix (i.e., M[X][Y]).


    Note that this is a completely "clean room" implementation, developed from a description of the algorithm, rather than copying an existing implementation. Doing it in this way eliminates copyright and licensing concerns associated with using an existing implementation.
    • Method Detail

      • calculate

        public static int calculate​(String source,
                                    String target)
        Calculates the Levenshtein distance between the provided string values.
        Parameters:
        source - The source string to compare. It must not be null.
        target - The target string to compare. It must not be null.
        Returns:
        The minimum number of changes required to turn the source string into the target string.