I tried to find a method to compare strings according to their similarity, and first came across the Levenshtein distance which defines the distance (or degree of similarity) between two strings as the minimum number of additions, deletions, and substitutions of single characters needed to transform one string into the other.![]()
I found this implementation of the Levenshtein algorithm in T-SQL, but noted a couple of errors:
First, the function returns a VARCHAR result, where you would expect an INT.
Next, due to the restriction of parameters and variables to VARCHAR(50) and VARCHAR(100), only strings with a limited number of characters could be compared. (The code may have been written before the introduction of VARCHAR(MAX)).
Furthermore, the distance matrix is stored in an array of CHAR, which only allows for a maximum difference of 255 characters to be handled correctly.
The Levenshtein algorithm requires a function to find the minimum of 3 integers:
create function [dbo].[min3](@a int, @b int, @c int) returns int as begin declare @min int set @min = @a if @b < @min set @min = @b if @c < @min set @min = @c return @min end
And this is the code:
CREATE FUNCTION [dbo].[LEVENSHTEIN]( @s NVARCHAR(MAX), @t NVARCHAR(MAX) )
/*
Levenshtein Distance Algorithm: TSQL Implementation
by Joseph Gama
http://www.merriampark.com/ldtsql.htm
Returns the Levenshtein Distance between strings s1 and s2.
Original developer: Michael Gilleland http://www.merriampark.com/ld.htm
Translated to TSQL by Joseph Gama
Fixed by Herbert Oppolzer / devio
as described in http://devio.wordpress.com/2010/09/07/calculating-levenshtein-distance-in-tsql
*/
RETURNS INT AS
BEGIN
DECLARE @d NVARCHAR(MAX), @LD INT, @m INT, @n INT, @i INT, @j INT,
@s_i NCHAR(1), @t_j NCHAR(1),@cost INT
--Step 1
SET @n = LEN(@s)
SET @m = LEN(@t)
SET @d = REPLICATE(NCHAR(0),(@n+1)*(@m+1))
IF @n = 0
BEGIN
SET @LD = @m
GOTO done
END
IF @m = 0
BEGIN
SET @LD = @n
GOTO done
END
--Step 2
SET @i = 0
WHILE @i <= @n BEGIN
SET @d = STUFF(@d,@i+1,1,NCHAR(@i)) --d(i, 0) = i
SET @i = @i+1
END
SET @i = 0
WHILE @i <= @m BEGIN
SET @d = STUFF(@d,@i*(@n+1)+1,1,NCHAR(@i)) --d(0, j) = j
SET @i = @i+1
END
--Step 3
SET @i = 1
WHILE @i <= @n BEGIN
SET @s_i = SUBSTRING(@s,@i,1)
--Step 4
SET @j = 1
WHILE @j <= @m BEGIN
SET @t_j = SUBSTRING(@t,@j,1)
--Step 5
IF @s_i = @t_j
SET @cost = 0
ELSE
SET @cost = 1
--Step 6
SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,
NCHAR(dbo.MIN3(
UNICODE(SUBSTRING(@d,@j*(@n+1)+@i-1+1,1))+1,
UNICODE(SUBSTRING(@d,(@j-1)*(@n+1)+@i+1,1))+1,
UNICODE(SUBSTRING(@d,(@j-1)*(@n+1)+@i-1+1,1))+@cost)
))
SET @j = @j+1
END
SET @i = @i+1
END
--Step 7
SET @LD = UNICODE(SUBSTRING(@d,@n*(@m+1)+@m+1,1))
done:
RETURN @LD
END

[...] String Functions While searching for appropriate string comparison functions in TSQL, I came across these pages implementing a couple [...]
[...] Longest Common Subsequence in TSQL When trying to calculate the similarity of strings, the Levenshtein Distance comes up as one way to solve this problem, as it counts the number of additions, deletions and [...]