# Calculating Levenshtein Distance in TSQL

I tried to find a method to compare strings according to their similarity, and first came across the Levenshtein distance which defines the distance (or degree of similarity) between two strings as the minimum number of additions, deletions, and substitutions of single characters needed to transform one string into the other. I found this implementation of the Levenshtein algorithm in T-SQL, but noted a couple of errors:

First, the function returns a VARCHAR result, where you would expect an INT.

Next, due to the restriction of parameters and variables to VARCHAR(50) and VARCHAR(100), only strings with a limited number of characters could be compared. (The code may have been written before the introduction of VARCHAR(MAX)).

Furthermore, the distance matrix is stored in an array of CHAR, which only allows for a maximum difference of 255 characters to be handled correctly.

The Levenshtein algorithm requires a function to find the minimum of 3 integers:

```create function [dbo].[min3](@a int, @b int, @c int)
returns int as
begin
declare @min int
set @min = @a
if @b < @min set @min = @b
if @c < @min set @min = @c
return @min
end
```

And this is the code:

```CREATE FUNCTION [dbo].[LEVENSHTEIN]( @s NVARCHAR(MAX), @t NVARCHAR(MAX) )
/*
Levenshtein Distance Algorithm: TSQL Implementation
by Joseph Gama
http://www.merriampark.com/ldtsql.htm

Returns the Levenshtein Distance between strings s1 and s2.
Original developer: Michael Gilleland http://www.merriampark.com/ld.htm
Translated to TSQL by Joseph Gama

Fixed by Herbert Oppolzer / devio
as described in https://devio.wordpress.com/2010/09/07/calculating-levenshtein-distance-in-tsql
*/
RETURNS INT AS
BEGIN
DECLARE @d NVARCHAR(MAX), @LD INT, @m INT, @n INT, @i INT, @j INT,
@s_i NCHAR(1), @t_j NCHAR(1),@cost INT

--Step 1
SET @n = LEN(@s)
SET @m = LEN(@t)
SET @d = REPLICATE(NCHAR(0),(@n+1)*(@m+1))
IF @n = 0
BEGIN
SET @LD = @m
GOTO done
END
IF @m = 0
BEGIN
SET @LD = @n
GOTO done
END

--Step 2
SET @i = 0
WHILE @i <= @n BEGIN
SET @d = STUFF(@d,@i+1,1,NCHAR(@i))        --d(i, 0) = i
SET @i = @i+1
END

SET @i = 0
WHILE @i <= @m BEGIN
SET @d = STUFF(@d,@i*(@n+1)+1,1,NCHAR(@i))    --d(0, j) = j
SET @i = @i+1
END

--Step 3
SET @i = 1
WHILE @i <= @n BEGIN
SET @s_i = SUBSTRING(@s,@i,1)

--Step 4
SET @j = 1
WHILE @j <= @m BEGIN
SET @t_j = SUBSTRING(@t,@j,1)
--Step 5
IF @s_i = @t_j
SET @cost = 0
ELSE
SET @cost = 1
--Step 6
SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,
NCHAR(dbo.MIN3(
UNICODE(SUBSTRING(@d,@j*(@n+1)+@i-1+1,1))+1,
UNICODE(SUBSTRING(@d,(@j-1)*(@n+1)+@i+1,1))+1,
UNICODE(SUBSTRING(@d,(@j-1)*(@n+1)+@i-1+1,1))+@cost)
))
SET @j = @j+1
END
SET @i = @i+1
END

--Step 7
SET @LD = UNICODE(SUBSTRING(@d,@n*(@m+1)+@m+1,1))

done:
RETURN @LD
END
```

## 4 thoughts on “Calculating Levenshtein Distance in TSQL”

1. Pingback: TSQL String Functions « devioblog

2. Works like a charm, thanks

This site uses Akismet to reduce spam. Learn how your comment data is processed.