When trying to calculate the similarity of strings, the Levenshtein Distance comes up as one way to solve this problem, as it counts the number of additions, deletions and substitutions of characters to transform one string into the other.![]()
The inverse solution is to count the number of characters that are the same in both strings in the same order, known as the Longest Common Subsequence problem (LCS).
The code presented here uses the same technique as the Levenshtein algorithm in storing a two-dimensional array of integers as an NVARCHAR(MAX), i.e. a string of Unicode characters encoding the integers in a matrix.
First, we need a function Max2() to retrieve the maximum of two integers:
CREATE FUNCTION [dbo].[Max2](@a int, @b int) RETURNS INT AS BEGIN IF @a > @b RETURN @a RETURN @b END
and then the T-SQL version of the algorithm:
CREATE FUNCTION [dbo].[LCS]( @s NVARCHAR(MAX), @t NVARCHAR(MAX) )
RETURNS INT AS
BEGIN
DECLARE @d NVARCHAR(MAX), @LD INT, @m INT, @n INT, @i INT, @j INT,
@s_i NCHAR(1), @t_j NCHAR(1)
SET @n = LEN(@s)
IF @n = 0 RETURN 0
SET @m = LEN(@t)
IF @m = 0 RETURN 0
SET @d = REPLICATE(NCHAR(0),(@n+1)*(@m+1))
SET @i = 1
WHILE @i <= @n BEGIN
SET @s_i = SUBSTRING(@s,@i,1)
SET @j = 1
WHILE @j <= @m BEGIN
SET @t_j = SUBSTRING(@t,@j,1)
IF @s_i = @t_j
SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,
NCHAR(UNICODE(
SUBSTRING(@d, (@j-1)*(@n+1)+@i-1+1, 1)
)+1))
ELSE
SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,NCHAR(dbo.Max2(
UNICODE(SUBSTRING(@d,@j*(@n+1)+@i-1+1,1)),
UNICODE(SUBSTRING(@d,(@j-1)*(@n+1)+@i+1,1)))))
SET @j = @j+1
END
SET @i = @i+1
END
SET @LD = UNICODE(SUBSTRING(@d,@n*(@m+1)+@m+1,1))
RETURN @LD
END
