## Calculating the Length of the Longest Common Subsequence in TSQL

When trying to calculate the similarity of strings, the Levenshtein Distance comes up as one way to solve this problem, as it counts the number of additions, deletions and substitutions of characters to transform one string into the other. The inverse solution is to count the number of characters that are the same in both strings in the same order, known as the Longest Common Subsequence problem (LCS).

The code presented here uses the same technique as the Levenshtein algorithm in storing a two-dimensional array of integers as an NVARCHAR(MAX), i.e. a string of Unicode characters encoding the integers in a matrix.

First, we need a function Max2() to retrieve the maximum of two integers:

```CREATE FUNCTION [dbo].[Max2](@a int, @b int)
RETURNS INT AS
BEGIN
IF @a > @b
RETURN @a
RETURN @b
END```

and then the T-SQL version of the algorithm:

```CREATE FUNCTION [dbo].[LCS]( @s NVARCHAR(MAX), @t NVARCHAR(MAX) )
RETURNS INT AS
BEGIN
DECLARE @d NVARCHAR(MAX), @LD INT, @m INT, @n INT, @i INT, @j INT,
@s_i NCHAR(1), @t_j NCHAR(1)

SET @n = LEN(@s)
IF @n = 0 RETURN 0

SET @m = LEN(@t)
IF @m = 0 RETURN 0

SET @d = REPLICATE(NCHAR(0),(@n+1)*(@m+1))

SET @i = 1
WHILE @i <= @n BEGIN
SET @s_i = SUBSTRING(@s,@i,1)

SET @j = 1
WHILE @j <= @m BEGIN
SET @t_j = SUBSTRING(@t,@j,1)

IF @s_i = @t_j

SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,
NCHAR(UNICODE(
SUBSTRING(@d, (@j-1)*(@n+1)+@i-1+1, 1)
)+1))

ELSE

SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,NCHAR(dbo.Max2(
UNICODE(SUBSTRING(@d,@j*(@n+1)+@i-1+1,1)),
UNICODE(SUBSTRING(@d,(@j-1)*(@n+1)+@i+1,1)))))

SET @j = @j+1
END
SET @i = @i+1
END

SET @LD = UNICODE(SUBSTRING(@d,@n*(@m+1)+@m+1,1))
RETURN @LD
END```

This site uses Akismet to reduce spam. Learn how your comment data is processed.