Multiple String REPLACEs with T-SQL CTEs

I had to solve the problem to update data in a table of NVARCHAR strings where several substrings had to be replaced with their respective counterparts, or translations.

Of course the easy way is to apply as many REPLACE functions as you need, but why not try a more generic way?

The string table in its simplest form looks like this:

CREATE TABLE #Data (
	data NVARCHAR(50)
)

INSERT INTO #Data (data) VALUES ('banana')
INSERT INTO #Data (data) VALUES ('apples')

and these are the translations:

CREATE TABLE #T (
    old	NVARCHAR(50),
    new NVARCHAR(50)
)

INSERT INTO #T VALUES('p', 'Q')
INSERT INTO #T VALUES('s', 'Z')
INSERT INTO #T VALUES('a', 'A')

I came up with the following CTE statement:

The CTE stores the original strings, their intermediate translations, and their final translations. A level counter is used to find the entry with the highest count of translations for each string, indicating the final result.

The CTE starts with the original data in all three columns, and level set to zero.

From the original data, all REPLACE results of all translatable texts are calculated. Replacing strings are compared using a case-sensitive collation:

WITH CTE (org, calc, DATA, level) AS
(
    SELECT data, data, data, 0
    FROM	#Data

    UNION ALL

    SELECT CTE.org, CTE.data,
        CONVERT(NVARCHAR(50), REPLACE(CTE.data, #T.old, #T.new)), CTE.level + 1
    FROM	CTE
    INNER JOIN #T ON CTE.data LIKE '%' + #T.old + '%'
        COLLATE Latin1_General_CS_AS
)
SELECT DISTINCT org, data, level
FROM CTE
WHERE level =
    (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)

Finally, clean up the temp tables:

DROP TABLE #Data
DROP TABLE #T

This code returns the expected result:

org data level
apples AQQleZ 3
banana bAnAnA 1

3 Responses to Multiple String REPLACEs with T-SQL CTEs

  1. […] String REPLACEs with T-SQL CTEs (2) My algorithm on replacing multiple strings in an SQL Server table works in principle, but soon after publishing the article, I noticed that it […]

  2. Nathan Skerl says:

    You can also try a set based approach. I use this often whenever I find myself coding nested replace calls:

    update d
    set data = replace(d.data, t.old, t.new)
    from #Data d
    cross
    apply #T t
    where charindex(t.old, d.data) > 0;

  3. Nathan says:

    using your example,

    set @data = ‘apples’

    select @data = replace(@data, old, new)
    from @t
    where charindex(old, @data) > 0;

    select @data

    You get the idea 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: