Transliterating Cyrillic into Latin alphabet with T-SQL

After sketching my CTE-based multi-REPLACE procedure, I wanted to test in on a real-life scenario. First example is a translation table to convert the cyrillic alphabet into Latin.

First we need to define the data and the translation tables using a case-sensitive cyrillic collation with COLLATE Cyrillic_General_CS_AS:

CREATE TABLE #T (
    level int identity(1,1),
    old    NVARCHAR(50) COLLATE Cyrillic_General_CS_AS,
    new    NVARCHAR(50) COLLATE Cyrillic_General_CS_AS
)

CREATE TABLE #Data (
    data    NVARCHAR(50) COLLATE Cyrillic_General_CS_AS
)

Next, we define the transliterations. See here for a list of standardized transliterations.

INSERT INTO #T VALUES(N'ый', 'y')

INSERT INTO #T VALUES(N'ЫЙ', 'Y')

INSERT INTO #T VALUES(N'а', 'a')
INSERT INTO #T VALUES(N'б', 'b')
INSERT INTO #T VALUES(N'в', 'v')
INSERT INTO #T VALUES(N'г', 'g')
INSERT INTO #T VALUES(N'д', 'd')
INSERT INTO #T VALUES(N'е', 'e')
INSERT INTO #T VALUES(N'ё', 'yo')
INSERT INTO #T VALUES(N'ж', 'zh')
INSERT INTO #T VALUES(N'з', 'z')
INSERT INTO #T VALUES(N'и', 'i')
INSERT INTO #T VALUES(N'й', 'y')
INSERT INTO #T VALUES(N'к', 'k')
INSERT INTO #T VALUES(N'л', 'l')
INSERT INTO #T VALUES(N'м', 'm')
INSERT INTO #T VALUES(N'н', 'n')
INSERT INTO #T VALUES(N'о', 'o')
INSERT INTO #T VALUES(N'п', 'p')
INSERT INTO #T VALUES(N'р', 'r')
INSERT INTO #T VALUES(N'с', 's')
INSERT INTO #T VALUES(N'т', 't')
INSERT INTO #T VALUES(N'у', 'u')
INSERT INTO #T VALUES(N'ф', 'f')
INSERT INTO #T VALUES(N'х', 'kh')
INSERT INTO #T VALUES(N'ц', 'c')
INSERT INTO #T VALUES(N'ч', 'ch')
INSERT INTO #T VALUES(N'ш', 'sh')
INSERT INTO #T VALUES(N'щ', 'shch')
INSERT INTO #T VALUES(N'ъ', '')
INSERT INTO #T VALUES(N'ы', 'y')
INSERT INTO #T VALUES(N'ь', '''')
INSERT INTO #T VALUES(N'э', 'e')
INSERT INTO #T VALUES(N'ю', 'yu')
INSERT INTO #T VALUES(N'я', 'ya')

INSERT INTO #T VALUES(N'А', 'A')
INSERT INTO #T VALUES(N'Б', 'B')
INSERT INTO #T VALUES(N'В', 'V')
INSERT INTO #T VALUES(N'Г', 'G')
INSERT INTO #T VALUES(N'Д', 'D')
INSERT INTO #T VALUES(N'Е', 'E')
INSERT INTO #T VALUES(N'Ё', 'YO')
INSERT INTO #T VALUES(N'Ж', 'ZH')
INSERT INTO #T VALUES(N'З', 'Z')
INSERT INTO #T VALUES(N'И', 'I')
INSERT INTO #T VALUES(N'Й', 'Y')
INSERT INTO #T VALUES(N'К', 'K')
INSERT INTO #T VALUES(N'Л', 'L')
INSERT INTO #T VALUES(N'М', 'M')
INSERT INTO #T VALUES(N'Н', 'N')
INSERT INTO #T VALUES(N'О', 'O')
INSERT INTO #T VALUES(N'П', 'P')
INSERT INTO #T VALUES(N'Р', 'R')
INSERT INTO #T VALUES(N'С', 'S')
INSERT INTO #T VALUES(N'Т', 'T')
INSERT INTO #T VALUES(N'У', 'U')
INSERT INTO #T VALUES(N'Ф', 'F')
INSERT INTO #T VALUES(N'Х', 'KH')
INSERT INTO #T VALUES(N'Ц', 'C')
INSERT INTO #T VALUES(N'Ч', 'CH')
INSERT INTO #T VALUES(N'Ш', 'SH')
INSERT INTO #T VALUES(N'Щ', 'SHCH')
INSERT INTO #T VALUES(N'Ъ', '')
INSERT INTO #T VALUES(N'Ы', 'Y')
INSERT INTO #T VALUES(N'Ь', '''')
INSERT INTO #T VALUES(N'Э', 'E')
INSERT INTO #T VALUES(N'Ю', 'YU')
INSERT INTO #T VALUES(N'Я', 'YA')

Note that the order of INSERT statements is relevant to the transliterated result. That’s we multi-character replacements are stated first.

I chose some random words from the Russian Wikipedia as test cases:

INSERT INTO #Data (data) VALUES (N'Добро пожаловать')
INSERT INTO #Data (data) VALUES (N'Википедия')

INSERT INTO #Data (data) VALUES (N'День Конституции Российской Федерации')
INSERT INTO #Data (data) VALUES (N'География')
INSERT INTO #Data (data) VALUES (N'Америка: Канада (Квебек) — Мексика')
INSERT INTO #Data (data) VALUES (N'Философия')

Using the transliteration CTE, we get the following results:

Америка: Канада (Квебек) — Мексика Amerika: Kanada (Kvebek) — Meksika
Википедия Vikipediya
География Geografiya
День Конституции Российской Федерации Den’ Konstitucii Rossiyskoy Federacii
Добро пожаловать Dobro pozhalovat’
Философия Filosofiya
Advertisements

2 Responses to Transliterating Cyrillic into Latin alphabet with T-SQL

  1. […] Latin alphabet using T-SQL One more example of replacing multiple strings using a CTE. After Russian / Cyrillic, it’s time for the Japanese […]

  2. Bensn says:

    Thank you! Very helpful!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: