One more example of replacing multiple strings using a CTE. After Russian / Cyrillic, it’s time for the Japanese syllabary.
SQL Server provides a couple of Japanese collations of which I found Japanese_Unicode_CI_AS useful: It is insensitive to the distinction between hiragana and katakana, but accent-sensitive to nigori and handakuten.
Let’s start with the table definitions:
CREATE TABLE #T ( level int identity(1,1), old NVARCHAR(50) COLLATE Japanese_Unicode_CI_AS, new NVARCHAR(50) COLLATE Japanese_Unicode_CI_AS ) CREATE TABLE #Data ( data NVARCHAR(50) COLLATE Japanese_Unicode_CI_AS )
and the data to be translated, taken randomly from the Japanese Wikipedia:
INSERT INTO #Data (data) VALUES (N'ひらがな') INSERT INTO #Data (data) VALUES (N'ヒラガナ') INSERT INTO #Data (data) VALUES (N'とうきょう') INSERT INTO #Data (data) VALUES (N'せんせい') INSERT INTO #Data (data) VALUES (N'ウィキペディア') INSERT INTO #Data (data) VALUES (N'ウィキポータル') INSERT INTO #Data (data) VALUES (N'ヨーロッパ') INSERT INTO #Data (data) VALUES (N'フランス') INSERT INTO #Data (data) VALUES (N'イタリア') INSERT INTO #Data (data) VALUES (N'ブルガリア') INSERT INTO #Data (data) VALUES (N'アメリカ') INSERT INTO #Data (data) VALUES (N'アジア')
The CTE as described in the previous posts:
; WITH CTE (org, calc, data, level) AS ( SELECT data, data, data, 1 FROM #Data UNION ALL SELECT CTE.org, CTE.data, CONVERT(NVARCHAR(50), REPLACE(CTE.data, #T.old, #T.new)), CTE.level + 1 FROM CTE INNER JOIN #T ON #T.level = CTE.level ) SELECT DISTINCT org, data FROM CTE WHERE level = (SELECT MAX(LEVEL) FROM CTE c WHERE CTE.org = c.org) OPTION (MAXRECURSION 1000) DROP TABLE #Data DROP TABLE #T
Since the transliteration table has more than 100 records (the default maximum recursion level is 100), the option (MAXRECURSION 1000) has to be added.
This results in the following transliteration of some Japanese kana words:
アジア | ajia |
アメリカ | amerika |
イタリア | itaria |
ウィキペディア | wikipedia |
ウィキポータル | wikipootaru |
さん | san |
せんせい | sensei |
とうきょう | toukyou |
ヒラガナ | hiragana |
フランス | furansu |
ブルガリア | burugaria |
ヨーロッパ | yooroppa |
This is the contents of the transliteration table:
We start with 2-character transliterations
INSERT INTO #T VALUES(N'きゃ', 'kya') INSERT INTO #T VALUES(N'きゅ', 'kyu') INSERT INTO #T VALUES(N'きょ', 'kyo') INSERT INTO #T VALUES(N'しゃ', 'sha') INSERT INTO #T VALUES(N'しゅ', 'shu') INSERT INTO #T VALUES(N'しょ', 'sho') INSERT INTO #T VALUES(N'ちゃ', 'cha') INSERT INTO #T VALUES(N'ちゅ', 'chu') INSERT INTO #T VALUES(N'ちょ', 'cho') INSERT INTO #T VALUES(N'てぃ', 'ti') INSERT INTO #T VALUES(N'デぃ', 'di') INSERT INTO #T VALUES(N'にゃ', 'nya') INSERT INTO #T VALUES(N'にゅ', 'nyu') INSERT INTO #T VALUES(N'にょ', 'nyo') INSERT INTO #T VALUES(N'ひゃ', 'hya') INSERT INTO #T VALUES(N'ひゅ', 'hyu') INSERT INTO #T VALUES(N'ひょ', 'hyo') INSERT INTO #T VALUES(N'みゃ', 'mya') INSERT INTO #T VALUES(N'みゅ', 'myu') INSERT INTO #T VALUES(N'みょ', 'myo') INSERT INTO #T VALUES(N'りゃ', 'rya') INSERT INTO #T VALUES(N'りゅ', 'ryu') INSERT INTO #T VALUES(N'りょ', 'ryo') INSERT INTO #T VALUES(N'ぎゃ', 'gya') INSERT INTO #T VALUES(N'ぎゅ', 'gyu') INSERT INTO #T VALUES(N'ぎょ', 'gyo') INSERT INTO #T VALUES(N'じゃ', 'ja') INSERT INTO #T VALUES(N'じゅ', 'ju') INSERT INTO #T VALUES(N'じょ', 'jo') INSERT INTO #T VALUES(N'ぢゃ', 'dja') INSERT INTO #T VALUES(N'ぢゅ', 'dju') INSERT INTO #T VALUES(N'ぢょ', 'djo') INSERT INTO #T VALUES(N'びゃ', 'bya') INSERT INTO #T VALUES(N'びゅ', 'byu') INSERT INTO #T VALUES(N'びょ', 'byo') INSERT INTO #T VALUES(N'ぴゃ', 'pya') INSERT INTO #T VALUES(N'ぴゅ', 'pyu') INSERT INTO #T VALUES(N'ぴょ', 'pyo') INSERT INTO #T VALUES(N'ウィ', 'wi')
Now the regular transliteration of the syllables:
INSERT INTO #T VALUES(N'あ', 'a') INSERT INTO #T VALUES(N'い', 'i') INSERT INTO #T VALUES(N'う', 'u') INSERT INTO #T VALUES(N'え', 'e') INSERT INTO #T VALUES(N'お', 'o') INSERT INTO #T VALUES(N'か', 'ka') INSERT INTO #T VALUES(N'き', 'ki') INSERT INTO #T VALUES(N'く', 'ku') INSERT INTO #T VALUES(N'け', 'ke') INSERT INTO #T VALUES(N'こ', 'ki') INSERT INTO #T VALUES(N'さ', 'sa') INSERT INTO #T VALUES(N'し', 'shi') INSERT INTO #T VALUES(N'す', 'su') INSERT INTO #T VALUES(N'せ', 'se') INSERT INTO #T VALUES(N'そ', 'so') INSERT INTO #T VALUES(N'た', 'ta') INSERT INTO #T VALUES(N'ち', 'chi') INSERT INTO #T VALUES(N'つ', 'tsu') INSERT INTO #T VALUES(N'て', 'te') INSERT INTO #T VALUES(N'と', 'to') INSERT INTO #T VALUES(N'な', 'na') INSERT INTO #T VALUES(N'に', 'ni') INSERT INTO #T VALUES(N'ぬ', 'nu') INSERT INTO #T VALUES(N'ね', 'ne') INSERT INTO #T VALUES(N'の', 'no') INSERT INTO #T VALUES(N'は', 'ha') INSERT INTO #T VALUES(N'ひ', 'hi') INSERT INTO #T VALUES(N'ふ', 'fu') INSERT INTO #T VALUES(N'へ', 'he') INSERT INTO #T VALUES(N'ほ', 'ho') INSERT INTO #T VALUES(N'ま', 'ma') INSERT INTO #T VALUES(N'み', 'mi') INSERT INTO #T VALUES(N'む', 'mu') INSERT INTO #T VALUES(N'め', 'me') INSERT INTO #T VALUES(N'も', 'mo') INSERT INTO #T VALUES(N'や', 'ya') INSERT INTO #T VALUES(N'ゆ', 'yu') INSERT INTO #T VALUES(N'よ', 'yo') INSERT INTO #T VALUES(N'ら', 'ra') INSERT INTO #T VALUES(N'り', 'ri') INSERT INTO #T VALUES(N'る', 'ru') INSERT INTO #T VALUES(N'れ', 're') INSERT INTO #T VALUES(N'ろ', 'ro') INSERT INTO #T VALUES(N'わ', 'wa') INSERT INTO #T VALUES(N'ゐ', 'wi') INSERT INTO #T VALUES(N'ゑ', 'we') INSERT INTO #T VALUES(N'を', 'wo') INSERT INTO #T VALUES(N'ん', 'n') INSERT INTO #T VALUES(N'が', 'ga') INSERT INTO #T VALUES(N'ぎ', 'gi') INSERT INTO #T VALUES(N'ぐ', 'gu') INSERT INTO #T VALUES(N'げ', 'ge') INSERT INTO #T VALUES(N'ご', 'go') INSERT INTO #T VALUES(N'ざ', 'za') INSERT INTO #T VALUES(N'じ', 'ji') INSERT INTO #T VALUES(N'ず', 'zu') INSERT INTO #T VALUES(N'ぜ', 'ze') INSERT INTO #T VALUES(N'ぞ', 'zo') INSERT INTO #T VALUES(N'だ', 'da') INSERT INTO #T VALUES(N'ぢ', 'dji') INSERT INTO #T VALUES(N'づ', 'dzu') INSERT INTO #T VALUES(N'で', 'de') INSERT INTO #T VALUES(N'ど', 'do') INSERT INTO #T VALUES(N'ば', 'ba') INSERT INTO #T VALUES(N'び', 'bi') INSERT INTO #T VALUES(N'ぶ', 'bu') INSERT INTO #T VALUES(N'べ', 'be') INSERT INTO #T VALUES(N'ぼ', 'bo') INSERT INTO #T VALUES(N'ぱ', 'pa') INSERT INTO #T VALUES(N'ぴ', 'pi') INSERT INTO #T VALUES(N'ぷ', 'pu') INSERT INTO #T VALUES(N'ぺ', 'pe') INSERT INTO #T VALUES(N'ぽ', 'po')
And finally we add support for small tsu and katakana vowel lengthening:
INSERT INTO #T VALUES(N'っk', 'kk') INSERT INTO #T VALUES(N'っp', 'pp') INSERT INTO #T VALUES(N'っs', 'ss') INSERT INTO #T VALUES(N'っt', 'tt') INSERT INTO #T VALUES(N'っc', 'tc') INSERT INTO #T VALUES(N'aー', 'aa') INSERT INTO #T VALUES(N'eー', 'ee') INSERT INTO #T VALUES(N'iー', 'ii') INSERT INTO #T VALUES(N'oー', 'oo') INSERT INTO #T VALUES(N'uー', 'uu')
The same here! Thank you!