Supplementary Characters in SQL Server

I originally intended to write an update to my post Collation Support in SQL Server Versions to include coverage of SQL Server 2012 collations. I was surprised to find that the number of supported collation names (reduced by the various suffixes such as _CI, _CS, etc.) did not change from SQL Server 2008.

What did change though was that there is a new collation suffix named “_SC” which stands for Supplementary Character support.

Once you declare a database with an _SC collation, the various string functions in SQL Server are not only Unicode-aware, but also Supplementary Character-aware (NCHAR, UNICODE, LEN).

Compare these statements in SQL Server 2012

CREATE DATABASE test_sc COLLATE  Latin1_General_100_CI_AI_SC; 
-- 2012 collation

DECLARE @d nvarchar(10)  = N'𣅿' ;
SELECT NCHAR(0xD84C) + NCHAR(0xDD7F); 
       -- Old style method using surrogate characters
SELECT NCHAR(143743), 
       UNICODE(@d), NCHAR(UNICODE(@d)), LEN(@d), DATALENGTH(@d)

SET @d = N'𠆾𠇀𠇃';
SELECT @d, UNICODE(@d), NCHAR(UNICODE(@d)), LEN(@d), DATALENGTH(@d)

resulting in

𣅿

𣅿       143743    𣅿    1    4

𠆾𠇀𠇃    131518    𠆾    3    12

with the results on a non-SC database in SQL Server 2012 or earlier

𣅿

NULL     55372    �    2    4

𠆾𠇀𠇃    55360    �    6    12

Example data taken from MSDN NCHAR() and SolidQ.

(Note that Management Studio 2008 does not display the characters in the query window, but does correctly show them in the results window)

You find that the functions NCHAR, UNICODE, and LEN support characters for code-points outside the Unicode BMP, whereas non-SC collations handle supplementary characters using the surrogate characters. For a list of changes in semantics, see MSDN.

4 thoughts on “Supplementary Characters in SQL Server

  1. Pingback: Supplementary Characters in C# « devioblog

  2. Pingback: Unicode Versions supported in SQL Server (2000-2012) « devioblog

  3. Pingback: “Why can’t I store/display [insert name] language data in SQL Server?” « devioblog

  4. Pingback: Length of UTF-8 VARCHAR in SQL Server | devioblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.