Supplementary Characters in SQL Server

I originally intended to write an update to my post Collation Support in SQL Server Versions to include coverage of SQL Server 2012 collations. I was surprised to find that the number of supported collation names (reduced by the various suffixes such as _CI, _CS, etc.) did not change from SQL Server 2008.

What did change though was that there is a new collation suffix named “_SC” which stands for Supplementary Character support.

Once you declare a database with an _SC collation, the various string functions in SQL Server are not only Unicode-aware, but also Supplementary Character-aware (NCHAR, UNICODE, LEN).

Compare these statements in SQL Server 2012

CREATE DATABASE test_sc COLLATE  Latin1_General_100_CI_AI_SC; 
-- 2012 collation

DECLARE @d nvarchar(10)  = N'𣅿' ;
       -- Old style method using surrogate characters
SELECT NCHAR(143743), 

SET @d = N'𠆾𠇀𠇃';

resulting in


𣅿       143743    𣅿    1    4

𠆾𠇀𠇃    131518    𠆾    3    12

with the results on a non-SC database in SQL Server 2012 or earlier


NULL     55372    �    2    4

𠆾𠇀𠇃    55360    �    6    12

Example data taken from MSDN NCHAR() and SolidQ.

(Note that Management Studio 2008 does not display the characters in the query window, but does correctly show them in the results window)

You find that the functions NCHAR, UNICODE, and LEN support characters for code-points outside the Unicode BMP, whereas non-SC collations handle supplementary characters using the surrogate characters. For a list of changes in semantics, see MSDN.

3 Responses to Supplementary Characters in SQL Server

  1. […] by my posting about supplementary characters in SQL Server 2012, I checked the capabilities of .Net to support supplementary […]

  2. […] refer to a Unicode version number. Even if there is not change, SQL Server 2012 introduced native support for supplementary characters. Like this:LikeBe the first to like this […]

  3. […] in SQL Server using the UCS-2 encoding, meaning every character uses 2 bytes to store, unless the database is created using an Supplementary Characters collation, causing the database to use the UTF-16 encoding (same encoding for the BMP, but support for all […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: