I originally intended to write an update to my post Collation Support in SQL Server Versions to include coverage of SQL Server 2012 collations. I was surprised to find that the number of supported collation names (reduced by the various suffixes such as _CI, _CS, etc.) did not change from SQL Server 2008.
What did change though was that there is a new collation suffix named “_SC” which stands for Supplementary Character support.
Once you declare a database with an _SC collation, the various string functions in SQL Server are not only Unicode-aware, but also Supplementary Character-aware (NCHAR, UNICODE, LEN).
Compare these statements in SQL Server 2012
CREATE DATABASE test_sc COLLATE Latin1_General_100_CI_AI_SC; -- 2012 collation DECLARE @d nvarchar(10) = N'𣅿' ; SELECT NCHAR(0xD84C) + NCHAR(0xDD7F); -- Old style method using surrogate characters SELECT NCHAR(143743), UNICODE(@d), NCHAR(UNICODE(@d)), LEN(@d), DATALENGTH(@d) SET @d = N'𠆾𠇀𠇃'; SELECT @d, UNICODE(@d), NCHAR(UNICODE(@d)), LEN(@d), DATALENGTH(@d)
resulting in
𣅿 𣅿 143743 𣅿 1 4 𠆾𠇀𠇃 131518 𠆾 3 12
with the results on a non-SC database in SQL Server 2012 or earlier
𣅿 NULL 55372 � 2 4 𠆾𠇀𠇃 55360 � 6 12
Example data taken from MSDN NCHAR() and SolidQ.
(Note that Management Studio 2008 does not display the characters in the query window, but does correctly show them in the results window)
You find that the functions NCHAR, UNICODE, and LEN support characters for code-points outside the Unicode BMP, whereas non-SC collations handle supplementary characters using the surrogate characters. For a list of changes in semantics, see MSDN.
Pingback: Supplementary Characters in C# « devioblog
Pingback: Unicode Versions supported in SQL Server (2000-2012) « devioblog
Pingback: “Why can’t I store/display [insert name] language data in SQL Server?” « devioblog
Pingback: Length of UTF-8 VARCHAR in SQL Server | devioblog