Convert Unicode Hex Codepoint to Unicode Character in SQL Server

The Unicode standard uses the notation U+[x]xxxx to identify each Unicode character, i.e. “U+” followed by 4 or 5 hex digits.

If we remove the first 2 characters (as in the UCD), only hex digits will remain, and we need to convert them to an integer. Using this HexStrToVarBinary function, we get the binary representation of the hex number, which can than be CONVERTed to an int or bigint.

Next, we use the built-in NCHAR() function to create an NCHAR value of the int.

SELECT  UniHex,
        CONVERT(INT, dbo.HexStrToVarBinary(UniHex)) AS UniInt,
        NCHAR(CONVERT(INT, dbo.HexStrToVarBinary(UniHex))) AS UniNChar
FROM    UniHexTable

Note that the NCHAR() function only works with int arguments from 0 to 65535, so surrogate characters are not handled, and NCHAR() returns NULL.

SQL Server stores Unicode data encoded as UCS-2, and therefore does not support supplementary characters (surrogate pairs).

Unicode Versions supported in SQL Server

SQL Server Unicode
2000 3.0 ?
2005 3.2
2008 5.0

(I did not find an explicit statement on SQL2000, so this is a guess based on the release of SQL Server 2000 and the release date of Unicode 3.0)

About these ads

One Response to Convert Unicode Hex Codepoint to Unicode Character in SQL Server

  1. [...] Versions supported in SQL Server Update to a previous post on SQL Server and [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 66 other followers

%d bloggers like this: