I came across this and similar questions so many times on Stack Overflow:
- Saving russian text in SQL server
- Displaying hebrew characters with SQL collation SQL_Latin1_General_CP850_BIN2
- SQL Server Express 2005 unicode characters
- Cannot insert russian text into SQL server 2005 db
- How to save Unicode text using stored procedure in SQL Server 2008
- Why are my Chinese characters not displayed correctly in c# string
- Copy/Paste Chinese character into SQL2005, displays as box
- Persian numbers in SQL Server 2005
- Loss of Unicode data – due to collation change?
- T-SQL Replace “cyrillic symbols” returns ‘????’
- Unicode characters causing issues in SQL Server 2005 string comparison
My guess is that most of these question are due to a lack of understanding of Unicode and how the various components of an application must work together to display Unicode texts correctly.
Storage
If you store Unicode data in the database, you have to use the Unicode data types NVARCHAR(n), NVARCHAR(MAX), NTEXT (obsolete) or NCHAR (fixed-length strings). Unicode data is stored in SQL Server using the UCS-2 encoding, meaning every character uses 2 bytes to store, unless the database is created using an Supplementary Characters collation, causing the database to use the UTF-16 encoding (same encoding for the BMP, but support for all Unicode planes).
Except for the _SC collations, defining a collation does not have an influence on how Unicode data is stored.
In general, a collation defines how text data is compared and sorted. The collation can be set on database level (default for all text columns), on column level, and on operand level (i.e. the value of a comparison or ORDER BY clause) using the COLLATE clause.
Unicode Values
Unicode values can either be variables (or parameters) declared with the Unicode data types listed above, or a Unicode string literals using the N” notation. Most of the times, the questions above arise from either writing literals without the N”, or from believing that variable names must be attached the N somehow.
Let’s look at some examples:
- SELECTing string literals
SELECT N'тест', 'тест' тест ????
- Assigning (non-)Unicode values
declare @nvc1 nvarchar(50), @nvc2 nvarchar(50) declare @vc1 varchar(50), @vc2 varchar(50) set @nvc1 = N'тест' set @nvc2 = 'тест' set @vc1 = N'тест' set @vc2 = 'тест' print @nvc1 print @nvc2 print @vc1 print @vc2 тест ???? ???? ????
The Unicode value only “survives” if every assignment is Unicode-enabled.
- Storing from C#
.Net by definition supports Unicode. If you store data into or retrieve data from a Unicode-enabled column using either SqlCommand or a typical DAL/ORM component, you should not even notice that you are dealing with Unicode rather than plain ASCII.
Just be aware that in case of SqlCommand, you better use parameterized statements and SqlParameter rather than concatenating the complete statement including parameter values. (This approach is also most common to avoid SQL injections)
Displaying Unicode data
Next problem: You successfully stored Unicode data, but SELECTing in SSMS only returns lots of question marks or boxes. The reason is simple: SSMS cannot display the characters.
But what is required actually necessary to display Unicode data:
- a Unicode font covering the characters you want to retrieve
- a font rendering engine as part of the operating system capable of rendering the character string correctly (especially for right-to-left languages, and some South-East Asian languages)
- an application that makes use of both
I know from my experience that SSMS 2005 was not capable by default to display some Unicode characters (sorry, can’t remember which) even though fonts were installed. It probably had to do with the fonts defined for the editor or the result pane.
As a fact, SSMS 2008 on Windows 7 does not have any problems in the results pane, and can even display supplementary characters.
If you still have problems display Unicode characters, make sure to check whether the data is stored correctly by analyzing the numerical values of the Unicode string using the UNICODE function:
declare @t nvarchar(4000) select @t = SomeUnicodeColumn from MyTable where ... select unicode(substring( @t, 1, 1)) select unicode(substring( @t, 2, 2)) select unicode(substring( @t, 3, 3)) select unicode(substring( @t, 4, 4))
If the values are greater than 128 (or 256, depending on language), your data is stored as Unicode.
Pingback: Writing “Funny Characters”? Make sure you use the correct Encoding! « devioblog