“Why can’t I store/display [insert name] language data in SQL Server?”

I came across this and similar questions so many times on Stack Overflow:

My guess is that most of these question are due to a lack of understanding of Unicode and how the various components of an application must work together to display Unicode texts correctly.

Storage

If you store Unicode data in the database, you have to use the Unicode data types NVARCHAR(n), NVARCHAR(MAX), NTEXT (obsolete) or NCHAR (fixed-length strings). Unicode data is stored in SQL Server using the UCS-2 encoding, meaning every character uses 2 bytes to store, unless the database is created using an Supplementary Characters collation, causing the database to use the UTF-16 encoding (same encoding for the BMP, but support for all Unicode planes).

Except for the _SC collations, defining a collation does not have an influence on how Unicode data is stored.

In general, a collation defines how text data is compared and sorted. The collation can be set on database level (default for all text columns), on column level, and on operand level (i.e. the value of a comparison or ORDER BY clause) using the COLLATE clause.

Unicode Values

Unicode values can either be variables (or parameters) declared with the Unicode data types listed above, or a Unicode string literals using the N” notation. Most of the times, the questions above arise from either writing literals without the N”, or from believing that variable names must be attached the N somehow.

Let’s look at some examples:

  • SELECTing string literals
SELECT N'тест', 'тест'
тест    ????
  • Assigning (non-)Unicode values
declare @nvc1 nvarchar(50), @nvc2 nvarchar(50)
declare @vc1 varchar(50), @vc2 varchar(50)

set @nvc1 = N'тест'
set @nvc2 = 'тест'
set @vc1 = N'тест'
set @vc2 = 'тест'

print @nvc1
print @nvc2
print @vc1
print @vc2
тест
????
????
????

The Unicode value only “survives” if every assignment is Unicode-enabled.

  • Storing from C#

.Net by definition supports Unicode. If you store data into or retrieve data from a Unicode-enabled column using either SqlCommand or a typical DAL/ORM component, you should not even notice that you are dealing with Unicode rather than plain ASCII.

Just be aware that in case of SqlCommand, you better use parameterized statements and SqlParameter rather than concatenating the complete statement including parameter values. (This approach is also most common to avoid SQL injections)

Displaying Unicode data

Next problem: You successfully stored Unicode data, but SELECTing in SSMS only returns lots of question marks or boxes. The reason is simple: SSMS cannot display the characters.

But what is required actually necessary to display Unicode data:

  • a Unicode font covering the characters you want to retrieve
  • a font rendering engine as part of the operating system capable of rendering the character string correctly (especially for right-to-left languages, and some South-East Asian languages)
  • an application that makes use of both

I know from my experience that SSMS 2005 was not capable by default to display some Unicode characters (sorry, can’t remember which) even though fonts were installed. It probably had to do with the fonts defined for the editor or the result pane.

As a fact, SSMS 2008 on Windows 7 does not have any problems in the results pane, and can even display supplementary characters.

If you still have problems display Unicode characters, make sure to check whether the data is stored correctly by analyzing the numerical values of the Unicode string using the UNICODE function:

declare @t nvarchar(4000)
select @t = SomeUnicodeColumn from MyTable where ...
select unicode(substring( @t, 1, 1))
select unicode(substring( @t, 2, 2))
select unicode(substring( @t, 3, 3))
select unicode(substring( @t, 4, 4))

If the values are greater than 128 (or 256, depending on language), your data is stored as Unicode.

Supplementary Characters in SQL Server

I originally intended to write an update to my post Collation Support in SQL Server Versions to include coverage of SQL Server 2012 collations. I was surprised to find that the number of supported collation names (reduced by the various suffixes such as _CI, _CS, etc.) did not change from SQL Server 2008.

What did change though was that there is a new collation suffix named “_SC” which stands for Supplementary Character support.

Once you declare a database with an _SC collation, the various string functions in SQL Server are not only Unicode-aware, but also Supplementary Character-aware (NCHAR, UNICODE, LEN).

Compare these statements in SQL Server 2012

CREATE DATABASE test_sc COLLATE  Latin1_General_100_CI_AI_SC; 
-- 2012 collation

DECLARE @d nvarchar(10)  = N'𣅿' ;
SELECT NCHAR(0xD84C) + NCHAR(0xDD7F); 
       -- Old style method using surrogate characters
SELECT NCHAR(143743), 
       UNICODE(@d), NCHAR(UNICODE(@d)), LEN(@d), DATALENGTH(@d)

SET @d = N'𠆾𠇀𠇃';
SELECT @d, UNICODE(@d), NCHAR(UNICODE(@d)), LEN(@d), DATALENGTH(@d)

resulting in

𣅿

𣅿       143743    𣅿    1    4

𠆾𠇀𠇃    131518    𠆾    3    12

with the results on a non-SC database in SQL Server 2012 or earlier

𣅿

NULL     55372    �    2    4

𠆾𠇀𠇃    55360    �    6    12

Example data taken from MSDN NCHAR() and SolidQ.

(Note that Management Studio 2008 does not display the characters in the query window, but does correctly show them in the results window)

You find that the functions NCHAR, UNICODE, and LEN support characters for code-points outside the Unicode BMP, whereas non-SC collations handle supplementary characters using the surrogate characters. For a list of changes in semantics, see MSDN.

Display currently executing SQL Server statements

If you ever wondered if an application is performing queries against an SQL Server database, or which statements take longest and generate most CPU usage, here is a query I found using some of the DMV views:.

SELECT TOP 100 SUBSTRING(qt.text, (qs.statement_start_offset/2)+1, 
        ((CASE qs.statement_end_offset
          WHEN -1 THEN DATALENGTH(qt.text)
         ELSE qs.statement_end_offset
         END - qs.statement_start_offset)/2)+1), 
qs.execution_count, 
qs.total_logical_reads, qs.last_logical_reads,
qs.min_logical_reads, qs.max_logical_reads,
qs.total_elapsed_time, qs.last_elapsed_time,
qs.min_elapsed_time, qs.max_elapsed_time,
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE qt.encrypted=0
ORDER BY qs.total_logical_reads DESC

Change the TOP n and ORDER BY clauses as required.

To clear the contents of the DMV views, use the following statement from SSMS:

DBCC FREEPROCCACHE

Source, TechNet

 

Slow Query Performance in SqlDataReader vs. Query Analyzer / Management Studio

When working on a query for a web application, I noticed that the same query which executes within a second in Query Analyzer, would take nearly a minute when executed by a stored procedure called via SqlDataReader.ExecuteReader().

I was surprised because I had never seen such a huge difference in query execution times.

The query performs a LIKE operation on an indexed computed NVARCHAR column.

Fortunately, this SO question gave a hint at the SET ARITHABORT setting which seems to take a different value in Query Analyzer and .Net libraries.

The answer points to a deleted page which itself seems to be the answer to other questions and has thankfully been preserved in the Web Archive. (You need to Select All or disable CSS to read the text). The ARITHABORT setting is by itself a source of many questions on SO.

MSDN describes the aspect causing performance differences in my case as:

The setting of SET ARITHABORT is set at execute or run time and not at parse time.

SET ARITHABORT must be ON when creating or manipulating indexes on computed columns or indexed views.

After adding a SET ARITHABORT ON statement to the stored procedure, the query performed the same in both conditions.

For more information, see also the MSDN article SET Options That Affect Results.

Scheduling Backups on SQLServer Express

Management Studio Express does not provide a user interface to schedule backup jobs, but you can easily find a couple of solutions on the web, such as here, here, and here.

The backup solution I came up with was one .sql file each for full and incremental backup, and two .cmd files which execute the respective .sql files via sqlcmd.

The full backup SQL file looks like this:

PRINT 'backup.sql ' + CONVERT(VARCHAR, GETDATE(), 120)

DECLARE @backupSetId AS INT
DECLARE @Filename NVARCHAR(256)
DECLARE @Database NVARCHAR(256)
DECLARE @Backup NVARCHAR(256)

SET @Database = N'<name of database>'
SET @Filename = N'E:\Backups\<name of database>.' + 
    REPLACE(CONVERT(NVARCHAR, GETDATE(), 102), '.', '-') +
    N'.bak'
SET @Backup = @Database + N' Full Database Backup'

BACKUP DATABASE @Database
TO DISK = @Filename
WITH NOFORMAT, NOINIT, 
NAME = @Backup, 
SKIP, NOREWIND, NOUNLOAD, STATS = 10

SELECT    @backupSetId = position 
FROM    msdb..backupset 
WHERE    database_name = @Database 
AND        backup_set_id = 
    (SELECT MAX(backup_set_id) FROM msdb..backupset 
        WHERE database_name=@Database )

IF @backupSetId IS NULL 
    PRINT N'Verify failed. Backup information for database ''' + 
        @Database + N''' not found.'
ELSE
    RESTORE VERIFYONLY FROM  DISK = @Filename 
    WITH FILE = @backupSetId, NOUNLOAD, NOREWIND

This piece of code is repeated for every database you want to backup.

The incremental backup file has the WITH DIFFERENTIAL option in the BACKUP command, and excludes verification.

The batch file which executes the backup.sql file simply calls sqlcmd:

sqlcmd -S <your db server name> 
    -i <path to>\backup.sql 
    -o <path to>\backup.log

Next, use the Scheduled Tasks control panel application to schedule the full backup once a week, and the incremental backups daily: Use a weekly schedule and select the days of week to define when the backups should run.

Taking a Database Offline in Management Studio Express

SQL Server Management Studio Express does not provide a Take Database Offline command in the Object Explorer menu.

To take a database offline, simply type:

alter database my_database_name set offline

To check, which databases are set offline, use the following query:

select name, databaseproperty(name, 'IsOffline') as IsOffline
from sysdatabases

Management Studio + Generate Script = Funny Error

I just tried to script a database that I am working on, when Management Studio gave me this fine error message:

- Determining objects in database 'mydatabase' that will be scripted. (Error)
Messages
An item with the same key has already been added. (System.Windows.Forms)

The Progress window lists all database objects as Stopped, and it is impossible to figure out exactly which object causes to error message.

Checking the database for duplicate names (unlikely, but just to be sure)

select name, count(object_id) from sys.objects
group by name
order by count(object_id) desc

did not show any name occurring more than once, of course.

The most annoying point is, that the error is not raised because of technical problems (meaning, invalid information in the database (hopefully!)), but by an exception raised by System.Windows.Forms because a ListBoxItem (or whatever) violated some fancy UI constraint that has nothing to do with the task to be performed.

The internet knows nothing about this.

Update: Here’s the solution

Continue reading