Why do I need NVARCHAR columns?

I came across a (MSSQL) database that almost only contained VARCHAR columns, even though the target audience was expected to come from all over the European Union.

In my projects, I am used to define NVARCHAR columns for all values that can be written by the users, and use VARCHAR only for strings that are known to be ASCII, such as program identifiers, URLs, or email addresses.

I tried to show the disadvantage of using VARCHAR by inputing accented characters, expecting a default Western collation, only to fail insofar as that the characters were displayed correctly. I found later that the collation had been explicitly set to a character set support these accented characters.

I prepared more thoroughly for the next time, and wrote a little TSQL script to illustrate my point:

create table #t (
	id int identity,
	sA varchar(100),	-- insert varchar
	sN varchar(100),	-- insert nvarchar (result same as varchar)
	sC varchar(100) collate Croatian_CI_AS,	-- insert nvarchar
	n nvarchar(100)

declare @i int
declare @s varchar(100)	-- implicit collation of database
declare @n nvarchar(100)
set @s = ''
set @n = ''

set @i = 32
while @i < 1000 begin
	set @s = @s + nchar(@i)
	set @n = @n + nchar(@i)
	set @i = @i + 1

	if (len(@s)=32) begin
		insert into #t (sA, n, sN, sC) values (@s, @n, @n, @n)

		print @s
		print @n

		set @s = ''
		set @n = ''

select id as Row, sA as DBCollated, sC as HRCollated, n as Unicoded
from #t
order by id

drop table #t

Table #T contains a couple of text columns:

  • sA varchar with database collation is set by a varchar variable
  • sN varchar with database collation is set by an nvarchar variable
  • sC varchar with explicit collation is also set by an nvarchar variable
  • n nvarchar is set by an nvarchar variable

In the loop, I construct strings with 32 Unicode characters, ranging from code point 32 to code point 1000 (that’s just arbitrary codes), and insert the same value into the 4 columns.

The findings of this experiment are:

VARCHAR variables have the same collation as the database. You cannot declare a COLLATE clause in the variable definition. Thus sA and sN contain the same values, as would be expected by mapping Unicode characters onto a code page.

The resulting table looks like this (only 16 characters per row for better readability):

Database Collation Explicit Collation Unicode
1 !”#$%&'()*+,-./ !”#$%&'()*+,-./ !”#$%&'()*+,-./
2 0123456789:;<=>? 0123456789:;<=>? 0123456789:;<=>?
5 `abcdefghijklmno `abcdefghijklmno `abcdefghijklmno
6 pqrstuvwxyz{|}~ pqrstuvwxyz{|}~ pqrstuvwxyz{|}~
7 ????????????? ??ƒ????ˆ??????? €‚ƒ„…†‡ˆ‰Š‹ŒŽ
8 ?????????????? ???????˜??????? ‘’“”•–—˜™š›œžŸ
9 ¡¢£¤¥¦§¨©ª«¬­®¯ !cL¤Y¦§¨©a«¬­®— ¡¢£¤¥¦§¨©ª«¬­®¯
10 °±²³´µ¶·¸¹º»¼½¾¿ °±23´µ¶·¸1o»113? °±²³´µ¶·¸¹º»¼½¾¿
13 àáâãäåæçèéêëìíîï aáâaäaaçeéeëiíîi àáâãäåæçèéêëìíîï
14 ðñòóôõö÷øùúûüýþÿ ?noóôoö÷ouúuüý?y ðñòóôõö÷øùúûüýþÿ
15 AaAaAaCcCcCcCcDd AaĂ㥹ĆćCcCcČčĎď ĀāĂ㥹ĆćĈĉĊċČčĎď
16 ÐdEeEeEeEeEeGgGg ĐđEeEeEeĘęĚěGgGg ĐđĒēĔĕĖėĘęĚěĜĝĞğ
17 GgGgHhHhIiIiIiIi GgGgHhHhIiIiIiIi ĠġĢģĤĥĦħĨĩĪīĬĭĮį
18 Ii??JjKk?LlLlLl? Ii??JjKk?ĹĺLlĽľ? İıIJijĴĵĶķĸĹĺĻļĽľĿ
19 ?LlNnNnNn???OoOo ?ŁłŃńNnŇň???OoOo ŀŁłŃńŅņŇňʼnŊŋŌōŎŏ
20 OoŒœRrRrRrSsSsSs ŐőOoŔŕRrŘřŚśSsŞş ŐőŒœŔŕŖŗŘřŚśŜŝŞş
21 ŠšTtTtTtUuUuUuUu ŠšŢţŤťTtUuUuUuŮů ŠšŢţŤťŦŧŨũŪūŬŭŮů
22 UuUuWwYyŸZzZzŽž? ŰűUuWwYyYŹźŻżŽž? ŰűŲųŴŵŶŷŸŹźŻżŽžſ
23 b????????Ð?????? b????????Đ?????? ƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏ
24 ?ƒƒ????I??l????O ?Ff????I??l????O ƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟ
25 Oo?????????t??TU Oo?????????t??TU ƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯ
26 u?????z????????? u?????z????????? ưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿ
27 |??!?????????AaI |??!?????????AaI ǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏ
28 iOoUuUuUuUuUu?Aa iOoUuUuUuUuUu?Aa ǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟ
29 ????GgGgKkOoOo?? ????GgGgKkOoOo?? ǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯ
30 j??????????????? j??????????????? ǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿ
31 ???????????????? ???????????????? ȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏ
32 ???????????????? ???????????????? ȐȑȒȓȔȕȖȗȘșȚțȜȝȞȟ
33 ???????????????? ???????????????? ȠȡȢȣȤȥȦȧȨȩȪȫȬȭȮȯ
34 ???????????????? ???????????????? ȰȱȲȳȴȵȶȷȸȹȺȻȼȽȾȿ
35 ???????????????? ???????????????? ɀɁɂɃɄɅɆɇɈɉɊɋɌɍɎɏ
36 ???????????????? ???????????????? ɐɑɒɓɔɕɖɗɘəɚɛɜɝɞɟ
37 ?g?????????????? ?g?????????????? ɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯ
38 ???????????????? ???????????????? ɰɱɲɳɴɵɶɷɸɹɺɻɼɽɾɿ
39 ???????????????? ???????????????? ʀʁʂʃʄʅʆʇʈʉʊʋʌʍʎʏ
40 ???????????????? ???????????????? ʐʑʒʓʔʕʖʗʘʙʚʛʜʝʞʟ
41 ???????????????? ???????????????? ʠʡʢʣʤʥʦʧʨʩʪʫʬʭʮʯ
42 ?????????'”?’??? ?????????'”‘’??? ʰʱʲʳʴʵʶʷʸʹʺʻʼʽʾʿ
43 ????^?ˆ?’¯´`?_?? ????^?^ˇ’Ż´`?_?? ˀˁ˂˃˄˅ˆˇˈˉˊˋˌˍˎˏ
44 ??????????°?˜??? ????????˘˙°˛~˝?? ːˑ˒˓˔˕˖˗˘˙˚˛˜˝˞˟
45 ???????????????? ???????????????? ˠˡˢˣˤ˥˦˧˨˩˪˫ˬ˭ˮ˯
46 ???????????????? ???????????????? ˰˱˲˳˴˵˶˷˸˹˺˻˼˽˾˿
47 `´^~¯¯??¨?°???”? `´^~ŻŻ˘˙¨?°?ˇ?”? ̀́̂̃̄̅̆̇̈̉̊̋̌̍̎̏
48 ???????????????? ???????????????? ̛̖̗̘̙̜̝̞̟̐̑̒̓̔̕̚
49 ???????¸???????? ???????¸???????? ̡̢̧̨̠̣̤̥̦̩̪̫̬̭̮̯
50 ?__????????????? ?__????????????? ̴̵̶̷̸̰̱̲̳̹̺̻̼̽̾̿
51 ???????????????? ???????????????? ͇͈͉͍͎̀́͂̓̈́͆͊͋͌ͅ͏
52 ???????????????? ???????????????? ͓͔͕͖͙͚͐͑͒͗͛͘͜͟͝͞
53 ???????????????? ???????????????? ͣͤͥͦͧͨͩͪͫͬͭͮͯ͢͠͡
54 ??????????????;? ??????????????;? ͰͱͲͳʹ͵Ͷͷ͸͹ͺͻͼͽ;Ϳ
55 ???????????????? ???????????????? ΀΁΂΃΄΅Ά·ΈΉΊ΋Ό΍ΎΏ
56 ???G????T??????? ???????????????? ΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟ
57 ???S??F??O?????? ???????????????? ΠΡ΢ΣΤΥΦΧΨΩΪΫάέήί
58 ?aß?de??????µ??? ??ß?????????µ??? ΰαβγδεζηθικλμνξο
59 p??st?f????????? ???????????????? πρςστυφχψωϊϋόύώϏ
60 ???????????????? ???????????????? ϐϑϒϓϔϕϖϗϘϙϚϛϜϝϞϟ

One can easily see that the original Unicode values are translated into VARCHAR values according to the COLLATION setting. Some characters not contained in the collation are mapped onto characters without accents, others are simply replaced by question marks.

Use this code if you need to explain the differences between VARCHAR and NVARCHAR, and why using NVARCHAR is not a bad idea.

3 thoughts on “Why do I need NVARCHAR columns?

  1. Pingback: Bad (MS SQL Server) habits to kick « devioblog

  2. If your business rules dictate only ASCII characters, varchar has the advantage of taking less space and being more efficient. When we converted from nvarchar to varchar our data processing applications had a decrease in processing time of over 30%.

  3. Pingback: Length of UTF-8 VARCHAR in SQL Server | devioblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.