Writing “Funny Characters”? Make sure you use the correct Encoding!

A user of SMOscript sent me a bug report:

Some of our stored procedures have apostrophes in comments. If we export our DB and enable the “if not exists” option, our scripts become invalid, as the commented apostrophe ends the string.

He suspected SMOscript might encode the SP code incorrectly, but I made sure that was not the case, so I asked for a (redacted) sample of such a stored procedure.

I added the SP to a database, and in SSMS ran Generate Script on the procedure both with and without the IF NOT EXISTS option. In “plain” mode this simply generates the CREATE PROCEDURE statement, whereas in the IF NOT EXISTS case it will generate the IF NOT EXISTS check, and then create the SP using sp_executesql and dynamic SQL (and some people do not like this very much).

Next, I ran SMOscript with the -i switch, which activates the IF NOT EXISTS mode:

>smoscript.exe -s localhost -d mydatabase -o fooSP -i > fooSP.sql

When I opened the generated file in SSMS, and indeed, the lines that originally contained an apostrophe not contained a quote which ended the string. But I also noted that other quote characters got correctly escaped using 2 quotes.

Then it struck me: the email mentioned apostrophes, but what I saw here was quotes!

I opened the original file in Notepad++ in hex mode, and there it was: e2 80 99, the UTF-8 encoding for U+2019, formally named RIGHT SINGLE QUOTATION MARK, but apostrophe among friends 😉

Given its code point, it is obvious that this character is neither in the ASCII nor in the ASCII character set, so SMOscript has to generate Unicode or UTF-8 encoding.

Fortunately, this functionality is already built-in: use -T for Unicode, or -U for UTF-8 encoding:

>smoscript.exe -s localhost -d mydatabase -o fooSP -i -U -f .\fooSP.sql

Note that I use the -f switch to directly write the output to a file, since the console might not be able to display Unicode characters, or it might, and I’m not smart enough 😉 Anyway, directly creating the file from SMOscript frees you from the hassles of piping “funny” characters.

In hindsight, what happened was that SMOscript wrote the output to the console (via piping), and, using the native code page, the console was not able to correctly output the apostrophe character, and replaced it with the quote character, thus breaking the string parameter for sp_executesql.

You should always be aware that if you translate between code pages, not every character can be mapped to a character in the target code page, and some mappings will cause difficulties.

Fortunately, this problem was quite easy to solve.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.