Convert Unicode Hex Codepoint to Unicode Character in SQL Server

July 11, 2009

The Unicode standard uses the notation U+[x]xxxx to identify each Unicode character, i.e. “U+” followed by 4 or 5 hex digits.

If we remove the first 2 characters (as in the UCD), only hex digits will remain, and we need to convert them to an integer. Using this HexStrToVarBinary function, we get the binary representation of the hex number, which can than be CONVERTed to an int or bigint.

Next, we use the built-in NCHAR() function to create an NCHAR value of the int.

SELECT  UniHex,
        CONVERT(INT, dbo.HexStrToVarBinary(UniHex)) AS UniInt,
        NCHAR(CONVERT(INT, dbo.HexStrToVarBinary(UniHex))) AS UniNChar
FROM    UniHexTable

Note that the NCHAR() function only works with int arguments from 0 to 65535, so surrogate characters are not handled, and NCHAR() returns NULL.

SQL Server stores Unicode data encoded as UCS-2, and therefore does not support supplementary characters (surrogate pairs).

Unicode Versions supported in SQL Server

SQL Server Unicode
2000 3.0 ?
2005 3.2
2008 5.0

(I did not find an explicit statement on SQL2000, so this is a guess based on the release of SQL Server 2000 and the release date of Unicode 3.0)


Managing Customers, Project Versions and Installations

July 8, 2009

If you develop software and you have a couple of customers, you will pretty soon lose track of which customer has which software version installed and when.

I now implemented a long over-due item on my todo list to manage customers, contacts, and their installation of software versions.

After you create a Customer record in dbscript, you can assign it Project Versions with shipping date and/or installation date. These Installations are visible from both the Customer record and the Project Version record.

The Project page gives you an overview of which Customer had software version installations, and which is the most current Project Version installed:

The latest version of dbscript can be downloaded here.

By the way, the current version of dbscript 0.97 also has an improved Data Diagram generator which performs better for data models with more than 100 tables.


Synching SQL Server Databases

July 6, 2009

Synchronizing developer databases and synchronizing production with development databases are recurring topics at Stackoverflow:

small dev team, and “The biggest issue that we are having now is that our DB schema is totally out of sync between all of us”

“My main concern is deploying schema + data updates to developer machines from the “master” development copy.”

Databases under version control is a more difficult problem than version control for regular source code, as you not only need

  • the SQL source of the data model, but also
  • a change script to upgrade from one version to the next,
  • and populate application tables with pre-defined values (constants).

I think that dbscript and SMOscript are two useful tools which can help you to address these issues (which is why I wrote them and use them).

SMOscript lets you script all database objects that are scriptable through SMO. You can script the objects to a single file, or as single files in subdirectories named after their type (Table, View, StoredProcedure, etc)

dbscript helps you with the other two items:

Import the current database schemas of two SQL Server databases, and have dbscript compare them. You can then select which changes (on objects and column level) to include in the change script, and have dbscript generate the migration script.

Instead of importing from a database, you can also upload a SQL file containing the schema definition.

Based on the schema data, you can then define all your system tables (those tables which hold pre-defined “magic” values), and let dbscript generate a stored procedure which will script the data as Insert/Update statements, or have the live data scripted directly by dbscript.

To keep your code in sync with database values, you can also create C# constants from your database values.

Last not least, let’s deal with the question of how to keep several programmers in sync.

dbscript keeps all data (projects, schemas, versions, objects) in a database, and lets multiple users work in the same project. Project Versions can be tagged as “development”, “test”, “beta”, “minor”, “major” and other stages. If versions are tagged, they are visible to every member of the project; if they are not tagged, they are only visible to the user who created them.

A strategy to handle multiple developers in dbscript could be implemented as follows:

  • Schemas/Versions of the master development database are tagged
  • Developer generates migration script to sync his development database with the latest master database version
  • Developer applies his changes locally
  • Developer uploads or imports private database schema without tagging the created version
  • If the master schema has changed since the developer retrieved it, apply change scripts from previous to current master development database schema.
  • Developer compares private database schema with latest master schema
  • Developer modifies value generation scripts
  • Developer generates migration script, and applies the script to the master database
  • Developer imports master schema and tags the version

dbscript is available for download here.


Standalone SMO 2008 download

July 1, 2009

The Microsoft SQL Server 2008 Feature Pack (link to April 2009 version) contains tools and libraries for SQL Server 2008, among them the latest version of SQL Server Management Objects (SMO) which support 2000, 2005, and 2008.

SMO is the technology used by my applications SMOscript and automssqlbackup to query and control SQL Server databases.


Parsing SQL into XML and C# Objects

June 29, 2009

I recently wrote about my attempts to parse SQL statements and extract schema information from them. (dbscript already has a SQL parser, but it needs to be replaced to support different SQL dialects)

As it turns out, code generation is the way to go!

I start out with an object-oriented representation of the grammar of the target language, in this case T-SQL. A (custom-written) lexer turns the input stream into a sequence of lexemes or tokens, which the parser engine feeds into the grammar’s rule definitions.

The result of this action is an XML document whose structure is equivalent to the grammatical definition of the respective SQL statement.

SQL Parser Data Model

This is where code generation magic sets in: The grammar definition is also used to generate a C# object model, mapping each rule into a C# class. Each class implements an interface requiring a SetXml() method, which reads the contents of the instantiated object from an XmlNode structure, and thus populates its child objects recursively.

The application logic (which can be seen as sort of a “parser client”) only needs to instantiate the grammar object and the parser object, feed the parser the file, and retrieve a typed object containing an OO representation of the parsed file.

The next version of dbscript will be built on these mechanisms.


Parsing SQL into XML (2)

June 25, 2009

I recently starting re-writing my SQL parser for dbscript and reported about the first results.

Things always turn out more complicated than they initially appear, and one thing I realized was that a simple .sql file is not merely SQL, but also contains some “meta-magic” such as TSQL’s GO statement, which is not a valid SQL command, but really a batch separator for Query Analyzer or Management Studio.

Anyway, there is some kind of progress: I described all SQL statements that the AdventureWorks .sql scripts use, and parsing is pretty quick.

The sample statement of my previous post

CREATE TABLE [Production].[TransactionHistoryArchive](
    [TransactionID] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL ,
    [TransactionDate] [datetime] NOT NULL ,
    [TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL ,
    CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
    (
        [TransactionID] ASC
    ) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
    ON [PRIMARY]
) ON [PRIMARY]

is now parsed into an XML representation which looks like this:

<?xml version="1.0" encoding="Windows-1252"?>
<TSQL>
  <cmdCreateTable>
    <CREATE>CREATE</CREATE>
    <TABLE>TABLE</TABLE>
    <id>
      <id2s>
        <id2>[Production]</id2>
        <x2E>.</x2E>
        <id1>[TransactionHistoryArchive]</id1>
      </id2s>
    </id>
    <x28>(</x28>
    <columns>
      <column>
        <id>[TransactionID]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[int]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
....
      <x2C>,</x2C>
      <column>
        <id>[ModifiedDate]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[datetime]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
      <constraint>
        <CONSTRAINT>CONSTRAINT</CONSTRAINT>
        <constraintid>[PK_TransactionHistoryArchive_TransactionID]</constraintid>
        <type>
          <pk>
            <PRIMARY>PRIMARY</PRIMARY>
            <KEY>KEY</KEY>
            <clustered>
              <CLUSTERED>CLUSTERED</CLUSTERED>
            </clustered>
            <indexcolumns>
              <x28>(</x28>
              <indexcolumns>
                <indexcolumn>
                  <colasc>
                    <columnid>[TransactionID]</columnid>
                    <ASC>ASC</ASC>
                  </colasc>
                </indexcolumn>
              </indexcolumns>
              <x29>)</x29>
            </indexcolumns>
            <with>
              <WITH>WITH</WITH>
              <x28>(</x28>
              <params>
                <param>
                  <key>PAD_INDEX</key>
                  <x3D>=</x3D>
                  <value>OFF</value>
                </param>
                <x2C>,</x2C>
....
                <x2C>,</x2C>
                <param>
                  <key>ALLOW_PAGE_LOCKS</key>
                  <x3D>=</x3D>
                  <value>ON</value>
                </param>
              </params>
              <x29>)</x29>
            </with>
            <onfilegroup>
              <ON>ON</ON>
              <filegroupid>[PRIMARY]</filegroupid>
            </onfilegroup>
          </pk>
        </type>
      </constraint>
    </columns>
    <x29>)</x29>
    <onfilegroup>
      <ON>ON</ON>
      <filegroupid>[PRIMARY]</filegroupid>
    </onfilegroup>
  </cmdCreateTable>
</TSQL>

The complete result file is here.

So as I manage to complete parse and represent the contents of a SQL statement, the question remains how to evaluate the result:

  • Analyze internal representation (messy)
  • Analyze XML representation (complicated)
  • Trigger events for each parsed entity and their attributes (complicated)

The problem is always that the evaluating routine seems to be de-coupled from the parser.

The first solution is too much dependent on internal representation. Changing a little aspect of the parsing mechanism or the grammar would result in breaking code.

The other two solutions are less dependent, but still a change in the grammar needs to be reflected in the evaluator, without any immediate error reporting if the evaluator does not match the parser.

I read about Xtext the other day, a DSL parser for Eclipse, which seems to include a code generator that generates Java classes to match the grammar. An interesting solution, since a change in the grammar and subsequent code generation would immediatly break compilation of the evaluator.

To be continued… ;)


Firefox 3.0.11 freezes

June 22, 2009

Today my Firefox 3 upgraded, and after restarting, it reloaded my sessions, but repeatedly froze after about 20 to 40 seconds. This seems to be a widely experienced problem.

You can downgrade to 3.0.10 by installing from here.


Listing database properties using SMO

June 21, 2009

Recently I came across a couple of questions asking how to retrieve the physical file names of databases via SMO on Stackoverflow.

These questions prompted me to add a “List Database Properties” function to SMOscript.

This new function lists all database properties accessible via SMO, the DatabaseOptions object, and the database’s FileGroups and Files.

Sample output:

C:\Projects\smoscript>smoscript -s localhost -d dbscript2dev db
Database dbscript2dev on server localhost:

Current user dbo is DBO DbOwner

Owner DOMAIN\User Created 26.03.2008 21:23
Accessible True Updatable True SystemObject False
Status Normal CompatibilityLevel Version80
Size 386 MB SpaceAvailable 148.694 kB
DataSpaceUsage 133.016 kB IndexSpaceUsage 15.984 kB
PrimaryFilePath C:\Programme\Microsoft SQL Server\MSSQL\data
Default FileGroup PRIMARY Schema dbo
Collation Latin1_General_CI_AS CaseSensitive False
FullTextEnabled False
LastBackup 20.06.2009 03:01
LastLogBackup 11.09.2008 23:50
ActiveConnections 0

Options AnsiNullDefault False AnsiNullsEnabled False AnsiPaddingEnabled False An
siWarningsEnabled False ArithmeticAbortEnabled False AutoClose False AutoCreateS
tatistics True AutoShrink False AutoUpdateStatistics True CloseCursorsOnCommitEn
abled False ConcatenateNullYieldsNull False DatabaseOwnershipChaining True Local
CursorsDefault False NumericRoundAbortEnabled False PageVerify TornPageDetection
 QuotedIdentifiersEnabled False RecoveryModel Full RecursiveTriggersEnabled True
 UserAccess Multiple

Filegroup PRIMARY
 Default True ReadOnly False Size 318.208 MB
 File dbscriptdev_Data
 Path C:\Programme\Microsoft SQL Server\MSSQL\data\dbscript2dev.mdf
 Primary True
 Size 318.208 MB MaxSize unlimited
 Used 176.768 kB Available 141.440 kB
 Growth 10 %

Logfile dbscriptdev_Log
 Path C:\Programme\Microsoft SQL Server\MSSQL\data\dbscript2dev_log.ldf
 Size 76.736 MB MaxSize unlimited
 Used 70.090 kB
 Growth 10 %

The latest version 0.12 of SMOscript is available for download here.


Installing PostgreSQL 8.3 on Windows

June 19, 2009

I wanted to try out OpenNMS for monitoring our server infrastructure, and need to install PostgreSQL as described in the Installation manual.

Everything worked as described, except for one item during the installation of PostgreSQL: Trying to execute the command

initdb -E UTF-8 -U postgres ..\data

resulted in the error message

initdb: could not access directory "../data": No such file or directory

It turns out that initdb seems to have a problem with spaces in directory names (”Program Files”). Replacing the path with short filenames (”Program~1″) solves the problem Defining a path without any spaces solves the problem:

initdb -E UTF-8 -U postgres \databases\postgresql\8.3\data

Success!


Converting WordPress Export XML to HTML

June 18, 2009

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

Based on my previous work in dbscript to generate an HTML documentation of a database schema using XML and XSL I modified the “Single HTML” XSL of dbscript to create an XSL to transform the WordPress Export XML into an HTML page.

This HTML page lists all article titles with dates, categories and link to the original URL in the blog.

Clicking the article’s title will expand the list to display the contents of the selected article.

To display paragraphs properly, I needed to replace newlines with <br /> elements, as described on this page implementing various replace operations in XSL.

The WordPress HTML XSL file is available for download here.