Parsing SQL into XML and C# Objects

June 29, 2009

I recently wrote about my attempts to parse SQL statements and extract schema information from them. (dbscript already has a SQL parser, but it needs to be replaced to support different SQL dialects)

As it turns out, code generation is the way to go!

I start out with an object-oriented representation of the grammar of the target language, in this case T-SQL. A (custom-written) lexer turns the input stream into a sequence of lexemes or tokens, which the parser engine feeds into the grammar’s rule definitions.

The result of this action is an XML document whose structure is equivalent to the grammatical definition of the respective SQL statement.

SQL Parser Data Model

This is where code generation magic sets in: The grammar definition is also used to generate a C# object model, mapping each rule into a C# class. Each class implements an interface requiring a SetXml() method, which reads the contents of the instantiated object from an XmlNode structure, and thus populates its child objects recursively.

The application logic (which can be seen as sort of a “parser client”) only needs to instantiate the grammar object and the parser object, feed the parser the file, and retrieve a typed object containing an OO representation of the parsed file.

The next version of dbscript will be built on these mechanisms.


Parsing SQL into XML (2)

June 25, 2009

I recently starting re-writing my SQL parser for dbscript and reported about the first results.

Things always turn out more complicated than they initially appear, and one thing I realized was that a simple .sql file is not merely SQL, but also contains some “meta-magic” such as TSQL’s GO statement, which is not a valid SQL command, but really a batch separator for Query Analyzer or Management Studio.

Anyway, there is some kind of progress: I described all SQL statements that the AdventureWorks .sql scripts use, and parsing is pretty quick.

The sample statement of my previous post

CREATE TABLE [Production].[TransactionHistoryArchive](
    [TransactionID] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL ,
    [TransactionDate] [datetime] NOT NULL ,
    [TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL ,
    CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
    (
        [TransactionID] ASC
    ) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
    ON [PRIMARY]
) ON [PRIMARY]

is now parsed into an XML representation which looks like this:

<?xml version="1.0" encoding="Windows-1252"?>
<TSQL>
  <cmdCreateTable>
    <CREATE>CREATE</CREATE>
    <TABLE>TABLE</TABLE>
    <id>
      <id2s>
        <id2>[Production]</id2>
        <x2E>.</x2E>
        <id1>[TransactionHistoryArchive]</id1>
      </id2s>
    </id>
    <x28>(</x28>
    <columns>
      <column>
        <id>[TransactionID]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[int]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
....
      <x2C>,</x2C>
      <column>
        <id>[ModifiedDate]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[datetime]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
      <constraint>
        <CONSTRAINT>CONSTRAINT</CONSTRAINT>
        <constraintid>[PK_TransactionHistoryArchive_TransactionID]</constraintid>
        <type>
          <pk>
            <PRIMARY>PRIMARY</PRIMARY>
            <KEY>KEY</KEY>
            <clustered>
              <CLUSTERED>CLUSTERED</CLUSTERED>
            </clustered>
            <indexcolumns>
              <x28>(</x28>
              <indexcolumns>
                <indexcolumn>
                  <colasc>
                    <columnid>[TransactionID]</columnid>
                    <ASC>ASC</ASC>
                  </colasc>
                </indexcolumn>
              </indexcolumns>
              <x29>)</x29>
            </indexcolumns>
            <with>
              <WITH>WITH</WITH>
              <x28>(</x28>
              <params>
                <param>
                  <key>PAD_INDEX</key>
                  <x3D>=</x3D>
                  <value>OFF</value>
                </param>
                <x2C>,</x2C>
....
                <x2C>,</x2C>
                <param>
                  <key>ALLOW_PAGE_LOCKS</key>
                  <x3D>=</x3D>
                  <value>ON</value>
                </param>
              </params>
              <x29>)</x29>
            </with>
            <onfilegroup>
              <ON>ON</ON>
              <filegroupid>[PRIMARY]</filegroupid>
            </onfilegroup>
          </pk>
        </type>
      </constraint>
    </columns>
    <x29>)</x29>
    <onfilegroup>
      <ON>ON</ON>
      <filegroupid>[PRIMARY]</filegroupid>
    </onfilegroup>
  </cmdCreateTable>
</TSQL>

The complete result file is here.

So as I manage to complete parse and represent the contents of a SQL statement, the question remains how to evaluate the result:

  • Analyze internal representation (messy)
  • Analyze XML representation (complicated)
  • Trigger events for each parsed entity and their attributes (complicated)

The problem is always that the evaluating routine seems to be de-coupled from the parser.

The first solution is too much dependent on internal representation. Changing a little aspect of the parsing mechanism or the grammar would result in breaking code.

The other two solutions are less dependent, but still a change in the grammar needs to be reflected in the evaluator, without any immediate error reporting if the evaluator does not match the parser.

I read about Xtext the other day, a DSL parser for Eclipse, which seems to include a code generator that generates Java classes to match the grammar. An interesting solution, since a change in the grammar and subsequent code generation would immediatly break compilation of the evaluator.

To be continued… ;)


Firefox 3.0.11 freezes

June 22, 2009

Today my Firefox 3 upgraded, and after restarting, it reloaded my sessions, but repeatedly froze after about 20 to 40 seconds. This seems to be a widely experienced problem.

You can downgrade to 3.0.10 by installing from here.


Listing database properties using SMO

June 21, 2009

Recently I came across a couple of questions asking how to retrieve the physical file names of databases via SMO on Stackoverflow.

These questions prompted me to add a “List Database Properties” function to SMOscript.

This new function lists all database properties accessible via SMO, the DatabaseOptions object, and the database’s FileGroups and Files.

Sample output:

C:\Projects\smoscript>smoscript -s localhost -d dbscript2dev db
Database dbscript2dev on server localhost:

Current user dbo is DBO DbOwner

Owner DOMAIN\User Created 26.03.2008 21:23
Accessible True Updatable True SystemObject False
Status Normal CompatibilityLevel Version80
Size 386 MB SpaceAvailable 148.694 kB
DataSpaceUsage 133.016 kB IndexSpaceUsage 15.984 kB
PrimaryFilePath C:\Programme\Microsoft SQL Server\MSSQL\data
Default FileGroup PRIMARY Schema dbo
Collation Latin1_General_CI_AS CaseSensitive False
FullTextEnabled False
LastBackup 20.06.2009 03:01
LastLogBackup 11.09.2008 23:50
ActiveConnections 0

Options AnsiNullDefault False AnsiNullsEnabled False AnsiPaddingEnabled False An
siWarningsEnabled False ArithmeticAbortEnabled False AutoClose False AutoCreateS
tatistics True AutoShrink False AutoUpdateStatistics True CloseCursorsOnCommitEn
abled False ConcatenateNullYieldsNull False DatabaseOwnershipChaining True Local
CursorsDefault False NumericRoundAbortEnabled False PageVerify TornPageDetection
 QuotedIdentifiersEnabled False RecoveryModel Full RecursiveTriggersEnabled True
 UserAccess Multiple

Filegroup PRIMARY
 Default True ReadOnly False Size 318.208 MB
 File dbscriptdev_Data
 Path C:\Programme\Microsoft SQL Server\MSSQL\data\dbscript2dev.mdf
 Primary True
 Size 318.208 MB MaxSize unlimited
 Used 176.768 kB Available 141.440 kB
 Growth 10 %

Logfile dbscriptdev_Log
 Path C:\Programme\Microsoft SQL Server\MSSQL\data\dbscript2dev_log.ldf
 Size 76.736 MB MaxSize unlimited
 Used 70.090 kB
 Growth 10 %

The latest version 0.12 of SMOscript is available for download here.


Installing PostgreSQL 8.3 on Windows

June 19, 2009

I wanted to try out OpenNMS for monitoring our server infrastructure, and need to install PostgreSQL as described in the Installation manual.

Everything worked as described, except for one item during the installation of PostgreSQL: Trying to execute the command

initdb -E UTF-8 -U postgres ..\data

resulted in the error message

initdb: could not access directory "../data": No such file or directory

It turns out that initdb seems to have a problem with spaces in directory names (“Program Files”). Replacing the path with short filenames (“Program~1″) solves the problem Defining a path without any spaces solves the problem:

initdb -E UTF-8 -U postgres \databases\postgresql\8.3\data

Success!


Converting WordPress Export XML to HTML

June 18, 2009

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

Based on my previous work in dbscript to generate an HTML documentation of a database schema using XML and XSL I modified the “Single HTML” XSL of dbscript to create an XSL to transform the WordPress Export XML into an HTML page.

This HTML page lists all article titles with dates, categories and link to the original URL in the blog.

Clicking the article’s title will expand the list to display the contents of the selected article.

To display paragraphs properly, I needed to replace newlines with <br /> elements, as described on this page implementing various replace operations in XSL.

The WordPress HTML XSL file is available for download here.


Documenting Oracle Databases

June 10, 2009

The latest version of dbscript 0.96 now supports importing schema information from Oracle databases.

Version 0.96 introduces the separation of various database engines (previous versions assumed MSSQL Server only), and a dbscript Project must now explicitly define which database engine it is based on (MSSQL or Oracle).

In the database properties form for “live” databases, you can define the Oracle connection in two ways:

  • set “Server” to the instance name listed inside tnsnames.ora and leave “Catalog” empty
  • set “Server” to server:port, and “Catalog” to the service name (as in tnsnames.ora)

A new field called “Import Options” allows you to define the scope of the schema import:

  • “USER” (the default value)
  • “ALL:” + list of user names
  • “DBA:” + list of user names

The user specified in “Username” needs to have access to the ALL_ and DBA_ catalog views, respectively.

Sample Output based on the HR Demo Database

Wiki documentation of Oracle Demo Schema

Single HTML documentation of Oracle Demo Schema

Data Diagram of Oracle Demo Schema:

To access Oracle databases, dbscript requires ODP.Net from the Oracle Data Access Components 11g to be installed on the web server.

dbscript is available for download here.