I recently starting re-writing my SQL parser for dbscript and reported about the first results.
Things always turn out more complicated than they initially appear, and one thing I realized was that a simple .sql file is not merely SQL, but also contains some “meta-magic” such as TSQL’s GO statement, which is not a valid SQL command, but really a batch separator for Query Analyzer or Management Studio.
Anyway, there is some kind of progress: I described all SQL statements that the AdventureWorks .sql scripts use, and parsing is pretty quick.
The sample statement of my previous post
CREATE TABLE [Production].[TransactionHistoryArchive](
[TransactionID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[ReferenceOrderLineID] [int] NOT NULL ,
[TransactionDate] [datetime] NOT NULL ,
[TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
[ModifiedDate] [datetime] NOT NULL ,
CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
(
[TransactionID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
) ON [PRIMARY]
is now parsed into an XML representation which looks like this:
<?xml version="1.0" encoding="Windows-1252"?>
<TSQL>
<cmdCreateTable>
<CREATE>CREATE</CREATE>
<TABLE>TABLE</TABLE>
<id>
<id2s>
<id2>[Production]</id2>
<x2E>.</x2E>
<id1>[TransactionHistoryArchive]</id1>
</id2s>
</id>
<x28>(</x28>
<columns>
<column>
<id>[TransactionID]</id>
<datatype>
<id>
<id1s>
<id1>[int]</id1>
</id1s>
</id>
</datatype>
<attr>
<notnull>
<NOT>NOT</NOT>
<NULL>NULL</NULL>
</notnull>
</attr>
</column>
<x2C>,</x2C>
....
<x2C>,</x2C>
<column>
<id>[ModifiedDate]</id>
<datatype>
<id>
<id1s>
<id1>[datetime]</id1>
</id1s>
</id>
</datatype>
<attr>
<notnull>
<NOT>NOT</NOT>
<NULL>NULL</NULL>
</notnull>
</attr>
</column>
<x2C>,</x2C>
<constraint>
<CONSTRAINT>CONSTRAINT</CONSTRAINT>
<constraintid>[PK_TransactionHistoryArchive_TransactionID]</constraintid>
<type>
<pk>
<PRIMARY>PRIMARY</PRIMARY>
<KEY>KEY</KEY>
<clustered>
<CLUSTERED>CLUSTERED</CLUSTERED>
</clustered>
<indexcolumns>
<x28>(</x28>
<indexcolumns>
<indexcolumn>
<colasc>
<columnid>[TransactionID]</columnid>
<ASC>ASC</ASC>
</colasc>
</indexcolumn>
</indexcolumns>
<x29>)</x29>
</indexcolumns>
<with>
<WITH>WITH</WITH>
<x28>(</x28>
<params>
<param>
<key>PAD_INDEX</key>
<x3D>=</x3D>
<value>OFF</value>
</param>
<x2C>,</x2C>
....
<x2C>,</x2C>
<param>
<key>ALLOW_PAGE_LOCKS</key>
<x3D>=</x3D>
<value>ON</value>
</param>
</params>
<x29>)</x29>
</with>
<onfilegroup>
<ON>ON</ON>
<filegroupid>[PRIMARY]</filegroupid>
</onfilegroup>
</pk>
</type>
</constraint>
</columns>
<x29>)</x29>
<onfilegroup>
<ON>ON</ON>
<filegroupid>[PRIMARY]</filegroupid>
</onfilegroup>
</cmdCreateTable>
</TSQL>
The complete result file is here.
So as I manage to complete parse and represent the contents of a SQL statement, the question remains how to evaluate the result:
- Analyze internal representation (messy)
- Analyze XML representation (complicated)
- Trigger events for each parsed entity and their attributes (complicated)
The problem is always that the evaluating routine seems to be de-coupled from the parser.
The first solution is too much dependent on internal representation. Changing a little aspect of the parsing mechanism or the grammar would result in breaking code.
The other two solutions are less dependent, but still a change in the grammar needs to be reflected in the evaluator, without any immediate error reporting if the evaluator does not match the parser.
I read about Xtext the other day, a DSL parser for Eclipse, which seems to include a code generator that generates Java classes to match the grammar. An interesting solution, since a change in the grammar and subsequent code generation would immediatly break compilation of the evaluator.
To be continued…