dbscript has a built-in SQL script parser so that users can upload SQL scripts and analyze and document their database schemas.
While the parser is good enough to both extract all the relevant information from a SQL script and also skip over unsupported commands or simply plain text, I felt that the current approach would not scale well if I wanted to support more commands or different SQL dialects.
I started to play with a couple of C# classes, re-using the original dbscript T-SQL tokenizer, and soon had a small class library to define terminals, rules, and a parser class that kept everything together.
Additionally, the parser result can be output as XML.
Some parts of the parsing process turned out to be rather tough, such as the Repeater (or Loop) construct (as I said, it was playing rather than planning ahead), but I guess I now have a good framework to start implementing a T-SQL parser.
A table definition taken out of MS AdventureWorks
CREATE TABLE [Production].[TransactionHistoryArchive](
[TransactionID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[ReferenceOrderLineID] [int] NOT NULL ,
[TransactionDate] [datetime] NOT NULL ,
[TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
[ModifiedDate] [datetime] NOT NULL ,
CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
is parsed and translated into XML like this:
See the whole generated XML file here.
A “GO” command is transformed into this XML document:
This is a work-in-progress. The naming of the symbols is not final, and the parser code needs to be cleaned up and separated into parsing and TSQL functionality.