New SQL Parser in dbscript

November 30, 2009

I wrote about my plans to add a new SQL parser engine into dbscript a couple of months ago. Now the time has come to actually implement it for T-SQL (MS SQL Server; Oracle and PostgreSQL will follow in future versions), and I found that I did not foresee all the consequences of my initial intent. The overall architecture remained the same though.

The grammar definition allows to define attributes on non-terminals. Using the SQL Server versions as attributes (2005, 2008), I can mark commands or clauses as their support or feature has been introduced in a specific version, and store the database version as property of the uploaded Project Version.

The parser skips the parts of the uploaded file that it cannot parse, and writes these parts to the upload log. Up to now, the parser would simply fail if it considered the SQL file somehow invalid.

The (C#) object representation of a parsed SQL command has a boolean flag IsHandled for each non-terminal. The code processing the object representation needs to mark every non-terminal object as being handled (i.e. translated into schema information stored in the database). Objects that have not been flagged will also be listed in the upload log.

Next, the new parser allowed me to rewrite the dependency analysis. Until now, dbscript only analyzed view dependencies to order the CREATE VIEW statements. Dependency analysis has now been extended to all database objects.

One more major issue that has been solved is parsing and interpreting EXECUTE statements. Thus an EXEC sp_addextendedproperty is interpreted as adding a description to a database object.

This obviously led to changes in the XML generation of database schemas. Each object now lists its descriptions and the dependencies on and references from other database objects.

The XSL stylesheets which translate a project version XML into markup or HTML have been revised to reflect the additional information in the generated XML.

All taken together, the new parser adds:

  • better feedback on which parts of the uploaded SQL file have been processed
  • dependency analysis
  • object and column descriptions

See the following links documenting AdventureWorks OLTP 2008 (version 2005) for the effects of the new functionality:

Single HTML file documentation

MediaWiki documentation

ScrewTurn Wiki documentation

In both wikis, compare the (old-style) “wikibot” section with the new section “automatically generated” to see the changes.

The next version of dbscript with the mythical version number “1.0″ will be released soon ;)


Checking MS SQL Server Stored Procedures

November 26, 2009

When you create stored procedures, MS SQL Server does not really compile them and check them against the system catalog, as for example Oracle does.

This makes it nearly impossible for TSQL developers to make sure their code executes without faults in production environments.

My approach to check stored procedures is to execute them with all parameters set to NULL inside a transaction that will be rolled back after stored procedure execution.

BEGIN TRANSACTION
EXEC (@storedproc)
ROLLBACK TRANSACTION

The simplest strategy is to iterate through all procedures, and execute them using this code inside a TRY … CATCH block. If an error is raised, the CATCH block logs the error.

However, when I ran the code in an actual development environment with lots of SPs (around 120), two problems occurred that needed to be dealt with:

  • If any SQL statement run in Management Studio creates more than 100 result sets, you get the error message

The query has exceeded the maximum number of result sets that can be displayed in the results grid. Only the first 100 result sets are displayed in the grid.

  • Therefore, we stored the results of analysed procedures in a temporary table, which needs to be created only once if it does not yet exist
  • If you execute several stored procedures in a row, and each one probably has some imperfect transaction handling code, you need to make sure to rollback all open transactions.
/*
DROP TABLE #p
DELETE FROM #p
*/

SET ANSI_WARNINGS OFF

IF OBJECT_ID('tempdb..#p') IS NULL
  CREATE TABLE #p (
    ProcedureName sysname COLLATE Latin1_General_CI_AS,
    ErrorNumber int,
    ErrorSeverity int,
    ErrorState int,
    ErrorProcedure sysname COLLATE Latin1_General_CI_AS NULL,
    ErrorLine int,
    ErrorMessage NVARCHAR(2048) COLLATE Latin1_General_CI_AS
  )

DECLARE @proc sysname, @params VARCHAR(1000), @exec VARCHAR(1000)

DECLARE c CURSOR for
SELECT	p.name, SUBSTRING(REPLICATE(', null', COUNT(pa.name)), 3, 1000)
FROM sys.procedures p
LEFT OUTER JOIN sys.parameters pa  ON  p.object_id = pa.object_id
LEFT OUTER JOIN #p ON p.name = #p.ProcedureName
WHERE	p.name LIKE '%stored-procedure-name-filter%'
AND	#p.ProcedureName IS null
GROUP BY p.name
ORDER BY p.name

OPEN c
FETCH c INTO @proc, @params

WHILE @@FETCH_STATUS = 0 BEGIN

  PRINT @proc

  WHILE @@TRANCOUNT > 0
    ROLLBACK

  INSERT INTO #p (ProcedureName) VALUES (@proc)

  BEGIN TRANSACTION

  BEGIN TRY
    SET @exec = 'EXEC ' + @proc + ' ' + ISNULL(@params, '')
    EXEC (@exec)
    ROLLBACK TRANSACTION
  END TRY
  BEGIN CATCH
    WHILE @@TRANCOUNT > 0
      ROLLBACK

    UPDATE #p
    SET ErrorNumber = ERROR_NUMBER(),
      ErrorSeverity = ERROR_SEVERITY(),
      ErrorState = ERROR_STATE(),
      ErrorProcedure = ERROR_PROCEDURE(),
      ErrorLine = ERROR_LINE(),
      ErrorMessage = ERROR_MESSAGE()
    WHERE ProcedureName = @proc
  END CATCH

  FETCH c INTO @proc, @params
END

CLOSE c
DEALLOCATE c

SELECT * FROM #p
WHERE ErrorNumber IS NOT NULL
AND ErrorNumber <> 515

Some final comments:

  • I left the DROP and DELETE statements in TSQL comments to make it easier to clear or delete the temp table.
  • The collation declarations are necessary for the JOIN condition if your working db and your temp db have different collations.
  • If the script exits with the 100-result-sets error, rerun it until it only shows the check result for all stored procedures.
  • ErrorNumber 515 is the error indicating a NULL value has been inserted into a not-nullable column. Since the script creates this error condition, we need not list it.
  • Temp table name #p is used here for brevity. You should use a name which is never used as temp table in your TSQL code.

Constants in MS SQL Server

November 23, 2009

Even though TSQL does not allow constant declarations per se, there are several ways to handle constant values in SQL Server databases.

The simplest one is to document special values (apart from keeping a list on a piece of paper which drowns in the sediments on your desk) is to list them in the Description property of a column, although this is not what databases are made for.

Next is a check constraint on columns which only allow specified values and cause INSERT/UPDATE statements to fail if the new value does not match the constraint.

My favorite method is to have a table of constant values with a numeric identifier, a programmatic identifier for use in a programming language, and a natural-language string to be displayed to the user.

My application dbscript supports this method and generates code to link C# code to the values stored in the database. I described this mechanism earlier here and here.

As an example, take a simple table describing command execution states:

ID Name ProgID
10 Command Pending Pending
20 Command Running Running
30 Command Done Done

This table can be represented in C# as

public class ExecutionState
{
    public const int Pending = 10;
    public const int Running = 20;
    public const int Done = 30;
}

One disadvantage remains though: if you want to use the constant value in TSQL (for example, in a stored procedure), you still have to know the numeric value, or retrieve the numeric value by its programmatic identifier. In both cases, there is no way to check automatically that the values or identifiers actually exist.

Yet another method came to my mind recently: create a single-row view with the column names representing the constant identifiers and the fields containing the constant values:

CREATE VIEW V_Execution_State AS
SELECT 10 AS Pending, 20 AS Running, 30 AS Done

This way, constant identifiers can be checked by TSQL, and values easily assigned, like this:

DECLARE @state INT
SELECT @state = Pending FROM V_Execution_State

By the way, this is post # 200 ;)


Maximum Capacity Specifications for SQL Server

November 23, 2009

From MSDN:

Specification ↓ \ SQL Server Version → 7 / 2000 2005 2008
Maximum Capacity Specifications Article Article Article
Maximum Numbers of Processors Article Article
Maximum Amount of Physical Memory Article Article
Hardware and Software Requirements Article Article
http://msdn.microsoft.com/en-us/library/ms143506%28SQL.90%29.aspx

dbscript Videos

November 21, 2009

I created a couple of introductory videos describing dbscript covering topics previously handled on this blog or in the online help:

The videos have been created using CamStudio (screen recorder) and VirtualDub (avi editor) and ffmpeg (avi to flv converter). FlowPlayer is embedded by a Joomla plug-in.

The videos can be watched here. dbscript is available for download here.


SMOscript 0.15

November 21, 2009

Version 0.15 of SMOscript fixes a bug when scripting the CREATE statements for AD-based users.

The problem resulted from the assumption that the name of every SQL Server database object can be taken as the file name for the generated script, which is for AD users only true if there is a directory named as the domain of those users.

Version 0.15 now translates every character in an object name that is not a valid filename character into a dot.

The lastest version of SMOscript is available for download here.


Protecting SQL Server Data from Accidental Deletion

October 30, 2009

In nearly every project that I worked on in recent years, the database not only stores the data maintained by the application, but also describes (parts of) the application itself. If you ever had to implement a permission system which grants users to view or edit tables or open forms or execute functions, you already know that.

The resulting problem is that if the application relies on certain key data to be present and correct, accidental modification or deletion of that data usually causes the application to fail.

I try to show how to use triggers to prevent accidental data modification.

Prevent table data deletion

The simplest way to prevent data deletion is to have an INSTEAD OF DELETE trigger which does nothing, or simply raises an error:

CREATE TRIGGER [dbo].[Prevent_Foo_Delete]
    ON [dbo].[Foo]
    INSTEAD OF DELETE
AS
BEGIN
    SET NOCOUNT ON;
    RAISERROR('You cannot delete Foo',1,0);
END

If you really need to delete data, use the DISABLE TRIGGER (2005, 2008) and ENABLE TRIGGER (2005, 2008) commands.

Conditional deletion prevention

In this case, deletion should only be allowed under certain conditions. For example, we could allow to only delete single records:

CREATE TRIGGER [dbo].[Prevent_Foo_Multi_Delete]
   ON  [dbo].[Foo]
   INSTEAD OF DELETE
AS
BEGIN
    SET NOCOUNT ON;

    IF (SELECT COUNT(ID) FROM deleted) > 1
        RAISERROR('You can only delete a single Foo',1,0);
    ELSE
        DELETE Foo
        FROM Foo INNER JOIN deleted ON Foo.ID = deleted.ID

Similarly, one could prevent the deletion of detail records to only a single master record by writing

    IF (SELECT COUNT(DISTINCT BAR_ID) FROM deleted) > 1

Preventing Modifications

The same method can be used for UPDATE triggers. It may be, however, easier to define an ON UPDATE trigger to avoid rephrase the UPDATE statement in an INSTEAD OF trigger. In case of failure, we rollback the current transaction:

CREATE TRIGGER [dbo].[Prevent_Foo_Update]
   ON  [dbo].[Foo]
   FOR UPDATE
AS
BEGIN
      SET NOCOUNT ON;

      IF (SELECT COUNT(ID) FROM inserted) > 1 BEGIN
            ROLLBACK TRANSACTION
            RAISERROR('You can only modify 1 Foo',1,0);
      END
END

Preventing Truncation

These mechanisms prevent you from an accidental UPDATE or DELETE on all records (e.g. by a missing WHERE clause, or semicolon in front of the WHERE condition).

However, there is still the TRUNCATE TABLE command which deletes all data in a table and cannot be stopped by a DELETE trigger:

Because TRUNCATE TABLE is not logged, it cannot activate a trigger.

The rescue shows in the preceding sentence:

You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint

Simply have a table that references the tables to be protected:

CREATE TABLE Prevent_Truncate(
    Foo_ID INT REFERENCES Foo(ID),
    Bar_ID INT REFERENCES Bar(ID)
)

You only appreciate how valuable your data is once it’s lost ;)


dbscript New Version 0.99

October 20, 2009

The latest version 0.99 of dbscript has been released today providing new functionality and a couple of fixes.

Data diagrams looked a bit distorted if the data model contained circular foreign key constraints. I sketched the problem in my article on cycle detection, and the data diagram now excludes circular foreign keys in the calculation of the tables’ positions.

Comparison results can be restricted to “scopes”, such as new objects only, dropped objects only, etc. This makes it easier to generate schema migration scripts without dropping objects, for example.

Documentation Generators provide a preview to the generated content, and the generated XML now contains the project and project version identifiers to enable linking and referencing in the generator’s output.

Scripting a table in the object’s Generate/Create page now includes all constraints and indexes. (The project version script always included child objects). The same applies to object comparisons of tables, so that changes to indexes etc are easily identifiable.

New Functions

Besides generating .png data diagrams, dbscript now has the capability to generate data diagrams for Dia, an open-source diagrammer. The layout routine is the same as for png’s, but the output is Dia’s native XML format. Generating for Dia means that developers can freely layout and edit the diagram according to their needs, and export it to other formats. I described this feature earlier, and included samples.

Schema comparison is one basic feature of dbscript, and the new version compares multiple versions in one operation. After defining which schema versions to compare, you get a comparison matrix showing the number of differences between any two versions.

If the selected versions are versions of the same schema at different points of time, the comparison timeline shows each object ever changing in any of the versions, along with an indicator of the change.

Within a project, you can define Branches (as known from version control systems) and assign project versions to a branch. This alone would not be too overwhelming, but branches are a precondition of the update notification system, which I will describe in a future post.

The latest version of dbscript is available for download here.

Please leave comments and feedback.


Bad (MS SQL Server) habits to kick

October 19, 2009

Aaron Bertrand of SQLblog.com is writing a great series called Bad habits to kick, and two of his articles cover issues similar to some of my more recent posts:

Choosing the wrong data type deals with all SQL Server data types, where I only compared NVARCHAR vs VARCHAR. (Although I have to admit, I guess I won’t kick my habit of declaring INT integers even if the values never exceed 1-byte or 2-byte values ;) )

His article on Inconsistent naming conventions reads like a confirmation of some articles of my series on Data Model Checks, where I described how to check naming conventions for tables and other objects, columns, and constraints and indexes.

Remember that consistency in naming and typing improves efficiency in programming. Once you decided for some kind of conventions, your need to look up column names and data types in the table definitions will reduce as it will become “obvious” for the programmers which column name uses which data type, and how a table or stored procedure is named exactly.


Scripting all MS SQL Databases using SMOscript

October 17, 2009

SMOscript user Eric reported that the tool fails to script all remaining objects of a database once an error occurs trying to generate a CREATE PROCEDURE statement for an encrypted stored procedure.

This problem has been fixed in the latest version 0.14 which is now available for download.

Based on his feedback I want to demonstrate more capabilities of SMOscript:

Logging

While SMOscript does not implement log file functionality, you can still redirect its output to files from the command line using the standard shell redirectors >, >> and 2>.

smoscript ...parameters... >filename.log 2>filename.err

Scripting all databases on a server

To script all databases of a server into separate files in distinct directories, you can use SMOscript to first list all databases, then invoke SMOscript to script each database in the original list:

set basepath=c:\path\to\output\
set first=

for /f in "usebackq" %%i in (`smoscript -s localhost`) do ↵
                                         (set db=%%i& call :smo)
goto :end

:smo

if "%first%"=="." (
    echo database %db%
    mkdir %basepath%%db%
    smoscript -s localhost -d %db% -f %basepath%%db%\create.sql ↵
                                         > %basepath%%db%.log
)
set first=.

goto :eof
:end

The first line of “smoscript -s” is not a database name, so we include it from the list by using the variable named “first”.

The simple combination of SMOscript with a bit of shell magic can be quite efficient.