Creating Dia Data Diagrams from Database Schema

As I stated in earlier posts, you can use dbscript to generate PNG Data Diagrams (MS SQL Server, Oracle, PostgreSQL).

The generated PNG files are intended to give you an overview of the data model. The great drawback is that the information is graphical only, as it contains no information on the original data.

Upcoming version 0.99 will also include the capability to generate Dia files representing a database schema. Dia is an open-source diagram creation program which runs on Windows and Linux.

The advantages of generating in Dia format are that it allows users / developers to modify the generated diagrams, and that general Dia functionality can be used as stated on their website:

[It] can export diagrams to a number of formats, including EPS, SVG, XFIG, WMF and PNG, and can print diagrams (including ones that span multiple pages).

There is one minor issue in generating and opening Dia files: since Dia automatically sizes Table shapes according to their contents, it is not possible to predict what the table shape size will turn out to be (see FAQ). Thus when you open a generated Dia file for the first time, the foreign key connectors will look disconnected from the tables, even though they are not (in the data).

You need to manually enforce connector layout following these steps:

  • Select All (Ctrl-A)
  • Move selected objects (Cursor-Left, then Cursor-Right)

which will layout the foreign key connectors as expected.

Visit the gallery to view PNG images generated by Dia from diagrams created by dbscript.

Download Dia files with database models of

You need to install Dia to view the contents of these files.

Version 0.98 of dbscript Released

The latest version 0.98 of dbscript supports PostgreSQL databases in its documentation generation capabilities.

After importing the database dictionary (via direct connection using ADO.Net and Npgsql) can document a PostgreSQL database in all currently supported documentation format:

MediaWiki

Data Diagram (PNG)

HTML

ScrewTurn wiki

Integration support for PostgreSQL had some consequences: More and more functionality is handled separately for each database engine.

Database import was obviously the first one, since the data access classes (SqlConnection, SqlCommand) in .Net are different for every database library. Same goes for the database dictionary, which is best retrieve from the native system catalogs.

For import and upload, data access classes have been introduced to distinguish the different object types and their properties of each database engine. I mention work on the data access classes in a series of articles already.

In version 0.98, XML generation and object script generation are implemented separately. This results in XSL style sheets being now related to certain a database engine.

For Oracle, XML and object script generation have been updated, and the XSL style sheets have been adjusted to Oracle-specific objects and properties. The results were documented earlier.

The latest version of dbscript is available for download here.

Please leave comments and feedback.

Documenting Oracle Databases

I described the capabilities of dbscript to generate MediaWiki content and a single HTML page documenting an Oracle database in a previous post.

During development of PostgreSQL support (MediaWiki, HTML) it became clear that an XSL style sheet (among other things) needed to become specific to a database engine.

I therefore updated the XSL style sheets for Oracle support, and these are the results:

Creating HTML documentation of PostgreSQL databases

dbscript ships with a couple of XSLT style sheets which transform an XML representation of a database schema into MediaWiki, HTML, or, if you create them on your own, any format you wish.

After writing the previous post on PostgreSQL support in dbscript, I fixed the XSLT for HTML generation, created a Documentation Generator in dbscript, and this is the resulting HTML documentation of the OpenNMS data model.

Creating MediaWiki Documentation of PostgreSQL Databases

After hacking away the last couple of days, dbscript finally is able to perform another trick: Importing database schemas from PostgreSQL databases using the Npgsql data provider.

Internally, the database abstraction layer is now completely based on interfaces as sketched in a previous post. Rather than asking a Database Engine object, “can you do this,  and can you do this?” giving boolean answers, we ask the object: do you implement a functionality and all the required methods, which is a simple “is” operation.

The other aspect is that the newly created data access layer simplifies handling the differences between the database engines (object types, table columns, etc).

These differences also resulted in a reference from XSLT style sheets to a Database Engine to allow more specific content creation in Document Generators.

So here are the new documentation samples, based on the data model of OpenNMS:

The data diagram of the OpenNMS database schema:

PostgreSQL is the 3rd database engine supported by dbscript, after MS SQL Server and Oracle.

Please note that this version still needs some polish. The most current stable version is available for download here.

Managing Customers, Project Versions and Installations

If you develop software and you have a couple of customers, you will pretty soon lose track of which customer has which software version installed and when.

I now implemented a long over-due item on my todo list to manage customers, contacts, and their installation of software versions.

After you create a Customer record in dbscript, you can assign it Project Versions with shipping date and/or installation date. These Installations are visible from both the Customer record and the Project Version record.

The Project page gives you an overview of which Customer had software version installations, and which is the most current Project Version installed:

The latest version of dbscript can be downloaded here.

By the way, the current version of dbscript 0.97 also has an improved Data Diagram generator which performs better for data models with more than 100 tables.

Synching SQL Server Databases

Synchronizing developer databases and synchronizing production with development databases are recurring topics at Stackoverflow:

small dev team, and “The biggest issue that we are having now is that our DB schema is totally out of sync between all of us”

“My main concern is deploying schema + data updates to developer machines from the “master” development copy.”

Databases under version control is a more difficult problem than version control for regular source code, as you not only need

  • the SQL source of the data model, but also
  • a change script to upgrade from one version to the next,
  • and populate application tables with pre-defined values (constants).

I think that dbscript and SMOscript are two useful tools which can help you to address these issues (which is why I wrote them and use them).

SMOscript lets you script all database objects that are scriptable through SMO. You can script the objects to a single file, or as single files in subdirectories named after their type (Table, View, StoredProcedure, etc)

dbscript helps you with the other two items:

Import the current database schemas of two SQL Server databases, and have dbscript compare them. You can then select which changes (on objects and column level) to include in the change script, and have dbscript generate the migration script.

Instead of importing from a database, you can also upload a SQL file containing the schema definition.

Based on the schema data, you can then define all your system tables (those tables which hold pre-defined “magic” values), and let dbscript generate a stored procedure which will script the data as Insert/Update statements, or have the live data scripted directly by dbscript.

To keep your code in sync with database values, you can also create C# constants from your database values.

Last not least, let’s deal with the question of how to keep several programmers in sync.

dbscript keeps all data (projects, schemas, versions, objects) in a database, and lets multiple users work in the same project. Project Versions can be tagged as “development”, “test”, “beta”, “minor”, “major” and other stages. If versions are tagged, they are visible to every member of the project; if they are not tagged, they are only visible to the user who created them.

A strategy to handle multiple developers in dbscript could be implemented as follows:

  • Schemas/Versions of the master development database are tagged
  • Developer generates migration script to sync his development database with the latest master database version
  • Developer applies his changes locally
  • Developer uploads or imports private database schema without tagging the created version
  • If the master schema has changed since the developer retrieved it, apply change scripts from previous to current master development database schema.
  • Developer compares private database schema with latest master schema
  • Developer modifies value generation scripts
  • Developer generates migration script, and applies the script to the master database
  • Developer imports master schema and tags the version

dbscript is available for download here.

Parsing SQL into XML and C# Objects

I recently wrote about my attempts to parse SQL statements and extract schema information from them. (dbscript already has a SQL parser, but it needs to be replaced to support different SQL dialects)

As it turns out, code generation is the way to go!

I start out with an object-oriented representation of the grammar of the target language, in this case T-SQL. A (custom-written) lexer turns the input stream into a sequence of lexemes or tokens, which the parser engine feeds into the grammar’s rule definitions.

The result of this action is an XML document whose structure is equivalent to the grammatical definition of the respective SQL statement.

SQL Parser Data Model

This is where code generation magic sets in: The grammar definition is also used to generate a C# object model, mapping each rule into a C# class. Each class implements an interface requiring a SetXml() method, which reads the contents of the instantiated object from an XmlNode structure, and thus populates its child objects recursively.

The application logic (which can be seen as sort of a “parser client”) only needs to instantiate the grammar object and the parser object, feed the parser the file, and retrieve a typed object containing an OO representation of the parsed file.

The next version of dbscript will be built on these mechanisms.

Parsing SQL into XML (2)

I recently starting re-writing my SQL parser for dbscript and reported about the first results.

Things always turn out more complicated than they initially appear, and one thing I realized was that a simple .sql file is not merely SQL, but also contains some “meta-magic” such as TSQL’s GO statement, which is not a valid SQL command, but really a batch separator for Query Analyzer or Management Studio.

Anyway, there is some kind of progress: I described all SQL statements that the AdventureWorks .sql scripts use, and parsing is pretty quick.

The sample statement of my previous post

CREATE TABLE [Production].[TransactionHistoryArchive](
    [TransactionID] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL ,
    [TransactionDate] [datetime] NOT NULL ,
    [TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL ,
    CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
    (
        [TransactionID] ASC
    ) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
    ON [PRIMARY]
) ON [PRIMARY]

is now parsed into an XML representation which looks like this:

<?xml version="1.0" encoding="Windows-1252"?>
<TSQL>
  <cmdCreateTable>
    <CREATE>CREATE</CREATE>
    <TABLE>TABLE</TABLE>
    <id>
      <id2s>
        <id2>[Production]</id2>
        <x2E>.</x2E>
        <id1>[TransactionHistoryArchive]</id1>
      </id2s>
    </id>
    <x28>(</x28>
    <columns>
      <column>
        <id>[TransactionID]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[int]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
....
      <x2C>,</x2C>
      <column>
        <id>[ModifiedDate]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[datetime]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
      <constraint>
        <CONSTRAINT>CONSTRAINT</CONSTRAINT>
        <constraintid>[PK_TransactionHistoryArchive_TransactionID]</constraintid>
        <type>
          <pk>
            <PRIMARY>PRIMARY</PRIMARY>
            <KEY>KEY</KEY>
            <clustered>
              <CLUSTERED>CLUSTERED</CLUSTERED>
            </clustered>
            <indexcolumns>
              <x28>(</x28>
              <indexcolumns>
                <indexcolumn>
                  <colasc>
                    <columnid>[TransactionID]</columnid>
                    <ASC>ASC</ASC>
                  </colasc>
                </indexcolumn>
              </indexcolumns>
              <x29>)</x29>
            </indexcolumns>
            <with>
              <WITH>WITH</WITH>
              <x28>(</x28>
              <params>
                <param>
                  <key>PAD_INDEX</key>
                  <x3D>=</x3D>
                  <value>OFF</value>
                </param>
                <x2C>,</x2C>
....
                <x2C>,</x2C>
                <param>
                  <key>ALLOW_PAGE_LOCKS</key>
                  <x3D>=</x3D>
                  <value>ON</value>
                </param>
              </params>
              <x29>)</x29>
            </with>
            <onfilegroup>
              <ON>ON</ON>
              <filegroupid>[PRIMARY]</filegroupid>
            </onfilegroup>
          </pk>
        </type>
      </constraint>
    </columns>
    <x29>)</x29>
    <onfilegroup>
      <ON>ON</ON>
      <filegroupid>[PRIMARY]</filegroupid>
    </onfilegroup>
  </cmdCreateTable>
</TSQL>

The complete result file is here.

So as I manage to complete parse and represent the contents of a SQL statement, the question remains how to evaluate the result:

  • Analyze internal representation (messy)
  • Analyze XML representation (complicated)
  • Trigger events for each parsed entity and their attributes (complicated)

The problem is always that the evaluating routine seems to be de-coupled from the parser.

The first solution is too much dependent on internal representation. Changing a little aspect of the parsing mechanism or the grammar would result in breaking code.

The other two solutions are less dependent, but still a change in the grammar needs to be reflected in the evaluator, without any immediate error reporting if the evaluator does not match the parser.

I read about Xtext the other day, a DSL parser for Eclipse, which seems to include a code generator that generates Java classes to match the grammar. An interesting solution, since a change in the grammar and subsequent code generation would immediatly break compilation of the evaluator.

To be continued… 😉