XSLT Transformations on XML Files with Undeclared Namespaces

To perform an XSLT transformation, you just need a couple of lines in C#:

XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XsltFilename);
xslt.Transform(XmlInputFilename, XmlOutputFilename);

By the way, the PowerShell version of this code snippet is here.

This code works fine unless the XML file contains elements with undeclared namespaces.

While working on wpxslgui, I noticed that newer versions of the WordPress XML export format included the atom: namespace without declaring it:

<atom:link rel="search" type="application/opensearchdescription+xml" href="https://devio.wordpress.com/osd.xml" title="devioblog" />
<atom:link rel='hub' href='https://devio.wordpress.com/?pushpress=hub'/>

In case you ever tried my app on an XML file containing these elements, you get the error message

System.Xml.XmlException: ‘atom’ is an undeclared namespace

To fix the problem, you need to open the XML file in an editor, navigate to the line given in the error message, and delete the lines starting with the <atom:link element. Done

Of course, programmers love programming, and finding solutions. We need to declare the namespace before opening (and parsing) the XML file, and this is achieved by using an XmlReader containing the namespace declaration:

XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XsltFilename);

// prepare XML input
StreamReader sr = new StreamReader(XmlInputFilename, System.Text.Encoding.Default);
NameTable nt = new NameTable();
XmlNamespaceManager mgr = new XmlNamespaceManager(nt);
mgr.AddNamespace("atom", "urn:atom");
XmlParserContext xpc = new XmlParserContext(nt, mgr, "", XmlSpace.Default);
XmlReaderSettings rds = new XmlReaderSettings();
rds.ConformanceLevel = ConformanceLevel.Document;
XmlReader rd = XmlReader.Create(sr, rds, xpc);

// prepare transformation
StreamWriter wr = new StreamWriter(XmlOutputFilename);
xslt.Transform(rd, new XsltArgumentList(), wr);

// cleanup
wr.Flush();
wr.Close();
rd.Close();

[The code does not contain exception handling (file not found, file system permissions, etc) for clarity]

wpxslgui – WordPress XML Export Converter

wpxslgui is a Windows application which converts an XML File generated by the WordPress Export function into an HTML or Word HTML document.

This new program is based the two XSL style sheets I created earlier to process WordPress XML Exports: the “Single HTML” XSL and the “Table of Contents” XSL.

The program’s features are:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

Simply download wpxslgui, export your WordPress blog to XML, and convert into any of the supported output formats.

Let me know what you think about it 😉

Introducing oraddlscript

A recent question on StackOverflow inspired me to write a utility called oraddlscript:

How to generate scripts for Oracle schema objects, such as tables, procedures, etc.

The answers directed me to the DBMS_METADATA package (9i documentation, 10g documentation). The function DBMS_METADATA.GET_DDL is the functional equivalent of the MSSQL SMO library, which prompted me to adapt my command-line utility SMOscript to Oracle databases. VoilĂ , oraddlscript.

oraddlscript 0.14.3584.16268 (c) by devio.at 2009

    list and script databases and database objects.

    usage: oraddlscript [options] [command]

    options: (leading '-' or '/')

    -s server       TNS name or host name or host:port
    -svc service    service name (if host name is provided)
    -o owner        owner name

    -u username     username (default: integrated authentication)
    -p password     password (if -u is missing, password for sa)

    -f filename     output to file
    -F directory    output to directory

    -A              current ANSI codepage
    -O              ASCII
    -T              Unicode
    -U              UTF8

    commands:

    l               list objects
    s               script object/s (using dbms_meta.get*ddl)
    -xml            use dbms_meta.get*xml

    list object owners on server (implied by -s)
    list objects in database (implied by -s -o)
    script all objects (implied by -s -o -F/-f)

The command-line arguments are consistent with SMOscript, except for -d (database) which has been replaced by -o (owner name).

The list of objects is retrieved by querying DBA_OBJECTS, ALL_OBJECTS and USER_OBJECTS depending on which of the catalog views is accessible by the user specified by -u.

The package also contains a function GET_XML which is used to retrieve the XML representation of a database object.

The functions of oraddlscript are:

  • list usernames of object owners
  • list objects of specific owner
  • generate CREATE scripts of objects owned by specific user
  • generate XML files representing objects owned by specific user
  • generate one file per object or single file for all objects

Of course, logging and batch operations work just as previously described for SMOscript.

oraddlscript is available for download here.

Generating Table of Contents from WordPress Export

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

In a previous post I described an XSLT file to convert the exported XML into a single HTML file. A click on an article’s title displayed the whole article using a little JavaScript and CSS.

This time, I wanted to create an HTML table of contents displaying all the blog’s categories. Upon selection of a category, only the posting titles of that category should be displayed. The titles link to the original blog articles.

The new XSLT is available for download here.

Applied to this blog, the generated Table of Contents is here.

Structure and Analysis of edmx Files (3)

The Entity Designer of Visual Studio 2008 stores its data in an XML file with the extension .edmx. In the first part of this series I covered the Storage Model, the second part dealt with the Conceptual Model.

This part covers the Mappings section with regard to C# mappings. The Mapping section uses the xmlns:cs namespace

  • xmlns:cs=”urn:schemas-microsoft-com:windows:storage:mapping:CS”

For tables and views, the mapping section defines the mappings from database object to C# class (EntitySets and EntityTypes), along with their scalar properties:

    <edmx:Mappings>
      <cs:Mapping Space="C-S" >
        <cs:EntityContainerMapping
            StorageEntityContainer="MyNamespaceEntitiesStoreContainer"
            CdmEntityContainer="MyEntities">
          <cs:EntitySetMapping Name="TableFooSet">
            <cs:EntityTypeMapping TypeName="IsTypeOf(My.Namespace.Entities.TableFoo)">
              <cs:MappingFragment StoreEntitySet="TABLEFOO">
                <cs:ScalarProperty Name="ID" ColumnName="ID" />
                <cs:ScalarProperty Name="Name" ColumnName="NAME" />
              </cs:MappingFragment>
            </cs:EntityTypeMapping>
          </cs:EntitySetMapping> 

          <cs:AssociationSetMapping Name="FK_TABLEFOO_TABLEBAR"
            TypeName="My.Namespace.Entities.FK_TABLEFOO_TABLEBAR"
            StoreEntitySet="TABLEFOO">
            <cs:EndProperty Name="TABLEFOO">
              <cs:ScalarProperty Name="ID" ColumnName="ID" />
            </cs:EndProperty>
            <cs:EndProperty Name="TableBar">
              <cs:ScalarProperty Name="ID" ColumnName="BAR_ID" />
            </cs:EndProperty>
            <cs:Condition ColumnName="BAR_ID" IsNull="false" />
          </cs:AssociationSetMapping>

          <cs:FunctionImportMapping FunctionImportName="SPBarFromFoo"
            FunctionName="My.Namespace.Entities.Store.SP_BAR_FROM_FOO" />

End of Mapping section, end of Runtime declarations

        </cs:EntityContainerMapping>
      </cs:Mapping>
    </edmx:Mappings>
  </edmx:Runtime>

The rest of the edmx file contains the edmx:Designer section with layout information (shapes, connectors).

Parsing SQL into XML and C# Objects

I recently wrote about my attempts to parse SQL statements and extract schema information from them. (dbscript already has a SQL parser, but it needs to be replaced to support different SQL dialects)

As it turns out, code generation is the way to go!

I start out with an object-oriented representation of the grammar of the target language, in this case T-SQL. A (custom-written) lexer turns the input stream into a sequence of lexemes or tokens, which the parser engine feeds into the grammar’s rule definitions.

The result of this action is an XML document whose structure is equivalent to the grammatical definition of the respective SQL statement.

SQL Parser Data Model

This is where code generation magic sets in: The grammar definition is also used to generate a C# object model, mapping each rule into a C# class. Each class implements an interface requiring a SetXml() method, which reads the contents of the instantiated object from an XmlNode structure, and thus populates its child objects recursively.

The application logic (which can be seen as sort of a “parser client”) only needs to instantiate the grammar object and the parser object, feed the parser the file, and retrieve a typed object containing an OO representation of the parsed file.

The next version of dbscript will be built on these mechanisms.

Parsing SQL into XML (2)

I recently starting re-writing my SQL parser for dbscript and reported about the first results.

Things always turn out more complicated than they initially appear, and one thing I realized was that a simple .sql file is not merely SQL, but also contains some “meta-magic” such as TSQL’s GO statement, which is not a valid SQL command, but really a batch separator for Query Analyzer or Management Studio.

Anyway, there is some kind of progress: I described all SQL statements that the AdventureWorks .sql scripts use, and parsing is pretty quick.

The sample statement of my previous post

CREATE TABLE [Production].[TransactionHistoryArchive](
    [TransactionID] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [ReferenceOrderID] [int] NOT NULL,
    [ReferenceOrderLineID] [int] NOT NULL ,
    [TransactionDate] [datetime] NOT NULL ,
    [TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
    [Quantity] [int] NOT NULL,
    [ActualCost] [money] NOT NULL,
    [ModifiedDate] [datetime] NOT NULL ,
    CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
    (
        [TransactionID] ASC
    ) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
        IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
    ON [PRIMARY]
) ON [PRIMARY]

is now parsed into an XML representation which looks like this:

<?xml version="1.0" encoding="Windows-1252"?>
<TSQL>
  <cmdCreateTable>
    <CREATE>CREATE</CREATE>
    <TABLE>TABLE</TABLE>
    <id>
      <id2s>
        <id2>[Production]</id2>
        <x2E>.</x2E>
        <id1>[TransactionHistoryArchive]</id1>
      </id2s>
    </id>
    <x28>(</x28>
    <columns>
      <column>
        <id>[TransactionID]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[int]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
....
      <x2C>,</x2C>
      <column>
        <id>[ModifiedDate]</id>
        <datatype>
          <id>
            <id1s>
              <id1>[datetime]</id1>
            </id1s>
          </id>
        </datatype>
        <attr>
          <notnull>
            <NOT>NOT</NOT>
            <NULL>NULL</NULL>
          </notnull>
        </attr>
      </column>
      <x2C>,</x2C>
      <constraint>
        <CONSTRAINT>CONSTRAINT</CONSTRAINT>
        <constraintid>[PK_TransactionHistoryArchive_TransactionID]</constraintid>
        <type>
          <pk>
            <PRIMARY>PRIMARY</PRIMARY>
            <KEY>KEY</KEY>
            <clustered>
              <CLUSTERED>CLUSTERED</CLUSTERED>
            </clustered>
            <indexcolumns>
              <x28>(</x28>
              <indexcolumns>
                <indexcolumn>
                  <colasc>
                    <columnid>[TransactionID]</columnid>
                    <ASC>ASC</ASC>
                  </colasc>
                </indexcolumn>
              </indexcolumns>
              <x29>)</x29>
            </indexcolumns>
            <with>
              <WITH>WITH</WITH>
              <x28>(</x28>
              <params>
                <param>
                  <key>PAD_INDEX</key>
                  <x3D>=</x3D>
                  <value>OFF</value>
                </param>
                <x2C>,</x2C>
....
                <x2C>,</x2C>
                <param>
                  <key>ALLOW_PAGE_LOCKS</key>
                  <x3D>=</x3D>
                  <value>ON</value>
                </param>
              </params>
              <x29>)</x29>
            </with>
            <onfilegroup>
              <ON>ON</ON>
              <filegroupid>[PRIMARY]</filegroupid>
            </onfilegroup>
          </pk>
        </type>
      </constraint>
    </columns>
    <x29>)</x29>
    <onfilegroup>
      <ON>ON</ON>
      <filegroupid>[PRIMARY]</filegroupid>
    </onfilegroup>
  </cmdCreateTable>
</TSQL>

The complete result file is here.

So as I manage to complete parse and represent the contents of a SQL statement, the question remains how to evaluate the result:

  • Analyze internal representation (messy)
  • Analyze XML representation (complicated)
  • Trigger events for each parsed entity and their attributes (complicated)

The problem is always that the evaluating routine seems to be de-coupled from the parser.

The first solution is too much dependent on internal representation. Changing a little aspect of the parsing mechanism or the grammar would result in breaking code.

The other two solutions are less dependent, but still a change in the grammar needs to be reflected in the evaluator, without any immediate error reporting if the evaluator does not match the parser.

I read about Xtext the other day, a DSL parser for Eclipse, which seems to include a code generator that generates Java classes to match the grammar. An interesting solution, since a change in the grammar and subsequent code generation would immediatly break compilation of the evaluator.

To be continued… 😉

Converting WordPress Export XML to HTML

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

Based on my previous work in dbscript to generate an HTML documentation of a database schema using XML and XSL I modified the “Single HTML” XSL of dbscript to create an XSL to transform the WordPress Export XML into an HTML page.

This HTML page lists all article titles with dates, categories and link to the original URL in the blog.

Clicking the article’s title will expand the list to display the contents of the selected article.

To display paragraphs properly, I needed to replace newlines with <br /> elements, as described on this page implementing various replace operations in XSL.

The WordPress HTML XSL file is available for download here.

Parsing SQL into XML

dbscript has a built-in SQL script parser so that users can upload SQL scripts and analyze and document their database schemas.

While the parser is good enough to both extract all the relevant information from a SQL script and also skip over unsupported commands or simply plain text, I felt that the current approach would not scale well if I wanted to support more commands or different SQL dialects.

I started to play with a couple of C# classes, re-using the original dbscript T-SQL tokenizer, and soon had a small class library to define terminals, rules, and a parser class that kept everything together.

Additionally, the parser result can be output as XML.

Some parts of the parsing process turned out to be rather tough, such as the Repeater (or Loop) construct (as I said, it was playing rather than planning ahead), but I guess I now have a good framework to start implementing a T-SQL parser.

A table definition taken out of MS AdventureWorks

CREATE TABLE [Production].[TransactionHistoryArchive](
 [TransactionID] [int] NOT NULL,
 [ProductID] [int] NOT NULL,
 [ReferenceOrderID] [int] NOT NULL,
 [ReferenceOrderLineID] [int] NOT NULL ,
 [TransactionDate] [datetime] NOT NULL ,
 [TransactionType] [nchar](1) COLLATE Latin1_General_CI_AS NOT NULL,
 [Quantity] [int] NOT NULL,
 [ActualCost] [money] NOT NULL,
 [ModifiedDate] [datetime] NOT NULL ,
 CONSTRAINT [PK_TransactionHistoryArchive_TransactionID] PRIMARY KEY CLUSTERED
(
 [TransactionID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,
  ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

is parsed and translated into XML like this:

<TSQL><cmdCREATETABLE><CREATE>CREATE</CREATE><TABLE>TABLE</TABLE>
<id13a><id2s><id2>[Production]</id2><x2E>.</x2E>
<id1>[TransactionHistoryArchive]</id1></id2s></id13a>
<columnsr><column><colid><id1>[TransactionID]</id1></colid>
<datatype><id1s><id1>[int]</id1></id1s></datatype>
<attr><notnull><NOT>NOT</NOT><NULL>NULL</NULL></notnull><rep-end>,</rep-end></attr>
</column>
...
</columnsr><x29>)</x29>
<onfg><ON>ON</ON><filegroup><id1>[PRIMARY]</id1></filegroup></onfg>
</cmdCREATETABLE></TSQL>

See the whole generated XML file here.

A “GO” command is transformed into this XML document:

<TSQL><cmdGO><GO>GO</GO></cmdGO></TSQL>

This is a work-in-progress. The naming of the symbols is not final, and the parser code needs to be cleaned up and separated into parsing and TSQL functionality.

Generating XML from Database Schema and Table Data

As described in my previous posts on dbscript, version 0.92 supports generation of XML files based on schema information imported from an SQL Server database or uploaded via SQL files containing CREATE statements for all supported object types.

I provided two examples of how these XML files can be used by XSL transformations integrated into dbscript to generate wiki content.

In this post, I will introduce the capability to generate a Stored Procedure which in turn generates an XML representation of data stored in a database table.

Currently, the XML script generator supports 3 modes, called:

  • Field Elements
  • Column Name Elements
  • Column Name Attributes

to reflect the various ways data can be mapped onto XML.

The root element for all XML data files is named <data>, its child elements <table> have the attributes name and schema. Each record within a table is marked by the element <record>, its attributes being the names of the defined primary key columns.

All examples are constructed from tables of the dbscript database.

Field Elements

Field Elements mode have <field name=”column name”> child elements, with the record field values as node texts:

<table name="ProjectVersionType" schema="dbo">
  <record OID="1">
    <field name="OID" datatype="int">1</field>
    <field name="ID" datatype="nvarchar">Major</field>
    <field name="DispSeq" datatype="int">100</field>
  </record>
</table>

Column Name Elements

The child elements of the <record> element are named according to the column names of the table:

<table name="ObjectType" schema="dbo">
  <record OID="1" >
    <OID datatype="int">1</OID>
    <ID datatype="nvarchar">Table</ID>
    <ProgID datatype="varchar">Table</ProgID>
    <SysobjectType datatype="varchar">U</SysobjectType>
    <Seq2000 datatype="int">100</Seq2000>
    <Seq2005 datatype="int">100</Seq2005>
    <WikiTitleFormat datatype="nvarchar">_(table)</WikiTitleFormat>
  </record>
</table>

Column Name Attributes

The <record> element contains attributes according to the column names of the table:

<table name="ScriptType" schema="dbo">
  <record
    OID="1"
    ID="T-SQL Value Script"
    HasSCols="1"
    ProgID="TSqlValues"
  >
  </record>
</table>

The latest version of dbscript (0.92) is available for download here.