On Converting Data

I had to analyze SQL Server Database Projects (available from SQL Server Data Tools for Visual Studio), and these projects offer a menu item “Create Snapshot” which creates a snapshot file with the extension .dacpac.

It turns out that a .dacpac is a zipped XML file (plus some other files) containing a structured representation of the database objects defined in the project. However Visual Studio does not provide a way to display them (double-clicking the file will only display binary data).

So I thought about how to best display the contents of a .dacpac? Two methods came to my mind.

First, inspired by my work on wpxslgui, create an XSLT style sheet which transforms the contents of the XML file to some legible text, for example CREATE TABLE and similar TSQL statements.

Intuitively I called this approach Symbolic Transformation.

Symbolic Transformation (e.g. XSLT)
Representation 1 => Representation 2

On the other hand, a Logical Transformation contains one module parsing the information contained in “Representation 1” into some kind of model, and another module creating the “Representation 2” of that model. The two module can be  implementations of the Interpreter pattern and the Builder or Factory patterns, respectively.

Logical Transformation (simple)
Interpreter Builder
Representation 1 => Model => Representation 2

If we take Representation 1 and Representation 2 as two separate interfaces to the same business model, and want to support two-way operations, we can extend the last table like this:

Logical Transformation (extended)
Interpr.
Builder
Converter Converter Interpr.
Builder
Repr. 1 => Model 1 => Business Model => Model 2 => Repr. 2
<= <= <= <=

Why do we need to have Model 1 and Model 2, as they seem to make the whole thing even more complex?

Let’s have a look at a simple CREATE TABLE statement and some of their representations:

  • a SQL parser (think: ANTLR) uses the representation given by the SQL parser
  • the SQL Server catalog views sys.tables, sys.columns, etc. are a different representation
  • a .dacpac archive is another representation

To keep our code simple, our Model X class structure should be as close to the representation as 1) possible 2) necessary (thinking about proxy classes generated by xsd.exe, ANTLR, and ORMs).

Thus, a common data model (named Business Model in the table) is required, as well as 2-way conversion between the Business Model and each of the other models.

Displaying CRM 2011 Entity Customizations using XSLT

Sometimes it is necessary to extract information about customized entities and attributes in text form, e.g. to document entities or to compare two solutions.

An exported CRM solution is a Zip archive containing a couple of XML files, the most important of them being the customizations.xml.

This XML file can be extracted from the zip, and further processed by an XSLT file, such as the one I am going to describe here.

The output of this XSLT is a text file listing all customized entities, and their fields in tabular form: name, data type, required, display name, description. For clarity, tabs are shown as \t in the listing.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" indent="yes"/>

The Entity template displays the entity name and invoke the attributes template for every attribute:

  <xsl:template match="ImportExportXml/Entities/Entity">
  <xsl:text>
\t</xsl:text>
Entity <xsl:value-of select="Name/."/> (<xsl:value-of 
                               select="Name/@OriginalName"/>)
  <xsl:if test="EntityInfo/entity/attributes/attribute">
    <xsl:apply-templates 
      select="EntityInfo/entity/attributes/attribute"></xsl:apply-templates>
  </xsl:if>
  </xsl:template>

The Attribute template displays the attribute’s properties. The @langcode condition needs to be adjusted to the language you want to extract from your customizations.xml.

  <xsl:template match="EntityInfo/entity/attributes/attribute">
    <xsl:text>
\t</xsl:text>
    <xsl:value-of select="@PhysicalName"/>
    <xsl:text>\t</xsl:text><xsl:value-of select="Type"/>
    <xsl:if test="MaxLength">(<xsl:value-of 
                               select="MaxLength"/>)</xsl:if>
    <xsl:text>\t</xsl:text><xsl:value-of 
                               select="RequiredLevel" ></xsl:value-of>
    <xsl:text>\t"</xsl:text>
    <xsl:value-of 
      select="displaynames/displayname[@languagecode='1031']/@description" />
    <xsl:text>"</xsl:text>
    <xsl:if test="Descriptions">
    <xsl:text>\t// </xsl:text>
      <xsl:value-of 
      select="Descriptions/Description[@languagecode='1031']/@description" />
    </xsl:if>
    <xsl:if test="optionset/options">
      <xsl:apply-templates 
        select="optionset/options/option"></xsl:apply-templates>
    </xsl:if>
  </xsl:template>

The Option template displays the option value (for bool and picklist) and its description

  <xsl:template match="optionset/options/option">
    <xsl:text>
\t\t</xsl:text>
    <xsl:value-of select="@value"/>
    <xsl:text>\t"</xsl:text>
    <xsl:value-of 
      select="labels/label[@languagecode='1031']/@description"/>
    <xsl:text>"</xsl:text>
  </xsl:template>

Ignore every other text node in the source XML:

  <xsl:template match="text()">
  </xsl:template>
</xsl:stylesheet>

Feel free to adjust the output according to your needs.

WordPress XML Export Converter: New Version 1.03

While preparing this blog’s Table of Contents page I noticed that my WordPress tool wpxslgui would not process WP’s Export XML as it did before.

I noticed that the export format had changed from

xmlns:wp="http://wordpress.org/export/1.1/"

to

xmlns:wp="http://wordpress.org/export/1.2/"

Well, if this introduction sounds familiar, you are right.

Unfortunately, this time the change of the wp namespace meant that the wp:category has been dropped as a defined element.

For the Table of Contents xslt, this change means that the categories have to be extracted from the posts’ (<item> elements) <category> child elements. (The other two Xslt file used for generating a Single HTML and a Word HTML file are not affected by this change)

Since the categories occur several times throughout the blog’s XML file, they need to be collected and sorted before outputting them in the result file using the so-called Muenchian method.

To collect the categories, the <xsl:key> element is used:

<xsl:key name="categories" match="/rss/channel/item/category[@domain='category']" use="@nicename" />

The categories are selected and sorted using <xsl:applytemplates>

<xsl:apply-templates 
    select="item[wp:post_type = 'post' and wp:status = 'publish']/category" 
    mode="foo" >
  <xsl:sort order="ascending" select="text()"/>
</xsl:apply-templates>

Only the first of each set of category duplicates is output:

<xsl:template 
    match="item[wp:post_type = 'post' and wp:status = 'publish']/category
       [ generate-id() = generate-id(key('categories', @nicename)[1]) ]" 
    mode="foo">
  <xsl:value-of select="text()"/>
  <xsl:text>
</xsl:text>
</xsl:template>

To avoid outputting the categories’ text() property repeatedly, we need to prevent evaluating the inner text:

  <xsl:template match="text()" mode="foo"></xsl:template>

The main features of wpxslgui remained the same:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

After downloading the latest version of wpxslgui, export your WordPress blog to XML (select “All content”), and convert the file into any of the supported output formats.

Dealing with “Circular group reference” errors in xsd.exe

I continued to research the problem of XSLT files that cannot be processed by xsd.exe. In case of xslt.xsd (contained in the Visual Studio 2010 installation under the XML directory), xsd generates the error message

>xsd "C:\Program Files\Microsoft Visual Studio 10.0\Xml\Schemas\xslt.xsd" 
      /classes

Error: Error generating classes for schema ‘C:\Program Files\Microsoft Visual Studio 10.0\Xml\Schemas\xslt.xsd’.
– Group ‘char-instructions’ from targetNamespace=’http://www.w3.org/1999/XSL/Transform&#8217; has invalid definition: Circular group reference.

Here is how I proceeded:

Create copy1.xsd

To work around this error, I created a copy of the original xslt.xsd (named here copy1), and located the offending XSD definitions.

Replace circular references by reference to new (dummy) element

The two definitions that cause the circular reference error are the groups “char-instructions” and “instructions”. To find out where these definitions are used in the generated C# classes, the groups’ references are replaced by a reference to a new element:

  <xs:group name="char-instructions">
    <xs:choice>
<!--    
      <xs:element name="apply-templates" type="apply-templates" />
      <xs:element name="call-template" type="call-template" />
      <xs:element name="apply-imports" type="apply-imports" />
      <xs:element name="for-each" type="for-each" /> 
      <xs:element name="value-of" type="value-of" />
      <xs:element name="copy-of" type="copy-of" />
      <xs:element name="number" type="number" />
      <xs:element name="choose" type="choose" />
      <xs:element name="if" type="if" />
      <xs:element name="text" type="text" />
      <xs:element name="copy" type="copy" />
      <xs:element name="variable" type="variable" />
      <xs:element name="message" type="message" />
      <xs:element name="fallback" type="fallback" />
      -->
      <xs:any namespace="##other" processContents="lax" />
      <xs:element name="ci-dummy" type="ci-dummy" />
    </xs:choice>
  </xs:group>
  <xs:group name="instructions">
    <xs:choice>
      <xs:group ref="char-instructions" />
<!--    
      <xs:element name="processing-instruction" type="processing-instruction" />
      <xs:element name="comment" type="comment" />
      <xs:element name="element" type="element" />
-->
      <xs:element name="i-dummy" type="i-dummy" />
      <xs:element name="attribute" type="attribute" />
    </xs:choice>
  </xs:group>

Of course, the new dummy types also need to be declared in copy1.xsd:

  <xs:complexType name="ci-dummy" mixed="true">
    <xs:attribute name="dummy" type="xs:string" />
  </xs:complexType>
  <xs:complexType name="i-dummy" mixed="true">
    <xs:attribute name="dummy" type="xs:string" />
  </xs:complexType>

Running xsd.exe on copy1.xsd will now run successfully and generate copy1.cs.

Replace references to dummy classes by original classes

Search the generated classes for references to the dummy classes. You can start by commenting out the declaration of the dummy class and follow the compiler errors. In the example of xslt.xsd, replace

[System.Xml.Serialization.XmlElementAttribute("ci-dummy", typeof(cidummy))]

by

[System.Xml.Serialization.XmlElementAttribute("attribute-set", typeof(attributeset))]
[System.Xml.Serialization.XmlElementAttribute("decimal-format", typeof(decimalformat))]
[System.Xml.Serialization.XmlElementAttribute("include", typeof(include))]
[System.Xml.Serialization.XmlElementAttribute("key", typeof(key))]
[System.Xml.Serialization.XmlElementAttribute("namespace-alias", typeof(namespacealias))]
[System.Xml.Serialization.XmlElementAttribute("output", typeof(output))]
[System.Xml.Serialization.XmlElementAttribute("param", typeof(param))]
[System.Xml.Serialization.XmlElementAttribute("preserve-space", typeof(preservespace))]
[System.Xml.Serialization.XmlElementAttribute("strip-space", typeof(stripspace))]
[System.Xml.Serialization.XmlElementAttribute("template", typeof(template))]
[System.Xml.Serialization.XmlElementAttribute("variable", typeof(variable))]

However, the types that were previously referenced by other elements are not part of the generated code, as no reference to the elements exist anymore.

Add elements for previously referenced types

Create a second copy of the xsd file by copying copy1.xsd to copy2.xsd. Add the xs:element definitions that have been commented out in the first copy

    <xs:element name="apply-templates" type="apply-templates" />
    <xs:element name="apply-imports" type="apply-imports" />
    <xs:element name="call-template" type="call-template" />
    <xs:element name="for-each" type="for-each" />
    <xs:element name="value-of" type="value-of" />
    <xs:element name="copy-of" type="copy-of" />
    <xs:element name="number" type="number" />
    <xs:element name="choose" type="choose" />
    <xs:element name="if" type="if" />
    <xs:element name="text" type="text" />
    <xs:element name="copy" type="copy" />
    <xs:element name="variable" type="variable" />
    <xs:element name="message" type="message" />
    <xs:element name="fallback" type="fallback" />
    <xs:element name="processing-instruction" type="processing-instruction" />
    <xs:element name="comment" type="comment" />
    <xs:element name="element" type="element" />

Run xsd on copy2.xsd generating copy2.cs. copy2.cs need not be part of the C# project.

Copy C# classes

Next, copy all C# classes missing in copy1.cs from copy2.cs until copy1.cs compiles successfully.

Clean up attributes

Your C# classes can now be compiled and will successfully load an XML file conforming to the original XSD.

In case of the xslt.xsd, I noticed that elements and texts are handled by two different arrays, namely

// many more XmlElementAttribute declarations
[System.Xml.Serialization.XmlElementAttribute("some-element-name", typeof(someelementname))]
public object[] Items {
  get { return this.itemsField; }
  set { this.itemsField = value; }
}
[System.Xml.Serialization.XmlTextAttribute()]
public string[] Text {
  get { return this.textField; }
  set { this.textField = value; }
}

This will cause your code to lose the original order of elements and texts in the XML file. To combine both types of data into one array, add the XmlTextAttribute to the Items property as well:

[System.Xml.Serialization.XmlTextAttribute(typeof(string))]
public object[] Items

and completely remove the declaration of the string[] Text property.

Official XSD definition for XSLT?

I tried to look for an official XSD definition for XSLT files for some XSLT experiments.

What I did find was an xslt.xsd file on w3.org, and an xsd in the Xml directory of my Visual Studio installation (“C:\Program Files\Microsoft Visual Studio 10.0\Xml\Schemas\xslt.xsd”). Unfortunately, both versions could not be processed using the xsd.exe tool from VS resulting in different error messages.

Then I found an xslt10.dtd file on w3.org, and converted it to XSD using Trang:

java -jar trang.jar -I dtd -O xsd xslt10.dtd  xslt10.xsd

The resulting .xsd file can be converted to a C# file by xsd.exe

xsd xslt10.xsd /classes /n:Xslt

However I am not sure if the DTD, the XSD and thus the C# definitions are really really standards-compliant, or if the .xsd requires some editing to produce correct code.

Please leave a comment if you have any information on this topic.

WordPress XML Export Converter updated

While preparing this blog’s Table of Contents page I noticed that my WordPress tool wpxslgui would not process WP’s Export XML as it did before.

I noticed that the export format had changed from

xmlns:wp="http://wordpress.org/export/1.0/"

to

xmlns:wp="http://wordpress.org/export/1.1/"

and after fixing the .xsl files, the tool worked again.

I used the opportunity to fix the file operations to read and write UTF-8 encoded files, as was suggest in a previous comment.

I also modified the .xsl files to output published posts only. The previous versions ignored the “published” flag and would output drafts and feedback entries.

The main features of wpxslgui remained the same:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

After downloading the latest version of wpxslgui, export your WordPress blog to XML (select “All content”), and convert the file into any of the supported output formats.

XSLT Transformations on XML Files with Undeclared Namespaces

To perform an XSLT transformation, you just need a couple of lines in C#:

XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XsltFilename);
xslt.Transform(XmlInputFilename, XmlOutputFilename);

By the way, the PowerShell version of this code snippet is here.

This code works fine unless the XML file contains elements with undeclared namespaces.

While working on wpxslgui, I noticed that newer versions of the WordPress XML export format included the atom: namespace without declaring it:

<atom:link rel="search" type="application/opensearchdescription+xml" href="https://devio.wordpress.com/osd.xml" title="devioblog" />
<atom:link rel='hub' href='https://devio.wordpress.com/?pushpress=hub'/>

In case you ever tried my app on an XML file containing these elements, you get the error message

System.Xml.XmlException: ‘atom’ is an undeclared namespace

To fix the problem, you need to open the XML file in an editor, navigate to the line given in the error message, and delete the lines starting with the <atom:link element. Done

Of course, programmers love programming, and finding solutions. We need to declare the namespace before opening (and parsing) the XML file, and this is achieved by using an XmlReader containing the namespace declaration:

XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XsltFilename);

// prepare XML input
StreamReader sr = new StreamReader(XmlInputFilename, System.Text.Encoding.Default);
NameTable nt = new NameTable();
XmlNamespaceManager mgr = new XmlNamespaceManager(nt);
mgr.AddNamespace("atom", "urn:atom");
XmlParserContext xpc = new XmlParserContext(nt, mgr, "", XmlSpace.Default);
XmlReaderSettings rds = new XmlReaderSettings();
rds.ConformanceLevel = ConformanceLevel.Document;
XmlReader rd = XmlReader.Create(sr, rds, xpc);

// prepare transformation
StreamWriter wr = new StreamWriter(XmlOutputFilename);
xslt.Transform(rd, new XsltArgumentList(), wr);

// cleanup
wr.Flush();
wr.Close();
rd.Close();

[The code does not contain exception handling (file not found, file system permissions, etc) for clarity]

wpxslgui – WordPress XML Export Converter

wpxslgui is a Windows application which converts an XML File generated by the WordPress Export function into an HTML or Word HTML document.

This new program is based the two XSL style sheets I created earlier to process WordPress XML Exports: the “Single HTML” XSL and the “Table of Contents” XSL.

The program’s features are:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

Simply download wpxslgui, export your WordPress blog to XML, and convert into any of the supported output formats.

Let me know what you think about it 😉

Generating Table of Contents from WordPress Export

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

In a previous post I described an XSLT file to convert the exported XML into a single HTML file. A click on an article’s title displayed the whole article using a little JavaScript and CSS.

This time, I wanted to create an HTML table of contents displaying all the blog’s categories. Upon selection of a category, only the posting titles of that category should be displayed. The titles link to the original blog articles.

The new XSLT is available for download here.

Applied to this blog, the generated Table of Contents is here.

Command-line XSLT processor with PowerShell

There are already a lot of XSLT processors out there, such as MSXSL, but without downloading and installing an application you can create your own processor using a couple of PowerShell lines and the System.Xml.Xsl namespace of .Net:

param ($xml, $xsl, $output)

if (-not $xml -or -not $xsl -or -not $output)
{
	Write-Host "& .\xslt.ps1 [-xml] xml-input [-xsl] xsl-input [-output] transform-output"
	exit;
}

trap [Exception]
{
	Write-Host $_.Exception;
}

$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load($xsl);
$xslt.Transform($xml, $output);

Write-Host "generated" $output;

What does this code do:

  • Declare command-line parameters $xml, $xsl, $output
  • Check parameters are passed to the script
  • Set a trap to display detailed error message in case an exception is raised
  • Load the XSLT file
  • Transform XML and write result to output file