Importing OpenStreetMap Data into MS SQL Server

After analyzing OSM extracts files, we can now import the data from XML into SQL Server. Depending on your requirements, the table to hold the .osm data looks something like this:

    [Lat] [float] NOT NULL,
    [Lon] [float] NOT NULL,
    [Country] [nvarchar](50) NULL,
    [PostCode] [nvarchar](50) NOT NULL,
    [City] [nvarchar](200) NOT NULL,
    [Street] [nvarchar](200) NULL,
    [HouseNumber] [nvarchar](50) NULL

Again, we start out reading from the .osm using an XmlReader

using (var reader = new StreamReader(@"path\to\data.osm", Encoding.UTF8))
  foreach (var el in GetNextNode(reader))

We collect all tag data (attributes k and v) in a dictionary

    var attrs = new Dictionary<string, string>();

    foreach (var tag in el.Elements("tag"))
      var k = tag.Attribute("k");
      var v = tag.Attribute("v");
      if (k != null && v != null)
        attrs.Add(k.Value, v.Value);

If all required fields (in my case, addr:postcode and addr:city) are present, we generate the INSERT statement:

    if (attrs.ContainsKey("addr:postcode") && attrs.ContainsKey("addr:city"))
      Console.WriteLine("INSERT INTO OSM (Lat, Lon, Country, PostCode, " + 
        "City, Street, HouseNumber)\n" + 
        "VALUES({5}, {6}, {0}, {1}, {2}, {3}, {4})\nGO",
        attrs.ToSql("addr:country"), attrs.ToSql("addr:postcode"), 
        attrs.ToSql("addr:street"), attrs.ToSql("addr:housenumber"),
        el.Attribute("lat").Value, el.Attribute("lon").Value);

Note that this code requires an extension method ToSql() defined on a dictionary to calculate the SQL notation for a nullable string literal:

public static class DictExt
  public static string ToSql(this Dictionary<string, string> dict, 
    string key)
    if (!dict.ContainsKey(key))
      return "NULL";
    var s = dict[key];
    if (string.IsNullOrEmpty(s))
      return "NULL";
    return "N'" + s.Replace("'", "''") + "'";

Parsing and Analyzing OpenStreetMap Export Files

OpenStreetMap provides geo-coded information in the form of extracted .osm files for download. .osm files are XML files with a defined schema, but this information alone does not really help in analyzing the files. The reason is that the simple structure <node><tag /></node> distinguishes its data by the tag’s k attribute value:

  <node id="567053682" [...] lat="48.2066502" lon="16.3707086">
    <tag k="addr:postcode" v="1010"/>
    <tag k="addr:country" v="AT"/>
    <tag k="addr:street" v="Neuer Markt"/>
    <tag k="addr:city" v="Wien"/>
    <tag k="addr:housenumber" v="16"/>

Therefore we first need to know which tag@k attributes are used in the XML. Given the size of .osm files (the extract for Austria is currently 3GB), a simple XmlDocument.Load() is bound to fail due to memory restrictions. Remember that both XmlDocument and XDocument are in-memory representations of the XML tree. See also my earlier comparison of XML access methods in .Net.

Fortunately, .Net also provides a way to a access an XML document without loading it into memory completely, namely by streaming it using XmlReader as illustrated in an MSDN sample.

Modifying this sample to process only nodes containing at least 1 tag, the enumerator to return such nodes looks like this:

static IEnumerable<XElement> GetNextNode(TextReader stringReader)
  using (XmlReader reader = XmlReader.Create(stringReader)) {
    while (reader.Read()) {
      switch (reader.NodeType) {
        case XmlNodeType.Element:
          if (reader.Name == "node") {
            XElement el = XElement.ReadFrom(reader) as XElement;
            if (el != null)
              if (el.Element("tag") != null)
                yield return el;

The code that calls this enumerator collects the values of the tag/@k attribute using a dictionary

var tagKeys = new Dictionary<string, int>();
using (var reader = new StreamReader(@"path\to\austria.osm", Encoding.UTF8)) {
  var i = 0;
  var dt = DateTime.Now;
  foreach (var el in GetNextNode(reader))  {
    var keys = new List<string>();

    foreach (var tag in el.Elements("tag")) {
      var ka = tag.Attribute("k");
      if (ka != null) {
        var key = ka.Value;
        if (tagKeys.ContainsKey(key))
          tagKeys[key] = tagKeys[key] + 1;
          tagKeys.Add(key, 1);

Finally, we can print the statistics on attributes and their numbers of occurrence:

foreach (var k in tagKeys.Keys.OrderBy(kT => kT))
  Console.WriteLine("{0} ({1})", k, tagKeys[k]);

After I found the various addr: attributes (see list with most frequent entries),

addr:city (44936)
addr:country (36215)
addr:housenumber (54034)
addr:postcode (45284)
addr:street (50469)

I wanted to know how often their combinations occurred:

if (keys.Contains("addr:postcode") && keys.Contains("addr:city")) {
  if (keys.Contains("addr:street")) {
    if (keys.Contains("addr:housenumber"))

resulting in

postcode+city 42352 
postcode+city+street 39864 
postcode+city+street+housenumber 39338

WordPress XML Export Converter: New Version 1.03

While preparing this blog’s Table of Contents page I noticed that my WordPress tool wpxslgui would not process WP’s Export XML as it did before.

I noticed that the export format had changed from




Well, if this introduction sounds familiar, you are right.

Unfortunately, this time the change of the wp namespace meant that the wp:category has been dropped as a defined element.

For the Table of Contents xslt, this change means that the categories have to be extracted from the posts’ (<item> elements) <category> child elements. (The other two Xslt file used for generating a Single HTML and a Word HTML file are not affected by this change)

Since the categories occur several times throughout the blog’s XML file, they need to be collected and sorted before outputting them in the result file using the so-called Muenchian method.

To collect the categories, the <xsl:key> element is used:

<xsl:key name="categories" match="/rss/channel/item/category[@domain='category']" use="@nicename" />

The categories are selected and sorted using <xsl:applytemplates>

    select="item[wp:post_type = 'post' and wp:status = 'publish']/category" 
    mode="foo" >
  <xsl:sort order="ascending" select="text()"/>

Only the first of each set of category duplicates is output:

    match="item[wp:post_type = 'post' and wp:status = 'publish']/category
       [ generate-id() = generate-id(key('categories', @nicename)[1]) ]" 
  <xsl:value-of select="text()"/>

To avoid outputting the categories’ text() property repeatedly, we need to prevent evaluating the inner text:

  <xsl:template match="text()" mode="foo"></xsl:template>

The main features of wpxslgui remained the same:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

After downloading the latest version of wpxslgui, export your WordPress blog to XML (select “All content”), and convert the file into any of the supported output formats.

Dealing with “Circular group reference” errors in xsd.exe

I continued to research the problem of XSLT files that cannot be processed by xsd.exe. In case of xslt.xsd (contained in the Visual Studio 2010 installation under the XML directory), xsd generates the error message

>xsd "C:\Program Files\Microsoft Visual Studio 10.0\Xml\Schemas\xslt.xsd" 

Error: Error generating classes for schema ‘C:\Program Files\Microsoft Visual Studio 10.0\Xml\Schemas\xslt.xsd’.
– Group ‘char-instructions’ from targetNamespace=’; has invalid definition: Circular group reference.

Here is how I proceeded:

Create copy1.xsd

To work around this error, I created a copy of the original xslt.xsd (named here copy1), and located the offending XSD definitions.

Replace circular references by reference to new (dummy) element

The two definitions that cause the circular reference error are the groups “char-instructions” and “instructions”. To find out where these definitions are used in the generated C# classes, the groups’ references are replaced by a reference to a new element:

  <xs:group name="char-instructions">
      <xs:element name="apply-templates" type="apply-templates" />
      <xs:element name="call-template" type="call-template" />
      <xs:element name="apply-imports" type="apply-imports" />
      <xs:element name="for-each" type="for-each" /> 
      <xs:element name="value-of" type="value-of" />
      <xs:element name="copy-of" type="copy-of" />
      <xs:element name="number" type="number" />
      <xs:element name="choose" type="choose" />
      <xs:element name="if" type="if" />
      <xs:element name="text" type="text" />
      <xs:element name="copy" type="copy" />
      <xs:element name="variable" type="variable" />
      <xs:element name="message" type="message" />
      <xs:element name="fallback" type="fallback" />
      <xs:any namespace="##other" processContents="lax" />
      <xs:element name="ci-dummy" type="ci-dummy" />
  <xs:group name="instructions">
      <xs:group ref="char-instructions" />
      <xs:element name="processing-instruction" type="processing-instruction" />
      <xs:element name="comment" type="comment" />
      <xs:element name="element" type="element" />
      <xs:element name="i-dummy" type="i-dummy" />
      <xs:element name="attribute" type="attribute" />

Of course, the new dummy types also need to be declared in copy1.xsd:

  <xs:complexType name="ci-dummy" mixed="true">
    <xs:attribute name="dummy" type="xs:string" />
  <xs:complexType name="i-dummy" mixed="true">
    <xs:attribute name="dummy" type="xs:string" />

Running xsd.exe on copy1.xsd will now run successfully and generate copy1.cs.

Replace references to dummy classes by original classes

Search the generated classes for references to the dummy classes. You can start by commenting out the declaration of the dummy class and follow the compiler errors. In the example of xslt.xsd, replace

[System.Xml.Serialization.XmlElementAttribute("ci-dummy", typeof(cidummy))]


[System.Xml.Serialization.XmlElementAttribute("attribute-set", typeof(attributeset))]
[System.Xml.Serialization.XmlElementAttribute("decimal-format", typeof(decimalformat))]
[System.Xml.Serialization.XmlElementAttribute("include", typeof(include))]
[System.Xml.Serialization.XmlElementAttribute("key", typeof(key))]
[System.Xml.Serialization.XmlElementAttribute("namespace-alias", typeof(namespacealias))]
[System.Xml.Serialization.XmlElementAttribute("output", typeof(output))]
[System.Xml.Serialization.XmlElementAttribute("param", typeof(param))]
[System.Xml.Serialization.XmlElementAttribute("preserve-space", typeof(preservespace))]
[System.Xml.Serialization.XmlElementAttribute("strip-space", typeof(stripspace))]
[System.Xml.Serialization.XmlElementAttribute("template", typeof(template))]
[System.Xml.Serialization.XmlElementAttribute("variable", typeof(variable))]

However, the types that were previously referenced by other elements are not part of the generated code, as no reference to the elements exist anymore.

Add elements for previously referenced types

Create a second copy of the xsd file by copying copy1.xsd to copy2.xsd. Add the xs:element definitions that have been commented out in the first copy

    <xs:element name="apply-templates" type="apply-templates" />
    <xs:element name="apply-imports" type="apply-imports" />
    <xs:element name="call-template" type="call-template" />
    <xs:element name="for-each" type="for-each" />
    <xs:element name="value-of" type="value-of" />
    <xs:element name="copy-of" type="copy-of" />
    <xs:element name="number" type="number" />
    <xs:element name="choose" type="choose" />
    <xs:element name="if" type="if" />
    <xs:element name="text" type="text" />
    <xs:element name="copy" type="copy" />
    <xs:element name="variable" type="variable" />
    <xs:element name="message" type="message" />
    <xs:element name="fallback" type="fallback" />
    <xs:element name="processing-instruction" type="processing-instruction" />
    <xs:element name="comment" type="comment" />
    <xs:element name="element" type="element" />

Run xsd on copy2.xsd generating copy2.cs. copy2.cs need not be part of the C# project.

Copy C# classes

Next, copy all C# classes missing in copy1.cs from copy2.cs until copy1.cs compiles successfully.

Clean up attributes

Your C# classes can now be compiled and will successfully load an XML file conforming to the original XSD.

In case of the xslt.xsd, I noticed that elements and texts are handled by two different arrays, namely

// many more XmlElementAttribute declarations
[System.Xml.Serialization.XmlElementAttribute("some-element-name", typeof(someelementname))]
public object[] Items {
  get { return this.itemsField; }
  set { this.itemsField = value; }
public string[] Text {
  get { return this.textField; }
  set { this.textField = value; }

This will cause your code to lose the original order of elements and texts in the XML file. To combine both types of data into one array, add the XmlTextAttribute to the Items property as well:

public object[] Items

and completely remove the declaration of the string[] Text property.

Reading Web Service Definition from Other Program’s app.config or web.config

The ConfigurationManager class allows you to open your own app.config (renamed to .exe.config) or web.config, but also the design-mode app.config of any other application by providing a ExeConfigurationFileMap parameter to the OpenMappedExeConfiguration method:

var config =
    new ExeConfigurationFileMap { ExeConfigFilename = cfg },

You can then access the <host> configuration like this

ServiceModelSectionGroup serviceModel =
if (serviceModel != null
    && serviceModel.Services != null
    && serviceModel.Services.Services != null)
  foreach (ServiceElement service in serviceModel.Services.Services)
    foreach (BaseAddressElement addr in service.Host.BaseAddresses)

This method works fine unless the services section has been extended by custom config sections such as a custom behavior

    <add name="myBehaviorExtension"
      type="My.BehaviorExtension, My.BehaviorExtension.Library" />

Such a custom extension without an accessible library implementing the type causes a ConfigurationErrorsException:

The type ‘x’ registered for extension ‘y’ could not be loaded

If this is the case, the only way I found was to load the .config file as an XmlDocument

XmlDocument xml = new XmlDocument();

and process the results of the xml’s SelectNodes() method

foreach (XmlNode nd in xml.SelectNodes(
  "/configuration/system.serviceModel/" +
  var url = nd.Attributes["baseAddress"];
  if (url != null)

WordPress XML Export Converter updated

While preparing this blog’s Table of Contents page I noticed that my WordPress tool wpxslgui would not process WP’s Export XML as it did before.

I noticed that the export format had changed from




and after fixing the .xsl files, the tool worked again.

I used the opportunity to fix the file operations to read and write UTF-8 encoded files, as was suggest in a previous comment.

I also modified the .xsl files to output published posts only. The previous versions ignored the “published” flag and would output drafts and feedback entries.

The main features of wpxslgui remained the same:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

After downloading the latest version of wpxslgui, export your WordPress blog to XML (select “All content”), and convert the file into any of the supported output formats.

XML Access Modes, Object Models, and Serialization in .Net

When I researched the topic of my recent post about the XmlSerializer I became aware that there are more ways to process XML data in .Net than I knew about. Here is an overview of how you can access XML data in .Net.

XML data

The .Net framework provides the following classes to store XML data in memory:

  • XSD proxy classes

XML serialization

The .Net framework implements a set of dependent classes to serialize and deserialize XML data. Each class in this list can be instantiated using a parameter of the classes below it:

  • filename (string)

XML query and manipulation

Besides the classes named above, other classes exist to query and manipulate XML data:

Putting it all together


Data model Access (MSDN summary) Serialization XML Query
XmlDocument “Represents an XML document.”
Load(XmlReader), Save(XmlWriter)
XPath SelectSingleNode(), SelectNodes();
Linq To Objects;
XPathNavigator CreateNavigator()
XDocument “Represents an XML document.”
Load(XmlReader), Save(XmlWriter),
XmlReader CreateReader();
Linq To Xml;
XPathNavigator CreateNavigator()
XSD proxy classes In-Memory XmlSerializer,
Introduction on MSDN
Linq To Objects
XmlReader “a reader that provides fast, non-cached, forward-only access to XML data.” Create(…) GetAttribute(), MoveTo*(),
XmlNodeReader “a reader that provides fast, non-cached forward only access to XML data in an XmlNode.” (XmlNode) derived from XmlReader
XPathNavigator “a cursor model for navigating and editing XML data.”
“read-only if created by XPathDocument; editable if created by XmlDocument
(CreateNavigator()) GetAttribute(), Value*,
XPath Select(), Select*();
XmlReader ReadSubtree()
XPathDocument “a fast, read-only, in-memory representation of an XML document” Constructor() XPathNavigator CreateNavigator()

Conversion and Comparison


Converting from XmlDocument to XDocument

Comparing XmlReader to SAX Reader


Use XDocument as the source for XmlSerializer.Deserialize?

Converting XDocument to XmlDocument and vice versa

XDocument or XMLDocument

XMLSerializer vs XMLReader vs XMLDocument vs XDocument Performance Comparison

XmlDocument versus XDocument versus XmlReader/XmlWriter

Performance: LINQ to XML vs XmlDocument vs XmlReader

Configuring XmlSerializer to reproduce XDocument.Save format

Clean-up job in a recent project that uses XML to store certain data. What I found:

  • an XSD describing the XML schema
  • C# proxy classes previously generated by the VS xsd.exe tool
  • the actual XML files
  • all of them not always in sync

The code reading and writing the XML files used LinqToXml to query the files and manually instantiate the XSD-generated classes.

Since the purpose of the proxy classes is to simplify read, write and query operations on statically typed classes, I replaced the Xml.Linq with compile-time type-safe object operations.

In order to verify that my changes did not corrupt the file or change any of the data, I needed to compare the original .xml files and the ones generated by XmlSerializer. I noticed differences between the two sets caused by the classes that wrote the files:

  • XDocument.Root.Save does not generate namespace declarations, and writes a <element></element> closing tag for empty data.
  • XmlSerializer does not write an explicit closing tag but uses the <element /> notation, and generates some namespace declarations in the <?xml> header.

I found the first hints of dealing with the closing tag here and here by implementing an XmlTextWriter, and the final code looks like this:

class MyXmlTextWriter : XmlTextWriter
  public MyXmlTextWriter(Stream w)
    : base(w, Encoding.UTF8)
    Indentation = 2;
    IndentChar = ' ';
    Formatting = System.Xml.Formatting.Indented;

  public override void WriteEndElement()
    if (this.WriteState == System.Xml.WriteState.Element)
      Formatting = System.Xml.Formatting.None;
    Formatting = System.Xml.Formatting.Indented;

To omit the namespace declarations, one needs to provide an XmlSerializerNamespaces object for the XmlSerializer:

XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add(string.Empty, string.Empty);
ns.Add(string.Empty, "");
ns.Add(string.Empty, "");

using (var fs = new FileStream(filename, FileMode.Create, FileAccess.Write))
  serializer.Serialize(new MyXmlTextWriter(fs), app, ns);

Checking VAT identification numbers using JavaScript

The EU Commission provides the checkVat web service to check the validity of EU VAT Identification Numbers (wsdl, redirected target).

To perform the web service query in the browser, I used the JavaScript SOAP Client on CodePlex which worked after a couple of tweaks.

First, IE9 seems to be unable to concatenate an XML document and an empty string, resulting in a Script Error. (see this post on the Discussion page). Edit the _loadWsdl function and modify the line checking the wsdl to:

if(typeof(wsdl) != "object" && wsdl + "" != "" && wsdl + "" != "undefined")

Next, the version of MSXML installed on my current work PC does not support the local-name() XPath function required in the function _getElementsByTagName. Since this function is used to retrieve the SOAP result only, and the result node is always on the same level in XML, I added to the function:

  var nd = document.selectNodes("//*/*/*/" + tagName );
  if (nd.length > 0) {
   return nd;
catch (ex) {}
  return document.selectNodes("//*/*/urn:" + tagName );
catch (ex) {}

This handles both the SOAP *Result and *Response nodes (the checkVat service adds a urn: namespace to the response attributes).

The checkVat web service is somewhat “special” (at least according to my experience) in that it defines the parameters types in a sub-namespace (xsd:schema targetNamespace=””) rather than the web service’s namespace (wsdl:definitions targetNamespace=””). The parameter types namespace is also used in the Response (xmlns:urn=””).

I finally gave up trying to retrieve the correct namespace with XPath and simply added the line to _sendSoapRequest:

ns = ns + ":types";

The mismatch in namespaces causes Visual Studio to issue a comment in the generated Reference.cs file:

// CODEGEN: Generating message contract since the wrapper namespace ( of message “checkVatRequest” does not match the default value (

Further, the checkVat service does not return a Result node, but only a Response message (probably the cause for the out parameters generated in C#).

So I had to fix the _onSendSoapRequest function to parse the Response node:

var nd = SOAPClient._getElementsByTagName(req.responseXML, method + "Result");
if(nd.length == 0)
  nd = SOAPClient._getElementsByTagName(req.responseXML, method + "Response");
if(nd.length == 0)
  nd = SOAPClient._getElementsByTagName(req.responseXML, "return"); // PHP web Service?

and remove the urn: namespace in the result elements (_node2object):

obj[node.childNodes[i].nodeName.replace("urn:", "")] = p;

Finally, my test page containing the web service call for both a valid and an invalid VATID was able to call the script:

var url = "";

function CheckVATOk()
  var pl = new SOAPClientParameters();
  pl.add("countryCode", "AT");
  pl.add("vatNumber", "U40600000");
  SOAPClient.invoke(url, "checkVat", pl, true, CheckVAT_callBack);
function CheckVATInvalid()
  var pl = new SOAPClientParameters();
  pl.add("countryCode", "XX");
  pl.add("vatNumber", "U40600000");
  SOAPClient.invoke(url, "checkVat", pl, true, CheckVAT_callBack);
function CheckVAT_callBack(r, rawXml)
  if (typeof(r.valid) != "undefined")
    alert(r.countryCode + " " + r.vatNumber + "\r\n" + + ": " +
      r.address + "\r\n" + r.valid);
  else if (typeof(r.number) != "undefined")
    alert( + " " + r.number + ": " +
      r.message.replace(/[{}\s']/g, ""));

In case of an error, the error message needs to be stripped of a couple of “special” characters to get a character-only error string.

The code worked from a local html page, but not when executed inside a web application. This is due to the Same Origin Policy and raises the message “This Page Accesses Data on Another Domain” in IE9 unless you make the browser “trust” the web service’s host.

To work around this problem, you’d need to implement a local webservice calling the remote checkVat web service.

Altering XML Schema Collections

SQL Server first implemented XML Schema Collections in version 2005.

However, support for XML schema collections is not exactly developer-friendly: Once a schema collection has been created in a database, the ALTER XML SCHEMA COLLECTION command can only add to an existing schema, but nothing can be removed from it, or changed:

Use the ALTER XML SCHEMA COLLECTION to add new XML schemas whose namespaces are not already in the XML schema collection, or add new components to existing namespaces in the collection.

Change or removal of schema “components” (i.e. elements and attributes) is only possible by dropping and then re-creating the XML schema collection. This is not as straight-forward as it seems, and this lack of efficiency keeps developers from using XML-based data, as this Stack Overflow question shows.

To drop an XML schema collection, the following steps are necessary:

  • If a table column references the schema collection (i.e. typed XML), it has to be converted to plain XML type
  • If the table column has a default constraint, drop the default constraint
  • If a procedure or function has typed XML parameters, the procedure or function has to be dropped
  • If a function has typed XML parameters, the function has to be dropped
  • If there is an XML index on a column referencing the schema collection, the index (primary and secondary indexes) has to be dropped
  • If there are computed columns based on a typed XML column, the computed columns have to be dropped
  • If there are indexes on these computed columns, the indexes have to be dropped
  • If there are schema-bound views, functions, or procedures based on tables containing typed XML columns, these objects have to be dropped

Of course, all these DROP commands have to be executed in the correct order.

After creating the new XML schema collection, all the dropped objects can be re-created using information initially stored in the SQL Server catalog views.