OpenStreetMap for MS SQL Server

I previously wrote about importing OpenStreetMap data into MS SQL Server, and accidentally came across another project dealing with this topic.

osm2mssql not only imports data from OSM files, but also uses the SQL Server Geography types to encode the imported data. osm2mssql is hosted on CodePlex.

IPMap – Geocoding Multiple IP Addresses using OpenStreetMap

If you are looking for methods to map IP address to their geographic locations, there are various possibilities (source):

Some of them provide geo data to be downloaded, others, such as Google Maps, allow data retrieval via a web service.

Of course, the drawback of 3rd party web services is that access is usually metered, meaning the number of requests per day or per hour is limited, especially for free web services (e.g. Google Maps usage limits).

After checking which of the services gave me the most accurate location for my IP address, I chose MaxMind GeoLite City as my geo data source, downloaded the .csv files, and imported them into a database. The resolution is on IP blocks per city, which is good enough for my purposes.

From a previous project operating with OpenStreetMap and OpenLayers,  I employed an elaborate technique of code reuse (read: copy and paste and edit) to adapt the existing code to its new purpose.

And voilà, here comes IPMap.

The text box allows any text to be entered, and the code-behind extracts the IP addresses contained in the text (great if you just want to paste lines from server log files). Processing the request may take some time depending on the number of IP addresses posted.

By the way, the screen shots of my previous post are actually the output of IPMap.

Sample: Hits on this blog from Feb 01 through Feb 03



Importing OpenStreetMap Data into MS SQL Server

After analyzing OSM extracts files, we can now import the data from XML into SQL Server. Depending on your requirements, the table to hold the .osm data looks something like this:

CREATE TABLE [dbo].[OSM](
    [Lat] [float] NOT NULL,
    [Lon] [float] NOT NULL,
    [Country] [nvarchar](50) NULL,
    [PostCode] [nvarchar](50) NOT NULL,
    [City] [nvarchar](200) NOT NULL,
    [Street] [nvarchar](200) NULL,
    [HouseNumber] [nvarchar](50) NULL
)

Again, we start out reading from the .osm using an XmlReader

using (var reader = new StreamReader(@"path\to\data.osm", Encoding.UTF8))
{
  foreach (var el in GetNextNode(reader))
  {

We collect all tag data (attributes k and v) in a dictionary

    var attrs = new Dictionary<string, string>();

    foreach (var tag in el.Elements("tag"))
    {
      var k = tag.Attribute("k");
      var v = tag.Attribute("v");
      if (k != null && v != null)
        attrs.Add(k.Value, v.Value);
    }

If all required fields (in my case, addr:postcode and addr:city) are present, we generate the INSERT statement:

    if (attrs.ContainsKey("addr:postcode") && attrs.ContainsKey("addr:city"))
    {
      Console.WriteLine("INSERT INTO OSM (Lat, Lon, Country, PostCode, " + 
        "City, Street, HouseNumber)\n" + 
        "VALUES({5}, {6}, {0}, {1}, {2}, {3}, {4})\nGO",
        attrs.ToSql("addr:country"), attrs.ToSql("addr:postcode"), 
        attrs.ToSql("addr:city"), 
        attrs.ToSql("addr:street"), attrs.ToSql("addr:housenumber"),
        el.Attribute("lat").Value, el.Attribute("lon").Value);
    }
  }
}

Note that this code requires an extension method ToSql() defined on a dictionary to calculate the SQL notation for a nullable string literal:

public static class DictExt
{
  public static string ToSql(this Dictionary<string, string> dict, 
    string key)
  {
    if (!dict.ContainsKey(key))
      return "NULL";
    var s = dict[key];
    if (string.IsNullOrEmpty(s))
      return "NULL";
    return "N'" + s.Replace("'", "''") + "'";
  }
}

Parsing and Analyzing OpenStreetMap Export Files

OpenStreetMap provides geo-coded information in the form of extracted .osm files for download. .osm files are XML files with a defined schema, but this information alone does not really help in analyzing the files. The reason is that the simple structure <node><tag /></node> distinguishes its data by the tag’s k attribute value:

  <node id="567053682" [...] lat="48.2066502" lon="16.3707086">
    <tag k="addr:postcode" v="1010"/>
    <tag k="addr:country" v="AT"/>
    <tag k="addr:street" v="Neuer Markt"/>
    <tag k="addr:city" v="Wien"/>
    <tag k="addr:housenumber" v="16"/>
  </node>

Therefore we first need to know which tag@k attributes are used in the XML. Given the size of .osm files (the extract for Austria is currently 3GB), a simple XmlDocument.Load() is bound to fail due to memory restrictions. Remember that both XmlDocument and XDocument are in-memory representations of the XML tree. See also my earlier comparison of XML access methods in .Net.

Fortunately, .Net also provides a way to a access an XML document without loading it into memory completely, namely by streaming it using XmlReader as illustrated in an MSDN sample.

Modifying this sample to process only nodes containing at least 1 tag, the enumerator to return such nodes looks like this:

static IEnumerable<XElement> GetNextNode(TextReader stringReader)
{
  using (XmlReader reader = XmlReader.Create(stringReader)) {
    reader.MoveToContent();
    while (reader.Read()) {
      switch (reader.NodeType) {
        case XmlNodeType.Element:
          if (reader.Name == "node") {
            XElement el = XElement.ReadFrom(reader) as XElement;
            if (el != null)
              if (el.Element("tag") != null)
                yield return el;
          }
          break;
      }
    }
  }
}

The code that calls this enumerator collects the values of the tag/@k attribute using a dictionary

var tagKeys = new Dictionary<string, int>();
using (var reader = new StreamReader(@"path\to\austria.osm", Encoding.UTF8)) {
  var i = 0;
  var dt = DateTime.Now;
  foreach (var el in GetNextNode(reader))  {
    i++;
    var keys = new List<string>();

    foreach (var tag in el.Elements("tag")) {
      var ka = tag.Attribute("k");
      if (ka != null) {
        var key = ka.Value;
        keys.Add(key);
        if (tagKeys.ContainsKey(key))
          tagKeys[key] = tagKeys[key] + 1;
        else
          tagKeys.Add(key, 1);
    }
  }
}

Finally, we can print the statistics on attributes and their numbers of occurrence:

foreach (var k in tagKeys.Keys.OrderBy(kT => kT))
  Console.WriteLine("{0} ({1})", k, tagKeys[k]);

After I found the various addr: attributes (see list with most frequent entries),

addr:city (44936)
addr:country (36215)
addr:housenumber (54034)
addr:postcode (45284)
addr:street (50469)

I wanted to know how often their combinations occurred:

if (keys.Contains("addr:postcode") && keys.Contains("addr:city")) {
  cPCC++;
  if (keys.Contains("addr:street")) {
    cPCCS++;
    if (keys.Contains("addr:housenumber"))
      cPCCSH++;
  }
}

resulting in

postcode+city 42352 
postcode+city+street 39864 
postcode+city+street+housenumber 39338