IPMap – Geocoding Multiple IP Addresses using OpenStreetMap

February 3, 2013

If you are looking for methods to map IP address to their geographic locations, there are various possibilities (source):

Some of them provide geo data to be downloaded, others, such as Google Maps, allow data retrieval via a web service.

Of course, the drawback of 3rd party web services is that access is usually metered, meaning the number of requests per day or per hour is limited, especially for free web services (e.g. Google Maps usage limits).

After checking which of the services gave me the most accurate location for my IP address, I chose MaxMind GeoLite City as my geo data source, downloaded the .csv files, and imported them into a database. The resolution is on IP blocks per city, which is good enough for my purposes.

From a previous project operating with OpenStreetMap and OpenLayers,  I employed an elaborate technique of code reuse (read: copy and paste and edit) to adapt the existing code to its new purpose.

And voilà, here comes IPMap.

The text box allows any text to be entered, and the code-behind extracts the IP addresses contained in the text (great if you just want to paste lines from server log files). Processing the request may take some time depending on the number of IP addresses posted.

By the way, the screen shots of my previous post are actually the output of IPMap.

Sample: Hits on this blog from Feb 01 through Feb 03



Advertisements

Importing OpenStreetMap Data into MS SQL Server

September 10, 2012

After analyzing OSM extracts files, we can now import the data from XML into SQL Server. Depending on your requirements, the table to hold the .osm data looks something like this:

CREATE TABLE [dbo].[OSM](
    [Lat] [float] NOT NULL,
    [Lon] [float] NOT NULL,
    [Country] [nvarchar](50) NULL,
    [PostCode] [nvarchar](50) NOT NULL,
    [City] [nvarchar](200) NOT NULL,
    [Street] [nvarchar](200) NULL,
    [HouseNumber] [nvarchar](50) NULL
)

Again, we start out reading from the .osm using an XmlReader

using (var reader = new StreamReader(@"path\to\data.osm", Encoding.UTF8))
{
  foreach (var el in GetNextNode(reader))
  {

We collect all tag data (attributes k and v) in a dictionary

    var attrs = new Dictionary<string, string>();

    foreach (var tag in el.Elements("tag"))
    {
      var k = tag.Attribute("k");
      var v = tag.Attribute("v");
      if (k != null && v != null)
        attrs.Add(k.Value, v.Value);
    }

If all required fields (in my case, addr:postcode and addr:city) are present, we generate the INSERT statement:

    if (attrs.ContainsKey("addr:postcode") && attrs.ContainsKey("addr:city"))
    {
      Console.WriteLine("INSERT INTO OSM (Lat, Lon, Country, PostCode, " + 
        "City, Street, HouseNumber)\n" + 
        "VALUES({5}, {6}, {0}, {1}, {2}, {3}, {4})\nGO",
        attrs.ToSql("addr:country"), attrs.ToSql("addr:postcode"), 
        attrs.ToSql("addr:city"), 
        attrs.ToSql("addr:street"), attrs.ToSql("addr:housenumber"),
        el.Attribute("lat").Value, el.Attribute("lon").Value);
    }
  }
}

Note that this code requires an extension method ToSql() defined on a dictionary to calculate the SQL notation for a nullable string literal:

public static class DictExt
{
  public static string ToSql(this Dictionary<string, string> dict, 
    string key)
  {
    if (!dict.ContainsKey(key))
      return "NULL";
    var s = dict[key];
    if (string.IsNullOrEmpty(s))
      return "NULL";
    return "N'" + s.Replace("'", "''") + "'";
  }
}

Parsing and Analyzing OpenStreetMap Export Files

September 8, 2012

OpenStreetMap provides geo-coded information in the form of extracted .osm files for download. .osm files are XML files with a defined schema, but this information alone does not really help in analyzing the files. The reason is that the simple structure <node><tag /></node> distinguishes its data by the tag’s k attribute value:

  <node id="567053682" [...] lat="48.2066502" lon="16.3707086">
    <tag k="addr:postcode" v="1010"/>
    <tag k="addr:country" v="AT"/>
    <tag k="addr:street" v="Neuer Markt"/>
    <tag k="addr:city" v="Wien"/>
    <tag k="addr:housenumber" v="16"/>
  </node>

Therefore we first need to know which tag@k attributes are used in the XML. Given the size of .osm files (the extract for Austria is currently 3GB), a simple XmlDocument.Load() is bound to fail due to memory restrictions. Remember that both XmlDocument and XDocument are in-memory representations of the XML tree. See also my earlier comparison of XML access methods in .Net.

Fortunately, .Net also provides a way to a access an XML document without loading it into memory completely, namely by streaming it using XmlReader as illustrated in an MSDN sample.

Modifying this sample to process only nodes containing at least 1 tag, the enumerator to return such nodes looks like this:

static IEnumerable<XElement> GetNextNode(TextReader stringReader)
{
  using (XmlReader reader = XmlReader.Create(stringReader)) {
    reader.MoveToContent();
    while (reader.Read()) {
      switch (reader.NodeType) {
        case XmlNodeType.Element:
          if (reader.Name == "node") {
            XElement el = XElement.ReadFrom(reader) as XElement;
            if (el != null)
              if (el.Element("tag") != null)
                yield return el;
          }
          break;
      }
    }
  }
}

The code that calls this enumerator collects the values of the tag/@k attribute using a dictionary

var tagKeys = new Dictionary<string, int>();
using (var reader = new StreamReader(@"path\to\austria.osm", Encoding.UTF8)) {
  var i = 0;
  var dt = DateTime.Now;
  foreach (var el in GetNextNode(reader))  {
    i++;
    var keys = new List<string>();

    foreach (var tag in el.Elements("tag")) {
      var ka = tag.Attribute("k");
      if (ka != null) {
        var key = ka.Value;
        keys.Add(key);
        if (tagKeys.ContainsKey(key))
          tagKeys[key] = tagKeys[key] + 1;
        else
          tagKeys.Add(key, 1);
    }
  }
}

Finally, we can print the statistics on attributes and their numbers of occurrence:

foreach (var k in tagKeys.Keys.OrderBy(kT => kT))
  Console.WriteLine("{0} ({1})", k, tagKeys[k]);

After I found the various addr: attributes (see list with most frequent entries),

addr:city (44936)
addr:country (36215)
addr:housenumber (54034)
addr:postcode (45284)
addr:street (50469)

I wanted to know how often their combinations occurred:

if (keys.Contains("addr:postcode") && keys.Contains("addr:city")) {
  cPCC++;
  if (keys.Contains("addr:street")) {
    cPCCS++;
    if (keys.Contains("addr:housenumber"))
      cPCCSH++;
  }
}

resulting in

postcode+city 42352 
postcode+city+street 39864 
postcode+city+street+housenumber 39338

Querying OpenGeoDB

August 16, 2012

After importing OpenGeoDB Data into MS SQL Server, the next question is: How do we query relevant data?

Especially, how do we get all cities and municipalities for a given zip code?

The OpenGeoDB database contains locations and location-related data in various tables, each of them linked by the loc_id column. The geodb_textdata table contains typed text data for a location, and the type information is stored in geodb_type_names. Browsing the type names table soon reveals which information we need to query.

Additionally, the geodb_coordinates table may contain latitude and longitude information for a loc_id, so we want to retrieve this as well if available.

The resulting query is contained in a stored procedure (you can also implement a table-valued function with the same functionality), which looks like this:

create procedure [GeoDB_GetNamesFromZip](
    @zip nvarchar(10)
) as
begin
    select t.loc_id as Loc_Id, 
        t.text_val as Name, e.text_val as Level, 
        c.lat as Lat, c.lon as Lon
    from geodb_textdata t 
    left outer join geodb_textdata e
        on t.loc_id = e.loc_id and e.text_type = '400200000'
    left outer join geodb_coordinates c 
        on t.loc_id = c.loc_id
    where t.loc_id in 
      (select loc_id from geodb_textdata 
       where text_val = @zip and text_type = '500300000')
    and t.text_type = '500100000'
    order by e.text_val desc, t.text_val
end

In the data set covering Austria, the execution of

EXEC GeoDB_GetNamesFromZip '3400'

returns the following records:

29569 Kierling 7 48,308 16,275
29602 Klosterneuburg 7 48,305 16,327
29634 Kritzendorf 7 48,328 16,303
66996 Maria Gugging 7 48,313 16,248
30358 Weidling 7 48,291 16,307
66997 Weidlingbach 7 48,2667 16,2533
66995 Klosterneuburg 6 48,305 16,327

Importing OpenGeoDB Data into MS SQL Server

August 15, 2012

OpenGeoDB provides geographical and hierarchical location data for a couple of European countries (currently .at, .be, .ch, .de, .li).

The data however is available for download only in MySQL format, and a couple of modifications are required to execute the import in a MS SQL Server database:

  • opengeodb-begin.sql creates the required tables, and some data types and check constraints need to be adapted (see download)

AT.sql contains location data regarding Austria, such as latitude/longitude and zip codes for cities and communities. The file size is about 20MB, and could not be processed in SSMS.

I proceeded as follows to run the import:

  • Using Notepad++, I separated each INSERT statement into a single batch command by replacing every ‘;’ with ‘;\nGO’ (in RegEx mode)
  • Under the Encoding menu, invoke Convert to ANSI, and save the file.
  • Open a command prompt (Start, Run, cmd.exe) and run the .sql file using sqlcmd:
sqlcmd -U [user] -P [pwd] -d [db] -i C:\path\to\AT.sql -e > at.sql.log

The AThier.sql file is small enough to be run from SSMS, but can also be executed via sqlcmd without modification:

sqlcmd -U [user] -P [pwd] -d [db] -i C:\path\to\AThier.sql -e > athier.sql.log

Make sure the log files do not contain any errors or warnings.

  • Finally, opengeodb-end(-ms).sql inserts some type information and creates indexes.

In case anything goes wrong with the import, I also added a file delete_geodb.sql that truncates or deletes the geodb tables.

The SQL files needed to create the tables are available for download here. The data files need to be downloaded from their original location and modified as described above.