Adding columns from OSM to postgis with osmium

My latest Europe import was quite eventful. First, I run out of space several times during the import itself, at indexing time. The good thing is that, if you manage to reclaim some space, and reading a little of source code1, you can replay the missing queries by hand and stop cursing. To be fair, osm2pgsql currently uses a lot of space in slim+flat-nodes mode: three tables, planet_osm_node, planet_osm_way and planet_osm_relation; and one file, the flat nodes one. Those are not deleted until the whole process has finished, but they're actually not needed after the processing phase. I started working on fixing that.

But that was not the most difficult part. The most difficult part was that I forgot, somehow, to add a column to the import.style file. Elevation, my own style, renders different icons for different types of castles (and forts too), just like the Historic Place map of the Hiking and Bridle map2. So today I sat down and tried to figure out how to reparse the OSM extract I used for the import to add this info.

The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:

node,way   castle_type  text         polygon

That says that this applies to nodes and ways. If the element is a way, polygon will try to convert it to a polygon and put it in the planet_osm_polygon table; if it's a node, it ends in the planet_osm_point table. So we just add the column to those tables:

ALTER TABLE planet_osm_point   ADD COLUMN castle_type text;
ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;

Now how to process the extract? Enter pyosmium. It's a Python binding for the osmium library with a stream-like type of processing à la expat for processing XML. The interface is quite simple: one subclasses osmium.SimpleHandler, defines the element type handlers (node(), way() and/or relation()) and that's it! Here's the full code of the simple Python script I did:

#! /usr/bin/python3

import osmium
import psycopg2

conn= psycopg2.connect ('dbname=gis')
cur= conn.cursor ()

class CastleTypes (osmium.SimpleHandler):

    def process (self, thing, table):
        if 'castle_type' in thing.tags:
            try:
                name= thing.tags['name']
            # osmium/boost do not raise a KeyError here!
            # SystemError: <Boost.Python.function object at 0x1329cd0> returned a result with an error set
            except (KeyError, SystemError):
                name= ''
            print (table, thing.id, name)

            cur.execute ('''UPDATE '''+table+
                         ''' SET castle_type = %s
                            WHERE osm_id = %s''',
                         (thing.tags['castle_type'], thing.id))

    def node (self, n):
        self.process (n, 'planet_osm_point')

    def way (self, w):
        self.process (w, 'planet_osm_polygon')

    relation= way  # handle them the same way (*honk*)

ct= CastleTypes ()
ct.apply_file ('europe-latest.osm.pbf')

The only strange part of the API is that it doesn't seem to raise a KeyError when the tag does not exist, but a SystemError. I'll try to figure this out later. Also interesting is the big amount of unnamed elements with this tag that exist in the DB.


  1. I would love for GitHub to recognize something like https://github.com/openstreetmap/osm2pgsql/blob/master/table.cpp#table_t::stop and be directed to that method, because #Lxxx gets old pretty quick. 

  2. I just noticed how much more complete those maps are. more ideas to use :)