Disk usage while updating an OSM rendering database

Now the new part. I have never updated an OSM rendering table, so let's see what the docs say about it:

To keep an osm2pgsql database up to date you have two options: you either provide the original import data file, or you provide the replication URL. In my case, the latter is http://download.geofabrik.de/europe-updates/, but as you can see, it can be inferred from the extract; it even picks up the right date1:

$ osm2pgsql-replication init --verbose --port 5433 --database gis --osm-file europe-latest.osm.pbf
2023-08-23 15:07:18 [DEBUG]: Replication information found in OSM file header.
2023-08-23 15:07:18 [DEBUG]: Replication URL: http://download.geofabrik.de/europe-updates
2023-08-23 15:07:18 [DEBUG]: Replication sequence: 3787
2023-08-23 15:07:18 [DEBUG]: Replication timestamp: 2023-08-11T20:21:59Z
2023-08-23 15:07:18 [INFO]: Initialised updates for service 'http://download.geofabrik.de/europe-updates'.
2023-08-23 15:07:18 [INFO]: Starting at sequence 3787 (2023-08-11 20:21:59+00:00).

Then the update is really simple: you just tell it to do it and it will do it, automatically! The only condition is that you give the update the same parameters you gave to the original osm2pgsql invocation. It will proceed to download a batch of updates until some value is reached (around 120MiB?), then call osm2pgsql, rinse, repeat, until all missing updates have been applied:

$ osm2pgsql-replication update --verbose --port 5433 --database gis -- --cache 0 --number-processes 4 --slim --flat-nodes $(pwd)/nodes.cache --hstore --multi-geometry --style $osm_carto/openstreetmap-carto.style --tag-transform-script $osm_carto/openstreetmap-carto.lua
2023-08-23 15:13:16 [INFO]: Using replication service 'http://download.geofabrik.de/europe-updates'. Current sequence 3787 (2023-08-11 22:21:59+02:00).
2023-08-23 15:13:16 [DEBUG]: Starting new HTTP connection (1): download.geofabrik.de:80
2023-08-23 15:13:16 [DEBUG]: http://download.geofabrik.de:80 "GET /europe-updates/state.txt HTTP/1.1" 200 123
2023-08-23 15:13:16 [DEBUG]: Calling osm2pgsql with: /usr/bin/osm2pgsql --append --slim --prefix planet_osm --cache 0 --number-processes 4 --slim --flat-nodes /home/mdione/src/projects/osm/data/osm/nodes.cache --hstore --multi-geometry --style /home/mdione/src/projects/osm/osm-carto/openstreetmap-carto.style --tag-transform-script /home/mdione/src/projects/osm/osm-carto/openstreetmap-carto.lua -d gis -P 5433 /tmp/tmpk7ml1gi9/osm2pgsql_diff.osc.gz
2023-08-23 15:13:16 [DEBUG]: Importing from sequence 3787
2023-08-23 15:13:16 [DEBUG]: Starting new HTTP connection (1): download.geofabrik.de:80
2023-08-23 15:13:16 [DEBUG]: http://download.geofabrik.de:80 "GET /europe-updates/state.txt HTTP/1.1" 200 123
2023-08-23 15:13:16 [DEBUG]: Starting new HTTP connection (1): download.geofabrik.de:80
2023-08-23 15:13:16 [DEBUG]: http://download.geofabrik.de:80 "GET /europe-updates/000/003/788.osc.gz HTTP/1.1" 200 30348254
2023-08-23 15:13:25 [DEBUG]: Downloaded change 3788. (389531 kB available in download buffer)
2023-08-23 15:13:25 [DEBUG]: Starting new HTTP connection (1): download.geofabrik.de:80
2023-08-23 15:13:25 [DEBUG]: http://download.geofabrik.de:80 "GET /europe-updates/000/003/789.osc.gz HTTP/1.1" 200 35284953
2023-08-23 15:13:36 [DEBUG]: Downloaded change 3789. (245491 kB available in download buffer)
2023-08-23 15:13:36 [DEBUG]: Starting new HTTP connection (1): download.geofabrik.de:80
2023-08-23 15:13:36 [DEBUG]: http://download.geofabrik.de:80 "GET /europe-updates/000/003/790.osc.gz HTTP/1.1" 200 32891529
2023-08-23 15:13:46 [DEBUG]: Downloaded change 3790. (114339 kB available in download buffer)
2023-08-23 15:13:46 [DEBUG]: Starting new HTTP connection (1): download.geofabrik.de:80
2023-08-23 15:13:46 [DEBUG]: http://download.geofabrik.de:80 "GET /europe-updates/000/003/791.osc.gz HTTP/1.1" 200 35347966
2023-08-23 15:13:57 [DEBUG]: Downloaded change 3791. (-26371 kB available in download buffer)
2023-08-23 15:14:16  osm2pgsql version 1.8.0
2023-08-23 15:14:16  Database version: 15.3 (Debian 15.3-0+deb12u1)
2023-08-23 15:14:16  PostGIS version: 3.3
2023-08-23 15:14:16  Setting up table 'planet_osm_point'
2023-08-23 15:14:16  Setting up table 'planet_osm_line'
2023-08-23 15:14:16  Setting up table 'planet_osm_polygon'
2023-08-23 15:14:16  Setting up table 'planet_osm_roads'
2023-08-23 16:26:17  Reading input files done in 4321s (1h 12m 1s).
2023-08-23 16:26:17    Processed 3014131 nodes in 1506s (25m 6s) - 2k/s
2023-08-23 16:26:17    Processed 687625 ways in 977s (16m 17s) - 704/s
2023-08-23 16:26:17    Processed 28176 relations in 1838s (30m 38s) - 15/s
2023-08-23 16:27:11  Going over 217062 pending ways (using 4 threads)
Left to process: 0........
2023-08-23 16:30:04  Processing 217062 pending ways took 173s (2m 53s) at a rate of 1254.69/s
2023-08-23 16:30:04  Going over 89496 pending relations (using 4 threads)
Left to process: 0.......
2023-08-23 17:24:42  Processing 89496 pending relations took 3277s (54m 37s) at a rate of 27.31/s
2023-08-23 17:24:43  Done postprocessing on table 'planet_osm_nodes' in 0s
2023-08-23 17:24:43  Done postprocessing on table 'planet_osm_ways' in 0s
2023-08-23 17:24:43  Done postprocessing on table 'planet_osm_rels' in 0s
2023-08-23 17:24:43  All postprocessing on table 'planet_osm_point' done in 0s.
2023-08-23 17:24:43  All postprocessing on table 'planet_osm_line' done in 0s.
2023-08-23 17:24:43  All postprocessing on table 'planet_osm_polygon' done in 0s.
2023-08-23 17:24:43  All postprocessing on table 'planet_osm_roads' done in 0s.
2023-08-23 17:24:43  osm2pgsql took 7827s (2h 10m 27s) overall.
[...]

I'm not going to put all 4 or 5 iterations, this should be enough. 2h10m to process around 120MiB od data, meaning around 1m/MiB. I have 12 days of updates, each around 30MiB, except for the last three, that are 90, 160 and 130 MiB each, due to an ongoing edit battle. A total of around 620MiB of updates, it's going to take 10h at this pace.

[The next day]

[...]
2023-08-23 23:31:36 [INFO]: Data imported until 2023-08-22 20:21:53+00:00. Backlog remaining: 1 day, 1:09:43.865536

After ~8h10m it finished. Disk space grew 10GiB, which is more than the 10x of the import (more like 15x!). At this pace I should get space issues in around 220 days. On the other hand, these diffs are quite unusual, so probably the stats are skewed. Also, I guess at some point I should redo some of the clustering and analyzing done in the import process, but I don't see anything on the docs, so I'll ask around. There are not big peaks to talk about. The most space I see being freed is around 200MiB.

The osm2pgsql people have automated it to the point where they even provide in their documentation an example systemd unit file to do the updates for you. Check the docs linked above.


  1. Yes, I started writing these posts 13 days ago.