How did it start? Well, it actually started a long time ago, maybe a year ago, most probably more. I got myself tumbling down the beautiful rabbit hole of Blender based mapping. The idea is very simple: if you have a DEM, you can build a 3D representation of the terrain and use a renderer to build your map. To me the most striking thing about those maps were not their 3D-ness (which to some it's starting to be tiresome , and I agree), but the shadows. I've been pursuing the best shadow for a while, and this seemed like the perfect fit.
So, like I said, one year ago or so I took "the" Blender relief tutorial and run with it. I got to the point where I could reproduce it with a 1x1, 3600x3600px DEM from mapzen, but when I tried to automate it, I found out that Blender has a python console where it prints out the commands that are equivalent to the actions you make in the UI, but the resulting script was too horrible to my eyes and run out of breath (another of those cases of the perfect being the enemy of the good).
Then a few days ago I read that first link and got some steam build up. In fact, it was two passages in it that lit up the fire:
Most of these use a tool called Blender, an extremely powerful open-source tool for all kinds of 3D modeling and rendering. A few cartographers use other tools, such as Aerialod, or R‘s Rayshader plugin.
R! I can easily automate this!
If we stick to a very zoomed-out map, or if we have a really huge computer running Blender, we could try to do a hillshade for the entire world, and then slice that up for our map tiles. But there’s no way we could do this at a high-enough resolution so you could zoom all the way in, as in the sample tiles above.
Challenge accepted! (Spoiler: I'm not there yet).
I tried Rayshader. I wanna be honest: it's easy, quick, but I didn't like the results. It seemed like no matter how high you put the sun, it always drew very long shadows. So despite its pragmaticism, I left it on a side.
So I picked up what I did in the past and tried to apply it to a map. I re-rendered everything and applied it to my style. The outcome was encouraging:
To start with, I probably did that render with not enough render passes, so the output looks grainy. Second, the material color is too bright, so the height tints are washed out. Still, we can see the big shadow cast over the valley some 3200m below the Mont Blanc/Monte Bianco.
This proved to be a good place to test the method, because of the great difference between the valley and the peak casting the shadow over it, and that lead me to think: are there more extreme places in the world? An easy bet is yes, and the place to look for them was the Himalayas. The Aconcagua could have been a good contender, but the valley at its SE is some 4550m already. Surprisingly, the way I searched for a good place was to use existing maps with the usual hill shade technique, looking for big dark spots, specially those wide in the NW-SE direction. I first found my spot in the Tukuche Peak, that looms some 4350m above the Gandaki River, and then the nearby Dhaulagiri, that's even some 1250m higher, clocking at 8167m. Here's how they look (not the big structure in the upper left, but the bigger grey [8000m+] blob to the East of it; the river snakes in the right all the way down):
I had to add 3 more color bands to my style and reshuffle them because I never rendered any 8k'er before, so the colors were haphazardly concocted for the rendering and are not definitive. At least it lets you behold the grandiosity of that rock jutting through thousands and thousands of meters with very steep sides.
Time to get real. I usually render regions were I'll be going, and next time it's the Upper Segre Valley, so I rendered N42E001-2 in one go. That's 7201x4884px after reprojecting to WebMercator (and compensating as described in the second link!), so some 35Mpx. Blender took some 44m+ on a 4.5yo medium-high spec'ed laptop at 200 render samples, which means that I can continue to render small regions this way, but that for the moment I won't be applying this technique to the whole Europe.
Up to here I was just applying the same style in QGIS, which has been an indispensable tool to develop this
style. But trying these TIFFs in mapnik for the first time needed an extra step. Well, two, in fact. Blender
does not save the TIFFs georeferenced, so you have to copy the data from the original DEM. For that, use
gdal_edit.py -a_srs ... -a_ullr ...
with the right ESPG and the data from the output of gdalinfo
. Next, for
some reson, it always use 16bits integers, even when explicitly saying to use 8. This little snippet takes care
of that:
import imageio
image = imageio.imread('pirinoak-blender-10x.tif')
image = image / 256
image = image.astype('uint8')
imageio.imwrite('pirinoak-blender-10x-8bits.tif', image)
Thank $DEITY (and developers!) for good libraries.
The first thing I noticed was that we have been lied by maps (again!) for a long time. Most hill shading algos use a 45° high sun (the direction does not matter much). But if you think about it, how many mountains have sides 45°+ steep? According to a (real, not like me!) cartographer friend, for continental Argentina it's less than 1% at 30arcsecs of resolution (note that SRTM is 1arcsec). Still, some shadows are there, and they help us (and we get used to that) to recognize slope direction. And now we're asking a raytracing program to calculate real shadows? The result I initially got was underwhelming, even when I was already asking Blender to exaggerate height by 5x!:
So, I bit the bullet and went all in with 10x:
Much better, but not definitive. I still have to render Dhaulagiri again, and at least some region I already know well by having being there a lot.
Now some notes about "the" Blender relief tutorial. I followed it to the letter, but with my experience I had to make some changes. One you already know, using a displacement scale of 10x instead of 0.3. I have no exact idea why his initial rendering were so spiky, but I suspect that the DEM grid unit was not meters.
Second, since that first Mount Blanc/Monte Bianco render, we know the color is too bright. I lowered it to 0.6
(and later I found that that's what he actually suggests at the end of the plane section) and then compared the
grey in a plain (#b5b5b5
) to what GDAL outputs and compensated using a simple coefficient. The final value is
0.402.
Third, I was having issues rendering: I was getting a lot of terracing. After a long chat with Viktor_smg from blender.chat/support they figured out that the sRGB color space in the Image Texture is broken and that I should use XYZ instead. This meant installing Blender by hand instead of relying on the one in Debian Unstable because it's too old and does not have it. They also gave me pointers about how to automate it.
Last, you can't apply this technique DEM by DEM because you want the shadows from the neighbouring tiles to spill over the current one. That link shows how to render the tile and its 8 neighbouring ones, but I think that you can optimize it in two ways: First, since shadows come from the NW, just add the tiles that lie in that general direction. Second, no shadow would cast over 10s of kilometers. You could even get away with just adding a smaller band around the tile.
That's it for now. The next step is to automate this an publish that. $DEITY knows when that will happen.
python openstreetmap gdal elevation hillshading imageio gis dem
It all started with the following statement: "my renders are too slow". There were three issues as the root case: osm-carto had become more complex (nothing I can do there), my compute resources were too nimble (but upgrading them would cost money I don't want to spend in just one hobby) and the terrain rasters were being reprojected all the time.
My style uses three raster layers to provide a sense of relief: one providing height tints, one enhancing that color on slopes and desaturating it on valleys, and a last one providing hill shading. All three were derived from the same source DEM, which is projected in WGS 84 a.k.a. EPSG 4326. The style is of course in Pseudo/Web Mercator, a.k.a. EPSG 3857; hence the constant reprojection.
So my first reaction was: reproject first, then derive the raster layers. This single step made it all hell break loose.
I usually render Europe. It's the region where I live and where I most travel to. I'm using relief layers because I like mountains, so I'm constantly checking how they look. And with this change, the mountains up North (initially was around North Scandinavia) looked all washed out:
The reason behind this is that GDAL's hill and slope shading algorithms don't take in account the projection, but just use pixel values as present in the dataset and the declared pixel size. The DEMs I'm using are of a resolution of 1 arc second per pixel. WGS84 assumes a ellipsoid of 6_378_137m of radius (plus a flattening parameter which we're going to ignore), which means a circumference at the Equator of:
In [1]: import math
In [2]: radius = 6_378_137 # in meters
In [3]: circunf = radius * 2 * math.pi # 2πr
In [4]: circunf
Out[4]: 40_075_016.68557849 # in meters
One arc second is hence this 'wide':
In [5]: circunf / 360 / 60 / 60
Out[5]: 30.922080775909325 # in meters
In fact the DEMs advertises exactly that:
mdione@diablo:~/src/projects/osm/data/height/mapzen$ gdalinfo <span class="createlink">N79E014</span>.hgt
Size is 3601, 3601
Coordinate System is:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Pixel Size = (0.000277777777778,-0.000277777777778)
Well, not exactly that, but we can derive it 'easily'. Deep down there, the
declared unit is degree
(ignore the metre
in the datum; that's the unit for
the numbers in the ELLIPSOID
) and its size in radians (just in case you aren't
satisfied with the amount of conversions involved, I guess). The projection declares
degrees as the unit of the projection for both axis, and the pixel size is declared in the units of the
projection, and 1 arc second is:
In [6]: 1 / 60 / 60 # or 1/3600
Out[6]: 0.0002777777777777778 # in degrees
which is exactly the value declared in the pixel size[1][2]. So one pixel is one arc second and one arc second is around 30m at the Equator.
The 'flatness' problem arises because all these projections assume all latitudes are of the
same width; for them, a degree at the Equator is as 'wide' as one at 60°N, which is not; in reality,
it's twice as wide, because the width of a degree decreases with the cosine of the latitude and
math.cos(math.radians(60)) == 0.5
. The
mountains we saw in the first image are close to 68N; the cosine up there is:
In [7]: math.cos(math.radians(68))
Out[7]: 0.37460659341591196
So pixels up there are no longer around 30m wide, but only about
30 * 0.37 ≃ 11 # in meters
!
How does this relate to the flatness? Well, like I said, GDAL's algorithms use the declared pixel size everywhere, so at f.i. 68°N it thinks the pixel is around 30m wide when it's only around 11m wide. They're miscalculating by a factor of almost 3! Also, the Mercator projection is famous for stretching not only in the East-West direction (longitude) but also in latitude, at the same pace, based on the inverse of the cosine of the latitude:
stretch_factor = 1 / math.cos(math.radians(lat))
So all the mountains up there 'look' to the algorithms as 3 times wider and 'taller' ('tall' here means in the latitude, N-S direction, not height AMSL), hence the washing out.
In that image we have an original mountain 10 units wide and 5 units tall; but gdal
thinks it's 20 units wide, so it looks flat.
Then it hit me: it doesn't matter what projection you are using, unless you calculate slopes and hill shading on the sphere (or spheroid, but we all want to keep it simple, right? Otherwise we wouldn't be using WebMerc all over :), the numbers are going to be wrong.
The Wandering Cartographer
mentions this
and goes around the issue by doing two things: Always using DEMs where the units
are meters, and always using a local projection that doesn't deform as much. He can do that
because he produces small maps. He also concedes he is
using the 111_120 constant
when the unit is degrees.
I'm not really sure about how that number is calculated, but I suspect that it
has something to do with the flattening parameter, and also some level of
rounding. It seems to have originated from
gdaldem
's manpage
(search for the -s
option). Let's try something:
In [8]: m_per_degree = circunf / 360
In [9]: m_per_degree
Out[9]: 111_319.49079327358 # in meters
That's how wide is a degree is at the Equator. Close enough, I guess.
But for us world-wide, or at least continent-wide webmappers like me, those suggestions are not enough; we can't choose a projection that doesn't deform because at those scales they all forcibly do, and we can't use a single scaling factor either.
gdal
's algorithm
uses the 8 neighboring pixels of a pixel to determine slope and aspect. The slope
is calculated on a sum of those pixels' values divided by the cell size. We can't change the
cell size that gdal
uses, but we can change the values :) That's the third triangle up there: the algorithm
assumes a size for the base that it's X
times bigger than the real one, and we just make the
triangle as many times taller. X
is just 1 / math.cos(math.radians(lat))
.
If we apply this technique at each DEM file individually, we have to chose a latitude for it. One option is
to take the latitude at the center of the file. It would make your code from more
complicated, specially if you're using bash
or, worse, make
because you have to pass a
different value for -s
, but you can always use a little bit of Python or any other high level,
interpreted language.
But what happens at the seams where one tile meets another? Well, since each file has it's own latitude, two neighbouring values, one in one DEM file and the other in another, will have different compensation values. So, when you jump from a ~0.39 factor to a ~0.37 at the 68° of latitude, you start to notice it, and these are not the mountains more to the North that there are.
The next step would be to start cutting DEM tiles into smaller ones, so the 'jumps' in between are less noticeable... but can we make this more continuous-like? Yes we can! We simply compensate each pixel based on its latitude.
This image cheats a little; this one is based on EU-DEM and not mapzen like the original. You can see where the DEM file ends because the slope and hillshading effects stop at the bottom left. I still have to decide which dataset I will use; at some point I decided that EU-DEM is not global enough for me, and that it's probably better to use a single dataset (f.i. mapzen) than a more accurate but partial one, but since I usually render the regions I need, I might as well change my workflow to easily change the DEM dataset.
This approach works fine as shown, but has at least one limitation. The data type used in those TIFFs
is Int16
. This means signed ints 16 bits wide, which means values can go between -32768..32767.
We could only compensate Everest up to less than 4 times, but luckily we don't have to; it
would have to be at around 75° of latitude before the compensated value was bigger than the
maximum the type can represent. But at around 84° we get a compensation of 10x
and it only gets quickly worse from there:
75: 3.863
76: 4.133
77: 4.445
78: 4.809
79: 5.240
80: 5.758
81: 6.392
82: 7.185
83: 8.205
84: 9.566
85: 11.474
86: 14.336
87: 19.107
88: 28.654
89: 57.299 # we don't draw this far, the cutout value is around 85.5 IIRC
90: ∞ # :)
The solution could be to use a floating point type, but that means all calculations would
be more expensive, but definitely not as much as reprojecting on the fly. Or maybe we can use
a bigger integer type. Both solutions would also use more disk space and require more memory.
gdal
currently supports band types of Byte, UInt16, Int16, UInt32, Int32, Float32, Float64,
CInt16, CInt32, CFloat32 and CFloat64 for reading and writing.
Another thing is that it works fine and quick for datasets like SRTM and mapzen because I can compensate whole raster lines in one go as all its pixels have the same latitude, but for EU-DEM I have to compensate every pixel and it becomes slow. I could try to parallelize it (I bought a new computer which has 4x the CPUs I had before, and 4x the RAM too :), but I first have to figure out how to slowly write TIFFS; currently I have to hold the whole output TIFF in memory before dumping it into a file.
Here's the code for pixel per pixel processing; converting it to do row by row for f.i. mapzen makes you actually remove some code :)
#! /usr/bin/env python3
import sys
import math
import rasterio
import pyproj
import numpy
in_file = rasterio.open(sys.argv[1])
band = in_file.read(1) # yes, we assume only one band
out_file = rasterio.open(sys.argv[2], 'w', driver='GTiff',
height=in_file.height, width=in_file.width,
count=1, dtype=in_file.dtypes[0],
crs=in_file.crs, transform=in_file.transform)
out_data = []
transformer = pyproj.Transformer.from_crs(in_file.crs, 'epsg:4326')
# scan every line in the input
for row in range(in_file.height):
if (row % 10) == 0:
print(f"{row}/{in_file.height}")
line = band[row]
# EU-DEM does not provide 'nice' 1x1 rectangular tiles,
# they use a specific projection that become 'arcs' in anything 'rectangular'
# so, pixel by pixel
for col in range(in_file.width):
# rasterio does not say where do the coords returned fall in the pixel
# but at 30m tops, we don't care
y, x = in_file.xy(row, col)
# convert back to latlon
lat, _ = transformer.transform(y, x)
# calculate a compensation value based on lat
# real widths are pixel_size * cos(lat), we compensate by the inverse of that
coef = 1 / math.cos(math.radians(lat))
line[col] *= coef
out_data.append(line)
# save in a new file.
out_file.write(numpy.asarray(out_data, in_file.dtypes[0]), 1)
That's it!
python openstreetmap gdal elevation hillshading rasterio pyproj gis dem crs
[1] Except for the negative value. That has to do with the fact that pixels count from top to bottom, but degrees from South (negative) to North.
[2] The other number, 0.0174532925199433, is how much a degree is in radians.
[3] All that extra math is converting degrees into radians so cos()
is happy.
Another aspect I've been looking into with respect to optimizing the rendering speed is data sources that are not in the target projection. Not being in the target projection forces Mapnik to reproject them on the fly, and this for each (meta)tile, whereas it would make more sense to have them already reprojected.
In my case I have 3 datasets, big ones, in EPSG:4258. The three datasets are elevation, hill and slope shade based on EEA's DEM files. The source tiles amount to almost 29GiB, and the elevation layer, being RGB, takes more than that. So I set off to try to reproject the things.
My first, more obvious approach was to reproject every 5x5°, 18000x18000px tile, then derive the data I need, but I started to get gaps between tiles. Notice that the original data is cleanly cut (5° x 3600"/° x 1px/" == 18000px), without any overlapping.
The next approach was to merge them all
in a .vrt file, then reproject chunks of it with gdalwarp
. What I wanted as
output was the same 5x5° tiles, reprojected, but with an extra pixel, so they
overlap. This last requirement was the problematic one. See, the final
projection makes any square in the original projection a tall rectangle,
stretching more and more towards the poles. The closest I could get
was to use the -ts
option, but that meant that I didn't get any control about
how many extra pixels I got in the vertical/latitude direction. My OCD started
thrashing :) In fact what happened was that I was not sure how GDAL would handle
the possible partial pixel, whether rounding down (meaning excluding it), up
(finishing it), or simply leaving the pixel with partial data and impacting the
final rendering.
Even Rouault pointed to me that gdalwarp
can do something fantastic: it can
generate a .vrt
file too with all the parameters needed for the reprojection,
so reading from there was automatically reprojecting the original data. The
resulting dataset is 288,000x325,220px (the original is 288,000x180,000px),
so I'm definitely going to cut it down in small
tiles. After consulting with a eight-ball, I decided to discard the idea of
tiles with boundaries based on coordinates, which might not even make sense
anymore, but settle for pixel based sizes, still with an extra pixel. The chosen
size is 2**14+1 a.k.a. 16385. For this gdal_translate
is perfect.
The final algorithm is like this:
gdalwarp -t_srs "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 \
+x_0=0.0 +y_0=0.0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs +over" \
-r lanczos -tr 30.92208077590933 -30.92208077590933 \
-of VRT EU-DEM.vrt EU-DEM-corrected.vrt
The values for the -tr
option is the pixel size in meters, which is the unit
declared in the SRS. Notice that as Mercator stretches towards the poles, this
is the size at the origin of the projection; in this case, at 0° lat, 0° lon.
Then the reprojection (by reading from the reprojecting dataset) and cut, in a couple of loops:
tile_size=$((2**14)); \
for i in $(seq 0 17); do
for j in $(seq 0 5); do
for k in $(seq 0 3); do
l=$((4*$j+$k));
gdal_translate -co BIGTIFF=YES -co TILED=YES -co COMPRESS=LZMA \
-co LZMA_PRESET=9 \
-srcwin $(($tile_size*$i)) $(($tile_size*$l)) \
$(($tile_size+1)) $(($tile_size+1)) \
-of GTiff EU-DEM-corrected.vrt \
$(printf "%03dx%03d-corrected.tif" $i $l) &
done;
wait;
done;
done
There's an extra loop to be able to launch 4 workers at the same time, because I
have 4 cores. This doesn't occupy the 4 cores a 100% of the time (cores that
already finished stay idle until the other finished), but it was getting akward
to express in a Makefile
, and this is run only once.
Before deriving the rest of the data there's an extra step: removing those generated tiles that actually have no data. I do a similar thing with empty sea tiles in the rendering process. Notice also that the original data is not tile complete for the covered region (79 tiles instead of the 160 they should be).
openstreetmap gis gdal
I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.
Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.
Lastly, as imagico mentioned in the first post about this hack,
the algorithms are not fast, and I already made my computer start thrashing the
disk swapping like Hell because pg
hit a big polygon and started using lots of
RAM to calculate its centerline. At least this time I can see how complex the
polygons are before handing them to the code. As an initial benchmark, the original data
for that changeset (I later simplified it with JOSM's tool) took 0.063927s in
pg+gis
and 0.004737s in the Python code. More test will come later.
Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!
openstreetmap gis python
My latest Europe import was quite eventful. First, I run out of space several
times during the import itself, at indexing time. The good thing is that, if you
manage to reclaim some space, and reading a little of
source code[1],
you can replay the missing queries by hand and stop cursing. To be fair,
osm2pgsql
currently uses a lot of space in slim+flat-nodes mode: three tables,
planet_osm_node
, planet_osm_way
and planet_osm_relation
; and one file, the
flat nodes one. Those are not deleted until the whole process has finished, but
they're actually not needed after the processing phase. I started working on
fixing that.
But that was not the most difficult part. The most difficult part was that I
forgot, somehow, to add a column to the
import.style
file.
Elevation, my own style,
renders different icons for different types of castles (and forts too), just like
the Historic Place map
of the Hiking and Bridle map[2]. So today
I sat down and tried to figure out how to reparse the OSM extract I used for the
import to add this info.
The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:
node,way castle_type text polygon
That says that this applies to nodes and ways. If the element is a way, polygon
will try to convert it to a polygon and put it in the planet_osm_polygon
table;
if it's a node, it ends in the planet_osm_point
table. So we just add the
column to those tables:
ALTER TABLE planet_osm_point ADD COLUMN castle_type text;
ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;
Now how to process the extract? Enter
pyosmium
. It's a Python binding
for the osmium
library with a stream-like type of processing à la expat for
processing XML. The interface is quite simple: one subclasses
osmium.SimpleHandler
, defines the element type handlers (node()
, way()
and/or relation()
) and that's it! Here's the full code of the simple Python
script I did:
#! /usr/bin/python3
import osmium
import psycopg2
conn= psycopg2.connect ('dbname=gis')
cur= conn.cursor ()
class CastleTypes (osmium.SimpleHandler):
def process (self, thing, table):
if 'castle_type' in thing.tags:
try:
name= thing.tags['name']
# osmium/boost do not raise a KeyError here!
# SystemError: <Boost.Python.function object at 0x1329cd0> returned a result with an error set
except (KeyError, SystemError):
name= ''
print (table, thing.id, name)
cur.execute ('''UPDATE '''+table+
''' SET castle_type = %s
WHERE osm_id = %s''',
(thing.tags['castle_type'], thing.id))
def node (self, n):
self.process (n, 'planet_osm_point')
def way (self, w):
self.process (w, 'planet_osm_polygon')
relation= way # handle them the same way (*honk*)
ct= CastleTypes ()
ct.apply_file ('europe-latest.osm.pbf')
The only strange part of the API is that it doesn't seem to raise a KeyError
when the tag does not exist, but a SystemError
. I'll try to figure this out
later. Also interesting is the big amount of unnamed elements with this tag that
exist in the DB.
[1] I would love for GitHub to recognize something like https://github.com/openstreetmap/osm2pgsql/blob/master/table.cpp#table_t::stop and be directed to that method, because #Lxxx gets old pretty quick.
[2] I just noticed how much more complete those maps are. more ideas to use :)
openstreetmap gis python
In this last two days I've been expanding
osm-centerlines
. Now it not only
supports ways more complex than a simple rectangle, but also ones that lead to
'branches' (unfortunately, most probably because the mapper either imported
bad data or mapped it himself). Still, I tested it in very complex polygons
and the result is not pretty. There is still lots of room for improvements.
Unluckily, it's not as stand alone as it could be.
The problem is that, so far, the algos force you to provide now only the polygon
you want to process, but also its
skeleton and
medial. The code extends the
medial using info extracted from the skeleton in such a way that the resulting
medial ends on a segment of the polygon, hopefully the one(s) that cross
from one riverbank to another at down and upstream. Calculating the skeleton
could be performed by CGAL
, but the current
Python binding doesn't include
that function yet. As for the medial, SFCGAL (a C++ wrapper for CGAL)
exports a function that calculates an approximative medial,
but there seem to be no Python bindings for them yet.
So, a partial solution would be to use PostGIS-2.2's ST_StraightSkeleton()
and
ST_ApproximateMedialAxis()
, so I added a function called
skeleton_medial_from_postgis()
. The parameters are a psycopg2
connection to a
PostgreSQL+PostGIS database and the way you want to calculate, as a shapely.geometry
,
and it returns the skeleton and the medial ready to be fed into extend_medials()
.
The result of that should be ready for mapping.
So there's that. I'll be trying to improve it in the next days, and start looking into converting it into a JOSM plugin.
openstreetmap gis python
For a long time now I've been thinking on a problem: OSM data sometimes contains riverbanks that have no centerline. This means that someone mapped (part of) the coasts of a river (or stream!), but didn't care about adding a line that would mark its centerline.
But this should be computationally solvable, right? Well, it's not that easy. See, for given any riverbank polygon in OSM's database, you have 4 types of segments: those representing the right and left riverbanks (two types) and the flow-in and flow-out segments, which link the banks upstream and downstream. With a little bit of luck there will be only one flow-in and one flow-out segment, but there are no guarantees here.
One method could try and identify these segments, then draw a line starting in the middle of the flow-in segment, calculating the middle by traversing both banks at the same time, and finally connect to the middle for the flow-out segment. Identifying the segments by itself is hard, but it is also possible that the result is not optimal, leading to a jagged line. I didn't try anything on those lines, but I could try some examples by hand...
Enter topology, the section of maths that deals with this kind of problems. The skeleton of a polygon is a group of lines that are equidistant to the borders of the polygon. One of the properties this set of lines provides is direction, which can be exploited to find the banks and try to apply the previous algorithm. But a skeleton has a lot of 'branches' that might confuse the algo. Going a little further, there's the medial axis, which in most cases can be considered a simplified skeleton, without most of the skeleton branches.
Enter free software :) CGAL
is a library that can compute a lot of topological properties. PostGIS is clever
enough to leverage those algorithms and present, among others, the functions
ST_StraightSkeleton()
and ST_ApproximateMedialAxis()
. With these two and the
original polygon I plan to derive the centerline. But first an image that will
help explaining it:
The green 'rectangle' is the original riverbank polygon. The thin black line is the skeleton for it; the medium red line is the medial. Notice how the medial and the center of the skeleton coincide. Then we have the 4 branches forming a V shape with its vertex at each end of the medial and its other two ends coincide with the ends of the flow in and flow out segments!
So the algorithm is simple: start with the medial; from its ends, find the branches in the skeleton that form that V; using the other two ends of those Vs, calculate the point right between them, and extend the medial to those points. This only calculates a centerline. The next step would be to give it a direction. For that I will need to see if there are any nearby lines that could be part of the river (that's what the centerline is for, to possibly extend existing rivers/centerlines), and use its direction to give it to the new centerline.
For the moment the algorithm only solves this simple case. A slightly more
complex case is not that trivial, as skeletons and medials are returned as a
MultiLineString
with a line for each segment, so I will have to rebuild them
into LineString
s before processing.
I put all the code
online, of course :)
Besides a preloaded PostgreSQL+PostGIS database with OSM data, you'll need
python3-sqlalchemy
, geoalchemy
, python3-fiona
and python3-shapely
. The
first two allows me to fetch the data from the db. Ah! by the way, you will need
a couple of views:
CREATE VIEW planet_osm_riverbank_skel AS SELECT osm_id, way, ST_StraightSkeleton (way) AS skel FROM planet_osm_polygon WHERE waterway = 'riverbank';
CREATE VIEW planet_osm_riverbank_medial AS SELECT osm_id, way, ST_ApproximateMedialAxis (way) AS medial FROM planet_osm_polygon WHERE waterway = 'riverbank';
Shapely allows me to manipulate the polygonal data, and fiona is used to save the results to a shapefile. This is the first time I ever use all of them (except SQLAlchemy), and it's nice that it's so easy to do all this in Python.
openstreetmap gis python
As I mentioned in one of my lasts posts about it, I started using SRTM 1 arc
data for rendering my maps. So far, with the 3 arc dataset, I was able to
generate a huge image by stitching the tiles with gdal_merge.py
, then
generating the 3 final images (height, slope and hill shade) plus one
intermediary for the slope, all with gdaldem
. Now this is no longer
possible, as the new dataset is almost 10x the old one, so instead of going that
way, I decided to try another one.
With gdalbuildvrt
it is possible to generate an XML file that 'virtually'
stitches images. This means that any attempt to access data through this file
will actually make the library (libgdal
) to find the proper tile(s) and access
them directly.
So now the problem becomes processing each tile individually and then virtual
stitching them. The first part is easy, as I just need to do the same I was
doing to the huge stitched image before. I also took the opportunity to use
tiled files, which instead of storing the image one scan line at a time (being
1 arc second resolution, each scan line has 3601 pixels; the extra one is for
overlapping with the neighbors), it stores the file in 256x256 sub-tiles, possibly
(that is, no tested) making rendering faster by clustering related data closer.
The second step, with gdalbuildvrt
, should also be easy.
The first block on the way is the fact that SRTM tiles above 50°N are only 1801
pixels wide, most posibly because it makes no sense anyways. This meant that I
had to preprocess the original tiles so libgdal
didn't have to do the
interpolation at render time (in fact, it already has to do it once while
rendering, using the lanczos
scaling algorithm). This was done with gdalwarp
.
The second one came from slope and hill shading tiles. As the algortihm goes, it
generates some 'fade out' values in the edges, and when libgdal
was stitching
them, I could see it as a line in the seam. This was fixed by passing
-compute_edges
to gdaldem
.
Finally, for some reason gdalbuildvrt
was generating some very strange .vrt
files. The format of these files is more or less the following:
- For each band in the source tiles it creates a band in the result.
- For each source tile, it describes:
- The source file
- The source band
- The size and tiling of the source (3601², 256²)
- The rectangle we want from the source (0², 3601²)
- The rectangle in the result (x, y, 3601²)
- For each source tile, it describes:
The problem I saw was some weird declararions of the rectangles in the result, where the coordinates or the sizes didn't match what I expected. I will try to figure this out with the GDAL poeple in the following weeks, but first I want to make sure that the source tiles are easily downloadable (so far I have only found download options through USGS' EarthExplorer, which requires you to be logged in in order to download tiles, which means that it is not very scriptable, so not very reproducible).
So for the moment I'm using my own .vrt
file generator, completely not generic
enough for release, but soon. I also took the opportunity to make the rectangles
in the result non-overlapping, being just 3600² in size. I know that the
generated file works because I'm also generating smaller samples of the
resulting layers (again, height, slope and hill shading) for rendering smaller
zoom levels.
The only remaining question about huge DEM datasets is contour generation. So far I had just generated contour lines for each tile and lived with the fact that they too look ugly at the seams.
Since I started playing with rendering maps I included some kind of elevation info for highlighting mountains. At the beginning it was just hillshading provided by some german guy (I don't have the reference on me right now), but after reading Tilemill's terrain data guide, I started using DEMs to generate 4 different layers: elevation coloring, slope shading, hillshading and contour lines.
When I started I could find only three DEM sources: SRTM 3arc and ViewFinderPanoramas (1arc and 3arc). The second one tries to flatten plains (for instance, the Po's plain nearby to where I live), where it generates some ugly looking terracing. The third one, when I downloaded the corresponding tile (they're supposed to be 1x1 degrees), its medatada reported an extension between 7 and 48 degrees east, and between 36 and 54 degrees north, and that its size is 147602x64801 pixels. I also remember stitching all the tiles covering Europe, just to get a nice 1x1 degrees hole in the North Adriatic sea. Not having much time to pre or post process the data, I decided to stick to SRTM.
Things changed at the end of last year. The US government decided to release 1arc, 30m global coverage (previously that resolution covered only the US). I started playing with the data mid January, only to find that it is not void-filled: this DEMs are derived from the homonymous Shuttle mission, which used radar to get the data. Radar gets very confused when water is involved; this is no problem on rivers, lakes or sea, where elevation is constant relative to the coast, but it's a problem on snow covered mountains, glaciers and even clouds. This means that the data has NODATA holes. The SRTM 3arc v4.1 I was using had these 'voids' filled; deFerrantis has painstakingly been filling these voids too by hand, using topographic maps as reference.
So I set up to fill these voids too. But first let's see how the original data looks like. All the images are for the area near Isola 2000, a ski station I go often. The first image is how this looks on the SRTM 3arc v4.1:
This is a 4x4 grid of 256x256 pixel tiles (1024x1024 in total) at zoom level 13. The heights range from ~700m up to ~2600m, and the image combines all the 4 layers. It already starts showing some roundness in the terrain, specially in ridges and mountain tops; even at the bottom of the deep valleys.
For contrast, this is deFerrantis data:
This is the first time I really take a look on the result; it doesn't seem to be much better that 3arc. Here's the terracing I mentioned:
For contrast, check what 1arc means:
From my point of view, the quality is definitely better. Peaks, crests and valleys are quite sharp. As for the mountain sides, they look rugged. My appreciation is this reflects better the nature of the terrain in question, but Christoph Hormann of Imagigo.de views it as sample noise. He has worked a lot on DEMs to generate very beautiful maps.
But then we have those nice blue lagoons courtesy of voids (the blue we can see is the water color I use in my maps). So, how to proceed?
The simplest way to fix this is covering the voids with averages calculated from the data at the seams of the voids. GDAL has a tool
for that called, of course, gdal_fillnodata.py
. This is the outcome:
At first this looks quite good, but once we start to zoom in (remember there are at least 5 more zoom levels), we start to see some regular patterns:
Another option is to use deFerrantis' data to fill the voids. For this we need to merge both datasets. One way to do it is using
GDAL's gdalwarp
tool. We create a file piling up layers of data; first the most complete one, then the layers with holes:
gdalwarp deFerrantis/N44E007.hgt mixed.tif
gdalwarp SRTM_1arc_v3/n44_e007_1arc_v3.tif mixed.tif
This looks like this:
I have to be honest, it doesn't look good. Both files declare the same extents and resolution (their metadata is similar, but the second file has more), but if you compare the renders for SRTM_1arc_v3 and deFerrantis, you will notice that they don't seem to align properly.
The last simple option would be to upsample SRTM_3arc_v4.1 and then merge like before, but it took me a while to figure out the right parameters:
gdalwarp -te 6.9998611 43.9998611 8.0001389 45.0001389 -tr 0.000277777777778 -0.000277777777778 -rb SRTM_3as_v4.1/srtm_38_04.tif srtm_1as_v3-3as_v4.1.tif
Creating output file that is 3601P x 3601L.
Processing input file SRTM_3as_v4.1/srtm_38_04.tif.
Using internal nodata values (eg. -32768) for image SRTM_3as_v4.1/srtm_38_04.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.
gdalwarp SRTM_1as_v3/n44_e007_1arc_v3.tif srtm_1as_v3-3as_v4.1.tif
Processing input file SRTM_1as_v3/n44_e007_1arc_v3.tif.
Using internal nodata values (eg. -32767) for image SRTM_1as_v3/n44_e007_1arc_v3.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.
The complex part was the -te
and -tr
parameters. The 3 arc file covers a 5x5 degrees zone at 40-45°N, 5-10°E.
The 1 arc file only covers 44-45°N, 7-8°E, so I need to cut it out. I use the -te
option for that, but according the doc it's
«[to] set georeferenced extents of [the] output file to be created». This means the degrees of the W, S, E and N limits of the output
file. Note that the actual parameters are 1.5" off those limits; I took those limits from the original 1 arc file.
The second one is
even more cryptic: «[to] set [the] output file resolution (in target georeferenced units)». This last word is the key, units.
According to gdalinfo
, both files have a certain pixel size in units declared in the projection. Both declare
UNIT["degree",0.0174532925199433]
; “degree“ has a clear meaning; the float besides it is the size of the unit in radians (π/180).
So the parameters for -tr
is how many units (degrees) does a pixel represent (or, more accurately, how many degrees are between
a pixel center and the next one). Notice that the vertical value is negative; that's because raster images go from North to South,
but degrees go the other way around (North and East degrees are positive, South and West negative). In any case, I also just copied
the pixel size declared in the 1 arc file.
After all this dissertation about GeoTIFF metadata, finally the resulting DEM:
As I said earlier, I don't have much time to invest in this; my map is mostly for personal consumption and I can't put much time on it. so my conclusion is this: if I can manage the 9x data size, I think I'm going this way.
One of the most cited ways to accelerate rendering maps with Mapnik is metatiling.
This technique is simple: instead of rendering each individual 256x256 PNG file
that normally backs a slippy map, render NxN times that (normally with N=8) and
split afterwards. mapnik-stylesheet's generate_tiles.py
was not capable of
doing it, but I hacked a little bit and got it to do it, so after integrating
openstreetmap-carto v2.14.1 and fixing some local bugs, I gave it another spin.
This time I also imported the whole Europe data, and expanded the terrain to cover anything above 60°N, taking the height files form De Ferranti's viewfinderpanoramas. This meant that whatever rendering times I got this time, they are not completely comparable to what I obtained before because it's simply handling way more data, specially relative to the terrain data.
So, first, a short revisit to importing. Here's the summary:
mdione@diablo:/var/lib/data/mdione/src/projects/osm/data/osm$ osm2pgsql \
--create --drop --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache \
--cache 2048 --number-processes 4 --unlogged europe-latest.osm.pbf
osm2pgsql SVN version 0.82.0 (64bit id space)
Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
Allocated space for persistent node cache file
Maximum node in persistent node cache: 0
Mid: pgsql, scale=100 cache=2048
Reading in file: europe-latest.osm.pbf
Processing: Node(1225816k 799.6k/s) Way(151242k 17.71k/s) Relation(1872470 338.91/s) parse time: 15596s [4h20m]
Node stats: total(1225816996), max(2884224404) in 1533s [0h26m]
Way stats: total(151242682), max(284701738) in 8538s [2h22m]
Relation stats: total(1872475), max(3776697) in 5525s [1h32m]
Going over pending ways...
Maximum node in persistent node cache: 2884632575
110980610 ways are pending
Using 4 helper-processes [but only 1 is used, the others failed, dunno why]
Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
Maximum node in persistent node cache: 2884632575
Process 0 finished processing 110980610 ways in 34202 sec [9h30m]
110980610 Pending ways took 34202s at a rate of 3244.86/s
Going over pending relations...
Maximum node in persistent node cache: 2884632575
0 relations are pending
Using 4 helper-processes
Process 2 finished processing 0 relations in 2 sec
Process 3 finished processing 0 relations in 2 sec
Process 0 finished processing 0 relations in 3 sec
Process 1 finished processing 0 relations in 3 sec
0 Pending relations took 3s at a rate of 0.00/s
node cache: stored: 203128264(16.57%), storage efficiency: 75.67% (dense blocks: 138131, sparse nodes: 63494657), hit rate: 17.62%
Stopped table: planet_osm_rels in 1s
Stopped table: planet_osm_nodes in 1s
Stopped table: planet_osm_ways in 3s
All indexes on planet_osm_roads created in 4679s [1h18m]
All indexes on planet_osm_point created in 6093s [1h41m]
All indexes on planet_osm_line created in 9703s [2h42m]
All indexes on planet_osm_polygon created in 12735s [3h32m]
Osm2pgsql took 62780s overall [17h26m]
Only ~3h30m more than importing a part of Europe, I think it's manageable. I still don't know what to make out of the cache/storage efficiency line, but it's clearly related to the small size of the cache.
About the rendering, I only rendered up to zoom level 11, as I discussed before, but that limit was mostly due to the fact that going all the way down to 14 took too much time. But after viewing how much less rendering took this time, I most probably revert that change. Take a look at the graph:
This graph shows the average and total time for rendering up to zoom level 11 with single tiling (actually it's the old data I showed previously) and metatiling 8x8. The total time uses the logarithmic scale on the right. The average in the second case is the division between the total time and the amount of final individual tiles, which are produced by cutting the metatile in as many single tiles as the minimum between 2^z and 64, so zoom levels 0-2 don't produce spurious images. The times are amazing. Notice that from time to time the total time took more in the metatile version than in the single tile version, but the are two factors that impact here. One is that due to metatiling, I'm rendering tiles that with single tiling I wouldn't due to the fact that now I can only cut in increments of 8 instead of one. Two, now I'm rendering the whole Europe instead of just a big part of it. This makes the amount of tiles produced to be between 6 to 10 times more.
Still, take a look at the average rendering times for metatiling, and specially at the last column in the table, which is not graphed: it's the proportion between the average tile rendering time, meta to single. It's mostly below the 10%, except for a couple of zoom levels that either produce few tiles (ZL 2, 16 tiles) or stay under 15% (ZL 4). This is ridiculously low, which means fast. I will first finish rendering my zone down to ZL 18 and then render the rest down to ZL 14 to have more complete, comparable graphs.
openstreetmap gis