Processing huge DEM datasets
As I mentioned in one of my lasts posts about it, I started using SRTM 1 arc
data for rendering my maps. So far, with the 3 arc dataset, I was able to
generate a huge image by stitching the tiles with gdal_merge.py
, then
generating the 3 final images (height, slope and hill shade) plus one
intermediary for the slope, all with gdaldem
. Now this is no longer
possible, as the new dataset is almost 10x the old one, so instead of going that
way, I decided to try another one.
With gdalbuildvrt
it is possible to generate an XML file that 'virtually'
stitches images. This means that any attempt to access data through this file
will actually make the library (libgdal
) to find the proper tile(s) and access
them directly.
So now the problem becomes processing each tile individually and then virtual
stitching them. The first part is easy, as I just need to do the same I was
doing to the huge stitched image before. I also took the opportunity to use
tiled files, which instead of storing the image one scan line at a time (being
1 arc second resolution, each scan line has 3601 pixels; the extra one is for
overlapping with the neighbors), it stores the file in 256x256 sub-tiles, possibly
(that is, no tested) making rendering faster by clustering related data closer.
The second step, with gdalbuildvrt
, should also be easy.
The first block on the way is the fact that SRTM tiles above 50°N are only 1801
pixels wide, most posibly because it makes no sense anyways. This meant that I
had to preprocess the original tiles so libgdal
didn't have to do the
interpolation at render time (in fact, it already has to do it once while
rendering, using the lanczos
scaling algorithm). This was done with gdalwarp
.
The second one came from slope and hill shading tiles. As the algortihm goes, it
generates some 'fade out' values in the edges, and when libgdal
was stitching
them, I could see it as a line in the seam. This was fixed by passing
-compute_edges
to gdaldem
.
Finally, for some reason gdalbuildvrt
was generating some very strange .vrt
files. The format of these files is more or less the following:
- For each band in the source tiles it creates a band in the result.
- For each source tile, it describes:
- The source file
- The source band
- The size and tiling of the source (3601², 256²)
- The rectangle we want from the source (0², 3601²)
- The rectangle in the result (x, y, 3601²)
- For each source tile, it describes:
The problem I saw was some weird declararions of the rectangles in the result, where the coordinates or the sizes didn't match what I expected. I will try to figure this out with the GDAL poeple in the following weeks, but first I want to make sure that the source tiles are easily downloadable (so far I have only found download options through USGS' EarthExplorer, which requires you to be logged in in order to download tiles, which means that it is not very scriptable, so not very reproducible).
So for the moment I'm using my own .vrt
file generator, completely not generic
enough for release, but soon. I also took the opportunity to make the rectangles
in the result non-overlapping, being just 3600² in size. I know that the
generated file works because I'm also generating smaller samples of the
resulting layers (again, height, slope and hill shading) for rendering smaller
zoom levels.
The only remaining question about huge DEM datasets is contour generation. So far I had just generated contour lines for each tile and lived with the fact that they too look ugly at the seams.