Processing huge DEM datasets

As I mentioned in one of my lasts posts about it, I started using SRTM 1 arc data for rendering my maps. So far, with the 3 arc dataset, I was able to generate a huge image by stitching the tiles with gdal_merge.py, then generating the 3 final images (height, slope and hill shade) plus one intermediary for the slope, all with gdaldem. Now this is no longer possible, as the new dataset is almost 10x the old one, so instead of going that way, I decided to try another one.

With gdalbuildvrt it is possible to generate an XML file that 'virtually' stitches images. This means that any attempt to access data through this file will actually make the library (libgdal) to find the proper tile(s) and access them directly.

So now the problem becomes processing each tile individually and then virtual stitching them. The first part is easy, as I just need to do the same I was doing to the huge stitched image before. I also took the opportunity to use tiled files, which instead of storing the image one scan line at a time (being 1 arc second resolution, each scan line has 3601 pixels; the extra one is for overlapping with the neighbors), it stores the file in 256x256 sub-tiles, possibly (that is, no tested) making rendering faster by clustering related data closer. The second step, with gdalbuildvrt, should also be easy.

The first block on the way is the fact that SRTM tiles above 50°N are only 1801 pixels wide, most posibly because it makes no sense anyways. This meant that I had to preprocess the original tiles so libgdal didn't have to do the interpolation at render time (in fact, it already has to do it once while rendering, using the lanczos scaling algorithm). This was done with gdalwarp.

The second one came from slope and hill shading tiles. As the algortihm goes, it generates some 'fade out' values in the edges, and when libgdal was stitching them, I could see it as a line in the seam. This was fixed by passing -compute_edges to gdaldem.

Finally, for some reason gdalbuildvrt was generating some very strange .vrt files. The format of these files is more or less the following:

  • For each band in the source tiles it creates a band in the result.
    • For each source tile, it describes:
      • The source file
      • The source band
      • The size and tiling of the source (3601², 256²)
      • The rectangle we want from the source (0², 3601²)
      • The rectangle in the result (x, y, 3601²)

The problem I saw was some weird declararions of the rectangles in the result, where the coordinates or the sizes didn't match what I expected. I will try to figure this out with the GDAL poeple in the following weeks, but first I want to make sure that the source tiles are easily downloadable (so far I have only found download options through USGS' EarthExplorer, which requires you to be logged in in order to download tiles, which means that it is not very scriptable, so not very reproducible).

So for the moment I'm using my own .vrt file generator, completely not generic enough for release, but soon. I also took the opportunity to make the rectangles in the result non-overlapping, being just 3600² in size. I know that the generated file works because I'm also generating smaller samples of the resulting layers (again, height, slope and hill shading) for rendering smaller zoom levels.

The only remaining question about huge DEM datasets is contour generation. So far I had just generated contour lines for each tile and lived with the fact that they too look ugly at the seams.