Writing a tile server in python

Another dictated post111, but heavily edited. Buyer beware.

I developed a tileset based on OpenStreetMap data and style and elevation information, but I don't have a render server. What I have been doing is using my own version of an old script from the mapnik version of the OSM style. This script is called generate_tiles, and I made big modifications to it and now it's capable of doing many things, including spawning several processes for handling the rendering. You can define regions that you want to render, or you can just provide a bbox or a set of tiles or just coordinates. You can change the size of the meta tile, and it handles empty tiles. If you find a sea tile, most probably you will not need to render its children9, where children are the four tiles that are just under it in the next zoom level. For instance, in zoom level zero we have only one tile (0,0,0), and it's children are (1,0,0), (1,0,1), (1,1,0) and (1,1,1). 75% of the planet's surface is water, and with Mercator projection and the Antartic Ocean, the percent of tiles could be bigger, so this optimization cuts a lot of useless rendering time.

Another optimization is that it assumes that when you render zoom level N, you will be using at least the same data for zoom level N+1. Of course, I am not catching that data because mapnik does not allow this, but the operating system does the catching. So if you have enough RAM, then you should be able to reuse all the data that's already in buffers and cache, instead of having to fetch them again from disk. This in theory should accelerate rendering and probably it is10.

The script works very well, and I've been using it for years already for rendering tiles in batches for several zoom levels. Because my personal computer is way more powerful than my server (and younger; 2018 vs 2011), I render in my computer and rsync to my server.

So now I wanted to make a tile server based on this. Why do I want to make my own and not use renderd? I think my main issue with renderd is that it does not store the individual tiles, but keeps metatiles of 8x8 tiles and serve the individual tiles from there. This saves inode usage and internal fragmentation. Since my main usage so far has been (and probably will continue to be) rendering regions by hand, and since my current (static) tile server stores all the latest versions of the tiles I have rendered since I started doing this some 15 years ago, I want updating the server in a fast way. Most tile storage methods I know fail terribly at update time (see here); most of the time it means sending the whole file over the wire. Also, individual tiles are easier to convert to anything else, like creating a MBTiles file, push it to my phone, and have a offline tile service I can carry with me on treks where there is no signal. Also, serving the tiles can be as easy as python -m http.server from the tileset root directory. So renderd is not useful for me. Another reason is, well, I already have the rendering engine working. So how does it work?

The rendering engine consists of one main thread, which I call Master, and rendering threads3. These rendering threads load the style and wait for work to do. The current style file is 6MiB+ and takes mapnik 4s+ to load it and generate all its structures, which means these threads have to be created once per service lifetime. I have one queue that can send commands from the Master to the renderer pool asking for rendering a metatile, which is faster than rendering the individual tiles. Then one of the rendering threads picks the request from this queue, calls mapnik, generates the metatile, cuts it into the subtiles and saves them to disk. The rendering thread posts in another queue, telling the Master about the children metatiles that must be rendered, which due to emptiness can be between 0 and 4.

To implement the caching optimization I mentioned before, I use a third structure to maintain a stack. At the beginning I push into it the initial work; later I pop one element from it, and when a rendered returns the list of children to be rendered, I push them on top of the rest. This is what tries to guarantee that a metatile's children will be rendered before moving to another region that would trash the cache. And because the children can inspect the tiles being written, they can figure out when a child is all sea tiles and not returning it for rendering.

At the beginning I thought that, because the multiprocessing queues are implemented with pipes, I could use select()4 to see whether the queue was ready for writing or reading and use a typical non-blocking loop. When you're trying to write, these queues will block when the queue is full, and when you're trying to read, they will block when the queue is empty. But these two conditions, full and empty, are actually handled by semaphores, not by the size of the pipe. That means that selecting on those pipes, even if I could reach all the way down into the structures of the multiprocessing.Queue all the way down. and add them to a selector, yes, the read queue will not be selected if it's empty (nothing to read), but the write queue will not, since availability of space in the pipe does not mean the queue is not full.

So instead I'm peeking into these queues. For the work queue, I know that the Master thread8 is the only writer, so I can peek to see if it is full. If it is, I am not going to send any new work to be done, because it means that all the renders are busy, and the only work queued to be done has not been picked up yet. For the reading side it's the same, Master is the only reader. so, I can peek if it's empty, and if it is, I am not going to try to read any information from it. So, I have a loop, peeking first into the work queue and then into the info queue. If nothing has been done, I sleep a fraction of a second.

Now let's try to think about how to replace this main loop with a web frontend. What is the web frontend going to do? It's going to be getting queries by different clients. It could be just a slippy map in a web page, so we have a browser as a client, or it could be any of the applications that can also render slippy maps. For instance, on Linux, we have marble; on Android, I use MyTrails, and OsmAnd.

One of the things about these clients is that they have timeouts. Why am I mentioning this? Because rendering a metatile for me can take between 3 to 120 seconds, depending on the zoom level. There are zoom levels that are really, really expensive, like between 7 and 10. If a client is going to be asking directly a rendering service for a tile, and the tile takes too long to render, the client will timeout and close the connection. How do we handle this on the server side? Well, instead of the work stack, the server will have request queue, which will be collecting the requests from the clients, and the Master will be sending these requests to the render pool.

So if the client closes the connection, I want to be able to react to that, removing any lingering requests made by that client from the request queue. If I don't do that, the request queue will start piling up more and more requests, creating a denial of service. This is not possible in multiprocessing queues, you cannot remove an element. The only container that can do that is a dequeue5, which also is optimized for putting and popping things from both ends (it's probably implemented using a circular buffer), which is perfect for a queue. As for the info queue, I will not be caring anymore about children metatiles, because I will not be doing any work that the clients are not requesting.

What framework that would allow me to do this? Let's recap the requirements:

  • Results are computed, and take several seconds.
  • The library that generates the results is not async, nor thread safe, so I need to use subprocesses to achieve parallelization.
  • A current batch implementation uses 2 queues to send and retrieve computations to a pool of subprocesses; my idea is to "just" add a web frontend to this.
  • Each subprocess spends some seconds warming up, son I can't spawn a new process for each request.
  • Since I will have a queue of requested computations, if a client dies, if its query is being processed, then I let it finish; if not, I should remove it from the waiting queue.

I started with FastAPI, but it doesn't have the support that I need. At first I just implemented a tile server; the idea was to grow from there6, but reading the docs it only allows doing long running async stuff after the response has been sent.

Next was Flask. Flask is not async unless you want to use sendfile(). sendfile() is a way to make the kernel read a file and write it directly on a socket without intervention from the process requesting that. The alternative is to to open the file, read a block, write it on the socket, repeat. This definitely makes your code more complex, you have to handle lots of cases. So sendfile() is very, very handy, but it's also faster because it's 0-copy. But Flask does not give control of what happens when the client suddenly closes the connection. I can instruct it to cancel the tasks in flight, but as per all the previous explanation, that's not what I want.

This same problem seems to affect all async frameworks I looked into. asyncio, aiohttp, tornado. Except, of course, twisted, but its API for that is with callbacks, and TBH, I was starting to get tired of all this, and the prospect of callback hell, even when all the rest of the system could be developed in a more async way, was too much. And this is not counting the fact that I need to hook into the main loop to step the Master. This could be implemented with timed callbacks, such as twisted's callLater(), but another thought started to form in my head.

Why did I go directly for frameworks? Because they're supposed to make our lives easier, but from the beginning I had the impression that this would not be a run of the mill service. The main issue came down to beign able to send things to render, return the rendered data to the right clients, associate several clients to a single job before it finished (more than one client might request the same tile or several tiles that belong to the same metatile), and handle client and job cancellation when clients disappear. The more frameworks' documentation I read, the more I started to fear that the only solution was to implement an non-blocking12 loop myself.

I gotta be honest, I dusted an old Unix Network Programming book, 2nd Ed., 1998 (!!!), read half a chapter, and I was ready to do it. And thanks to the simple selector API, it's a breeze:

  1. Create a listening socket.
  2. Register it for read events (connections).
  3. On connection, accept the client and wait for read events in that one too.
  4. We were not registering for write before because the client is always ready for write before we start sending anything, which lead to tight loops.
  5. On client read, read the request and send the job to Master. Unregister for read.
  6. But if there's nothing to read, the client disconnected. Send an empty.response, unregister for read and register for write.
  7. Step Master.
  8. If anything came back, generate the responses and queue them for sending. Register the right clients for write.
  9. On client write (almost always), send the response and the file with sendfile() if any.
  10. Then close the connection and unregister.
  11. Loop to #3.

Initially all this, including reimplementing fake Master and render threads, took less than 200 lines of code, some 11h of on-and-off work. Now that I have finished I have a better idea of how to implement this at least with twisted, which I think I will have to do, since step 4 assumes the whole query can be recv()'ed in one go and step 7 similarly for send()'ing; luckily I don't need to do any handholding for sendfile(), even when the socket is non blocking. A more production ready service needs to handle short reads and writes. Also, the HTTP/1.1 protocol all clients are using allows me to assume that once a query is received, the client will be waiting for an answer before trying anything else, and that I can close the connection once a response has been send and assume the client will open a new connection for more tiles. And even then, supporting keep alive should not be that hard (instead of closing the client, unregister for write, register for read, and only do the close dance when the response is empty). And because I can simply step Master in the main loop, I don't have to worry about blocking queues.

Of course, now it's more complex, because it's implementing support for multiple clients with different queries requiring rendering the same metatile. This is due that applications will open several clients for fetching tiles when showing a region, and unless it's only 4 and they fall in the corner of 4 adjacent metatiles, they will always mean more than one client per metatile. Also, I could have several clients looking at the same region. The current code is approaching the 500 lines, but all that should also be present in any other implementation.

I'm pretty happy about how fast I could make it work and how easy it was. Soon I'll be finishing integrating a real render thread with saving the tiles and implement the fact that if one metatile's tile is not present, we can assume it's OK, but if all are not present, I have to find out if they were all empty or never rendered. A last step would be how to make all this testable. And of course, the twisted port.


  1. This is getting out of hand. The audio was 1h long, not sure how long it took to auto transcribe, and when editing and thinking I was getting to the end of it, the preview told me I still had like half the text to go through. 

  2. No idea what I wanted to write here :) 

  3. Because mapnik is not thread safe and because of the GIL, they're actually subprocesses via the multioprocessing module, but I'll keep calling them threads to simplify. 

  4. Again, a simplification. Python provides the selector module that allows using abstract implementations that spare us from having to select the best implementation for the platform. 

  5. I just found out it's pronounced like 'deck'. 

  6. All the implementations I did followed the same pattern. In fact, right now, I hadn't implementing the rendering tile server: it's only blockingly sleep()'ing for some time (up to 75s, to trigger client timeouts), and then returning the tiles already present. What's currently missing is figuring out whether I should rerender or use the tiles already present7, and actually connecting the rendering part. 

  7. Two reasons to rerender: the data is stale, or the style has changed. The latter requires reloading the styles, which will probably mean rebuilding the rendering threads. 

  8. I keep calling this the Master thread, but at this point instead of having its own main loop, I'm just calling a function that implements the body of such loop. Following previous usage for such functions, it's called single_step()

  9. Except when you start rendering ferry routes. 

  10. I never measured it :( 

  11. Seems like nikola renumbers the footnotes based on which order they are here at the bottom of the source. The first note was 0, but it renumbered it and all the rest to start counting from 1. 

  12. Have in account that I'm explicitly making a difference between a non-blocking/select() loop from an async/await system, but have in account that the latter is actually implemented with the formet. 

Collating, processing, managing, backing up and serving a gallery of a 350GiB, 60k picture collection

In the last two days I have commented a little bit how I process and manage my photos. I'm not a very avid photographer, I have like 350 gigabytes of photos, most of them are yet not processed, around 60,000 of them. So I will comment a little bit more how do I manage all that.

I start with the camera, a 24Mpx camera, just a couple of lenses, nothing fancy. Go out, take some pictures, come back home.

I put the SD camera on my computer and I use my own software to import it. The import process is not fancy, it just empties the SD card, checks every file for the EXIF information, uses the date and time to create the filename, a sequence number if needed, and puts them all in a single incoming directory where all the current unprocessed images are1.

Then I use this software I developed in PyQt5. It's very, very basic, but it's really quick, it's mostly keyboard based. It reads the EXIF information and present some of the tags at the left of the screen; things like date, time, size, orientation and then focal length, aperture, ISO and various other data I can get from the images. It's mostly focused on my current camera and the previous one, both Nikons2. The previous one was an N90, right now it's an N7200. The image occupies most of the window, and the program is always in full screen. At the bottom there's the filename and a couple of toggles.

I can do several things with this:

  • Go forwards, backwards, by one, by ten, by a hundred and by a thousand, because that incoming directory right now has almost seven years of history, probably ten thousand pictures.

  • Move randomly, which allows me to pick up a new thing to collate when I get bored with the current one but I want to keep doing it to reduce the backlog.

  • Mark the images in different ways. The main ones are about selecting for storing, with two modes: One is to keep the image in the original size. I usually use this for my best landscape or astro photos. The other one will resize it down to twelve megapixels3, from 6000x4000 pixels to 4500x3000 pixels, 75% on each dimension.

  • Rotate the images, just in case the camera did not guess the orientation correctly, usually when I'm taking pictures right upward or right downwards.
  • Select several pictures for stitching, which will use hugin to do so. It's not 100% automatic, but at least puts the pictures in a stitch directory and point hugin there.

  • Select a picture for cropping or editing; I'm not going to develop a whole image editor, so I just delegate to an existing program, gwenview.

  • Select images for deleting and delete them permanently.

  • Select several images for comparison and enter/exit comparison mode, which means that going backwards and forwards applies only this this set. This is good for things like when you take certain pictures, but there are not necessarily sequences in the original picture sequence, which for me makes culling images faster.

  • It has two zoom levels, fit to screen and full size. I don't have much the need for other options.
  • 99% of the pictures I take are freehand, so in a sequence there's always some movement between images. In full size I can put every image on its own position, aligning the whole sequence and allow culling based on blurriness or other factors.

  • Also in full size, I can lock the view, so when I pan one of the images and I switch to another one, it will also pan that second image to that position. It also helps when I'm checking for details between two different images of the same thing.

  • Move all the selected images, resize them if needed, and put them in a folder. It also creates a hardlink between my categorization in folders into a folder that collects all the images by date; there's one folder for each month and year with all the pictures of that month inside. It uses hardlinks so it doesn't duplicate the image file, saving space.

  • It also has a readonly mode, so I can hand the computer to my kids to watch the photos.

When culling, I use the comparison mode and individual position and lock view features a lot, going back and forth between images, discarding until only one is left.

That's the first part, the one I must spend my time on, just basic culling, selection and storage. My main tree is just a tree based on my way of categorizing the images.

My program doesn't have a directory view; instead, I just use gwenview again.

Notice there's no photo editing in this workflow. I rarely shoot in RAW for two reasons: a) I'm really bad at postprocessing; and b) even if I was good, I don't have the time to do it; my free time is shared among several hobbies. I only do it for astro photograpy and very few, rare occasions.

The third tool I use is digikam. I use it for two things, which are related: semi-automatic and manual tagging. The semi-automatic is face detection; digikam can find and guess faces, but requires manual confirmation4. The fully manual part is plain tagging, mostly with location5 and sometimes some other info. I sometimes also rate my pictures; I mostly use four and five, sometimes three, only for my best pictures.

Then there's another script that reads the digikam database and uses the tags to create another directory for the tags, which also uses hardlinks. It still doesn't do anything about the rating, but I could easily add that.

That's all on my personal computer. I use rsync to make a copy on my home server that has two purposes. One, it's a backup, which includes all the original 24Mpx images that I hadn't culled yet, which I think is the biggest part of my collection.

The second one, it feeds a gallery program that is developed in PHP by a guy named Karl. It's probably the single paid software I use. It's a single PHP file that you put at the root of your gallery, you enable PHP processing by your web server (in my case, Apache), and generates the gallery on the run, just reading the directories and creating all the necessary thumbnails and all that. I did a small change to this program. The original algorithm creates thumbnails based on each file's path (and other attributes, 4 or 5 I think), but because I have all these hard links, it creates duplicated thumbnail files. So I changed it to use the filename instead of the filepath6.

I don't have any kind of synchronization with my phone. Most of the pictures I take with it are not the kind of pictures I usually will put in my own gallery, except the times I go out without my camera and I end up taking pictures anyway. I still don't have a workflow for that, it's mostly manual. So if I ever lose my phone, I'm fscked because I have definitely no backups of it.

That lack of synchronization also means that the only way to see the pictures in my phone is by opening the gallery in the browser. It's not the best, but I don't do that that often. I have tried to use alternatives like NextCloud, which I also have installed on my home server. I have some issues with permissions because, again, this is a backup directory, so it has all the owner information that belongs to me, instead of the web server. That means it doesn't have the proper permissions to let NextCloud manage those files. Luckily files.gallery just needs a subdirectory.

Another reason is that before I was using static gallery generators: sigal, gallerpy or even nikola, which drives this glob. All those can generate the gallery statically, so serving them is so much easier. My old home server died at some point and I had to come up with something. I had a spare old laptop laying around and I used that. Now it's enough to generate the gallery on the fly. I have plans to make something bigger, but that's for another time.


  1. In fact I have another directory for all the unprocessed photos from another era, and I'm thinking of starting a new era. 

  2. Even if EXIV is a standard for storing tags, there's no standard for the tag names, so every manufacturer has its own sets, that even change between camera lines. For a better idea of what I'm talking about, just peruse Image::ExifTool's source code

  3. I currently own no screen that is 4500 pixels of width, let alone 6000. Maybe my kids will, but by then Mpx count will be so different that it won't make any sense to accomodate that. Right now storage for me is expensive, so I'll keep it this way. 

  4. Or rejection: the false positive rate is bigger that I would like, and it doesn't have a way to say 'yes, this is that person, but don't train on this image'. This is the case for pictures where the face is either semi occluded, sometimes painted, sometimes bad lightning, and mostly just blurry. 

  5. Most of my pictures don't have GPS info, not even the ones in the phone. The latter I only enable when I really need the info later, mostly for mapping. Later I either discard the photo or remove the info. 

  6. For a while now I'm even making this distinction in my own code, filename vs filepath.