Incrementally backing up Cassandra with Amanda

Warning: this has not been tested yet.

Again, TL;DR version at the end.

They say that backing up in C* really easy: you just run nodetool snapshot, which only creates a hardlink for each data file somewhere else in the filesystem, and then you just backup those hardlinks. Optionally, when you're done, you simply remove them and that's it.

But that's only the half of the story. The other half is taking those snapshots and storing them somehwere else; let's say, a backup server, so you can restore the data even in case of spontaneous combustion followed by explosion due to shortcircuits caused by your dog peeing on the machine. Not that that happens a lot in a datacenter, but one has to plan for any contingency, right?

In our case we use Amanda, which internally uses an implementation of tar or GNU tar if asked for (yes, also other tools if asked). The problems begin with how you define what to backup and where does C* put those snapshots. The definitions are done by what Amanda calls disklists, which are basically a list of directories to backup entirely. In the other hand, for a column family Bar in a keyspace Foo, whose data are normally stored in <data_file_directory>/Foo/Bar/, a snapshot is located in <data_file_directory>/Foo/Bar/snapshots/<something>, where something can be a timestamp or a name defined by the user at snapshot time.

If you want to simplify your backup configuration, you'll probably will want to say <data_file_directory>/*/*/snapshots/ as the dirs to backup, but Amanda merrily can't expand wildcards in disklists. A way to solve this is to create a directory sibling of <data_file_directory>, move the files in the snapshots there, and specify it in the disklists. That kinda works...

... until your second backup pass comes and you find out that even when you specified an incremental backup, it copies over all the snapshot files again. This is because when a hardlink is created, the ctime of the inode is changed. Guess what tar uses to see if a file has changed... yes, ctime and mtime1.

So we're back to square one, or zero even. Seems like the only solution is to use C*'s native 'support' for incrementality, but the docs are just a couple of paragraphs that barely explain how they're done (suprise, the same way as the snapshots) and how to activate it, which is the reason why we didn't followed this path from the beginning. So in the end, it seems that you can't use Amanda or tar to make incremental backups, even with the native support.

But then there's a difference between the snapshot and the incremental mode: with the snapshot method, you create the snapshot just before backing it up, which sets all the ctimes to now. C*'s incremental mode "hard-links each flushed SSTable to a backups directory under the keyspace data directory", so they have roughly the same ctime as the mtimes, and neither never ever changes (remember, SSTables are inmutable) again (until we do a snapshot, of course).

One particularity that I noticed is that only new SSTables are backed up, but not those that are the result of compactions. At the beginning I thought this was wrong, but after discussing the issue with driftx in the IRC channel and a confirmation by Tyler Hobbs in the mailing list, we came to the following conclussion: with also compacted SSTables, at restore time you would need to do a manual compaction to minimize data duplication, which otherwise means more SStables associated by the Bloom filters and more disk reads/seeks per get and more space used; but if you don't backup/restore those SStables, the manual compaction is only advisable. Also, as a consequence, you don't need to track which files were deleted between backups.

So the remaining problem is to know which files have been backed up, because C* backups, just like snapshots, are not automatically cleaned. I came up with the following solution, which at the beginning it might seem complicated, but it really isn't.

When we do a snapshot, which is perfect for full backups, we previously remove all the files present in the backup directory; incremental files since the last incremental backup are not needed because we're doing a full anyways. At the end of this we have the files ready for the full; we do the backup, and we erase the files.

Then the following days we just add the dynamic backups so far, preceded by a flush, so as to have the last data in the SSTables and not depend on CommitLogs. As they're only the diff against the files in the full, and not the intermediate compacted SSTables, they're as big as they should (but also as small as they could, if you're worried about disk ussage). Furthermore, the way we put files in the backup dir is via symlinks, so it doesn't change the file's mtime or ctime, and we configure Amanda to dereference symlinks.

Later, at restore time, the files are put in the backup directory, and with a script that takes the KS and CF from the file's name, they're 'dealed' to the right directories.

TL;DR version

Full backup

  • Remove old incremental files and symlinks.
  • nodetool snapshot.
  • Symlink all the snapshot files to a backup directory
  • Backup that directory dereferencing symlinks.
  • nodetool clearsnapshot and remove symlinks.

Incremental backup

  • nodetool flush.
  • Symlink all incremental files into the bakup directory.
  • Backup that directory dereferencing symlinks.

Restore2

  • Restore the last full backup and all the incrementals.

  1. tar's docs are not clear in what exactly it uses, ("Incremental dumps depend crucially on time stamps"), but Amanda's seems to imply such a thing ("Tar has the ability to preserve the access times[;] however, doing so effectively disables incremental backups since resetting the access time alters the inode change time, which in turn causes the file to look like it needs to be archived again.") 

  2. Actually is not that simple. The previous post in this series already shows how it could get more complicated. 

Building tilesets with tilemill and OSM data

The problem of creating tiles seems to be really simple: You have the OSM data in one end, which can be downloaded from here, and the .png tiles in the other. In the middle there should be something that reads that data, applies some templates or description, and generates the tiles. But life is never so simple.

mapnik cannot read any of OSM's exported files (XML or .pbf), but only from a SQLite or PostgreSQL/GIS database; and we can only conver to the latter either with imposm or osm2pgsql, so that road we go. It's mostly a matter of following TileMill's page about using OSM's data (and some template called osm-bright). For Debian sid you follow its instructions for Ubuntu Oneiric Ocelot.

Importing the data should be as simple as:

mdione@mustang:~/src/projects/osm$ sudo su -c "osm2pgsql --database osm --input-reader pbf --verbose --create $(pwd)/france.osm.pbf" postgres
[...]
Unable to open /home/mdione/src/projects/osm/data/france.osm.pbf

No error message. strace gives us a clue:

open("/home/mdione/src/projects/osm/data/france.osm.pbf", O_RDONLY) = -1 EOVERFLOW (Value too large for defined data type)

A little bit cryptic, but basically it says the file is too large. So I downloaded only the region where I live (for some countries there are individual files) and after some cache tweaking:

mdione@mustang:~/src/projects/osm/data$ sudo su -c "osm2pgsql --database osm --input-reader pbf --verbose --cache $((1024+512)) --create $(pwd)/provence-alpes-cote-d-azur.osm.pbf" postgres

This time I got it right. Some numbers so you have an idea how much time and space this takes:

Input file size: 184054kB
Processing: Node(19153k 832.8k/s) Way(3019k 23.59k/s) Relation(9110 112.47/s)  parse time: 233s
node cache: stored: 19153339(100.00%), storage efficiency: 16.41% (dense blocks: 113981, sparse nodes: 0), hit rate: 0.00%
Osm2pgsql took 2125s overall
Final DB size: ~1GiB

That's some 35 minutes.

Once finished you fire TileMill, create a new project and add a PostGIS layer. It took me some time to figure out what to put in the different fields, even when there is a tutorial for that, but I used these, mostly taken from inspecting the database's schemas:

  • ID and Class: osm-roads
  • Connection: dbname=osm host=localhost user=postgres
  • Table or subquery: planet_osm_roads
  • Unique key field: osm_id
  • Geometry field: way
  • The rest: default

And then add the following to the style.mss which you can edit in the right box of TileMill:

#osm-roads {
  ::outline {
    line-color: #7f0000;
    line-width: 2;
    line-join: round;
  }
}

The result is kind of dissapointing: first, you have to specify more in the "Table or subquery" field, because the data actually is not only some kind of roads but also borders. I used (select * from planet_osm_roads where highway!='') as foo for all roads and (select * from planet_osm_roads where boundary='administrative') as foo for borders as a first attempt to be more selective. Second, as I said, it's only the main roads (down to secondary in OSM's terms, list which by the way seems to grow everytime I see it), but nothing smaller. But as a start I think is enough.

Restoring files owner and permissions

Note: TL;DR version at the end.

What could go wrong doing:

chown -R foo.foo $DATA_DIR/

as root? Yes, exactly: $DATA_DIR might not be defined and you end up setting the owner/group to foo.foo for all the files in the machine, including executables and devices. Of course, I learnt that the hard way. Even more, I lost the only ssh session I had on the 3 machines where I did this (mental note: be very very very careful when using cssh or similar tools), but luckly I still had a fourth twin sister which survived.

The first thing you notice is that you can't ssh anymore into the machine. This is because ssh denies to answer any connections with this message: fatal: /var/run/sshd must be owned by root and not group or world writable..

The fisrt step is to restore some initially sane ownership. The safest is root.root, but unluckly I didn't realize this until after I lost said ssh sessions. So actually, the first step is to regain control of the machine. The only way I can think of is to login via the console. I thought that this would imply also rebooting and going in single mode or even using the init=/bin/bash hack which has saved more than one life, or at least sanity, but login keeps working even when the perms are wrong. I seem to remember that was not the case, probably because I thought it was setuid, but sudo find / -perm -4000 confirms that this is not the case.

So, after loging in, I need 2 commands: chown -R root.root / and /etc/init.d/ssh start (because now ssh does not start at boot time, seemingly leaving no error message behind) and now you can login via ssh again.

Now it's time to restore the real ownership. Handy comes a small tool called acl. This small package has two tools, getfacl and its evil twin setfacl. With these we're really close to the final solution. getfacl --recursive / > all.acl pulls the ownerships, then you copy all.acl to the sick server(s) and apply it like a cure-all medicine like this: setfacl --recursive --restore /root/all.acl while being in the root (/) directory. Actually, think of it as a blood or bone marrow transplant.

As final notes, don't use getfacl's --tabular option, as setfacl doesn't recognize the format. Also, you can check that the ownerships were correctly with find / ! -user root | xargs ls -ld and or you can dump the new perms and compare with those you got from the donnor machine.

Update

I give the problem a little bit more of thought and came to the conclusion that with that method you have no idea if you restored all the ownerships properly or not. Taking advantage that I actually changed both owner and group, I can replace the first command with chown -R root /, and then find the not-undone files with find / -group foo. I hadn't tested this, but I think it's ok. Another option is to initially only restore with chown the minimal files to make ssh work again and then find for all the foo.foos.

So, the promised TL;DR version:

  • On the sick machine(s), login via the console and run:
  • chown -R root /
  • /etc/init.d/ssh start
  • Select a donnor machine and run:
  • getfacl --recursive / > all.acl
  • Transplant all.acl to the sick machine(s) (most probably with scp).
  • On the sick machine(s), now via ssh if its more confortable for you, run:
  • cd /
  • setfacl --recursive --restore /root/all.acl
  • find / -group foo and change by hand any remaining not0undone files.

Cassandra counters are not atomic

Today I arrived at work and I was shoved to a scrambled war room. Inside, we the two sysadmins working with C*, our inmediate boss, the DBA, the PHP developer involved in this first migration proyect (from MySQL, this is important) and the project leader replacing the one on vacations. Yesterday before I left I saw a similar but more informal gathering around the other sysadmin who's testing the migration in our preproduction environment. As I was busy in the other corner of the office with my backup tasks (I'm still strugling with a lots of constraints, but I think I finally tackled it down, as in down to the floor. I hope it will just stay still for production... but I digress), so I was unaware the reason of the meeting, except for the title in the mail: "Problem with the counter".

If you followed this history closely, you have all the clues to know where I'm going1. We're replacing the smallest of our databases, the avatar database, which might be small (~50GiB of data), but is has a great impact, because not only a lot of our pages use it, but also our customers and/or partners. Each image has an unique ID implemented with a MySQL auto increment key. Because of the impact, we couldn't simply go for UUIDs.

Now, we knew that even when C* implements counter columns since v0.8, there is no guarantee that the counter changes are atomic from a cluster point of view. What really surprised us is that this no-guarantee also holds for only one node. In other words, simultaneous changes to a counter (incremens by 1, in our case) are not atomic even when they're done in the same node.

To make sure, I put on my hazmat suit3 and plunged head first, shit shovel in hand (at the end they were not needed, the code is very readable), into C*'s code. After some maneuvering (it's not a straight route, there was some back-and-forth) I got the piece of code that I was looking for. Basically it says that if there are no indexes to update and no deletes, the update is done concurrently without any locks. And you cannot index counters, of course, as it makes no sense.

Clearly the subject of atomic counters is not something that C* plans to fix any time soon2, and given the difficulty of it, I can understand that desition. But I would expect atomicity at node level, so if one desires so, one can shoot his own foot "implementing" atomic counters just writing the updates in only one node (and then some medium/heavy infra to implement HA).

One more guy, 90 minutes, 5 possible solutions (including snowflake) later, we desided to temporarily still use MySQL for the counter (remember, we're going online next week) and look closer to more permanent solutions later (this other guy, a sysadmin in another group, has been on-and-off fighting to even make snowflake start in his own machine), including of course, the long term goal of getting rid of this kind of IDs.


  1. Well, if the title didn't gave it away from the beggining. 

  2. If you feel curious, start here

  3. 3 years in France has made me somewhat careful... Dammit! 

Restoring Cassandra online

Still playing with Cassandra, we setup a cluster of 5 nodes to test backing up and restoring. Datastax' doc only takes in account a simple case: where you only have to replace a node that's failing or whose files were corrupted. In this case the restore is quite straightforward: take the node out of the cluster, delete the commit logs, restore the data you have and re-add the node to the cluster, with an optional repair afterwards.

In our case in particular, we not only contemplate this kind of cases, but we also might need to rollback to a point in the past, which implies restoring the data on all the nodes. It's true that this is possible repeating the above algorithm node by node1, without the eventual repair. This means that while you're repairnig the CF or KS, your cluster is almost constantly one node less. This might mean nothing on big deployments, but our production cluster is a humble 4 nodes one, even smaller than the testing one! So having as less downtime as possible is highly needed.

So we set off to find a way to do it without stopping the nodes. Some people were advising on using nodetool refresh or sstableloader, but that seems to work only when restoring one node from scratch; that is, the same case as at the beginning. In our case, sstableloader was making no difference. I assume that it's becasuse it's inserting the data with their original timestamp, so the data with newer timestamps still in the Mem/SSTables in the nodes take precedence. That is, sstableloader seems to not replace the data.

With nodetool refresh the same happens, but you still have the option of deleting the current SSTables after a nodetool flush. But that leads to a state where the node(s) where you have done this emit this error on any operation on the CF or KS:

java.io.IOError: java.io.FileNotFoundException: /var/opt/hosting/db/cassandra/data/one_cf/cf_1/one_cf-cf_1-hd-13-Data.db (No such file or directory)

It's not obvious from the example I show, but that's exactly one of the SSTables I just removed. That is, C* still tries to read the SSTables that were there no more even after a nodetool refresh. Maybe this is a bug, but then that commmand's semantic is not clearly stated anywhere.

I found a simple workaround: as we're no longer interested in the data as it is in its current state, I can simply drop the KS of CF and rebuild it after with the data I get from the restore.

In the end, the procedure is like this:

  • Drop and recreate the CF or KS.
  • For all nodes, in parallel if posssible:
    • Remove the snapshot created at drop time2.
    • Restore the snapshot and move the data files to the right place.
    • nodetool refresh.

  1. Of course how many minimal nodes you need to restore depends on how easy is to restore the data in the nodes and how's your data's replication options and factors. 

  2. I'm not sure this is documented anywhere. 

Incron cannot create watch for system table

Today I started playing with incron. If you don't know it, it's a nifty tool for launching scripts on file system events. The configuration is very simple, just some "incrontab" files in /etc/incron.d. Each line in those files has 3 fields: directory to watch, a comma-separated list of inotify events, and the script and parameters to launch.

I was doing some tests when I started to get this kind of messages in the logs:

Mar  1 09:42:30 test-machine incrond[32310]: cannot create watch for system table foo: (2) No such file or directory

I checked and rechecked several times, but the directories I was watching existed, in the same machine. That is, no obvious mistake. Then I stopped incron and launched it in strace and the problem became evident:

inotify_add_watch(9, "#", 0)            = -1 ENOENT (No such file or directory)

It happens that incron does not support comments in the incrontab files, so it treats the # at the beginning of the line as the path to watch. Silly thing, if you ask me.

Actually, if you go to its BTS, you'll find bug reports about it12 and some others about its incrontab file parsing34. I'm pretty sure there must be more around.

So, for now, my "solution" is to count the lines that begin with #, count the lines that report this errors, double check, and if they coincide, just ignore them. Otherwise, as incron stupidly does not report which file it didn't find, just stop the service and run it under strace -ff -e file.

There are other caveats with incron. For instance, cron and at launch the commands with sh (or the shell you define), but incron doesn't, so you can't easily do any redirections, and any output is lost, even the stderr. I couldn't get it to do any kind of redirection, so I ended up doing a wrapper. Lastly, it can't watch more than once the same dir with the same set of flags.


  1. http://bts.aiken.cz/view.php?id=173 

  2. http://bts.aiken.cz/view.php?id=424 

  3. http://bts.aiken.cz/view.php?id=539 

  4. http://bts.aiken.cz/view.php?id=571 

Monitoring Cassandra with groovy

One of my job's new developments is that we'll start using Cassandra as the database for some of our webservices. The move was decided mainly because of the lack of SPoF and easy adition of a column, which happens rather often in our environment.

One of the tasks we've are in charge of is to monitor the system. Most of the interesting values to monitor in a Cassandra setup can be obtained with various commands of nodetool, but not values from the JVM running the Cassandra instance. So I turned to my closest Java guru, who recommended doing a script in groovy. After playing a little with the Java-like language, I got this:

import javax.management.ObjectName
import javax.management.remote.JMXConnector
import javax.management.remote.JMXConnectorFactory
import javax.management.remote.JMXServiceURL

jmxEnv = [(JMXConnector.CREDENTIALS): (String[])["user", "pass"]]

def serverUrl = 'service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi'
def server = JMXConnectorFactory.connect(new JMXServiceURL (serverUrl), jmxEnv).MBeanServerConnection

mBeanNames= [ 
    "java.lang:type=ClassLoading", 
    "java.lang:type=Compilation", 
    "java.lang:type=Memory", 
    "java.lang:type=Threading",

    "org.apache.cassandra.db:type=Caches",
    "org.apache.cassandra.db:type=Commitlog",
    "org.apache.cassandra.db:type=CompactionManager",
    "org.apache.cassandra.db:type=StorageProxy",
    "org.apache.cassandra.db:type=StorageService",

    "org.apache.cassandra.internal:type=AntiEntropyStage",
    "org.apache.cassandra.internal:type=FlushWriter",
    "org.apache.cassandra.internal:type=GossipStage",
    "org.apache.cassandra.internal:type=HintedHandoff",
    "org.apache.cassandra.internal:type=InternalResponseStage",
    "org.apache.cassandra.internal:type=MemtablePostFlusher",
    "org.apache.cassandra.internal:type=MigrationStage",
    "org.apache.cassandra.internal:type=MiscStage",
    "org.apache.cassandra.internal:type=StreamStage",

    "org.apache.cassandra.metrics:type=ClientRequestMetrics,name=ReadTimeouts",
    "org.apache.cassandra.metrics:type=ClientRequestMetrics,name=ReadUnavailables",
    "org.apache.cassandra.metrics:type=ClientRequestMetrics,name=WriteTimeouts",
    "org.apache.cassandra.metrics:type=ClientRequestMetrics,name=WriteUnavailables",

    "org.apache.cassandra.net:type=FailureDetector",
    "org.apache.cassandra.net:type=MessagingService",
    "org.apache.cassandra.net:type=StreamingService",


    "org.apache.cassandra.request:type=MutationStage",
    "org.apache.cassandra.request:type=ReadRepairStage",
    "org.apache.cassandra.request:type=ReadStage",
    "org.apache.cassandra.request:type=ReplicateOnWriteStage",
    "org.apache.cassandra.request:type=RequestResponseStage",
    ]

def dumpMBean= { name ->
    println "$name:"

    // get a proxy MBean for the class
    bean= new GroovyMBean (server, name)
    // get the attributes
    attrs= bean.listAttributeNames ()
    // get an AttrlibuteList, previous cast (!) of Array<String> to String[]
    attrMap= server.getAttributes (bean.name(), (String [])attrs)

    attrMap.each { kv ->
        // kv is an Attribute
        key= kv.name
        // skip RangeKeySample, it can be 15MiB big or more...
        if (key!="RangeKeySample") {
            value= kv.value
            println "\t$key: $value"
        }
    }

    println ""
}

// dump singletons
mBeanNames.each { name ->
    dumpMBean (name)
}

// dump keyspaces and their column families
args.each { ks_cfs ->
    split= ks_cfs.tokenize ('=')
    ks= split[0]
    cfs= split[1].tokenize (',')

    cfs.each { cf ->
        dumpMBean ("org.apache.cassandra.db:type=ColumnFamilies,keyspace=$ks,columnfamily=$cf")
    }
}

In particular we dump its output to a text file and we process it afterwards to pick the values we want to monitor and graph. As we're not yet in production, we hadn't settled on which values we're going to monitor.

Logging shell scripts' executions

Small hack. Do you have a bash1 script that runs via cron or incron or any other daemon, and you wanna debug it by at least tracing its execution? Well, simply surround it by:

(

set -x

...

) > /tmp/$0-$(date +%F.%T).log

But! Do not make the same mistake I did and forget to redirect stderr too2!:

) > /tmp/$0-$(date +%F.%T).log 2>&1

I know there is a shortcut for that, but I'm too lazy to look it up.


  1. I'm not sure this works in sh

  2. Othewise the trace does not appear in the log, because it's output to stderr

Building tilesets with tilemill

Still in my search for the perfect map for my N900, I started growing envious in Google's relief mode. I have to confess that I am a map junkie (I've been amazed with them since ever) and, for me, mountains are an importantant part of city life's background, if not the center of certain activities like trekking, skiing or climbing. So having a map application like marble feeded with maps I designed with high contrast and hill shading is good, but not enough, but it seemed that, short of abusing gmaps' tiles, there was no option for now.

Then a few days ago I crossed an article in InfoWorld that talks about alternatives, and I got hooked with the first one, TileMill, particularily because of its page on working with terrain data. So I'll try to explore it in this post, or at least install it.

First things first, download the code and the dependencies:

build-essential
libwebkit-dev
libmapnik
libmapnik-dev
mapnik-utils
nodejs
nodejs-dev
npm

libmapnik does not exist in Debian sid, but installing libmapnik-dev takes care of it. In my case, a machine that barely has build-essential, we're talking about 100+MiB in packages. It also needs libsigc++-dev, which is not listed as a dep.

Then after the compilation finished (I had to do an ugly sudo ln -s /usr/lib/i386-linux-gnu/sigc++-2.0/include/sigc++config.h /usr/include/sigc++-2.0/ trick to make the compiler find that file) I got this:

$ ./index.js

module.js:337
    throw new Error("Cannot find module '" + request + "'");
          ^
Error: Cannot find module '../build/Release/node_sqlite3.node'
    at Function._resolveFilename (module.js:337:11)
    at Function._load (module.js:279:25)
    at Module.require (module.js:359:17)
    at require (module.js:375:17)
    at Object.<anonymous> (/home/mdione/src/projects/tilemill/node_modules/sqlite3/lib/sqlite3.js:1:104)
    at Module._compile (module.js:446:26)
    at Object..js (module.js:464:10)
    at Module.load (module.js:353:32)
    at Function._load (module.js:311:12)
at Module.require (module.js:359:17

I'm gonna say in my defense that I know nothing about node.js, so I can't really say, but when a module I just installed, and one with so many users as I think the sqlite3 one must have, doesn't work because of some path problem, I think it smells fishy.

Debian to the rescue. I simply:

$ rm -rf node_modules/sqlite3/
$ sudo apt-get install node-sqlite3

And it runs without a hitch:

$ ./index.js
[tilemill] Creating configuration dir /home/mdione/.tilemill
[tilemill] Creating files dir /home/mdione/Documents/MapBox
[tilemill] Creating export dir /home/mdione/Documents/MapBox/export
[tilemill] Creating project dir /home/mdione/Documents/MapBox/project
[tilemill] Started [Server Tile:20008].
[tilemill] Creating cache dir /home/mdione/Documents/MapBox/cache
[tilemill] Plugin [editor] loaded.
[tilemill] Plugin [carto] loaded.
[tilemill] Plugin [templates] loaded.
[tilemill] Plugin [fonts] loaded.
[tilemill] Started [Server Core:20009].
[tilemill] Client window created (pass --server=true to disable this)
[tilemill] Checking for new version of TileMill...
[tilemill] npm http GET https://registry.npmjs.org/tilemill
[tilemill] npm
[tilemill]  http 200 https://registry.npmjs.org/tilemill
[tilemill] Latest version of TileMill is 0.9.1.

Update

When running TileMill and trying to do anything I got this message:

Unknown child node in 'Map': 'Parameters'

As per this post and TileMill's instructions to build from source, and I quote, «If the error has to do with node-mapnik then you likely have a Mapnik version that is too old». So I sat down, did some judo takes to put mapnik from git into very nice Debian packages. I'm not proud about the battle; as all wars it was messy, and not all of it was completely legal. But I can assure you that after installing an unreleased version of it, and rebuilding its binding for node.js (see the "Updating" section at the bottom of the build from source, but rm -rf node_modules/mapnik/ first), it works.

TODO:

  • Resolve more dependencies with Debian packages.
  • Play with it
  • ...
  • Profit!

Marble using cloudmade tiles with hillshading

Yesterday1 I wrote: «Then I found out that it wouldn't be easy to use the tiles [generated by CloudMade] with marble». How wrong was I.

Here you can read how to access the tiles in CM's servers. The only thing you need is an API key, which you can get once you registered (it practically begs you to get one :). Another piece of the puzzle is marble itself. I was specting to copy OSM's service description file and bend it and twist it until it worked with CM's tiles. But things are, luckily, much more simple.

In marble you simply do «File», «Create a new map...», then select «Online map providing indexed tiles», which clearly describes CM. Next, you have to provide and URL template, which you can simply derive from the doc in CM's site. Mine's like this:

http://tile.cloudmade.com/5f806ad32bb44b38a464020fa2223193/51084/256/{zoomLevel}/{x}/{y}.png

You, of course, will have to change the API key and the style number for yours. One thing I found is that the @2x doesn't seem to work with marble; without it it works just fine. Maybe it's a User-Agent thing. Then you keep answering what marble asks you in the wizard. By the end of it, you'll have marble working with your tiles! It's amazing.

One extra hack was to add support for hillshading. These are tiles that are alpha-blended with OSM's, wich gives you the impression of viewing a 3D map (bah, at least in the covered regions with mountains). The service files (.dgml files) are actually XML files quite easy to edit if you defocus and just look at the big picture ot if. So I just simply opened both files (OSM's is in /usr/share/kde4/apps/marble/data/maps/earth/openstreetmap/openstreetmap.dgml, the one you created in $HOME/.local/share/marble/maps/earth/foo/foo.dgml), copied the texture tag that describes hillshading, pasted it in my .dgml file, saved, reopened marble and... voilà! Even better, as hillshading is actually another tile server, marble can share it for OSM's, OSMaRenderer and yours.

The final result is simply amazing. You can see a screenshot here.

More on my use of marble and OSM pretty soon.


  1. When I say "yesterday", I actually mean "almost 7 months ago". That's what took me to fix my glob and come back to writing, finishing this post.