Changing Denon AH-GC20 battery

A few years back I bit the bullet and bought noise cancelling headphones. I was hesitant because it costed way more than the EUR 22 I paid for my (later found to be) crappy Sennheiser. The reason for buying one were double: I had problems concentrating in a noisy office, and my wife found an almost 50% discount on the price asked for the Denon AH-GC20.

I have to say that I loved them since the day I tried. They're not only circumaural, but the pads are also flat against the head, making a very good sealing, to the point that I have almost never used the NC. They're Bluetooth, and they also work with a detachable cable for when the battery is dead.

Which bring us to 2 or 3 years later. The battery stopped charging after only one year, so they became wired full time since. This means not BT and no NC. Initially I didn't care too much because I changed the company I work for, and now I can choose where I work. But now I have meetings at the time I should be cooking dinner, so I started missing the BT support again.

Showing them to a friend, I accidentally found that the padding is glued to a ring of plastic that clips to the rest of the headphone. Later, and by chance, I started trying to disassemble the right one first, because I thought the battery was on the left one, and I preferred to practice on the simpler one. In fact, the battery resides on the right one.

After removing the padding I was confronted to what looked like a glue-sealed structure. But stretching a little the fabric that covers the speakers I found 8 holes. I made small holes in the fabric and started poking until I saw a T5 head. I removed all 8 screws, but later found that only those in the long and short axis of the ellipsoid were needed. Also, one of the screws was deeper than what an usual T5 bit can reach, so I had to borrow a T5 screwdriver. Once the 4 screws are gone, the lid with the logo came off easily.

The next step was to find a battery replacement. The original one is a LiPo 600mAh one with the number 383450 on it. This can be read as 38x34x5.0mm, which mostly matches the actual size of the battery. Asking around, I was told that the battery's capacity is irrelevant, because most changing circuits use voltage as a measure of charge state, and at worst it would take more time to recharge.

Biting the bullet a second time, I bought a 800mAh battery for around EUR 10. It came with a connector that I planned to use to connect it to the headphones: instead of soldering, which I'm very bad at to the point I don't have a soldering iron, I planned to cut the original cables close to the old battery, strip a little the ends and insert them in the connector.

Life is never simple. The tinny tiny wires are too thin for making good connection and the insulator is too thick to fit in the holes. At the end I dismantled the connector, open the receivers a little bit, passed the insulators with the wires 'combed' back, and cramped the receivers with pliers. To make it officially bad quality, I used masking tape to keep them from shorting.

I hesitated before plugging it in because I really fear this lithium stuff. Changing a battery that is supposed to be replaced is one thing, and playing with chemical fire 2cm from from your right ear is another. Happily I'm not an idiot and I properly matched the black wire with the black wire, and the red one with its peer. I plugged the USB cable, and around 1h later the thing was full. I unplugged it, put it on, and turned it on. It greeted me with its 'Waiting for connection' sound and I smiled. I tried with the computer, it connected and... music to my ears! Wirelessly! For the first time in 2 years! And for less than EUR 10, a couple of holes in a piece of fabric I never see, and... a bulging headphone.

See, batteries have in their specs not only the sizes, but also the error tolerance, which for these sizes seems tom be around ±0.2-0.3mm. I have the impression that this is related to the fact that these batteries come wrapped in some kind of malleable aluminium sheet. In any case, I took the risk and bought a 1mm thicker battery, and it shows. I thought I had some space and I could cut some plastic for the extra mm, but on second inspection it's not true. I will have to hunt for a thinner battery soon, but for the moment I'm happy with my meeting/cooking sessions :)

osm-tile-tools v1.0

Today I finally sat down and spun off generate_tiles.py to its own repository, so people can follow its development without having to clone my own osm-carto fork. This happened just after I finished making the storage thread optional, because I usualy have as many rendering threads as cores I have, so that extra thread was not only compiting with them, but also it spent some time de/marshaling the metatiles between processes, so I'm not sure it's worth it. Maybe if I had more cores?

The new repo is here. There are some other tools there (hence the tools in the repo's name), but they're not so polished or documented. You're free to look and ask :)

What if I do a release, I hear (it must be the voice in my head)? Why not! I even decided to go bold and tagged it as v1.0.

Remotely upgrading CentOS6 to CentOS7

At $WORK we're in the middle of a massive OS upgrade in all of our client's appliances. Unfortunately, the OS is CentOS, which has no official method to do it except for 'reinstall'. This of course is unacceptable for us, we need an automatic system that can do it remotely. We have developed such a thing, mostly based on RedHat's official tool, redhat-upgrade-tool (rut). There are plenty of tutorials on how to do it on the web.

All was fine until we hit clients using UEFI instead of BIOS1. Unluckily rut does not seem to handle the grub to grub2 transition wery well in this case. I had to create a dummy RPM package that, if it detects that the machine boots via EFI, generates a grub2 config file in /boot/efi/EFI/centos/grub.cfg and runs efibootmgr to tell the EFI system to use the EFI partition and file to boot. Here's the whole %posttrans section:

(
    set -x

    efi_parition=$(mount | awk '$3 == "/boot/efi" { print $1 }' | cut -d / -f 3)

    if [ -n "${efi_parition}" ]; then
        # create grub2 config file for the EFI module
        mkdir -pv /boot/efi/EFI/centos
        grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

        # it's a shame we have to this here and then efibootmgr it's going to do the inverse
        # convert /boot/efi -> /dev/fooX -> foo + X
        efi_disk=$(find -H /sys/block/* -name ${efi_parition} | cut -d / -f 4)
        # we can't just cut the last char (part number could be beyond 9)
        # and we can't assume it's just after the disk name (disk: nvme0n1, partition: nvme0n1p1, part_num: 1)
        efi_parition_number=$(echo ${efi_parition} | egrep -o '[0-9]+$')

        # create an entry in the EFI boot manager and make it the only one to boot
        efibootmgr --create --bootnum 0123 --label grub2 --disk "/dev/${efi_disk}" \
            --part ${efi_parition_number} --loader \\EFI\\centos\\grubx64.efi --bootorder 0123
    fi
) &> /tmp/upgrade-centos-6.9-7.2.log

Another part of the automatic upgrade script detects EFI, installs grub2-efi and that other RPM, so the code is executed during the actual process of upgrading to CentOS7.

But the gist of this post is about how did I manage to test such a thing. Luckily we had 3 Lenovo servers laying around, but I could only reach them via ssh and IPMI. If you don't know, BMC/IPMI is a 'second' computer in your computer with which, among many, many other things, you can remotely turn on/off your computer, access the console remotely and even upload an ISO image and mount it in the 'main' machine as if it was a USB CD-ROM disk. This last one will come handy later. Notice that this is the system in the middle of Bloomberg's article about China infiltrating into companies, so you better get acquainted to it.

These Lenovo machines already had CentOS7 installed, so the first step was to install CentOS6. This should be possible by copying a CentOS6's installer and booting it via grub2. But before that I also had to enable UEFI boot. This broke booting, because the existing system had grbu2 installed in a disk's MBR instead on a EFI partition, and the disk had DOS partitions instead of GPT ones.

So the steps I took to fix this were:

On The client machine:

  • Install the icedtea-plugin (the remote console is a java applet).
  • Download Super Grub2 Disk.
  • Connect via IPMI to the server, request the remote console.

On the remote server (all this through the IMPI remote console):

  • Login, download CentOS6 installer iso.
  • Tell the virtual device manager to add an ISO, using the SG2D ISO.
  • Reboot the machine.
  • Get into the Setup (another EFI module), enable EFI boot mode, reboot.
  • Start the Boot Selection Menu, boot from the virtual CD.

On SG2D/Grub2's interface:

  • Enable GRUB2's RAID and LVM support (the CentOS6 install ISO was in a LVM part).
  • Selected Print the devices/partitions to find my LVM part.
  • Loosely followed these instructions to boot from the ISO image:
  • Open the Grub2 prompt (Press c).
  • loopback loop (lvm/vg0-root)/root/CentOS-6.9-x86_64-minimal.iso
  • linux (loop)/images/pxeboot/vmlinuz
  • linux (loop)/images/pxeboot/initrd.img
  • boot

This boots the installer, but it's not able to find the source media, so install from a URL. After setting up TCP/IP (fixed in my case), I used the URL http://mirrors.sonic.net/centos/6/os/x86_64/.

Freebie: More than I ever wanted to know about UEFI.


  1. In UEFI systems, the BIOS/Legacy mode is just an EFI module that provides the legacy boot method; that is, boots from a disk's MBR. 

Customizing the Python language

Programming languages can be viewed as three things: their syntax and data model, their standard library and the third party libraries you can use. All these define the expressiveness of the language, and determine what can you write (which problems you can solve) and how easily or not. This post/talk is about how expressive I think Python is, and how easy it is or not to change it.

I said that we solve problems by writing (programs), but in fact, Python can solve several problems without really writing a program. You can use the interpreter as a calculator, or use some of the modules a programs:

$ python3 -m http.server 8000

With that you can serve the current directory via HTTP. Or do this:

$ python3 -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 3: 30.2 usec per loop
$ python3 -m timeit '"-".join([str(n) for n in range(100)])'
10000 loops, best of 3: 27.5 usec per loop
$ python3 -m timeit '"-".join(map(str, range(100)))'
10000 loops, best of 3: 23.2 usec per loop

to check which method is faster. Notice that these are modules in the standard library, so you get this functionality out of the box. Of course, you could also install some third party module that has this kind of capability. I find this way of using modules as programs very useful, and I would like to encourage module writers to consider providing such interfaces with your modules if you think it makes sense.

Similarly, there are even programs written in Python that can also be used as modules, which I think should also be considered by all program writers. For instance, I would really like that ssh was also a library; of course, we have paramiko, but I think it's a waste of precious developer time to reimplement the wheel.

The next approach I want to show is glue code. The idea is that you take modules, functions and classes, use them as building blocks, and write a few lines of code that combine them to provide something that didn't exist before:

import centerlines, psycopg2, json, sys, shapely.geometry, shapely.wkt, shapely.wkb

tolerance = 0.00001

s = sys.stdin.read()
data = json.loads(s)
conn = psycopg2.connect(dbname='gis')

ans = dict(type='FeatureCollection', features=[])

for feature in data['features']:
    shape = shapely.geometry.shape(feature['geometry'])

    shape = shape.simplify(tolerance, False)
    skel, medials = centerlines.skeleton_medials_from_postgis(conn, shape)
    medials = centerlines.extend_medials(shape, skel, medials)
    medials = shapely.geometry.MultiLineString([ medial.simplify(tolerance, False)
                                                 for medial in medials ])

    ans['features'].append(dict(type='Feature',
                                geometry=shapely.geometry.mapping(medials)))

s = json.dumps(ans)
print(s)

This example does something quite complex: it takes a JSON representation of a polygon from stdin, calculates the centerline of that polygon, convert is back to a JSON representation and outputs that to stdout. You could say that I'm cheating; most of the complexity is hidden in the shapely and centerlines modules, and I'm using PostgreSQL to do the actual calculation, but this is what we developers do, right?

Once the building blocks are not enough, it's time to write our own. We can write new functions or classes that solve or model part of the problem and we keep adding glue until we're finished. In fact, in the previous example, centerlines.skeleton_medials_from_postgis() and centerlines.extend_medials() are functions that were written for solving this problem in particular.

But the expressiveness of the language does not stop at function or method call and parameter passing; there are also operators and other protocols. For instance, instead of the pure OO call 2.add(3), we can simply write 2 + 3, which makes a lot of sense given our background from 1st grade. Another example which I love is this:

file = open(...)
line = file.readline()
while line:
    # [...]
    line = file.readline()
file.close()

versus

file = open(...)
for line in file:
    # [...]
file.close()

The second version is not only shorter, it's less error prone, as we can easily forget to do the second line = file.readline() and iterate forever on the same line. All this is possible thanks to Python's special methods, which is a section of the Python reference that I definitely recommend reading. This technique allowed me to implement things like this:

command1(args) | command2(args)

which makes a lot of sense if you have a shell scripting background; or this:

with cd(path):
    # this is executed in path

# this is executed back on the original directory

which also will ring a bell for those of you who are used to bash (but for those of you who don't, it's written as ( cd path; ... )). I can now even write this:

with remote(hostname):
    # this body excecutes remotely in hostname via ssh

Following this same pattern with the file example above, we can even simplify it further like so:

with open(...) as file:
    for line in file:
        # [...]

This has the advantage that not only relieves us from closing the file, that would happen even if an unhandled exception is raised within the with block.

Special methods is one of my favorite features of Python. One could argue that this is the ultimate language customization, that not much more can be done. But I'm here to tell you that there is more, that you can still go further. But first let me tell you that I lied to you: the pipe and remote() examples I just gave you are not (only) implemented with special methods. In fact, I'm using a more extreme resource: AST meddling.

As any other programming language, Python execution goes through the steps of a compiler: tokenizing, parsing, proper compilation and execution. Luckily Python gives us access to the intermediate representation between the parsing and compilation steps, know as Abstract Syntax Tree, using the ast.parse() function. Then we can modify this tree at our will and use other functions and classes in the ast module to make sure this modifications are still a valid AST, and finally use compile() and exec() to execute the modified tree.

For instance, this is how I implemented |:

class CrazyASTTransformer(ast.NodeTransformer):
    def visit_BinOp(self, node):
        if type (node.op) == BitOr:
            # BinOp( left=Call1(...), op=BitOr(), right=Call2(...) )
            update_keyword(node.left,
                           keyword(arg='_out', value=Name(id='Pipe', ctx=Load())))
            update_keyword(node.left,
                           keyword (arg='_bg', value=Name(id='True', ctx=Load())))
            ast.fix_missing_locations(node.left)
            update_keyword(node.right, keyword(arg='_in', value=node.left))
            node = node.right
            # Call2(_in=Call1(...), _out=Pipe, _bg=True)

        return node

I used Call1 and Call2 to show which is which; they're really ast.Call objects, which represent a function call. Of course, once I rewrote the tree, most of the code for how the commands are called and how the pipe is set up is in the class that implements commands, which is quite more complex.

For remote() I did something even more extreme: I took the AST of the body of the context manager, I pickle()'d it, added it as an extra parameter to remote(), and replaced it with pass as the body of the context manager, so the AST becomes the equivalent of:

with remote(hostname, ast_of_body_pickled):
    pass

When the context manager really executes, I send the AST over the ssh connection together with the locals() and globals() (its execution context), unpickle in the other side, restore the context, continue with the compile()/exec() dance, and finally repickle the context and send it back. This way the body can see its scope, and its modifications to it are seen in the original machine.

And that should be it. We reached the final frontier of language customization, while maintaining compatibility, through the AST, with the original interpreter...

Or did we? What else could we do? We certainly can't1 modify the compiler or the execution Virtual Machine, and we already modify the AST, can we do something with Python's tokenizer or parser? Well, like the compiler and the VM, they're written in C, and modifying them would force us to fork the interpreter, with all the drawbacks of maintaining it. But can we make another parser?

On one hand, the Python standard library provides a couple of modules to implement your own parsers: tokenize and parser. If we're inventing a new language, this is one way to go, but if we just want a few minor changes to the original Python language, we must implement the whole tokenizer/parser pair. Do we have other options?

There is another, but not a simple one. pypy is, among other things, a Python implementation written entirely in (r)Python. This implementation runs under Python legacy (2.x), but it can parse and run current Python (3.x) syntax4. This implementation includes the tokenizer, the parser, its own AST implementation2, and, of course, a compiler and the VM. This is all free software, so we can3 take the tokenizer/parser combination, modify it at will, and as long as we produce a valid (c)Python AST, we can still execute it in the cPython compiler/VM combination.

There are three main reasons to modify this code. First, to make it produce a valid cPython AST, we will need to modify it a lot; cPython's compile() function accepts only ASTs built with instances of the classes from the ast module (or str or bytes5), it does not indulge into duck-typing. pypy produces ASTs with instances of its own implementation of the ast module; rewriting the code is tiresome but not difficult.

Second, on the receiving side, if we're trying to parse and execute a particular version of Python, we must run it at least under the oldest Python version that handles that syntax. For instance, when I wanted to support f-strings in my language, I had no option but to run the language on top of Python-3.6, because that's when they were introduced. This meant that a big part of the modifications we have to do is to convert it to Py3.

Finally, we must modify it so it accepts the syntax we want; otherwise, why bother? :)

So what do we get with all this fooling around? Now we can modify the syntax so, for instance, we can accept expressions as keyword argument names, or remove the restriction that keyword and positional arguments must be in a particular order:

grep(--quiet=True, 'mdione', '/etc/passwd')

After we modify the parser, it's able to generate an AST, but this AST is invalid because the compiler will reject it. So we still have to recourse to more AST meddling before passing it to the compiler. What I did for the parameter meddling was to create a o() function which accepts a key and a value, so --quiet=True becomes the AST equivalent of o('--quiet', True). Once we've finished this meddling, the original, official, unmodified interpreter will happily execute our monster.

All of these techniques are used in ayrton in some way or another, even the first one: I use python3 -m unittest discover ayrton to run the unit tests!


  1. Well, technically we can, it's free software, remember! 

  2. The cPython AST, while being part of the standard library, is not guaranteed to be stable from versions to version, so we can't really consider it as part of the API. I think this is the reason why other implementations took the liberty to do it their own way. 

  3. ... as long as we respect the license. 

  4. In fact some of the work is implemented in the py3.5 branch, not yet merged into default. I'm using the code from this branch. 

  5. This would also be another avenue: feed compile() the definite bytecode, but that looks like doing a lot of effort, way more than what I explain here. 

Third party apps not working on Fairphone OS 18.09.2

More than a year ago I bought a FairPhone2 because that's what a geek with social responsible inclinations (and some hardware hacking) does. It came with Android Marshmallow (aka 6), but last December I bit the bullet and upgraded to Nougat (7.1). Also, as any megacorporation paranoid geek would do, I don't have a Google account (even when 90%+ of my mails ends up in their humongous belly, but who uses mail nowadays anyways...), so I have been using it with F-Droid and Yalp Store.

The upgrade went smoothly, and almost right after it I was poking aroung the Yalp Store when I saw several system updates, including the Android System WebView. This component is the one responsible of showing web content in your apps, and, believe me, you use it more than you think. The new Android came with version 67.0.3396.87, and Yalp Store was offering v71.0.3578.99, so I didn't think about it and installed the further upgrade, along with most of the apps that I knew were not installed through F-Droid1. There's also the fact that since Nougat, ASWB is deprecated in favor of embracing Chrome, but I have it disabled in my phone, just like most of the Google Apps (including Google Play Services).

The issue came when I tried to launch the official Selfoss reader. The list of articles worked fine, but trying to read one made the app crash. Even worse were the two homebanking apps I have: they didn't even show their main screen.

Thanks to a small troubleshooting session with jochensp in the #fairphone ICR channel, we found out that in fact it was a ASWB problem (hint: use adb logcat). Once more I had to use one of those don't-know-how-shady-it-is APK mirror sites (I used APKMirror, if you're curious, but don't blame me if the soft you install from there comes with all kinds of troyans).

The first thing I tried was to downgrade to the original version, so I downloaded the closest one I found (they didn't have the exact version, which makes me wonder how ofthen do they scan apps for upgrades), but downgrades don't work, even with adb install -r -d. For some reason, the same site offered a newer version than Yalp Store (72.0.3626.53, which I just found out it's a beta version!), so I upgraded (manual download + install) and that fixed it!


  1. There's an issue where Yalp Store tries to manage the apps installed via F-Droid by offering the versions available in Google Play, but most if the time it doesn't work because F-Droid recompiles everything and I think the keys are different. I hadn't compared if the versions offered by YS are really newer than those in FD). 

pefan

A few weeks ago I needed to do some line based manipulation that kinda went further of what you can easyly do with awk. My old-SysAdmin brain kicked in and the first though was, if you're going to use awk and sed, you might as well use perl. Thing is, I really can't remember when was the last time I wrote even a oneliner in perl, maybe 2011, in my last SysAdmin-like position.

Since then I've been using python for almost anything, so why not? Well, the python interpreter does not have an equivalent of perl's -n switch; and while we're at it, -a, -F, -p are also interesting for this.

So I wrote a little program for that. Based on those switch names, I called it pefan. As python does not have perl's special variables, and in particuar, $_ and @_, the wrapper sets the line variable for each line of the input, and if you use the -a or -F switches, the variable data with the list that's the result of splitting the line.

Meanwhile, while reading the perlrun manpage to write this post, I found out that -i and even -s sound useful, so I'll be adding support for those in the future. I'm also thinking of adding support for curly-brace-based block definitions, to make oneliners easier to write. Yes, it's a travesty, but it's all in line with my push to make python more SysAdmin friendly.

In the meantime, I added a couple of switches I find useful too. See the whole usage:

usage: pefan.py [-h] [-a] -e SCRIPT [-F SPLIT_CHAR] [-i] [-M MODULE_SPEC]
                [-m MODULE_SPEC] [-N] [-n] [-p] [--no-print] [-r RANDOM]
                [-s SETUP] [-t [FORMAT]] ...

Tries to emulate Perl's (Yikes!) -peFan switches.

positional arguments:
FILE                  Files to process. If ommited or file name is '-',
                      stdin is used. Notice you can use '-' at any point in
                      the list; f.i. "foo bar - baz".

optional arguments:
-h, --help            show this help message and exit
-a, --split           Turns on autosplit, so the line is split in elements.
                      The list of e lements go in the 'data' variable.
-e SCRIPT, --script SCRIPT
                      The script to run inside the loop.
-F SPLIT_CHAR, --split-char SPLIT_CHAR
                      The field delimiter. This implies [-a|--split].
-i, --ignore-empty    Do not print empty lines.
-M MODULE_SPEC, --import MODULE_SPEC
                      Import modules before runing any code. MODULE_SPEC can
                      be MODULE or MODULE,NAME,... The latter uses the 'from
                      MODULE import NAME, ...' variant. MODULE or NAMEs can
                      have a :AS_NAME suffix.
-m MODULE_SPEC        Same as [-M|--import]
-N, --enumerate-lines
                      Prepend each line with its line number, like less -N
                      does.
-n, --iterate         Iterate over all the lines of inputs. Each line is
                      assigned in the 'line' variable. This is the default.
-p, --print           Print the resulting line. This is the default.
--no-print            Don't automatically print the resulting line, the
                      script knows what to do with it
-r RANDOM, --random RANDOM
                      Print only a fraction of the output lines.
-s SETUP, --setup SETUP
                      Code to be run as setup. Run only once after importing
                      modules and before iterating over input.
-t [FORMAT], --timestamp [FORMAT]
                      Prepend a timestamp using FORMAT. By default prints it
                      in ISO-8601.

FORMAT can use Python's strftime()'s codes (see
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-
behavior).

Go get it here.

Ansible in a box or .iso

At my $NEWJOB I'm in the team that installs the products we sell on the client's machines. Most of the time the client has bought appliances from us, so they come with the installer and some tools for setting them up. Because of the type of product we sell, the customer might have bought between 3 to 12 or more nodes that will form a cluster, and some times they're spread over several data centers.

The system needs a frontend network and a backend one, and the nodes come with two high speed NICs (typically 10Gib), two low speed (1Gib), and a BMC/IPMI interface. The typical use is to bond both high speed NICs and then build two VLANs on top. The atypical use might be whatever the client came up with. One client has bonded each of the high speed NICs with one of the low speed in primary/backup mode, and has two physical networks. Another one does everything through a single interface with no VLANs. This should give you an idea of how disparate the networking setups can be, so the networking has to be custom made for each client.

Our first step is to connect to the nodes and configure networking. The only preconfigured interface is the BMC/IPMI one, which asks for an IPv4 via DHCP. So we connect to the BMC interface. This involves connecting via HTTP to a web interface that is run within the IPMI subsystem, then download a Java application that gives us a virtual KVM so we can use the computer as if we just had connected a keyboard and a monitor to it.

For those who don't know (I didn't before I started this new position), the IPMI/BMC system is a mini computer fully independent of the main system, which boots once the node has power connected to it, but not necessarily powered on. You can turn on/off the machine, divert KVM as I mentioned before, and more, as you'll see. If you're surprised to find out you have more than one computer in your computer, just read this.

Once connected to the node, we run a setup script, to which we feed all the networking info, including static IPs, gateways, DNS servers, timezone, etc. All this for each node. By hand. Slow, error prone, boring.

Let's automate this. The simplest tool I can think of is Ansible. In fact, I also think it's perfect for this. But there's a catch: there's no Ansible installed on the node, there is no way Ansible will be able to talk KVMimplementedasajavaapp-ese, and again, there's no networking yet, so no ssh or any other remote access. But most modern IPMI systems have an extra feature: virtual devices. You can upload iso images and IPMI will present them as a USB cd reader with media inside.

So today's trick involves in creating an iso image with ansible on it that can run on the target system. It's suprisingly easy to do. In fact, it would be as easy as creating a virtualenv, install ansible, add the playbooks and stuff, etc, and create an iso image from that, if it were not for the fact that the image has to be at least less than 50MiB (we have seen this limit on Lenovo systems). Ansible alone is 25MiB of source code, and compiled into .pyc files doubles that. So the most difficult part is to trim it down to size.

Of course, we fisrt get rid of all the .py source code. But not all. Modules and module tools in ansible are loaded from the .py files, so we have to keep those. I can also get rid of pip, setuptools and wheel, as I won't be able to install new stuff for two reasons: one, this is going to be a read only iso image, and two, remember, networking is not setup yet :) Also, ansbile is going to be run locally (--connection local), so paramiko is gone too. Next come all those modules I won't be using (cloud, clustering, database, etc). There a couple more details, so let's just have the script we currently use:

#! /bin/bash
# this is trim.sh

set -e

while [ $# -gt 0 ]; do
    case "$1" in
      -a|--all)
        # get rid of some python packages
        for module in pip setuptools pkg_resources wheel; do
            rm -rfv "lib/python2.7/site-packages/$module"
        done

        shift
        ;;
    esac
done

# cleanup
find lib -name '*.py' -o -name '*.dist-info' | egrep -v 'module|plugins' | xargs rm -rfv

for module in paramiko pycparser; do
    rm -rfv "lib/python2.7/site-packages/$module"
done

ansible_prefix="lib/python2.7/site-packages/ansible"

# trim down modules
for module in cloud clustering database network net_tools notification remote_management source_control web_infrastructure windows; do
    rm -rfv $ansible_prefix/modules/$module
done

# picking some by hand
find $ansible_prefix/module_utils | \
    egrep -v 'module_utils$|__init__|facts|parsing|six|_text|api|basic|connection|crypto|ismount|json|known_hosts|network|pycompat|redhat|service|splitter|urls' | \
    xargs -r rm -rfv
find $ansible_prefix/module_utils/network -type d | egrep -v 'network$|common' | xargs -r rm -rfv
find $ansible_prefix/modules/packaging -type f | \
    egrep -v '__init__|package|redhat|rhn|rhsm|rpm|yum' | xargs -r rm -v
find $ansible_prefix/modules/system    -type f | \
    egrep -v '__init__|authorized_key|cron|filesystem|hostname|known_hosts|lvg|lvol|modprobe|mount|parted|service|setup|sysctl|systemd|timezone' | \
    xargs -r rm -v

Notice that if I was even more space constrained (and it could be possble, if we find another IPMI implementation with smaller staging space) I could go further and make the venv use the Python installed in the system and not the one copied in the venv.

Now, the next step is to fix the venv to be runnable from any place. The first step is to make it relocatable. This fixes all the binaries in bin to use /usr/bin/env python2 instead of the hardcoded path to the python binary copied into the venv. One thing I never understood is why it didn't went a step further and also declared the VIRTUAL_ENV as relative to the path were bin/activate resides. In any case, I do an extra fix with sed and I'm done.

Last step is just to create the iso image. It's been ages since I last generated one by hand, and the resulting command line (which I simply stole from running k3b) resulted more complext than I expected (what happened to sensible defaults?). Here are the interesting parts:

#! /bin/bash

set -eu

ansible-playbook --syntax-check --inventory-file inventory.ini playbook.yaml

./check_config.py

./trim.sh --all

# make relative
/usr/bin/python2 -m virtualenv --relocatable .

# try harder
# this doesn't cover all the possibilities were bin/activate might be sourced
# from, but in our case we have a wrapper script that makes sure we're in a sane place
sed -i -e 's/VIRTUAL_ENV=".*"/VIRTUAL_ENV="$(pwd)"/' bin/activate

genisoimage -sysid LINUX -rational-rock -joliet -joliet-long \
    -no-cache-inodes -full-iso9660-filenames -disable-deep-relocation -iso-level 3 \
    -input-charset utf-8 \
    -o foo.iso .

We threw in some checks on the syntax and contents of the playbook (it's annoying to find a bug when running on the target machine, come back, generate a new iso, upload it, mount, etc). It is possible that you would also like to exclude more stuff in your working directory, so just create a build dir, copy over your files (maybe with rsync --archive --update --delete) and run genisoimage there.

This method produces an iso image 26MiB big that works both in virtual machines, with which I developed this solution, and on some IPMI systems, like the Lenovo I mentioned before. Unluckily I couldn't get my hands on many different systems that have IPMI and not being used for anything else.

One final note about sizing. If you run du on your working/staging directory to see how far are from the limit, use --apparent-sizes, as the iso format packs files better than generic filesystems (in my case I see 26MiB apparent vs 46MiB 'real'; this is due to block sizes and internal fragmentation).

Identity, countries, languages and currencies

I started watching PyCon's videos. One of the first ones I saw is Amber Brown's "How we do identity wrong". I think she1 is right in raising not only the notion of not assuming things related to names, addresses and ID numbers, but also that you shouldn't be collecting information that you don't need; at some point, it becomes a liability.

In the same vein about assuming, I have more examples. One of them is deciding what language you show your site depending on what country the client connects form. I'm not a millennial (more like a transmillennial, if you push me to it), but I tend to go places. Every time I go to a new place, I get sites in new languages, but maps in US!

Today I wanted to book a hotel room. The hotel's site asked me where do I live, so I chose France. Fact is, for them country and language is the same thing (I wonder what would happen if I answer Schweiz/Suisse/Svizzera/Svizra), so I can't say that I live in France but prefer English, so I chose United Kingdom instead. Of course, this also meant that I got prices in GBP, not EUR, so I had to correct that one too. At least I could.

Later they asked me country of residence and nationality; when I chose italian, the country was set to Italia, even when I chose France first!

I leave you all with an anecdote. As I said, I like to go places, most of the times with friends. Imagine the puzzled expression of the police officer that stopped us to find a car licensed in France, driven by an italian, with an argentinian, a spanish and a chilean passangers, crossing from Austria to Slovakia, listening to US music. I only forgot to put the GPS in japanese or something.

So, don't assume; if you assume, let the user change settings to their preferences, and don't ask for data you don't actually need. And please use the user's Accept-Language header; they have it for a reason.


  1. I think that's the pronoun she1: said she1: preferred. I'm sorry if I got that wrong. 

Reprojecting and splitting huge datasets

Another aspect I've been looking into with respect to optimizing the rendering speed is data sources that are not in the target projection. Not being in the target projection forces Mapnik to reproject them on the fly, and this for each (meta)tile, whereas it would make more sense to have them already reprojected.

In my case I have 3 datasets, big ones, in EPSG:4258. The three datasets are elevation, hill and slope shade based on EEA's DEM files. The source tiles amount to almost 29GiB, and the elevation layer, being RGB, takes more than that. So I set off to try to reproject the things.

My first, more obvious approach was to reproject every 5x5°, 18000x18000px tile, then derive the data I need, but I started to get gaps between tiles. Notice that the original data is cleanly cut (5° x 3600"/° x 1px/" == 18000px), without any overlapping.

The next approach was to merge them all in a .vrt file, then reproject chunks of it with gdalwarp. What I wanted as output was the same 5x5° tiles, reprojected, but with an extra pixel, so they overlap. This last requirement was the problematic one. See, the final projection makes any square in the original projection a tall rectangle, stretching more and more towards the poles. The closest I could get was to use the -ts option, but that meant that I didn't get any control about how many extra pixels I got in the vertical/latitude direction. My OCD started thrashing :) In fact what happened was that I was not sure how GDAL would handle the possible partial pixel, whether rounding down (meaning excluding it), up (finishing it), or simply leaving the pixel with partial data and impacting the final rendering.

Even Rouault pointed to me that gdalwarp can do something fantastic: it can generate a .vrt file too with all the parameters needed for the reprojection, so reading from there was automatically reprojecting the original data. The resulting dataset is 288,000x325,220px (the original is 288,000x180,000px), so I'm definitely going to cut it down in small tiles. After consulting with a eight-ball, I decided to discard the idea of tiles with boundaries based on coordinates, which might not even make sense anymore, but settle for pixel based sizes, still with an extra pixel. The chosen size is 2**14+1 a.k.a. 16385. For this gdal_translate is perfect.

The final algorithm is like this:

gdalwarp -t_srs "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 \
    +x_0=0.0 +y_0=0.0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs +over" \
    -r lanczos -tr 30.92208077590933 -30.92208077590933 \
    -of VRT EU-DEM.vrt EU-DEM-corrected.vrt

The values for the -tr option is the pixel size in meters, which is the unit declared in the SRS. Notice that as Mercator stretches towards the poles, this is the size at the origin of the projection; in this case, at 0° lat, 0° lon.

Then the reprojection (by reading from the reprojecting dataset) and cut, in a couple of loops:

tile_size=$((2**14)); \
for i in $(seq 0 17); do
    for j in $(seq 0 5); do
        for k in $(seq 0 3); do
            l=$((4*$j+$k));
            gdal_translate -co BIGTIFF=YES -co TILED=YES -co COMPRESS=LZMA \
                -co LZMA_PRESET=9 \
                -srcwin $(($tile_size*$i)) $(($tile_size*$l)) \
                    $(($tile_size+1)) $(($tile_size+1)) \
                -of GTiff EU-DEM-corrected.vrt \
                $(printf "%03dx%03d-corrected.tif" $i $l) &
        done;
        wait;
    done;
done

There's an extra loop to be able to launch 4 workers at the same time, because I have 4 cores. This doesn't occupy the 4 cores a 100% of the time (cores that already finished stay idle until the other finished), but it was getting akward to express in a Makefile, and this is run only once.

Before deriving the rest of the data there's an extra step: removing those generated tiles that actually have no data. I do a similar thing with empty sea tiles in the rendering process. Notice also that the original data is not tile complete for the covered region (79 tiles instead of the 160 they should be).

Callable choices for django rest framework

At work I'm writing an API using Django/DRF. Suddenly I had to write an application (just a few pages for calling a few endpoints), so I (ab)used DRF's Serializers to build them. One of the problems I faced while doing this was that DRF's ChoiceField accepts only a sequence with the values for the dropdown, unlike Django's, who also accepts callables. This means that once you gave it a set of values, it never ever changes, at least until you restart the application.

Unless, of course, you cheat. Or hack. Aren't those synonyms?

class UpdatedSequence:
    def __init__(self, update_func):
        self.update_func = update_func
        self.restart = True

        self.data = None
        self.index = 0


    def __iter__(self):
        # we're our own iterator
        return self


    def __next__(self):
        # if we're iterating from the beginning, call the function
        # and cache the result
        if self.restart:
            self.data = self.update_func()
            self.index = 0

        try:
            datum = self.data[self.index]
        except IndexError:
            # we reached the limit, start all over
            self.restart = True
            raise StopIteration
        else:
            self.index += 1
            self.restart = False

        return datum

This simple class tracks when you start iterating over it and calls the function you pass to obtain the data. Then it iterates over the result. When you reach the end, it marks it to start all over, so the next time you iterate over it, it will call the function again. The function you pass can be the all() method of a QuerySet or anything else that goes fetch data and returns an iterable.

In my case in particular, I also added a TimedCache so I don't read twice the db to fill two dropdown with the same info in the same form:

class TimedCache:
    '''A function wrapper that caches the result for a while.'''
    def __init__(self, f, timeout):
        self.f = f
        self.timeout = timeout
        self.last_executed = None
        self.cache = None
        self.__name__ = f.__name__ + ' (Cached %ds)' % timeout


    def __call__(self):
        now = time.monotonic()

        if self.cache is None or (now - self.last_executed) > self.timeout:
            self.cache = self.f()
            self.last_executed = now

        return self.cache

I hope this helps someone.