A few weeks ago I needed to do some line based manipulation that kinda went further of what you can easyly do with awk. My old-SysAdmin brain kicked in and the first though was, if you're going to use awk and sed, you might as well use perl. Thing is, I really can't remember when was the last time I wrote even a oneliner in perl, maybe 2011, in my last SysAdmin-like position.

Since then I've been using python for almost anything, so why not? Well, the python interpreter does not have an equivalent of perl's -n switch; and while we're at it, -a, -F, -p are also interesting for this.

So I wrote a little program for that. Based on those switch names, I called it pefan. As python does not have perl's special variables, and in particuar, $_ and @_, the wrapper sets the line variable for each line of the input, and if you use the -a or -F switches, the variable data with the list that's the result of splitting the line.

Meanwhile, while reading the perlrun manpage to write this post, I found out that -i and even -s sound useful, so I'll be adding support for those in the future. I'm also thinking of adding support for curly-brace-based block definitions, to make oneliners easier to write. Yes, it's a travesty, but it's all in line with my push to make python more SysAdmin friendly.

In the meantime, I added a couple of switches I find useful too. See the whole usage:

usage: pefan.py [-h] [-a] -e SCRIPT [-F SPLIT_CHAR] [-i] [-M MODULE_SPEC]
                [-m MODULE_SPEC] [-N] [-n] [-p] [--no-print] [-r RANDOM]
                [-s SETUP] [-t [FORMAT]] ...

Tries to emulate Perl's (Yikes!) -peFan switches.

positional arguments:
FILE                  Files to process. If ommited or file name is '-',
                      stdin is used. Notice you can use '-' at any point in
                      the list; f.i. "foo bar - baz".

optional arguments:
-h, --help            show this help message and exit
-a, --split           Turns on autosplit, so the line is split in elements.
                      The list of e lements go in the 'data' variable.
-e SCRIPT, --script SCRIPT
                      The script to run inside the loop.
-F SPLIT_CHAR, --split-char SPLIT_CHAR
                      The field delimiter. This implies [-a|--split].
-i, --ignore-empty    Do not print empty lines.
                      Import modules before runing any code. MODULE_SPEC can
                      be MODULE or MODULE,NAME,... The latter uses the 'from
                      MODULE import NAME, ...' variant. MODULE or NAMEs can
                      have a :AS_NAME suffix.
-m MODULE_SPEC        Same as [-M|--import]
-N, --enumerate-lines
                      Prepend each line with its line number, like less -N
-n, --iterate         Iterate over all the lines of inputs. Each line is
                      assigned in the 'line' variable. This is the default.
-p, --print           Print the resulting line. This is the default.
--no-print            Don't automatically print the resulting line, the
                      script knows what to do with it
-r RANDOM, --random RANDOM
                      Print only a fraction of the output lines.
-s SETUP, --setup SETUP
                      Code to be run as setup. Run only once after importing
                      modules and before iterating over input.
-t [FORMAT], --timestamp [FORMAT]
                      Prepend a timestamp using FORMAT. By default prints it
                      in ISO-8601.

FORMAT can use Python's strftime()'s codes (see

Go get it here.

python pefan sysadmin

Posted jue 15 nov 2018 22:27:38 CET Tags: sysadmin

At my $NEWJOB I'm in the team that installs the products we sell on the client's machines. Most of the time the client has bought appliances from us, so they come with the installer and some tools for setting them up. Because of the type of product we sell, the customer might have bought between 3 to 12 or more nodes that will form a cluster, and some times they're spread over several data centers.

The system needs a frontend network and a backend one, and the nodes come with two high speed NICs (typically 10Gib), two low speed (1Gib), and a BMC/IPMI interface. The typical use is to bond both high speed NICs and then build two VLANs on top. The atypical use might be whatever the client came up with. One client has bonded each of the high speed NICs with one of the low speed in primary/backup mode, and has two physical networks. Another one does everything through a single interface with no VLANs. This should give you an idea of how disparate the networking setups can be, so the networking has to be custom made for each client.

Our first step is to connect to the nodes and configure networking. The only preconfigured interface is the BMC/IPMI one, which asks for an IPv4 via DHCP. So we connect to the BMC interface. This involves connecting via HTTP to a web interface that is run within the IPMI subsystem, then download a Java application that gives us a virtual KVM so we can use the computer as if we just had connected a keyboard and a monitor to it.

For those who don't know (I didn't before I started this new position), the IPMI/BMC system is a mini computer fully independent of the main system, which boots once the node has power connected to it, but not necessarily powered on. You can turn on/off the machine, divert KVM as I mentioned before, and more, as you'll see. If you're surprised to find out you have more than one computer in your computer, just read this.

Once connected to the node, we run a setup script, to which we feed all the networking info, including static IPs, gateways, DNS servers, timezone, etc. All this for each node. By hand. Slow, error prone, boring.

Let's automate this. The simplest tool I can think of is Ansible. In fact, I also think it's perfect for this. But there's a catch: there's no Ansible installed on the node, there is no way Ansible will be able to talk KVMimplementedasajavaapp-ese, and again, there's no networking yet, so no ssh or any other remote access. But most modern IPMI systems have an extra feature: virtual devices. You can upload iso images and IPMI will present them as a USB cd reader with media inside.

So today's trick involves in creating an iso image with ansible on it that can run on the target system. It's suprisingly easy to do. In fact, it would be as easy as creating a virtualenv, install ansible, add the playbooks and stuff, etc, and create an iso image from that, if it were not for the fact that the image has to be at least less than 50MiB (we have seen this limit on Lenovo systems). Ansible alone is 25MiB of source code, and compiled into .pyc files doubles that. So the most difficult part is to trim it down to size.

Of course, we fisrt get rid of all the .py source code. But not all. Modules and module tools in ansible are loaded from the .py files, so we have to keep those. I can also get rid of pip, setuptools and wheel, as I won't be able to install new stuff for two reasons: one, this is going to be a read only iso image, and two, remember, networking is not setup yet :) Also, ansbile is going to be run locally (--connection local), so paramiko is gone too. Next come all those modules I won't be using (cloud, clustering, database, etc). There a couple more details, so let's just have the script we currently use:

#! /bin/bash
# this is trim.sh

set -e

while [ $# -gt 0 ]; do
    case "$1" in
        # get rid of some python packages
        for module in pip setuptools pkg_resources wheel; do
            rm -rfv "lib/python2.7/site-packages/$module"


# cleanup
find lib -name '*.py' -o -name '*.dist-info' | egrep -v 'module|plugins' | xargs rm -rfv

for module in paramiko pycparser; do
    rm -rfv "lib/python2.7/site-packages/$module"


# trim down modules
for module in cloud clustering database network net_tools notification remote_management source_control web_infrastructure windows; do
    rm -rfv $ansible_prefix/modules/$module

# picking some by hand
find $ansible_prefix/module_utils | \
    egrep -v 'module_utils$|__init__|facts|parsing|six|_text|api|basic|connection|crypto|ismount|json|known_hosts|network|pycompat|redhat|service|splitter|urls' | \
    xargs -r rm -rfv
find $ansible_prefix/module_utils/network -type d | egrep -v 'network$|common' | xargs -r rm -rfv
find $ansible_prefix/modules/packaging -type f | \
    egrep -v '__init__|package|redhat|rhn|rhsm|rpm|yum' | xargs -r rm -v
find $ansible_prefix/modules/system    -type f | \
    egrep -v '__init__|authorized_key|cron|filesystem|hostname|known_hosts|lvg|lvol|modprobe|mount|parted|service|setup|sysctl|systemd|timezone' | \
    xargs -r rm -v

Notice that if I was even more space constrained (and it could be possble, if we find another IPMI implementation with smaller staging space) I could go further and make the venv use the Python installed in the system and not the one copied in the venv.

Now, the next step is to fix the venv to be runnable from any place. The first step is to make it relocatable. This fixes all the binaries in bin to use /usr/bin/env python2 instead of the hardcoded path to the python binary copied into the venv. One thing I never understood is why it didn't went a step further and also declared the VIRTUAL_ENV as relative to the path were bin/activate resides. In any case, I do an extra fix with sed and I'm done.

Last step is just to create the iso image. It's been ages since I last generated one by hand, and the resulting command line (which I simply stole from running k3b) resulted more complext than I expected (what happened to sensible defaults?). Here are the interesting parts:

#! /bin/bash

set -eu

ansible-playbook --syntax-check --inventory-file inventory.ini playbook.yaml


./trim.sh --all

# make relative
/usr/bin/python2 -m virtualenv --relocatable .

# try harder
# this doesn't cover all the possibilities were bin/activate might be sourced
# from, but in our case we have a wrapper script that makes sure we're in a sane place
sed -i -e 's/VIRTUAL_ENV=".*"/VIRTUAL_ENV="$(pwd)"/' bin/activate

genisoimage -sysid LINUX -rational-rock -joliet -joliet-long \
    -no-cache-inodes -full-iso9660-filenames -disable-deep-relocation -iso-level 3 \
    -input-charset utf-8 \
    -o foo.iso .

We threw in some checks on the syntax and contents of the playbook (it's annoying to find a bug when running on the target machine, come back, generate a new iso, upload it, mount, etc). It is possible that you would also like to exclude more stuff in your working directory, so just create a build dir, copy over your files (maybe with rsync --archive --update --delete) and run genisoimage there.

This method produces an iso image 26MiB big that works both in virtual machines, with which I developed this solution, and on some IPMI systems, like the Lenovo I mentioned before. Unluckily I couldn't get my hands on many different systems that have IPMI and not being used for anything else.

One final note about sizing. If you run du on your working/staging directory to see how far are from the limit, use --apparent-sizes, as the iso format packs files better than generic filesystems (in my case I see 26MiB apparent vs 46MiB 'real'; this is due to block sizes and internal fragmentation).

ansible ipmi bmc sysadmin

Posted jue 15 nov 2018 22:20:09 CET Tags: sysadmin

A month ago I revived my old-laptop-as-server I have at home. I don't do much in it, just serve my photos, a map, provide a ssh trampoline for me and some friends and not much more. This time I decided to tackle one of the most annoying problems I had with it: That closing the lid led to the system to suspend.

Now, the setup in that computer has evolved through some years, so a lot of cruft was left on it. For instance, at some point I solved the problem by installing a desktop and telling it not to suspend the machine, mostly because that's how I configure my current laptop. That, of course, was a cannon-for-killing-flies solution, but it worked, so I could focus in other things. Also, a lot of power-related packages were installed, assuming the were really needed for supporting everything I might ever wanted to do about power. This is the story on how I removed them all, why, and how I solved the lid problem... twice.

First thing to go were the desktop packages, mostly because the screen in that laptop has been dead for more than a year now, and because its new space in the house is a small shelf in my wooden desktop. Then I reviewed the power-related packages one by one and decided whether I needed it or not. This is more or less what I found:

  • acpi-fakekey: This package has a tool for injecting fake ACPI keystrokes in the input system. Not really needed.
  • acpi-support: It has a lot of scripts that can be run when some ACPI events occur. For instance, lid closing, battery/AC status, but also things like responding to power and even 'multimedia' keys. Nice, but not needed in my case; the lid is going to be closed all the time anyways.
  • laptop-mode-tools: Tools for saving power in your laptop. Not needed either, the server is going to be running all the time on AC (its battery also died some time ago).
  • upower: D-Bus interface for power events. No desktop or anything else to listen to them. Gone.
  • pm-utils: Nice CLI scripts for suspending/hibernating your system. I always have them around in my laptop because sometimes the desktops don't work properly. No use in my server, but it's cruft left from when I used it as my laptop. Adieu.

Even then, closing the lid led to the system suspending. Who else could be there? Well, there is one project who's being everywhere: systemd. I'm not saying this is bad, but it is everywhere. Thing is, its login subsystem also handles ACPI events. In the /etc/systemd/logind.conf file you can read the following lines:


so I uncommented the 4th line and changed it so:


Here you can also configure how the inhibition of actions work:


Please check the config file's doc if you plan to modify it.

Not entirely unrelated, my main laptop also started suspending when I closed the lid. I have it configured, through the desktop environment, to only turn off the screen, because what use is the screen if it's facing the keyboard and touchpad :) Somehow, these settings only recently started to be in effect, but a quick search didn't gave any results on when things changed. Remembering what I did with the server, I just changed that config file to:


That is, “let me configure this through the desktop, please”, and now I have my old behavior back :)

PS: I should start reading more about systemd. A good starting point seems to be all the links in its home page.

sysadmin systemd acpi

Posted dom 28 ago 2016 17:34:56 CEST Tags: sysadmin

Soon I'll be changing jobs, going from one MegaCorp to another. The problem is, my current workplace already has a silly security policy that does not allow you to use IRC or do HTTP against a dynamic DNS/IP (like the one at home), but happily lets you use webmails through which you can send anyone the company's IP without leaving much trace. Furthermore, my next assignment will have stricter Internet policy, so I finally sit down to see alternatives to have more traffic with the less footprint.

As I already mentioned, back home I have ssh listening on port 443 (and the port forwarded from the router to the server), and this worked for a while. Then these connections were shutdown, so I used stunnel on the server and openssl s_client plus some ssh config magic to go over that. This allowed me to use screen and irssi to do IRC and that was enough for a while. This meant I could talk to the communities around the tools and libs we were using.

But now I plan to change the way I do my mail. So far the setup includes using fetchmail to bring everything to that server, then use dovecot and/or a webmail to check from anywhere. But as ports are filtered and I already use 443 for ssh, I can't connect to IMAPS and I don't want to use something like sslh to multiple ssh and https on the same port because it sounds to ohacky, I turned towards SOCKS proxying.

Setting up a SOCKS proxy through ssh is simple. Most of the tutorials you'll find online use putty, but here I'll show how to translate those to the CLI client:

Host home
    Hostname www.xxx.yyy.zzz  # do not even do a DNS req; the IP is mostly static for me
    Port 443
    <span class="createlink">ProxyCommand</span> openssl s_client -connect %h:%p -quiet 2>/dev/null
    <span class="createlink">DynamicForward</span> 9050  # this is the line that gives you a SOCKS proxy

Then the next step is to configure each of your clients to use it. Most clients have an option for that, but when not, you need a proxyfier. For instance, even when KDE has a global setting for the SOCKS proxy, kopete does not seem to honor it. These proxifyers work by redirecting any connect(), gethostbyname() and most probably others to the SOCKS proxy. One of the best sources for SOCKS configuration is TOR's wiki, which heavily relies on SOCKS proxies, but right now the proxyfier they suggest (dante-client) does not install on my Debian setup, so I went with proxychains. Its final config is quite simple:

# Strict - Each connection will be done via chained proxies
# all proxies chained in the order as they appear in the list
# all proxies must be online to play in chain
# otherwise EINTR is returned to the app

# Proxy DNS requests - no leak for DNS data

# Some timeouts in milliseconds
tcp_read_time_out 15000
tcp_connect_time_out 8000

# defaults set to "tor"
socks5 9050

In fact, that's the default config with only one modification: the SOCKS protocol is forced to 5, so we can do DNS requests with its UDP support.

With this simple setup I managed to connect to my XMMP server with kopete, which is already a lot. Next step will be to figure out the mail setup and I can call this done.

sysadmin piercing

Posted dom 09 ago 2015 19:37:21 CEST Tags: sysadmin

At work we have Windows workstations, but we develop for Linux (don't ask; in my previous mission in another MegaCorp we had a similar setup for admining Linux servers...). We have access to devel machines via ssh and Samba, but the setup is laughable. I won't go too much into details because it's embarrassing and because I signed some kind of NDA somewhere.

Thing is, I setup a VM with VirtualBox in my workstation, installing a barebones Debian Sid. To have better integration with the Windows host I decided to install the VBox Linux Additions, but for some reason it was not setting up the video side of it. The error message is the one from the title:

Could not find X.org or XFree86 on the guest system. The X Window drivers will not be installed.

Thanks to this post I managed to quickly find out the reason. The step that actually tests and installs the x.org drivers is called like this:

/etc/init.d/vboxadd-x11 setup

If you run it with sh -x you will find out that it actually tests two things: the existence of /usr/lib/xorg/modules, which you can either create or just install the xserver-xorg-video-vesa package, and it tries to run X, which you will find in the xserver-xorg package.

So, TL;DR version: Just install these two packages:

sudo apt-get install xserver-xorg-video-vesa xserver-xorg

Now all works.


Posted jue 02 oct 2014 10:36:08 CEST Tags: sysadmin

Do you remember this? Well, now I have a similar but quite different case: I want to restore the files' times. Why? Read me out.

Since a long time I'm doing backups to an external drive. For that I simply use rsync. When I bought my second laptop, I took the chance to test restoring from that backup, so I just did a last backup in the old machine and a restore in the new one. This allowed me to add the last few things that were missing. After a few more weeks finding missing things, doing more backup/restore cycles between the old and new laptops, I was happy.

But there was something that I didn't realize till a few months later. The file times were not backed up, and clearly not restored. This didn't affect me in any way, at least until I tried to a new post to this glob.

See, this glob is generated with ikiwiki. ikiwiki uses the files' mtime to generate the posts' times, and uses that to sort them and to assign them as belonging to some year or month. With the mtimes lost, ikiwiki was assigning all the older posts to some day in Oct, 2012, which was the date of the last backup/restore cycle.

Frustrated (yes, sometimes I have a low threshold), I left it like that: no planet was flooded because in any case the rss feeds didn't really change, and nobody would notice and complain about it. In fact, nobody did.[1] Have in mid that as I kept my old laptop as a server, and that I didn't clean up all the files I keep in my backup, I still had the original mtimes in the original files.

Yesterday, playing around with unrelated things, I came across the post I mentioned at the beginning of this one, and had the urge to really fix that mess. Today I started looking at it, and found that this command was some part of it:

find src/projects/glob/work/ -name *.mdwn | xargs stat --format '%X %Y %Z %n'

but processing its output in bash seemed like a burden. Luckily, I have the solution for that, a project that I started for exactly that same reason: bash sucks at manipulating data.

So I grabbed my editor, punched keys for some 7 minutes, run 3 or 4 tests, making sure that everything would go smooth, run once more on the live system, rebuilt my glob[2] and voilà! It's fixed now.

How long is the code? Not much really:

from os import utime

with remote ('', allow_agent=False, password='XXXXXXXX') as fds:
    # find al files (not only posts), get its times and path
    with cd ('src/projects/glob/work'):
        find ('.', '-name', '*.mdwn') | xargs ('stat', format='%X %Y %Z %n')

(i, o, e)= fds

for line in e.readlines ():
    print (line.decode ('utf-8'), end='')

for line in o.readlines ():
    data= line.split ()
    access, modif, change= [ int (x) for x in data[:3] ]
    file_path= data[3].decode ('utf-8')
    if _e (file_path):
        utime (file_path, times= (access, modif))
        print ("%s not found!" % file_path)

Notice that this example is even more complex than initially thought: it ssh's into the server to fetch the original times.

This makes one thing clear: I have to improve a lot the remote() API. For instance, so far I have no way to know if the remote commands executed fine or not, and the way to read the lines is still ugly and not homogeneous with how other ayrton functions/executables work.

Also note that this does not restore the ctimes, as I hadn't found a way to do it, but didn't really did a lot of effort. I don't really need them.

Of course, I also fixed my backup and restore scripts :)

[1] I was this >< close of adding #foreveralone to that sentence. I think I'll resist for a couple more years.

[2] Don't forget, ikiwiki compiles static sites.

ayrton python sysadmin

Posted mar 03 dic 2013 22:55:24 CET Tags: sysadmin

Since I work in MegaCorp I found a new level of security policy; for me this is a low point: I not only can't ssh home; since I changed my DynDNS provider from dym.com to afraid.org, I can't even access my webserver because the proxy denies me access citing: Your request was denied because of its content categorization: "Dynamic DNS Host;Suspicious". So I can access lots of questionable contents using Google cache but not my own photos, fine.

At the beginning, the classical trick of making the ssh server to listen in the port 443 worked fine, but at some point Network Operations managed to close that hole. This change was not communicated, so it's not clear that it was completely on purpose. I once asked for the Network Usage Policy, if it exists, but the unofficial answer was on the lines of «I'm not sure you really want to ask».

So, I managed to pierce the firewall again with a further trick: wrapping ssh traffic in a SSL connection. This makes the traffic look like regular https traffic (remember, the s stands for SSL/TLS) but it encrypts the traffic twice.

Everything was smooth again, until the server crashed due to a lack of power. After I powered it on again, I found that I couldn't connect anymore. This morning I decided to take a couple of minutes to figure out why. The ssh client tells me this:

$ ssh -v -o 'ProxyCommand openssl s_client -connect %h:%p -quiet 2>/dev/null' -p 443 foo.afraid.org
OpenSSH_5.1p1 Debian-5, OpenSSL 0.9.8g 19 Oct 2007
debug1: Reading configuration data /home/user/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Executing proxy command: exec openssl s_client -connect foo.afraid.org:443 -quiet 2>/dev/null
debug1: permanently_drop_suid: 1004
debug1: identity file /home/user/.ssh/identity type -1
debug1: identity file /home/user/.ssh/id_rsa type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-1024
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-1024
debug1: identity file /home/user/.ssh/id_dsa type -1
ssh_exchange_identification: Connection closed by remote host

Not much info, really. Form the server side I have this:

SSL_accept: 1408F10B: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket

Nice, a cryptic error message, at least for me. Strange enough, openssl by itself manages to connect alright:

$ openssl s_client -connect foo.afraid.org:443 -ssl3
SSH-2.0-OpenSSH_6.2p2 Debian-6

That's the ssh server saying hi. After some DDG'ing[1] I find this post in serverfault. The first answer itself is not very helpful, but the second one is actually the OP saying how he solved it. It's telling stunnel to accept any version of SSL client while telling SSL to ignore SSLv2. I don't understand how it fixes it, but it works, yay!

[1] DuckDuckGo is getting better by the day.

sysadmin piercing

Posted lun 02 dic 2013 09:19:54 CET Tags: sysadmin

Since a long time I've been toying with the idea of having a better programing language for (shell) scripts than bash. Put in another way, a better shell language. Note that I'm not looking for a better shell per se, i just want a language that has better data manipulation than the rudimentary set of tools that normal shells give. I might as well be overlooking more powerful shells than bash like zsh, but so far I have seen them more pluggable than anything else. Alas, if that language could be Python, the better.

Enter sh. It's a nice Python module that allows you to call programs as if the were mere functions defined in it. Behind the curtains sh does some magic passes to make it so. It is fairly documented, commented (in the code) and maintained (via GitHub issues).

So I started using sh for replacing some shell scripts I had for their Python equivalents. So far the experience has been more or less satisfactory, with some papercuts. I thinks it's easier to explain with a simpl-ish example:

Exhibit A. Exhibit B. Try to view the side to side, I aligned them as much as I could. The Python version then diverged to using another set of data.

The most notable thing to notice is that the data manipulation in Python is so better done. This stems from the fact that bash has no float handling, much less concepts like floor or ceiling, So instead of a couple of ifs in the inner loop, I have to define three arrays, fill them according to some cases that handle the 'signs' of two different 'floats' (they're strings, really). Also setting the variables west, south, east and north is not only simpler, but it also has more error checking. We also save a loop: Python's version has two nested loops, bash's has three.

Now, if you squint a little, you'll see where Python starts to drag. One of the first things to do is to import a lot of modules. It's impressive how many, seven from the standard library, sh itself and one of my own (file_test). Then we try to figure the extent of this PBF file by piping the output of one command into another. In bash this is just a matter of, you know, using a pipe. sh provides us functions and we can even nest them, making the ouput of the inner command go to the outer one. I can live with that, but for someone coming from shell scripting might (just might) find it confusing.

There's something I think will definetely confuse a shell scripter (a SysAdmin?): the fact that by default, sh makes the commands think that their stdout is a TTY, while that is not the case. In my case that meant that osmpbf-outline[1] spat out colors for formatting the output, which meant that I had to explicitly say that the stdout should be a plain file (_tty_out=False). Also, at the beginning, the error handling of sh took me by surprise. That's why at first I said that an error code of 1 is ok (_ok_code=1), while later I did proper error handling with a try: ... except: block.

Notice that I also tried to use Python modules and functions were it made as much or more (or the only) sense than using an external command, like when I use os.chdir()[2] or os.unlink() instead of rm.

Also I find lacking in sh is more functions to do shell expansion. It only handles globbing, and only because Python's glob.glob() returns None if the pattern does not match any file, while bash leaves the patter as it is.

So my conclusion is that sh is already a very goo step forward towards what I would like to have in a shell scripting language, but I see space for some improvements. That's why I started hacking another module, called ayrton to try to tackle all this. Notice that at the beginning I tried to hack it in such a way that instead of having to say sh.ls or from sh import ls, you would simply use ls and me in the backstage would do all the juggling necessary to be equivalent to those[3]. That is not possible without breaking Python itself, but now that I'm starting to convert my scripts, I see a place for it. I will also try to incorporate whatever I hack back into sh. I'll glob about its details soon.

In the meantime, before I start to really document it, you can take a look of the current version of the most advanced script in ayrton so far[4].

[1] Notice how in the Python script this is written as osmpbf_outline. This is due to the fact that function names in Python cannot have -s in them (it's the 'minus' operator), so sh makes a trick where if you put a _ in the function name and such a command does not exist, it will try replacing them all with -s ans try again. Hacky, if you want, but works for me.

[2] There's no cd command, that's a bash builtin command, and it wouldn't make any senses anyways, as the change would only affect the subcommand and not the Python process.

[3] This includes messing with the resolution of builtins, which strangely works in Python3 but only from the interactive interpreter. I tried to figure out why, but after a while I decided that if it didn't work out of the box, and because what I wanted to do was a terribly ugly hack, I dropped out.

[4] I chose the .ay extension. I hope it doesn't become .ay! :)

sysadmin python ayrton

Posted dom 21 jul 2013 01:08:05 CEST Tags: sysadmin

Warning: this has not been tested yet.

Again, TL;DR version at the end.

They say that backing up in C* really easy: you just run nodetool snapshot, which only creates a hardlink for each data file somewhere else in the filesystem, and then you just backup those hardlinks. Optionally, when you're done, you simply remove them and that's it.

But that's only the half of the story. The other half is taking those snapshots and storing them somehwere else; let's say, a backup server, so you can restore the data even in case of spontaneous combustion followed by explosion due to shortcircuits caused by your dog peeing on the machine. Not that that happens a lot in a datacenter, but one has to plan for any contingency, right?

In our case we use Amanda, which internally uses an implementation of tar or GNU tar if asked for (yes, also other tools if asked). The problems begin with how you define what to backup and where does C* put those snapshots. The definitions are done by what Amanda calls disklists, which are basically a list of directories to backup entirely. In the other hand, for a column family Bar in a keyspace Foo, whose data are normally stored in <data_file_directory>/Foo/Bar/, a snapshot is located in <data_file_directory>/Foo/Bar/snapshots/<something>, where something can be a timestamp or a name defined by the user at snapshot time.

If you want to simplify your backup configuration, you'll probably will want to say <data_file_directory>/*/*/snapshots/ as the dirs to backup, but Amanda merrily can't expand wildcards in disklists. A way to solve this is to create a directory sibling of <data_file_directory>, move the files in the snapshots there, and specify it in the disklists. That kinda works...

... until your second backup pass comes and you find out that even when you specified an incremental backup, it copies over all the snapshot files again. This is because when a hardlink is created, the ctime of the inode is changed. Guess what tar uses to see if a file has changed... yes, ctime and mtime[1].

So we're back to square one, or zero even. Seems like the only solution is to use C*'s native 'support' for incrementality, but the docs are just a couple of paragraphs that barely explain how they're done (suprise, the same way as the snapshots) and how to activate it, which is the reason why we didn't followed this path from the beginning. So in the end, it seems that you can't use Amanda or tar to make incremental backups, even with the native support.

But then there's a difference between the snapshot and the incremental mode: with the snapshot method, you create the snapshot just before backing it up, which sets all the ctimes to now. C*'s incremental mode "hard-links each flushed SSTable to a backups directory under the keyspace data directory", so they have roughly the same ctime as the mtimes, and neither never ever changes (remember, SSTables are inmutable) again (until we do a snapshot, of course).

One particularity that I noticed is that only new SSTables are backed up, but not those that are the result of compactions. At the beginning I thought this was wrong, but after discussing the issue with driftx in the IRC channel and a confirmation by Tyler Hobbs in the mailing list, we came to the following conclussion: with also compacted SSTables, at restore time you would need to do a manual compaction to minimize data duplication, which otherwise means more SStables associated by the Bloom filters and more disk reads/seeks per get and more space used; but if you don't backup/restore those SStables, the manual compaction is only advisable. Also, as a consequence, you don't need to track which files were deleted between backups.

So the remaining problem is to know which files have been backed up, because C* backups, just like snapshots, are not automatically cleaned. I came up with the following solution, which at the beginning it might seem complicated, but it really isn't.

When we do a snapshot, which is perfect for full backups, we previously remove all the files present in the backup directory; incremental files since the last incremental backup are not needed because we're doing a full anyways. At the end of this we have the files ready for the full; we do the backup, and we erase the files.

Then the following days we just add the dynamic backups so far, preceded by a flush, so as to have the last data in the SSTables and not depend on CommitLogs. As they're only the diff against the files in the full, and not the intermediate compacted SSTables, they're as big as they should (but also as small as they could, if you're worried about disk ussage). Furthermore, the way we put files in the backup dir is via symlinks, so it doesn't change the file's mtime or ctime, and we configure Amanda to dereference symlinks.

Later, at restore time, the files are put in the backup directory, and with a script that takes the KS and CF from the file's name, they're 'dealed' to the right directories.

TL;DR version:

  • Full backup:

    • Remove old incremental files and symlinks.
    • nodetool snapshot.
    • Symlink all the snapshot files to a backup directory
    • Backup that directory dereferencing symlinks.
    • nodetool clearsnapshot and remove symlinks.
  • Incremental backup:

    • nodetool flush.
    • Symlink all incremental files into the bakup directory.
    • Backup that directory dereferencing symlinks.
  • Restore[2]:

    • Restore the last full backup and all the incrementals.

[1] tar's docs are not clear in what exactly it uses, ("Incremental dumps depend crucially on time stamps"), but Amanda's seems to imply such a thing ("Tar has the ability to preserve the access times[;] however, doing so effectively disables incremental backups since resetting the access time alters the inode change time, which in turn causes the file to look like it needs to be archived again.")

[2] Actually is not that simple. The previous post in this series already shows how it could get more complicated.

sysadmin cassandra

Posted mié 12 sep 2012 20:17:33 CEST Tags: sysadmin

Note: TL;DR version at the end.

What could go wrong doing:

chown -R foo.foo $DATA_DIR/

as root? Yes, exactly: $DATA_DIR might not be defined and you end up setting the owner/group to foo.foo for all the files in the machine, including executables and devices. Of course, I learnt that the hard way. Even more, I lost the only ssh session I had on the 3 machines where I did this (mental note: be very very very careful when using cssh or similar tools), but luckly I still had a fourth twin sister which survived.

The first thing you notice is that you can't ssh anymore into the machine. This is because ssh denies to answer any connections with this message: fatal: /var/run/sshd must be owned by root and not group or world writable..

The fisrt step is to restore some initially sane ownership. The safest is root.root, but unluckly I didn't realize this until after I lost said ssh sessions. So actually, the first step is to regain control of the machine. The only way I can think of is to login via the console. I thought that this would imply also rebooting and going in single mode or even using the init=/bin/bash hack which has saved more than one life, or at least sanity, but login keeps working even when the perms are wrong. I seem to remember that was not the case, probably because I thought it was setuid, but sudo find / -perm -4000 confirms that this is not the case.

So, after loging in, I need 2 commands: chown -R root.root / and /etc/init.d/ssh start (because now ssh does not start at boot time, seemingly leaving no error message behind) and now you can login via ssh again.

Now it's time to restore the real ownership. Handy comes a small tool called acl. This small package has two tools, getfacl and its evil twin setfacl. With these we're really close to the final solution. getfacl --recursive / > all.acl pulls the ownerships, then you copy all.acl to the sick server(s) and apply it like a cure-all medicine like this: setfacl --recursive --restore /root/all.acl while being in the root (/) directory. Actually, think of it as a blood or bone marrow transplant.

As final notes, don't use getfacl's --tabular option, as setfacl doesn't recognize the format. Also, you can check that the ownerships were correctly with find / ! -user root | xargs ls -ld and or you can dump the new perms and compare with those you got from the donnor machine.


I give the problem a little bit more of thought and came to the conclusion that with that method you have no idea if you restored all the ownerships properly or not. Taking advantage that I actually changed both owner and group, I can replace the first command with chown -R root /, and then find the not-undone files with find / -group foo. I hadn't tested this, but I think it's ok. Another option is to initially only restore with chown the minimal files to make ssh work again and then find for all the foo.foos.

So, the promised TL;DR version:

  • On the sick machine(s), login via the console and run:
    • chown -R root /
    • /etc/init.d/ssh start
  • Select a donnor machine and run:
    • getfacl --recursive / > all.acl
    • Transplant all.acl to the sick machine(s) (most probably with scp).
  • On the sick machine(s), now via ssh if its more confortable for you, run:
    • cd /
    • setfacl --recursive --restore /root/all.acl
    • find / -group foo and change by hand any remaining not0undone files.


Posted dom 26 ago 2012 12:13:44 CEST Tags: sysadmin