glob no es un blog. No en el sentido corriente de la palabra. Es un registro de mis proyectos y otras interacciones con el software libre.

glob is not a blog. Not in the common meaning of the word. It's a record of my projects and other interactions with libre software.

All texts CC BY Marcos Dione


I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.

Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.

Lastly, as imagico mentioned in the first post about this hack, the algorithms are not fast, and I already made my computer start thrashing the disk swapping like Hell because pg hit a big polygon and started using lots of RAM to calculate its centerline. At least this time I can see how complex the polygons are before handing them to the code. As an initial benchmark, the original data for that changeset (I later simplified it with JOSM's tool) took 0.063927s in pg+gis and 0.004737s in the Python code. More test will come later.

Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!


openstreetmap gis python

Posted Mon 29 Aug 2016 08:06:39 PM CEST Tags:

A month ago I revived my old-laptop-as-server I have at home. I don't do much in it, just serve my photos, a map, provide a ssh trampoline for me and some friends and not much more. This time I decided to tackle one of the most annoying problems I had with it: That closing the lid led to the system to suspend.

Now, the setup in that computer has evolved through some years, so a lot of cruft was left on it. For instance, at some point I solved the problem by installing a desktop and telling it not to suspend the machine, mostly because that's how I configure my current laptop. That, of course, was a cannon-for-killing-flies solution, but it worked, so I could focus in other things. Also, a lot of power-related packages were installed, assuming the were really needed for supporting everything I might ever wanted to do about power. This is the story on how I removed them all, why, and how I solved the lid problem... twice.

First thing to go were the desktop packages, mostly because the screen in that laptop has been dead for more than a year now, and because its new space in the house is a small shelf in my wooden desktop. Then I reviewed the power-related packages one by one and decided whether I needed it or not. This is more or less what I found:

  • acpi-fakekey: This package has a tool for injecting fake ACPI keystrokes in the input system. Not really needed.
  • acpi-support: It has a lot of scripts that can be run when some ACPI events occur. For instance, lid closing, battery/AC status, but also things like responding to power and even 'multimedia' keys. Nice, but not needed in my case; the lid is going to be closed all the time anyways.
  • laptop-mode-tools: Tools for saving power in your laptop. Not needed either, the server is going to be running all the time on AC (its battery also died some time ago).
  • upower: D-Bus interface for power events. No desktop or anything else to listen to them. Gone.
  • pm-utils: Nice CLI scripts for suspending/hibernating your system. I always have them around in my laptop because sometimes the desktops don't work properly. No use in my server, but it's cruft left from when I used it as my laptop. Adieu.

Even then, closing the lid led to the system suspending. Who else could be there? Well, there is one project who's being everywhere: systemd. I'm not saying this is bad, but it is everywhere. Thing is, its login subsystem also handles ACPI events. In the /etc/systemd/logind.conf file you can read the following lines:

#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
#HandleLidSwitch=suspend
#HandleLidSwitchDocked=ignore

so I uncommented the 4th line and changed it so:

HandleLidSwitch=ignore

Here you can also configure how the inhibition of actions work:

#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes

Please check the config file's doc if you plan to modify it.

Not entirely unrelated, my main laptop also started suspending when I closed the lid. I have it configured, through the desktop environment, to only turn off the screen, because what use is the screen if it's facing the keyboard and touchpad :) Somehow, these settings only recently started to be in effect, but a quick search didn't gave any results on when things changed. Remembering what I did with the server, I just changed that config file to:

HandlePowerKey=ignore
HandleSuspendKey=ignore
HandleHibernateKey=ignore
HandleLidSwitch=ignore
HandleLidSwitchDocked=ignore

That is, “let me configure this through the desktop, please”, and now I have my old behavior back :)


PS: I should start reading more about systemd. A good starting point seems to be all the links in its home page.


sysadmin systemd acpi

Posted Sun 28 Aug 2016 05:34:56 PM CEST Tags:

Dear conference speakers:

Please test write your slides at 1024x768 resolution, on a projector.

Dear conference organizers:

Please remind your speakers to do so.

Dear conference attendants:

If you are at the back of the room and you can't see the text/code in the slides the speaker(s) is showing, please shout “I cant see shit!” in the appropriate language, and try to embarrass the speaker as much as possible.

Thanks in advance. Yours truly,

-- Marcos.

PS: I've been watching videos of talks in some conferences and I swear to $DEITY at in least 40% of the ones I was interested in, I couldn't read the code on the video. Sometimes the fonts are too small, sometimes the colors are not contrasting enough. Please, at least test your slides on a projector...

PSS: I know the resolution I'm suggesting is low. Be happy I'm not asking for 640x480 :-P

PSSS: Ok, attendants, don't embarrass/harass the speakers :)


rants

Posted Thu 25 Aug 2016 02:19:38 PM CEST Tags:

Like I said in my last post[1], I'm looking at last PyCon's videos. Here are my selected videos, in the order I saw them:

Ned Batchelder - Machete-mode debugging: Hacking your way out of a tight spot. In fact, I saw this twice.

Larry Hastings - Removing Python's GIL: The Gilectomy

The Report Of Twisted’s Death or: Why Twisted and Tornado Are Relevant In The Asyncio Age

How I built a power debugger out of the standard library and things I found on the internet

Davey Shafik - HTTP/2 and Asynchronous APIs

Sumana Harihareswara - HTTP Can Do That?! Points for informative and funny.

Matthias Kramm - Python Typology Types are comming, so get used to them.

Scott Sanderson, Joe Jevnik - Playing with Python Bytecode Nice, very nice trick. I'm talking about the way the presentation is given.

Brian Warner - Magic Wormhole - Simple Secure File Transfer

And of course, the lighning talks. I always like these, because you can get exposed to any kind of things, some not even remotely connected to Python, but which can get your brain rolling down nice little bunny holes, or at least get a smile from you. So here:

LT#1. Please watch it at least between 20-25m.

LT#2

LT#3

LT#4

And of course, check the other ones, don't stop at my own interests.


[1] Yes, I started writing this a month ago.


python

Posted Fri 19 Aug 2016 05:35:45 PM CEST Tags:

Long time for this release. A couple of hard bugs (which fix was just moving a line down a little), a big-ish new feature, and moving in a new city. Here's the ChangeLog:

  • You can import ayrton modules and packages!
  • Depends on Python3.5 now.
  • argv is not quite a list: for some operations (len(), iter(), pop()), argv[0] is left alone.
  • option() raises KeyError or ValueError if the option or its 'argument' is wrong.
  • makedirs() and stat() are available as functions.
  • -p|--pdb launches pdb when there is an unhandled exception.
  • Fix for line in foo(...): ... by automatically adding the _bg=True option.
  • Better Command() detection.
  • A lot of internal fixes.

Get it on github or pypi!


python ayrton

Posted Wed 17 Aug 2016 01:17:22 PM CEST Tags:

My latest Europe import was quite eventful. First, I run out of space several times during the import itself, at indexing time. The good thing is that, if you manage to reclaim some space, and reading a little of source code[1], you can replay the missing queries by hand and stop cursing. To be fair, osm2pgsql currently uses a lot of space in slim+flat-nodes mode: three tables, planet_osm_node, planet_osm_way and planet_osm_relation; and one file, the flat nodes one. Those are not deleted until the whole process has finished, but they're actually not needed after the processing phase. I started working on fixing that.

But that was not the most difficult part. The most difficult part was that I forgot, somehow, to add a column to the import.style file. Elevation, my own style, renders different icons for different types of castles (and forts too), just like the Historic Place map of the Hiking and Bridle map[2]. So today I sat down and tried to figure out how to reparse the OSM extract I used for the import to add this info.

The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:

node,way   castle_type  text         polygon

That says that this applies to nodes and ways. If the element is a way, polygon will try to convert it to a polygon and put it in the planet_osm_polygon table; if it's a node, it ends in the planet_osm_point table. So we just add the column to those tables:

ALTER TABLE planet_osm_point   ADD COLUMN castle_type text;
ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;

Now how to process the extract? Enter pyosmium. It's a Python binding for the osmium library with a stream-like type of processing à la expat for processing XML. The interface is quite simple: one subclasses osmium.SimpleHandler, defines the element type handlers (node(), way() and/or relation()) and that's it! Here's the full code of the simple Python script I did:

#! /usr/bin/python3

import osmium
import psycopg2

conn= psycopg2.connect ('dbname=gis')
cur= conn.cursor ()

class CastleTypes (osmium.SimpleHandler):

    def process (self, thing, table):
        if 'castle_type' in thing.tags:
            try:
                name= thing.tags['name']
            # osmium/boost do not raise a KeyError here!
            # SystemError: <Boost.Python.function object at 0x1329cd0> returned a result with an error set
            except (KeyError, SystemError):
                name= ''
            print (table, thing.id, name)

            cur.execute ('''UPDATE '''+table+
                         ''' SET castle_type = %s
                            WHERE osm_id = %s''',
                         (thing.tags['castle_type'], thing.id))

    def node (self, n):
        self.process (n, 'planet_osm_point')

    def way (self, w):
        self.process (w, 'planet_osm_polygon')

    relation= way  # handle them the same way (*honk*)

ct= CastleTypes ()
ct.apply_file ('europe-latest.osm.pbf')

The only strange part of the API is that it doesn't seem to raise a KeyError when the tag does not exist, but a SystemError. I'll try to figure this out later. Also interesting is the big amount of unnamed elements with this tag that exist in the DB.


[1] I would love for GitHub to recognize something like https://github.com/openstreetmap/osm2pgsql/blob/master/table.cpp#table_t::stop and be directed to that method, because #Lxxx gets old pretty quick.

[2] I just noticed how much more complete those maps are. more ideas to use :)


openstreetmap gis python

Posted Tue 09 Aug 2016 05:45:56 PM CEST Tags:

Remember this?

For a few months now I've been trying to have a random slideshow of images. I used to do this either with kscreensaver, which for completely different reasons I can't use now, or xscreensavers' glslideshow, which, even when I compiled it by hand, I can't find the way to give it the root dir of the images. So, based on OMIT, I developed my own.

The differences with OMIT are minimal. It has to scan the whole tree for finding the appropriate files (its definition of "appropriate" could be improved, it's true); it goes into full screen mode with black background; and it (more) properly handles EXIF rotation[1]. All that in 176 LOCs, including proper licensing (GPLv3), and developed in one day and refined the next one.

One interesting thing I found out is that pyexiv2 is deprecated, with is last release in 2011 (!!!). What this new app uses is its recommended replacement, gexiv2.

So, there you are. Like OMIT, it's in PyQt4, but this time in Python3 (that's why I used gexiv2 instead). Its TODO includes porting it to PyQt5 and a few other things. You can grab it here. I plan to do a proper release soon, but for the moment just drop it in your PATH and be happy with it.


[1] http://www.daveperrett.com/articles/2012/07/28/exif-orientation-handling-is-a-ghetto/


omit omim pykde python

Posted Fri 05 Aug 2016 02:51:35 PM CEST Tags:

In this last two days I've been expanding osm-centerlines. Now it not only supports ways more complex than a simple rectangle, but also ones that lead to 'branches' (unfortunately, most probably because the mapper either imported bad data or mapped it himself). Still, I tested it in very complex polygons and the result is not pretty. There is still lots of room for improvements.

Unluckily, it's not as stand alone as it could be. The problem is that, so far, the algos force you to provide now only the polygon you want to process, but also its skeleton and medial. The code extends the medial using info extracted from the skeleton in such a way that the resulting medial ends on a segment of the polygon, hopefully the one(s) that cross from one riverbank to another at down and upstream. Calculating the skeleton could be performed by CGAL, but the current Python binding doesn't include that function yet. As for the medial, SFCGAL (a C++ wrapper for CGAL) exports a function that calculates an approximative medial, but there seem to be no Python bindings for them yet.

So, a partial solution would be to use PostGIS-2.2's ST_StraightSkeleton() and ST_ApproximateMedialAxis(), so I added a function called skeleton_medial_from_postgis(). The parameters are a psycopg2 connection to a PostgreSQL+PostGIS database and the way you want to calculate, as a shapely.geometry, and it returns the skeleton and the medial ready to be fed into extend_medials(). The result of that should be ready for mapping.

So there's that. I'll be trying to improve it in the next days, and start looking into converting it into a JOSM plugin.


openstreetmap gis python

Posted Fri 29 Jul 2016 12:39:57 AM CEST Tags:

For a long time now I've been thinking on a problem: OSM data sometimes contains riverbanks that have no centerline. This means that someone mapped (part of) the coasts of a river (or stream!), but didn't care about adding a line that would mark its centerline.

But this should be computationally solvable, right? Well, it's not that easy. See, for given any riverbank polygon in OSM's database, you have 4 types of segments: those representing the right and left riverbanks (two types) and the flow-in and flow-out segments, which link the banks upstream and downstream. With a little bit of luck there will be only one flow-in and one flow-out segment, but there are no guarantees here.

One method could try and identify these segments, then draw a line starting in the middle of the flow-in segment, calculating the middle by traversing both banks at the same time, and finally connect to the middle for the flow-out segment. Identifying the segments by itself is hard, but it is also possible that the result is not optimal, leading to a jagged line. I didn't try anything on those lines, but I could try some examples by hand...

Enter topology, the section of maths that deals with this kind of problems. The skeleton of a polygon is a group of lines that are equidistant to the borders of the polygon. One of the properties this set of lines provides is direction, which can be exploited to find the banks and try to apply the previous algorithm. But a skeleton has a lot of 'branches' that might confuse the algo. Going a little further, there's the medial axis, which in most cases can be considered a simplified skeleton, without most of the skeleton branches.

Enter free software :) CGAL is a library that can compute a lot of topological properties. PostGIS is clever enough to leverage those algorithms and present, among others, the functions ST_StraightSkeleton() and ST_ApproximateMedialAxis(). With these two and the original polygon I plan to derive the centerline. But first an image that will help explaining it:

The green 'rectangle' is the original riverbank polygon. The thin black line is the skeleton for it; the medium red line is the medial. Notice how the medial and the center of the skeleton coincide. Then we have the 4 branches forming a V shape with its vertex at each end of the medial and its other two ends coincide with the ends of the flow in and flow out segments!

So the algorithm is simple: start with the medial; from its ends, find the branches in the skeleton that form that V; using the other two ends of those Vs, calculate the point right between them, and extend the medial to those points. This only calculates a centerline. The next step would be to give it a direction. For that I will need to see if there are any nearby lines that could be part of the river (that's what the centerline is for, to possibly extend existing rivers/centerlines), and use its direction to give it to the new centerline.

For the moment the algorithm only solves this simple case. A slightly more complex case is not that trivial, as skeletons and medials are returned as a MultiLineString with a line for each segment, so I will have to rebuild them into LineStrings before processing.

I put all the code online, of course :) Besides a preloaded PostgreSQL+PostGIS database with OSM data, you'll need python3-sqlalchemy, geoalchemy, python3-fiona and python3-shapely. The first two allows me to fetch the data from the db. Ah! by the way, you will need a couple of views:

CREATE VIEW planet_osm_riverbank_skel   AS SELECT osm_id, way, ST_StraightSkeleton (way)      AS skel   FROM planet_osm_polygon WHERE waterway = 'riverbank';
CREATE VIEW planet_osm_riverbank_medial AS SELECT osm_id, way, ST_ApproximateMedialAxis (way) AS medial FROM planet_osm_polygon WHERE waterway = 'riverbank';

Shapely allows me to manipulate the polygonal data, and fiona is used to save the results to a shapefile. This is the first time I ever use all of them (except SQLAlchemy), and it's nice that it's so easy to do all this in Python.


openstreetmap gis python

Posted Tue 26 Jul 2016 06:55:14 PM CEST Tags:

A few weeks ago an interesting PR for osm-carto landed in the project's GitHub page. It adds rendering for several natural relief features, adding ridges, valleys, aretes, dales, coulouirs and others to cliffs, peaks and mountain passes, which were already being rendered. I decided to try it in Elevation (offline for the moment).

I sync'ed the style first with the latest release, applied the patch and... not much. My current database is quite old (re-importing takes ages and I don't have space for updates), so I don't have much features like that in the region I'm interested in. In fact, I went checking and the closest mountain range around here was not in the database, so I added it.

By the way, the range is mostly concurrent with a part of an administrative boundary, but SomeoneElse and SK53 suggested to make a new line. Even when other features are nearby (there's a path close to the crest and it's also more or less the limit between a forest and a bare rock section), which already makes the region a little bit crowded with lines, it makes sense: boundaries, paths, forest borders and ridges change at different time scales, so having them as separate lines makes an update to any of those independent of the rest.

Now I wanted to export this feature and import it in my rendering database, so I can actually see the new part of the style. This is not an straightforward process, only because when I imported my data I used osm2pgsql --drop, which removes the much needed intermediate tables for when one wants to update with osm2pgsql --append. Here's a roundabout way to go.

First you download the full feature (thanks RichardF!). In this case:

http://www.openstreetmap.org/api/0.6/way/430573542/full

This not only exports the line (which is a sequence of references to nodes) with its tags, but the nodes too (which are the ones storing the coords). The next step is to convert it to something more malleable, for instance, GeoJSON. For that I used ogr2ogr like this:

ogr2ogr -f GeoJSON 430573542.GeoJSON 430573542.xml lines

The last parameter is needed because, quoting Even Rouault (a.k.a. José GDAL): «you will always get "points", "lines", "multilinestrings", "multipolygons" and "other_relations" layers when reading a osm file, even if some are empty», and the GeoJSON driver refuses to create layers for you:

ERROR 1: Layer lines not found, and <span class="createlink">CreateLayer</span> not supported by driver.

But guess what, that not the easiest way :) At least we learned something. In fact postgis already has a tool called shp2pgsql that imports ESRIShapeFiles, and ogr2ogr produces by default this kind of file. It creates a .shp file for each layer as discussed before, but again, we're only interested in the line one. So:

ogr2ogr 430573542 430573542.xml lines
shp2pgsql -a -s 900913 -S 430573542/lines.shp > 430573542.sql

We can't use this SQL file directly, as it has a couple of problems. First, you can't tell shp2pgsql the names of the table where you want to insert the data or the geometry column. Second, it only recognizes some attributes (see below), and the rest it tries to add them as hstore tags. So we have to manually edit the file to go from:

INSERT INTO "lines" ("osm_id","name","highway","waterway","aerialway","barrier","man_made","z_order","other_tags",geom)
    VALUES ('430573542','Montagne Sainte-Victoire',NULL,NULL,NULL,NULL,NULL,'0','"natural"=>"ridge"','010500002031BF0D[...]');

into:

INSERT INTO "planet_osm_line" ("osm_id","name","z_order","natural",way)
    VALUES ('430573542','Montagne Sainte-Victoire','0','ridge','010500002031BF0D[...]');

See? s/lines/planet_osm_line/, s/other_tags/"natural"/ (with double quotes, because natural is a keyword in SQL, as in natural join), s/geom/way/ and s/'"natural"=>"ridge"'/'ridge'/ (in single quotes, so it's a string; double quotes are for columns). And I also removed the superfluous values and the ANALIZE line, as I don't care that much. Easy peasy.

A comment on the options for shp2pgsql. -s 900913 declares the SRID of the database. I got that when I tried without and:

ERROR:  Geometry SRID (0) does not match column SRID (900913)

-S is needed because shp2pgsql by default generated MultiLineStrings, but that table in particular has a LineString way column. This is how I figure it out:

ERROR:  Geometry type (MultiLineString) does not match column type (LineString)

Incredibly, after this data massacre, it loads in the db:

$ psql gis < 430573542.sql
SET
SET
BEGIN
INSERT 0 1
COMMIT

Enjoy!


openstreetmap gdal postgis

Posted Tue 12 Jul 2016 05:37:22 PM CEST Tags:

Today I stumbled upon PyCon 2016's youtube channel and started watching some of the talks. The first one I really finished watching was Ned Batchelder's "Machete debugging", a very interesting talk about 4 strange bugs and the 4 strange techniques they used to find where those bugs were produced. It's a wonderful talk, full of ideas that, if you're a mere mortal developer like me, will probably blow your mind.

One of the techniques they use for one of the bugs is to actually write a trace function. A trace function in cpython context is a function that is called in several different points of execution of Python code. For more information see sys.settrace()'s documentation.

In my case I used tracing for something that I always liked about bash: that you can ask it to print every line that's being executed (even in functions and subprocesses!). I wanted something similar for ayrton, so I sat down to figure out how this would work.

The key to all this is the function I mention up there. The API seems simple enough at first sight, but it's a little more complicated. You give this function what is called the global trace function. This function will be called with three parameters: a frame, an event and a event-dependent arg. The event I'm interested in is line, which is called for each new line of code that is executed. The complication comes because what this global trace function should return is a local trace function. This function will be called with the same parameters as the global trace function. I would really like an explanation why this is so.

The job for this function, in ayrton's case, is simple: inspect the frame, extract the filename and line number and print that. At first this seems to mean that I should read the files by myself, but luckily there's another interesting standard module: linecache to the rescue. The only 'real complication' of ayrton's use is that it would not work if the script to run was passed with the -c|--script option, but (un)luckily the execution engine already has to read the hold the script in lines, so using that as the cache instead of linecache was easy.

Finally, if you're interested in the actual code, go take a look. Just take in account that ayrton has 3 levels of tracing: à la bash (script lines prepended by +), with line numbers, and tracing any Python line execution, including any modules you might use and their dependencies. And don't forget that it also has 3 levels of debug logging into files. See ayrton --help!


ayrton python

Posted Thu 23 Jun 2016 08:32:12 PM CEST Tags:

ayrton has always been able to use any Python module, package or extension as long as it is in a directory in sys.path, but trying to solve a bigger bug, I realized that there was no way to use ayrton modules or packages. Having only laterally heard about the new importlib module and the new mechanism, I sat down and read more about it.

The best source (or at least the easiest to find) is possibly what Python's reference says about the import system, but I have to be honest: it was not an easy read. Next week I'll sit down and see if I can improve it a little. So, for those out there who, like me, might be having some troubles understanding the mechanism, here's how I understand the system works (ignoring deprecated APIs and corner cases or even relative imports; I haven't used or tried those yet):

def import_single(full_path, parent=None, module=None):
    # try this cache first
    if full_path in sys.modules:
        return sys.modules[full_path]

    # if not, try all the finders
    for finder in sys.meta_path:
        if parent is not None:
            spec = finder.find_spec(full_path, parent.__path__, target)
        else:
            spec = finder.find_spec(full_path, None, target)

        # if the finder 'finds' ('knows how to handle') the full_path
        # it will return a loader
        if spec is not None:
            loader = spec.loader

            if module is None and hasattr(loader, 'create_module'):
                module = loader.create_module(spec)

            if module is None:
                module = ModuleType(spec.name)  # let's assume this creates an empty module object
                module.__spec__ = spec

            # add it to the cache before loading so it can referenced from it
            sys.modules[spec.name] = module
            try:
                # if the module was passed as parameter,
                # this repopulates the module's namespace
                # by executing the module's (possibly new) code
                loader.exec_module(module)
            except:
                # clean up
                del sys.modules[spec.name]
                raise

            return module

    raise ImportError


def import (full_path, target=None):
    parent= None

    # this code iterates over ['foo', 'foo.bar', 'foo.bar.baz']
    elems = full_path.split('.')
    for partial_path in [ '.'.join (elems[:i]) for i in range (len (elems)+1) ][1:]
        parent = import_single(partial_path, parent, target)

    # the module is loaded in parent
    return parent

A more complete version of the if spec is not None branch can be found in the Loading section of the reference. Notice that the algorithm uses all the finders in sys.meta_path. So which are the default finders?

In [9]: sys.meta_path
Out[9]:
[_frozen_importlib.BuiltinImporter,
 _frozen_importlib.FrozenImporter,
 _frozen_importlib_external.PathFinder]

Of those finders, the latter one is the one that traverses sys.path, and also has a hook mechanism. I didn't use those, so for the moment I didn't untangle how they work.

Finally, this is how I implemented importing ayrton modules and packages:

from importlib.abc import MetaPathFinder, Loader
from importlib.machinery import ModuleSpec
import sys
import os
import os.path

from ayrton.file_test import _a, _d
from ayrton import Ayrton
import ayrton.utils


class AyrtonLoader (Loader):

    @classmethod
    def exec_module (klass, module):
        # «the loader should execute the module’s code
        # in the module’s global name space (module.__dict__).»
        load_path= module.__spec__.origin
        loader= Ayrton (g=module.__dict__)
        loader.run_file (load_path)

        # set the __path__
        # TODO: read PEP 420
        init_file_name= '__init__.ay'
        if load_path.endswith (init_file_name):
            # also remove the '/'
            module.__path__= [ load_path[:-len (init_file_name)-1] ]

loader= AyrtonLoader ()


class AyrtonFinder (MetaPathFinder):

    @classmethod
    def find_spec (klass, full_name, paths=None, target=None):
        # TODO: read PEP 420 :)
        last_mile= full_name.split ('.')[-1]

        if paths is not None:
            python_path= paths  # search only in the paths provided by the machinery
        else:
            python_path= sys.path

        for path in python_path:
            full_path= os.path.join (path, last_mile)
            init_full_path= os.path.join (full_path, '__init__.ay')
            module_full_path= full_path+'.ay'

            if _d (full_path) and _a (init_full_path):
                return ModuleSpec (full_name, loader, origin=init_full_path)

            else:
                if _a (module_full_path):
                    return ModuleSpec (full_name, loader, origin=module_full_path)

        return None

finder= AyrtonFinder ()


# I must insert it at the beginning so it goes before FileFinder
sys.meta_path.insert (0, finder)

Notice all the references to PEP 420. I'm pretty sure I must be breaking something, but for the moment this works.


ayrton python

Posted Wed 15 Jun 2016 04:46:41 PM CEST Tags:

Remember this? Ok, maybe you never read that. The gist of the post is that I used strace -r -T to produce some logs that we «amassed[sic] [...] with a python script for generating[sic] a CSV file [...] and we got a very interesting graph». Mein Gott, sometimes the English I write is terrible... Here's that graph again:

This post is to announce that that Python script is now public. You can find it here. It's not as fancy as those flame graphs you see everywhere else, but it's a good first impression, specially if you have to wait until the SysAdmin installs perf or any other tool like that (ok, let's be fair, l/strace is not a standard tool, but probably your grumpy SysAdmin will be more willing to install those than something more intrusive; I know it happened to me, at least). It's written in Python3; I'll probably backport it to Python2 soon, so those stuck with it can still profit from it.

To produce a similar graph, use the --histogram option, then follow the suggestions spewed to stderr. I hope this helps you solve a problem like it did to me!


profiling python

Posted Thu 09 Jun 2016 03:27:24 PM CEST Tags:

A short story. For years I've not only accumulated thousands of pictures[1] (around 50, to be more precise) but also schemas on how to sort those photos. After a long internal debate, I settled for the following one (for the moment, that is):

  • Pictures are imported from the camera's SD via a script, which:
    • Renames them from DSCXXXXX.JPG to a date based file name, using the picture's exif data.
    • (Hard) links them to a year/month based dir structure.
  • Later, pictures are sorted by hand into thematic dirs for filtering.
  • The year/month tree is handled with digikam for tagging.
  • Pictures are filtered into their final destination sorted by category, year and event (for instance, trip/year/to_this_place).
  • Pictures are also (hard) linked into a tag based dir structure, using nested tags.

So, the whole workflow is:

SD card --> incoming/01-tmp -> incoming/02-new/theme -> incoming/03-cur --> final destination
        `-> year/month                                                  `-> tags/theme/parent/child

The reason for using hard links is the following: I rsync everything to my home server, both as backup and for feeding the gallery(s) there. Because pictures are moved from one location to another until they reach their final destination, rsync retransmits the picture in its new location and then deletes the old one (I'm using --delete-after, to make sure the backup does not loose any picture if the transfer is stopped). This leads to useless transfers, as the picture is in the remote.

I played with the idea of using git or even git-annex for working around this, but in the end I decided to (ab)use rsync's hard link support. Now moving any picture in the workflow or renaming a category, theme or directory just means creating new hardlinks to the links in the year/month tree and removing the old ones later, an almost immediate operation. This also helps saving space and time when implementing the tag based tree.

digikam is good enough to uniquely identify each picture, even when two hard links point to the same file. This still means the picture appears several times; the metadata (most importantly, tags) are shared, but each new link adds load to the digikam's database.

I bit the bullet and sat down to do a last move and be done. I moved the year/month tree to ByDate/, completely isolating it from the rest of the collection. Then pointed digikam to only read that, and here's how I did it:

  • Closed digikam, of course.
  • Backed up everything, including both the database, which was in the collection's root, and the .local/share/config/digikamrc file.
  • Modified the latter so it points to the new database location:
[Database Settings]
Database Name=/home/mdione/Pictures/ByDate/
Database Name Thumbnails=/home/mdione/Pictures/ByDate/
  • Moved the database.
  • Changed the collection's root dir in the database:
mdione@diablo:~/Pictures/ByDate$ sqlite3 digikam4.db
-- Notice the specificPath is relative to the volume. In my case, the volume is /home
sqlite> select * from AlbumRoots;
1|Pictures|0|1|volumeid:?uuid=f5cadc44-6324-466c-8a99-4ede7457677e|/mdione/Pictures
sqlite> update AlbumRoots set specificPath = '/mdione/Pictures/ByDate' where id = 1;
sqlite> select * from AlbumRoots;
1|Pictures|0|1|volumeid:?uuid=f5cadc44-6324-466c-8a99-4ede7457677e|/mdione/Pictures/ByDate
  • Fired digikam and let it rescan the collection, which recognized the only link to the image and kept the tags.

This worked superbly!


[1] When I talk about pictures, I'm also including videos.

Posted Mon 06 Jun 2016 04:59:11 PM CEST

I've been improving a little Elevation's reproducibility. One of the steps of setting it up is to download an extract to both import in the database and fetch the DEM files that will be part of the background. The particular extract that I'm using, Europe, is more than 17GiB in size, which means that it takes a looong time to download. Thus, I would like to have the ability to continue the download if it has been interrupted.

The original script that was trying to do that is using curl. This version is not trying to continue the download, which can easily be achieved by adding the --continue - option. The version that has it never hit the repo because of the following:

The problem arises when the file we want to download is rolled every day. This means that the contents of the file changes from one day to the other, and we can't just continue from we left if that's the case, we must start all over[1]. One could think that curl has an option that looks like it handles that, --time-cond, which is what the script is trying to use. This option makes curl send the If-Modified-Since HTTP header, which allows the server to respond with a 304 (Not modified) if the file is not newer that the provided date. The date the curl provides is the one from the file referenced by that option, and I was giving the same file as the one where the output goes. I was using these options wrong, it was doing it the other way around: continue if the file changed or doing nothing if not.

So I sat down to try and tackle the problem. I know one can use the HEAD request to check (at least) two things: the resource's date and size (bah, at least in the case of static files like this). So the original idea was to get the URL's date and size; if the date is newer than the local file, I should restart the download from scratch; if not and the size was bigger than the local file, then continue; otherwise, assume the file is finished downloading and stop there.

The last twist of the problem is that the only useful dates from the file were either ctime or mtime, but both change on every write on the file. This means that if I leave the script downloading the file, and in the meanwhile the file is rotated, and the download is interrupted and I try again later, the file's c/mtime is newer that the URL, even when is for a file that is older then the URL. So I had to add a parallel timestamp file that is created only when starting a download and never updated (until the next full download; the file is actually touch'ed), and it is its mtime the one used for comparing with the URL's.

Long story short, curl's --time-cond and --continue options combined are not for this, a HEAD helps a little bit, but rotation-while-downloading can further complicate things. One last feature one could ask to such a script would be to keep the old file while downloading a new one and rotate at the end, but I will leave it for when/if I really need it. The new script is written in ayrton because it's easier to handle execution output and dates in it than in bash. This also pushed me to make minor improvements to it, so expect a release soon.


[1] In fact the other options are not do anything (but then we're left with an incomplete, useless file) or to try and find the file; in the case of geofabrik, they keep the last week of daily rotation, the first day of each previous month back to the beginning of the year; then the first day of each year back to 2014. Good luck with that.


elevation ayrton

Posted Tue 10 May 2016 05:30:28 PM CEST Tags:

Yesterday I climbed Cime du Cherion and to my surprise I saw Corsica[0]. Then a friend of mine pointed me to an article explaining that if you manage to see the island from the coast is because a mirage in a dry air layer 1000m high due to the Föhn's effect. It's notable that the French Wikipedia article about this effect is way more complete than the English one.

Punta Minuta (2556m) is one of the highest points in Corsica close to the northwestern coast. Cime du Cherion is 1778m. The distance between them is[1]:

surface_distance= 225.11km

Earth's mean radius[2] is:

km_per_radian= 6371km

which is also by definition the length of a radian on the theoretical surface of the Earth[3]. Those two mountains are then separated by an angle of:

alpha= 225.11km/6371km= 0.035333 radians.

or a little more than 2°[4]. According to this, the sagitta is then:

sagitta= km_per_radian*(1-math.cos (alpha/2))= 0.994215km, or 994.215m.

This means that is is possible to see the last 1.5km of Punta Minuta from Cime du Cherion and almost anything above around 1000m, which is quite a lot of Corsica, but definitely not what I saw.

In conclusion, we were both right, but him more than me :) And yes, I'm ignoring there is an angle between both points; if we take that in account and assume that Cime du Cherion is at 0°, then the projection of Punta Minuta over the secant that passes through those points is:

projection= math.sin (0.035333)/0.035333*2556m= 2555.46m

A little over half a meter :) Doesn't really change much in the calculations.

Last, a graph showing the height of the sagitta in function of the distance, quite surprising!


[0] Name in corsican :)

[1] Measured with marble.

[2] From the same page, polar radius is 6356.8km and equatorial is 6378.1km. We're measuring points between 42°20' and 43°50'N, so using the median is not that crazy.

[3] Don't go there.

[4] Another fun fact: 1° is about 111km.


misc

Posted Fri 15 Apr 2016 02:47:19 PM CEST Tags:

In a shallow but long yak shaving streak, I ended up learning Ruby (again). Coming from a deep Python background (but also Perl and others), I sat to write down a cheatsheet so I can come back to it time and again:

module Foo  # this is the root of this namespace
# everything defined here must be referenced as Foo::thing, as in
Foo::CamelCase::real_method()

:symbol  # symbols are the simplest objects, wich only have a name and a unique value
:'symbol with spaces!'

"#{interpolated_expression}"  # how to iterpolate expressions in strings

/regular_expression/  # very Perl-ish

generator { |item| block }  # this is related to yield

%q{quote words}  # à la perl!
%w{words}  # same?

def classless_method_aka_function(default=:value)  # Ruby calls these methods too
    block  # ruby custom indents by 2!
end

method_call :without :parens

class CamelCase < SuperClass  # simple inheritance
    include Bar  # this is a mixin;
    # Bar is a module and the class 'inherits' all its 'methods'

    public :real_method, :assign_method=
    protected :mutator_method!
    private :query_method?

    self  # here is the class!
    def real_method(*args)  # splat argument, can be anywhere
        # no **kwargs?
        super  # this calls SuperClass::real_method(*args)
        # comapre with
        super()  # this calls SuperClass::real_method()!

        local_variable
        @instance_variable  # always private
        @@class_variable  # always private
        $global_variable

        return self
        # alternatively
        self
        # as the implicit return value is the last statement executed
        # and all statements produce a value
    end

    def assign_method=()
        # conventionally for this kind of syntactic sugar:
        # When the interpreter sees the message "name" followed by " =",
        # it automatically ignores the space before the equal sign
        # and reads the single message "name=" -
        # a call to the method whose name is name=
    end

    class << self
        # this is in metaclass context!
    end

    protected
    def mutator_method!(a, *more, b)
        # conventionally this modifies the instance
    end

    private
    def query_method?()
        # conventionally returns true/false
    end
end

# extending classes
class CamelCase  # do I need to respecify the inheritance here?
    def more_methods ()
    end
end

obj.send(:method_name_as_symbol, args, ...)

begin
    raise 'exceptions can be strings'
rescue OneType => e  # bind the exception to e
    # rescued
rescue AnotherType
    # also
ensure
    # finally
else
    # fallback
end

=begin
Long
comment
blocks!
=end

statement; statement

long \
line

# everything is true except
false
# and
nil

variable ||= default_value

`shell`

AConstant  # technically class names are constants
# so do module names
A_CONSTANT  # conventionally; always public
# The Ruby interpreter does not actually enforce the constancy of constants,
# but it does issue a warning if a program changes the value of a constant

# case is an expression
foo = case
    when true then 100
    when false then 200
    else 300
end

do |args; another_local_variable|
    # args are local variables of this block
    # whose scope ends with the block
    # and which can eclipse another variable of the same name
    # in the containing scope

    # another_local_variable is declared local but does not
    # consume parameters
end

{ |args| ... }  # another block, conventionally single lined

# Matz says that any method can be called with a block as an implicit argument.
# Inside the method, you can call the block using the yield keyword with a value.

# Matz is Joe Ruby

# yield is not what python does
# see http://rubylearning.com/satishtalim/ruby_blocks.html
# block_given?

a= []  # array
a[0] == nil

ENV  # hash holding envvars
ARGV  # array with CL arguments

(1..10)  # range
(0...10)  # python like
5 === (1..10)  # true, 'case equality operator'

{ :symbol => 'value' } == { symbol: 'value' }  # hashes, not blocks :)

lambda { ... }  # convert a block into a Proc object

# you cannot pass methods into other methods (but you can pass Procs into methods),
# and methods cannot return other methods (but they can return Procs).

load 'foo.rb'  # #include like
require 'foo'  # import, uses $:
require_relative 'bar'  # import from local dir

It's not complete; in particular, I didn't want to go into depth on what yield does (hint: not what does in Python). I hope it's useful to others. I strongly recommend to read this tutorial.

Also, brace yourselves; my impression is that Ruby is not as well documented as we're used in Python.


python

Posted Thu 31 Mar 2016 06:33:50 PM CEST Tags:

ayrton is an modification of the Python language that tries to make it look more like a shell programming language. It takes ideas already present in sh, adds a few functions for better emulating envvars, and provides a mechanism for (semi) transparent remote execution via ssh.

A small update on v0.7.2:

  • Fix iterating over the log ouput of a Command in synchronous mode (that is, not running in the _bg). This complements the fix in the previous release.

Get it on github or pypi!


python ayrton

Posted Fri 26 Feb 2016 02:01:43 PM CET Tags:

This time we focused on making ayrton more debuggable and the scripts too. Featurewise, this release fixes a couple of bugs: one when executing remote code with the wrong Python version and another while iterating over long outputs. The latter needs more work so it's more automatic. Here's the ChangeLog:

  • Fix running remote tests with other versions of Python.
  • Fix tests broken by a change in ls's output.
  • Fix iterating over the long output of a command à la for line in foo(...): .... Currently you must add _bg=True to the execution options.
  • Fix recognizing names bound by for loops.
  • Added options -d|--debug, -dd|--debug2 and -ddd|--debug3 for enabling debug logs.
  • Added option -xxx|--trace-all for tracing all python execution. Use with caution, it generates lots of output.

Get it on github or pypi!


python ayrton

Posted Thu 25 Feb 2016 01:20:07 PM CET Tags:

A weird release, written from Russia via ssh on a tablet. The changelog is enough to show what's new:

  • Iterable parameters to executables are expanded in situ, so foo(..., i, ...) is expanded to foo (..., i[0], i[1], ... and foo (..., k=i, ...) is expanded to foo (..., k=i[0], k=i[1], ....
  • -x|--trace allows for minimal execution tracing.
  • -xx|--trace-with-linenos allows for execution tracing that also prints the line number.

Get it on github or pypi!


python ayrton

Posted Wed 10 Feb 2016 11:15:38 AM CET Tags:

If you pick up a starving dog and make him prosperous, he will not bite you.
This is the principal difference between a dog and a man.
        -- Mark Twain, "Pudd'nhead Wilson's Calendar"