glob no es un blog. No en el sentido corriente de la palabra. Es un registro de mis proyectos y otras interacciones con el software libre.

glob is not a blog. Not in the common meaning of the word. It's a record of my projects and other interactions with libre software.

All texts CC BY Marcos Dione


A few weeks ago an interesting PR for osm-carto landed in the project's GitHub page. It adds rendering for several natural relief features, adding ridges, valleys, aretes, dales, coulouirs and others to cliffs, peaks and mountain passes, which were already being rendered. I decided to try it in Elevation (offline for the moment).

I sync'ed the style first with the latest release, applied the patch and... not much. My current database is quite old (re-importing takes ages and I don't have space for updates), so I don't have much features like that in the region I'm interested in. In fact, I went checking and the closest mountain range around here was not in the database, so I added it.

By the way, the range is mostly concurrent with a part of an administrative boundary, but SomeoneElse and SK53 suggested to make a new line. Even when other features are nearby (there's a path close to the crest and it's also more or less the limit between a forest and a bare rock section), which already makes the region a little bit crowded with lines, it makes sense: boundaries, paths, forest borders and ridges change at different time scales, so having them as separate lines makes an update to any of those independent of the rest.

Now I wanted to export this feature and import it in my rendering database, so I can actually see the new part of the style. This is not an straightforward process, only because when I imported my data I used osm2pgsql --drop, which removes the much needed intermediate tables for when one wants to update with osm2pgsql --append. Here's a roundabout way to go.

First you download the full feature (thanks RichardF!). In this case:

http://www.openstreetmap.org/api/0.6/way/430573542/full

This not only exports the line (which is a sequence of references to nodes) with its tags, but the nodes too (which are the ones storing the coords). The next step is to convert it to something more malleable, for instance, GeoJSON. For that I used ogr2ogr like this:

ogr2ogr -f GeoJSON 430573542.GeoJSON 430573542.xml lines

The last parameter is needed because, quoting Even Rouault (a.k.a. José GDAL): «you will always get "points", "lines", "multilinestrings", "multipolygons" and "other_relations" layers when reading a osm file, even if some are empty», and the GeoJSON driver refuses to create layers for you:

ERROR 1: Layer lines not found, and <span class="createlink">CreateLayer</span> not supported by driver.

But guess what, that not the easiest way :) At least we learned something. In fact postgis already has a tool called shp2pgsql that imports ESRIShapeFiles, and ogr2ogr produces by default this kind of file. It creates a .shp file for each layer as discussed before, but again, we're only interested in the line one. So:

ogr2ogr 430573542 430573542.xml lines
shp2pgsql -a -s 900913 -S 430573542/lines.shp > 430573542.sql

We can't use this SQL file directly, as it has a couple of problems. First, you can't tell shp2pgsql the names of the table where you want to insert the data or the geometry column. Second, it only recognizes some attributes (see below), and the rest it tries to add them as hstore tags. So we have to manually edit the file to go from:

INSERT INTO "lines" ("osm_id","name","highway","waterway","aerialway","barrier","man_made","z_order","other_tags",geom)
    VALUES ('430573542','Montagne Sainte-Victoire',NULL,NULL,NULL,NULL,NULL,'0','"natural"=>"ridge"','010500002031BF0D[...]');

into:

INSERT INTO "planet_osm_line" ("osm_id","name","z_order","natural",way)
    VALUES ('430573542','Montagne Sainte-Victoire','0','ridge','010500002031BF0D[...]');

See? s/lines/planet_osm_line/, s/other_tags/"natural"/ (with double quotes, because natural is a keyword in SQL, as in natural join), s/geom/way/ and s/'"natural"=>"ridge"'/'ridge'/ (in single quotes, so it's a string; double quotes are for columns). And I also removed the superfluous values and the ANALIZE line, as I don't care that much. Easy peasy.

A comment on the options for shp2pgsql. -s 900913 declares the SRID of the database. I got that when I tried without and:

ERROR:  Geometry SRID (0) does not match column SRID (900913)

-S is needed because shp2pgsql by default generated MultiLineStrings, but that table in particular has a LineString way column. This is how I figure it out:

ERROR:  Geometry type (MultiLineString) does not match column type (LineString)

Incredibly, after this data massacre, it loads in the db:

$ psql gis < 430573542.sql
SET
SET
BEGIN
INSERT 0 1
COMMIT

Enjoy!


openstreetmap gdal postgis

Posted Tue 12 Jul 2016 05:37:22 PM CEST Tags:

Today I stumbled upon PyCon 2016's youtube channel and started watching some of the talks. The first one I really finished watching was Ned Batchelder's "Machete debugging", a very interesting talk about 4 strange bugs and the 4 strange techniques they used to find where those bugs were produced. It's a wonderful talk, full of ideas that, if you're a mere mortal developer like me, will probably blow your mind.

One of the techniques they use for one of the bugs is to actually write a trace function. A trace function in cpython context is a function that is called in several different points of execution of Python code. For more information see sys.settrace()'s documentation.

In my case I used tracing for something that I always liked about bash: that you can ask it to print every line that's being executed (even in functions and subprocesses!). I wanted something similar for ayrton, so I sat down to figure out how this would work.

The key to all this is the function I mention up there. The API seems simple enough at first sight, but it's a little more complicated. You give this function what is called the global trace function. This function will be called with three parameters: a frame, an event and a event-dependent arg. The event I'm interested in is line, which is called for each new line of code that is executed. The complication comes because what this global trace function should return is a local trace function. This function will be called with the same parameters as the global trace function. I would really like an explanation why this is so.

The job for this function, in ayrton's case, is simple: inspect the frame, extract the filename and line number and print that. At first this seems to mean that I should read the files by myself, but luckily there's another interesting standard module: linecache to the rescue. The only 'real complication' of ayrton's use is that it would not work if the script to run was passed with the -c|--script option, but (un)luckily the execution engine already has to read the hold the script in lines, so using that as the cache instead of linecache was easy.

Finally, if you're interested in the actual code, go take a look. Just take in account that ayrton has 3 levels of tracing: à la bash (script lines prepended by +), with line numbers, and tracing any Python line execution, including any modules you might use and their dependencies. And don't forget that it also has 3 levels of debug logging into files. See ayrton --help!


ayrton python

Posted Thu 23 Jun 2016 08:32:12 PM CEST Tags:

ayrton has always been able to use any Python module, package or extension as long as it is in a directory in sys.path, but trying to solve a bigger bug, I realized that there was no way to use ayrton modules or packages. Having only laterally heard about the new importlib module and the new mechanism, I sat down and read more about it.

The best source (or at least the easiest to find) is possibly what Python's reference says about the import system, but I have to be honest: it was not an easy read. Next week I'll sit down and see if I can improve it a little. So, for those out there who, like me, might be having some troubles understanding the mechanism, here's how I understand the system works (ignoring deprecated APIs and corner cases or even relative imports; I haven't used or tried those yet):

def import_single(full_path, parent=None, module=None):
    # try this cache first
    if full_path in sys.modules:
        return sys.modules[full_path]

    # if not, try all the finders
    for finder in sys.meta_path:
        if parent is not None:
            spec = finder.find_spec(full_path, parent.__path__, target)
        else:
            spec = finder.find_spec(full_path, None, target)

        # if the finder 'finds' ('knows how to handle') the full_path
        # it will return a loader
        if spec is not None:
            loader = spec.loader

            if module is None and hasattr(loader, 'create_module'):
                module = loader.create_module(spec)

            if module is None:
                module = ModuleType(spec.name)  # let's assume this creates an empty module object
                module.__spec__ = spec

            # add it to the cache before loading so it can referenced from it
            sys.modules[spec.name] = module
            try:
                # if the module was passed as parameter,
                # this repopulates the module's namespace
                # by executing the module's (possibly new) code
                loader.exec_module(module)
            except:
                # clean up
                del sys.modules[spec.name]
                raise

            return module

    raise ImportError


def import (full_path, target=None):
    parent= None

    # this code iterates over ['foo', 'foo.bar', 'foo.bar.baz']
    elems = full_path.split('.')
    for partial_path in [ '.'.join (elems[:i]) for i in range (len (elems)+1) ][1:]
        parent = import_single(partial_path, parent, target)

    # the module is loaded in parent
    return parent

A more complete version of the if spec is not None branch can be found in the Loading section of the reference. Notice that the algorithm uses all the finders in sys.meta_path. So which are the default finders?

In [9]: sys.meta_path
Out[9]:
[_frozen_importlib.BuiltinImporter,
 _frozen_importlib.FrozenImporter,
 _frozen_importlib_external.PathFinder]

Of those finders, the latter one is the one that traverses sys.path, and also has a hook mechanism. I didn't use those, so for the moment I didn't untangle how they work.

Finally, this is how I implemented importing ayrton modules and packages:

from importlib.abc import MetaPathFinder, Loader
from importlib.machinery import ModuleSpec
import sys
import os
import os.path

from ayrton.file_test import _a, _d
from ayrton import Ayrton
import ayrton.utils


class AyrtonLoader (Loader):

    @classmethod
    def exec_module (klass, module):
        # «the loader should execute the module’s code
        # in the module’s global name space (module.__dict__).»
        load_path= module.__spec__.origin
        loader= Ayrton (g=module.__dict__)
        loader.run_file (load_path)

        # set the __path__
        # TODO: read PEP 420
        init_file_name= '__init__.ay'
        if load_path.endswith (init_file_name):
            # also remove the '/'
            module.__path__= [ load_path[:-len (init_file_name)-1] ]

loader= AyrtonLoader ()


class AyrtonFinder (MetaPathFinder):

    @classmethod
    def find_spec (klass, full_name, paths=None, target=None):
        # TODO: read PEP 420 :)
        last_mile= full_name.split ('.')[-1]

        if paths is not None:
            python_path= paths  # search only in the paths provided by the machinery
        else:
            python_path= sys.path

        for path in python_path:
            full_path= os.path.join (path, last_mile)
            init_full_path= os.path.join (full_path, '__init__.ay')
            module_full_path= full_path+'.ay'

            if _d (full_path) and _a (init_full_path):
                return ModuleSpec (full_name, loader, origin=init_full_path)

            else:
                if _a (module_full_path):
                    return ModuleSpec (full_name, loader, origin=module_full_path)

        return None

finder= AyrtonFinder ()


# I must insert it at the beginning so it goes before FileFinder
sys.meta_path.insert (0, finder)

Notice all the references to PEP 420. I'm pretty sure I must be breaking something, but for the moment this works.


ayrton python

Posted Wed 15 Jun 2016 04:46:41 PM CEST Tags:

Remember this? Ok, maybe you never read that. The gist of the post is that I used strace -r -T to produce some logs that we «amassed[sic] [...] with a python script for generating[sic] a CSV file [...] and we got a very interesting graph». Mein Gott, sometimes the English I write is terrible... Here's that graph again:

This post is to announce that that Python script is now public. You can find it here. It's not as fancy as those flame graphs you see everywhere else, but it's a good first impression, specially if you have to wait until the SysAdmin installs perf or any other tool like that (ok, let's be fair, l/strace is not a standard tool, but probably your grumpy SysAdmin will be more willing to install those than something more intrusive; I know it happened to me, at least). It's written in Python3; I'll probably backport it to Python2 soon, so those stuck with it can still profit from it.

To produce a similar graph, use the --histogram option, then follow the suggestions spewed to stderr. I hope this helps you solve a problem like it did to me!


profiling python

Posted Thu 09 Jun 2016 03:27:24 PM CEST Tags:

A short story. For years I've not only accumulated thousands of pictures[1] (around 50, to be more precise) but also schemas on how to sort those photos. After a long internal debate, I settled for the following one (for the moment, that is):

  • Pictures are imported from the camera's SD via a script, which:
    • Renames them from DSCXXXXX.JPG to a date based file name, using the picture's exif data.
    • (Hard) links them to a year/month based dir structure.
  • Later, pictures are sorted by hand into thematic dirs for filtering.
  • The year/month tree is handled with digikam for tagging.
  • Pictures are filtered into their final destination sorted by category, year and event (for instance, trip/year/to_this_place).
  • Pictures are also (hard) linked into a tag based dir structure, using nested tags.

So, the whole workflow is:

SD card --> incoming/01-tmp -> incoming/02-new/theme -> incoming/03-cur --> final destination
        `-> year/month                                                  `-> tags/theme/parent/child

The reason for using hard links is the following: I rsync everything to my home server, both as backup and for feeding the gallery(s) there. Because pictures are moved from one location to another until they reach their final destination, rsync retransmits the picture in its new location and then deletes the old one (I'm using --delete-after, to make sure the backup does not loose any picture if the transfer is stopped). This leads to useless transfers, as the picture is in the remote.

I played with the idea of using git or even git-annex for working around this, but in the end I decided to (ab)use rsync's hard link support. Now moving any picture in the workflow or renaming a category, theme or directory just means creating new hardlinks to the links in the year/month tree and removing the old ones later, an almost immediate operation. This also helps saving space and time when implementing the tag based tree.

digikam is good enough to uniquely identify each picture, even when two hard links point to the same file. This still means the picture appears several times; the metadata (most importantly, tags) are shared, but each new link adds load to the digikam's database.

I bit the bullet and sat down to do a last move and be done. I moved the year/month tree to ByDate/, completely isolating it from the rest of the collection. Then pointed digikam to only read that, and here's how I did it:

  • Closed digikam, of course.
  • Backed up everything, including both the database, which was in the collection's root, and the .local/share/config/digikamrc file.
  • Modified the latter so it points to the new database location:
[Database Settings]
Database Name=/home/mdione/Pictures/ByDate/
Database Name Thumbnails=/home/mdione/Pictures/ByDate/
  • Moved the database.
  • Changed the collection's root dir in the database:
mdione@diablo:~/Pictures/ByDate$ sqlite3 digikam4.db
-- Notice the specificPath is relative to the volume. In my case, the volume is /home
sqlite> select * from AlbumRoots;
1|Pictures|0|1|volumeid:?uuid=f5cadc44-6324-466c-8a99-4ede7457677e|/mdione/Pictures
sqlite> update AlbumRoots set specificPath = '/mdione/Pictures/ByDate' where id = 1;
sqlite> select * from AlbumRoots;
1|Pictures|0|1|volumeid:?uuid=f5cadc44-6324-466c-8a99-4ede7457677e|/mdione/Pictures/ByDate
  • Fired digikam and let it rescan the collection, which recognized the only link to the image and kept the tags.

This worked superbly!


[1] When I talk about pictures, I'm also including videos.

Posted Mon 06 Jun 2016 04:59:11 PM CEST

I've been improving a little Elevation's reproducibility. One of the steps of setting it up is to download an extract to both import in the database and fetch the DEM files that will be part of the background. The particular extract that I'm using, Europe, is more than 17GiB in size, which means that it takes a looong time to download. Thus, I would like to have the ability to continue the download if it has been interrupted.

The original script that was trying to do that is using curl. This version is not trying to continue the download, which can easily be achieved by adding the --continue - option. The version that has it never hit the repo because of the following:

The problem arises when the file we want to download is rolled every day. This means that the contents of the file changes from one day to the other, and we can't just continue from we left if that's the case, we must start all over[1]. One could think that curl has an option that looks like it handles that, --time-cond, which is what the script is trying to use. This option makes curl send the If-Modified-Since HTTP header, which allows the server to respond with a 304 (Not modified) if the file is not newer that the provided date. The date the curl provides is the one from the file referenced by that option, and I was giving the same file as the one where the output goes. I was using these options wrong, it was doing it the other way around: continue if the file changed or doing nothing if not.

So I sat down to try and tackle the problem. I know one can use the HEAD request to check (at least) two things: the resource's date and size (bah, at least in the case of static files like this). So the original idea was to get the URL's date and size; if the date is newer than the local file, I should restart the download from scratch; if not and the size was bigger than the local file, then continue; otherwise, assume the file is finished downloading and stop there.

The last twist of the problem is that the only useful dates from the file were either ctime or mtime, but both change on every write on the file. This means that if I leave the script downloading the file, and in the meanwhile the file is rotated, and the download is interrupted and I try again later, the file's c/mtime is newer that the URL, even when is for a file that is older then the URL. So I had to add a parallel timestamp file that is created only when starting a download and never updated (until the next full download; the file is actually touch'ed), and it is its mtime the one used for comparing with the URL's.

Long story short, curl's --time-cond and --continue options combined are not for this, a HEAD helps a little bit, but rotation-while-downloading can further complicate things. One last feature one could ask to such a script would be to keep the old file while downloading a new one and rotate at the end, but I will leave it for when/if I really need it. The new script is written in ayrton because it's easier to handle execution output and dates in it than in bash. This also pushed me to make minor improvements to it, so expect a release soon.


[1] In fact the other options are not do anything (but then we're left with an incomplete, useless file) or to try and find the file; in the case of geofabrik, they keep the last week of daily rotation, the first day of each previous month back to the beginning of the year; then the first day of each year back to 2014. Good luck with that.


elevation ayrton

Posted Tue 10 May 2016 05:30:28 PM CEST Tags:

Yesterday I climbed Cime du Cherion and to my surprise I saw Corsica[0]. Then a friend of mine pointed me to an article explaining that if you manage to see the island from the coast is because a mirage in a dry air layer 1000m high due to the Föhn's effect. It's notable that the French Wikipedia article about this effect is way more complete than the English one.

Punta Minuta (2556m) is one of the highest points in Corsica close to the northwestern coast. Cime du Cherion is 1778m. The distance between them is[1]:

surface_distance= 225.11km

Earth's mean radius[2] is:

km_per_radian= 6371km

which is also by definition the length of a radian on the theoretical surface of the Earth[3]. Those two mountains are then separated by an angle of:

alpha= 225.11km/6371km= 0.035333 radians.

or a little more than 2°[4]. According to this, the sagitta is then:

sagitta= km_per_radian*(1-math.cos (alpha/2))= 0.994215km, or 994.215m.

This means that is is possible to see the last 1.5km of Punta Minuta from Cime du Cherion and almost anything above around 1000m, which is quite a lot of Corsica, but definitely not what I saw.

In conclusion, we were both right, but him more than me :) And yes, I'm ignoring there is an angle between both points; if we take that in account and assume that Cime du Cherion is at 0°, then the projection of Punta Minuta over the secant that passes through those points is:

projection= math.sin (0.035333)/0.035333*2556m= 2555.46m

A little over half a meter :) Doesn't really change much in the calculations.

Last, a graph showing the height of the sagitta in function of the distance, quite surprising!


[0] Name in corsican :)

[1] Measured with marble.

[2] From the same page, polar radius is 6356.8km and equatorial is 6378.1km. We're measuring points between 42°20' and 43°50'N, so using the median is not that crazy.

[3] Don't go there.

[4] Another fun fact: 1° is about 111km.


misc

Posted Fri 15 Apr 2016 02:47:19 PM CEST Tags:

In a shallow but long yak shaving streak, I ended up learning Ruby (again). Coming from a deep Python background (but also Perl and others), I sat to write down a cheatsheet so I can come back to it time and again:

module Foo  # this is the root of this namespace
# everything defined here must be referenced as Foo::thing, as in
Foo::CamelCase::real_method()

:symbol  # symbols are the simplest objects, wich only have a name and a unique value
:'symbol with spaces!'

"#{interpolated_expression}"  # how to iterpolate expressions in strings

/regular_expression/  # very Perl-ish

generator { |item| block }  # this is related to yield

%q{quote words}  # à la perl!
%w{words}  # same?

def classless_method_aka_function(default=:value)  # Ruby calls these methods too
    block  # ruby custom indents by 2!
end

method_call :without :parens

class CamelCase < SuperClass  # simple inheritance
    include Bar  # this is a mixin;
    # Bar is a module and the class 'inherits' all its 'methods'

    public :real_method, :assign_method=
    protected :mutator_method!
    private :query_method?

    self  # here is the class!
    def real_method(*args)  # splat argument, can be anywhere
        # no **kwargs?
        super  # this calls SuperClass::real_method(*args)
        # comapre with
        super()  # this calls SuperClass::real_method()!

        local_variable
        @instance_variable  # always private
        @@class_variable  # always private
        $global_variable

        return self
        # alternatively
        self
        # as the implicit return value is the last statement executed
        # and all statements produce a value
    end

    def assign_method=()
        # conventionally for this kind of syntactic sugar:
        # When the interpreter sees the message "name" followed by " =",
        # it automatically ignores the space before the equal sign
        # and reads the single message "name=" -
        # a call to the method whose name is name=
    end

    class << self
        # this is in metaclass context!
    end

    protected
    def mutator_method!(a, *more, b)
        # conventionally this modifies the instance
    end

    private
    def query_method?()
        # conventionally returns true/false
    end
end

# extending classes
class CamelCase  # do I need to respecify the inheritance here?
    def more_methods ()
    end
end

obj.send(:method_name_as_symbol, args, ...)

begin
    raise 'exceptions can be strings'
rescue OneType => e  # bind the exception to e
    # rescued
rescue AnotherType
    # also
ensure
    # finally
else
    # fallback
end

=begin
Long
comment
blocks!
=end

statement; statement

long \
line

# everything is true except
false
# and
nil

variable ||= default_value

`shell`

AConstant  # technically class names are constants
# so do module names
A_CONSTANT  # conventionally; always public
# The Ruby interpreter does not actually enforce the constancy of constants,
# but it does issue a warning if a program changes the value of a constant

# case is an expression
foo = case
    when true then 100
    when false then 200
    else 300
end

do |args; another_local_variable|
    # args are local variables of this block
    # whose scope ends with the block
    # and which can eclipse another variable of the same name
    # in the containing scope

    # another_local_variable is declared local but does not
    # consume parameters
end

{ |args| ... }  # another block, conventionally single lined

# Matz says that any method can be called with a block as an implicit argument.
# Inside the method, you can call the block using the yield keyword with a value.

# Matz is Joe Ruby

# yield is not what python does
# see http://rubylearning.com/satishtalim/ruby_blocks.html
# block_given?

a= []  # array
a[0] == nil

ENV  # hash holding envvars
ARGV  # array with CL arguments

(1..10)  # range
(0...10)  # python like
5 === (1..10)  # true, 'case equality operator'

{ :symbol => 'value' } == { symbol: 'value' }  # hashes, not blocks :)

lambda { ... }  # convert a block into a Proc object

# you cannot pass methods into other methods (but you can pass Procs into methods),
# and methods cannot return other methods (but they can return Procs).

load 'foo.rb'  # #include like
require 'foo'  # import, uses $:
require_relative 'bar'  # import from local dir

It's not complete; in particular, I didn't want to go into depth on what yield does (hint: not what does in Python). I hope it's useful to others. I strongly recommend to read this tutorial.

Also, brace yourselves; my impression is that Ruby is not as well documented as we're used in Python.


python

Posted Thu 31 Mar 2016 06:33:50 PM CEST Tags:

ayrton is an modification of the Python language that tries to make it look more like a shell programming language. It takes ideas already present in sh, adds a few functions for better emulating envvars, and provides a mechanism for (semi) transparent remote execution via ssh.

A small update on v0.7.2:

  • Fix iterating over the log ouput of a Command in synchronous mode (that is, not running in the _bg). This complements the fix in the previous release.

Get it on github or pypi!


python ayrton

Posted Fri 26 Feb 2016 02:01:43 PM CET Tags:

This time we focused on making ayrton more debuggable and the scripts too. Featurewise, this release fixes a couple of bugs: one when executing remote code with the wrong Python version and another while iterating over long outputs. The latter needs more work so it's more automatic. Here's the ChangeLog:

  • Fix running remote tests with other versions of Python.
  • Fix tests broken by a change in ls's output.
  • Fix iterating over the long output of a command à la for line in foo(...): .... Currently you must add _bg=True to the execution options.
  • Fix recognizing names bound by for loops.
  • Added options -d|--debug, -dd|--debug2 and -ddd|--debug3 for enabling debug logs.
  • Added option -xxx|--trace-all for tracing all python execution. Use with caution, it generates lots of output.

Get it on github or pypi!


python ayrton

Posted Thu 25 Feb 2016 01:20:07 PM CET Tags:

A weird release, written from Russia via ssh on a tablet. The changelog is enough to show what's new:

  • Iterable parameters to executables are expanded in situ, so foo(..., i, ...) is expanded to foo (..., i[0], i[1], ... and foo (..., k=i, ...) is expanded to foo (..., k=i[0], k=i[1], ....
  • -x|--trace allows for minimal execution tracing.
  • -xx|--trace-with-linenos allows for execution tracing that also prints the line number.

Get it on github or pypi!


python ayrton

Posted Wed 10 Feb 2016 11:15:38 AM CET Tags:

For a long time I've been searching for a program that would allow me to plan (car) trips with my friends. Yes, I know of the existence of Google Maps, but the service has several characteristics that doesn't make it appealing to me, and lacks a couple of features I expect. This is more or less the list of things I want:

  1. Define the list of points I want to go to. No-brainer.
  2. Define the specific route I want to take. This is normally implemented by adding more control points, but normally they're of the same category as the waypoins of the places you want to visit. I think they shouldn't.
  3. Define stages; for instance, one stage per day.
  4. Get the distance and time of each stage; this is important when visiting several cities, for having an idea of how much time during the day you'll spend going to the next one.
  5. Define alternative routes, just in case you don't really have/make the time to visit some points.
  6. Store the trips in cookies, share them via a URL or central site, but that anybody can easily install in their own server.
  7. Manage several trips at the same time.

So I sat down to try and create such a thing. Currently is just a mashup of several things GIS: my own OSM data rendering, my own waypoints-in-cookies idea (in fact, this is the expansion of what fired that post) and OSRM for the routing. As for the backend, I decided to try flask and flask-restful for creating a small REST API for storing all this. So far some basics work (points #1 and #6, partially), and I had some fun during the last week learning RESTful, some more Javascript (including LeafLet and some jQuery) and putting all this together. Here are some interesting things I found out:

  • RESTful is properly defined, but not for all URL/method pairs. In particular, given that I decide that trip ids are their name, I defined a POST to trips/ as the UPSERT for that name. I hope SQLAlchemy implements it soon.
  • Most of the magic of RESTful APIs happen in the model of your service.
  • Creating APIs with flask-restful could not be more obvious.
  • I still have to get my head around Javascript's prototypes.
  • Mouse/finger events are a nightmare in browsers. In particular, with current leafLet, you get clicked events on double clicks, unless you use the appropriate singleclick plugin from here.
  • Given XSS attacks, same-origin policy is enforced for AJAX requests. If you control the web service, the easiest way to go around it is CORS.
  • The only way to do such calls with jQuery is using the low level function $.ajax().
  • jQuery provides a function to parse JSON but not to serialize to it; use window.JSON.stringify().
  • Javascript's default parameters were not recognized by my browser :(.
  • OSRM's viaroute returns the coordinates multiplied by 10 for precision reasons, so you have to scale it down.
  • Nominatim and OSRM rock!

I still have lots of things to learn and finish, so stay tunned for updates. Currently the code resides in Elevation's code, but I'll split it in the future.

Update:

I have it running here. You can add waypoints by clicking in the map, delete them by doublecliking them, save to cookies or the server (for the moment it overwrites what's there, as you can't name the trips or manage several yet) and ask for the routing.

trip-planner elevation openstreetmap osrm python flask leaflet javascript jquery

Posted Mon 25 Jan 2016 03:52:06 PM CET Tags:

Ever since I started working in Elevation I faced the problem that it's mostly a openstreetmap-carto fork. This means that each time osm-carto has new changes, I have to adapt mine. We all know this is not an easy task.

My first approach was to turn to osm-carto's VCS, namely, git. The idea was to keep a branch with my version, then pull from time to time, and merge the latest released version into my branch, never merging mine into master, just in case I decided to do some modifications that could benefit osm-carto too. In that case, I would work on my branch, make a commit there, then cherry pick it into master, push to my fork in GitHub, make a Pull Request and voilà.

All this theory is nice, but in practice it doesn't quite work. Every time I tried to merge the release into my local branch, I got several conflicts, not to mention modifications that made some of my changes obsolete or at least forcing me to refactor them in the new code (this is the developer in me talking, you see...). While I resolved these conflicts and refactorings, the working copy's state was a complete mess, forcing me to fix them all just to be able to render again.

As this was not a very smooth workflow, I tried another approach: keeping my local modifications in a big patch. This of course had the same and other problems that the previous approach, so I gained nothing but more headaches.

Then I thought: who else manages forks, and at a massive scale? Linux distributions. See, distros have to patch the packaged software to compile on their environments. They also keep security patches that also are sent upstream for inclusion. Once a patch is accepted upstream, they can drop their local patch. This sounds almost exactly the workflow I want for Elevation.

And what do they use for managing the patches? quilt. This tool is heavily used in the Debian distro and is maintained by someone working at SuSE. Its documentation is a little bit sparse, but the tool is very simple to use. You start doing quilt init just to create the directories and files that it will use to keep track of your changes.

The modification workflow is a little bit more complex that with git:

  1. You mark the beginning of a new patch with quilt new <patch_name>;
    1. Then either tell quilt to track the files you will modify for this patch with quilt add <file> ... (in fact it just needs to be told before you save the new version of each file, because it will save the current state of the file for producing the diff later),
    2. Or use quilt edit <file> to do both things at the same time (it will use $EDITOR for editing the file);
  2. Then do your changes;
  3. Check everything is ok (it compiles, passes tests, renders, whatever);
  4. And finally record the changes (in the form of a patch) with quilt refresh.

In fact, this last 4 items (add, edit, test, refresh) can be done multiple times and they will affect the current patch.

Why do I say current? Because quilt keeps a stack of patches, in what it calls a series. Every time you do quilt new a new patch is put on top of the stack, and in the series just behind the current patch; all the other commends affect the patch that is currently on top. You can 'move' through the series with quilt pop [<patch_name>] and quilt push [<patch_name>]. if a patch name is provided, it will pop/push all the intermediate patches in the series.

How does this help with my fork? It actually does not save me from conflicts and refactorings, it just makes these problems much easier to handle. My update workflow is the following (which non incidentally mimics the one Debian Developers and Maintainers do every time they update their packages):

  1. I quilt pop -a so I go back to the pristine version, with no modifications;
  2. I git pull to get the newest version, then git tag and git checkout <last_tag> (I just want to keep in sync with releases);
  3. quilt push -a will try to apply all the patches. If one fails, quilt stops and lets me check the situation.
    1. quilt push -f will try harder to apply the troubling patch; sometimes is just a matter of too many offset lines or too much fuzzyness needed.
    2. If it doesn't apply, a .rej wil be generated and you should pick up from there.
    3. In any case, once everything is up to date, I need to run quilt refresh and the patch will be updated[1].
    4. Then I try quilt push -a again.
    5. If a patch is no longer useful (because it was fixed upstream or because it doesn't make any sense), then I can simply quilt delete <patch>; this will remove it from the series, but the patch file will still exist (unless I use the option -r[2]).

As long as I keep my patches minimal, there are big chances that they will be easier to integrate into new releases. I can even keep track of the patches in my own branch without fearing having conflicts, and allowing me to independently provide fixes upstream.

Let's see how this works in the future; at least in a couple of months I will be rendering again. Meanwhile, I will be moving to a database update workflow that includes pulling diffs from geofabrik.


[1] Being an old timer VCS user (since cvs times), I wish they would call this command update.

[2] Why not --long-options?


elevation openstreetmap utils

Posted Wed 30 Dec 2015 06:37:04 PM CET Tags:

Another long-ish cycle (1.5 months, more or less). That's what two weeks of vacation do to the project.

This time I fixed executing things in, and handling the standard streams between the ayrton script and, the remote(), so now we can run complex programs like vi and mc. The ChangeLog:

  • Send data to/from the remote via another ssh channel, which is more stable than using stdin.
  • Stabilized a lot all tests, specially those using a mocked stdout for getting test validation.
  • A lot of tests have been moved to their own scripts in ayrton/tests/scripts, which also work as (very minimal) examples of what's working.
  • Use flake8 to check the code.
  • Move remote() to its own source.
  • API change: if a str or bytes object is passed in _in, then it's the name of a file where to read stdin. If it's an int, then it's considered a file descriptor. This makes the API consistent to _out and _err handling.
  • More error handling.
  • Fixed errors with global variables handling.
  • argv is handled at the last time possible, allowing it being passed from test invocation.
  • shift complains on negative values.
  • Lazy pprint(), so debug statements do not do useless work.
  • stdin/out/err handling in remote() is done by a single thread.
  • Modify a lot the local terminal when in remote() so, among other things, we have no local echo.
  • Properly pass the terminal type and size to the remote. These last three features allow programs like vi be run in the remote.
  • Paved the road to make remote()s more like Command()s.

Get it on github or pypi!


python ayrton

Posted Wed 09 Dec 2015 04:32:29 PM CET Tags:

One of ayrton's features is the remote execution of code and programs via ssh. For this I initially used paramiko, which is a complete reimplementation of the ssh protocol in pure Python. It manages to connect, authenticate and create channels and port forwardings with any recent ssh server, and is quite easy:

import paramiko

c= paramiko.SSHClient ()
c.connect (...)

# get_pty=True so we emulate a tty and programs like vi and mc work
i, o, e= c.execute_command (command, get_pty=True)

So far so good, but the interface is those 3 objects, i, o and e, that represent the remote command's stdin, stdout and stderr. If one wants to fully implement a client, one needs to copy everything from the local process' standard streams to those.

For this, the most brute force approach is to create a thread for each pair of streams[1]:

class CopyThread (Thread):
    def __init__ (self, src, dst):
        super ().__init__ ()
        self.src= src
        self.dst= dst

    def run (self):
        while True:
            data= self.src.read (1024)
            if len (data)==0:
                break
            else:
                self.dst.write (data)

        self.close ()

    def close (self):
        self.src.close ()
        self.dst.close ()

This for some reason does not work out of the bat. When I implemented it in ayrton, what I got was that I didn't get anything from stdout or stderr until the remote code was finished. I tiptoed a little around the problem, but at the end I took cue from one of paramiko's examples and implemented a single copy loop with select():

class InteractiveThread (Thread):
    def __init__ (self, pairs):
        super ().__init__ ()
        self.pairs= pairs
        self.copy_to= dict (pairs)
        self.finished= os.pipe ()

    def run (self):
        while True:
            wait_for= list (self.copy_to.keys ())
            wait_for.append (self.finished[0])
            r, w, e= select (wait_for, [], [])

            if self.finished[0] in r:
                self.self.finished[0].close ()
                break

            for i in r:
                o= self.copy_to[i]
                data= i.read (1024)
                if len (data)==0:
                    # do not try to read any more from this file
                    del self.copy_to[i]
                else:
                    o.write (data)

        self.close ()


    def close (self):
        for k, v in self.pairs:
            for f in (k, v):
                 f.close ()

        self.finished[1].close ()


t= InteractiveThread (( (0, i), (o, 1), (e, 2) ))
t.start ()
[...]
t.close ()

The extra pipe, finished, is there to make sure we don't wait forever for stdin to finish.

This completely solves the problem of handling the streams, but that's not the only problem. The next step is to handle the fact that when we do some input via stdin, we see it twice. This is because both the local and the remote terminals are echoing what we type, so we just need to disable the local echoing. In fact, ssh does quite more than that:

class InteractiveThread (Thread):
    def __init__ (self, pairs):
        super ().__init__ ()

        [...]

        self.orig_terminfo= tcgetattr (pairs[0][0])
        # input, output, control, local, speeds, special chars
        iflag, oflag, cflag, lflag, ispeed, ospeed, cc= self.orig_terminfo

        # turn on:
        # Ignore framing errors and parity errors
        iflag|= IGNPAR
        # turn off:
        # Strip off eighth bit
        # Translate NL to CR on input
        # Ignore carriage return on input
        # XON/XOFF flow control on output
        # (XSI) Typing any character will restart stopped output. NOTE: not needed?
        # XON/XOFF flow control on input
        iflag&= ~( ISTRIP | INLCR | IGNCR | ICRNL | IXON | IXANY | IXOFF )

        # turn off:
        # When any of the characters INTR, QUIT, SUSP, or DSUSP are received, generate the corresponding signal
        # canonical mode
        # Echo input characters (finally)
        # NOTE: why these three? they only work with ICANON and we're disabling it
        # If ICANON is also set, the ERASE character erases the preceding input character, and WERASE erases the preceding word
        # If ICANON is also set, the KILL character erases the current line
        # If ICANON is also set, echo the NL character even if ECHO is not set
        # implementation-defined input processing
        lflag&= ~( ISIG | ICANON | ECHO | ECHOE | ECHOK | ECHONL | IEXTEN )

        # turn off:
        # implementation-defined output processing
        oflag&= ~OPOST

        # NOTE: whatever
        # Minimum number of characters for noncanonical read
        cc[VMIN]= 1
        # Timeout in deciseconds for noncanonical read
        cc[VTIME]= 0

        tcsetattr(self.pairs[0][0], TCSADRAIN, [ iflag, oflag, cflag, lflag,
                                                 ispeed, ospeed, cc ])


    def close (self):
        # reset term settings
        tcsetattr (self.pairs[0][0], TCSADRAIN, self.orig_terminfo)

        [...]

I won't pretend I understand all of that. Checking the file's history, I'm tempted to bet that neither the openssh developers do. I would even bet that it was taken from a telnet or rsh implementation or something. This is the kind of things I meant when I wrote my previous post about implementing these complex pieces of software as a library with a public API and a shallow frontend in the form of a program. At least the guys from openssh say that they're going in that direction. That's wonderful news.

Almost there. The last stone in the way is the terminal emulation. As is, SSHClient.execute_command() tells the other end that we're running in a 80x25 VT100 terminal. Unluckily the API does not allow us to set it by ourselves, but SSHClient.execute_command() is a very simple method that we can rewrite:

channel= c.get_transport ().open_session ()
term= shutil.get_terminal_size ()
channel.get_pty (os.environ['TERM'], term.columns, term.lines)

Reacting to SIGWINCH and changing the terminal's size is left as an exercise for the reader :)


[1] In fact this might seem slightly wasteful, as data has to be read into user space and then pushed down back to the kernel. The problem is that os.sendfile() only works if src is a kernel object that supports mmap(), which sockets don't, and even when splice() is available in a 3dr party module, one of the parameters must be a pipe. There is at least one huge thread spread over 4 or 5 kernel mailing lists discussing widening the applicability of splice(), but to be honest, I hadn't finished reading it.


python ayrton

Posted Wed 09 Dec 2015 11:49:03 AM CET Tags:

In my last job I had to do a complex Python script for merging several git histories into one. I used Python because I needed to do a lot of high level stuff I was sure bash would be a pain to use, like building trees and traversing them. My options for managing the underlaying git repositories were two: either do an ugly hack to execute git and parse its output; or use the ugly hack that already exists, called GitPython. The first was not an option, and the second meant that in some corner cases I had to rely on it just to execute particular git invocations. It was not pleasant, but it somehow worked.

While developing ayrton I'm using paramiko as the ssh client for implementing semi transparent remote code execution. The problem I have with it is that it's mostly aimed at executing commands almost blindly, with not much interaction. It only makes sense: its main client code is fabric, which mostly uses it in that context. ayrton aims to have a ssh client as transparent as the original openssh client, but bugs like this and this are in the way.

What those two situations have in common? Well, there are two incomplete Python libraries to emulate an existing program. At least in the case of GitPython they have a backdoor to call git directly. My complaint is not their incompleteness, far from it, but the fact that they have to do it from scratch. It's because of that that they're incomplete.

Take ayrton, for instance. It's mostly an executable that serves as an interpreter for scripts written in that language (dialect?), but it's implementation is so that the executable itself barely handles command line options and calls a library. That library implements everything that ayrton does for interpreting the language, to the point where most unit tests are using ayrton library for executing ayrton scripts. ayrton is not alone, others do similarly: fades, and at some point all those other Python modules like timeit or unittest.

So that's my wish for these Christmas, or Three Wise Men day[1], or my birthday next month; I would even accept it as an Easter Egg: have all these complex pieces of software implemented mainly as a public library (even if the API changed a lot, but that right now should be fairly stable) and very thin frontends as executables. I wish for libgit and libssh and their Python bindings.


[1] In my culture, kids get presents that day too.


python rants

Posted Wed 02 Dec 2015 05:03:24 PM CET Tags:

Last weekend I was at PyCon.ar at Mendoza, Argentina. As always, it was a good opportunity to find old and new friends; learn more about python, technology and more; and this time I even gave a talk.

I went to see several talks, but they were not recorded, so I have no links to videos to provide. The highlight for me was Argentina En Python's and DjangoGirl's Django tutorial. It was a very good taste of what the former are doing all over South America, which is simply incredible.

The talk I gave was actually heavily based/stolen from a talk by A. Jesse Jiryu Davis called How Do Python Coroutines Work?. I recommend you to watch it because it's amazing. That was one of the things I forgot to mention during the talk; the other one is that the classes Future and Task developed in the live-coding session[1] resemble a lot the ones the asyncio module offers, so their introduction is completely deliberate, even when they're swept under the rug quickly. Thanks to @hernantz, I was remembered of the article/book chapter that Davis and GvRossum wrote about asyncio.

I also used a lightning talk slot for promoting ayrton and showing a little Elevation. I just put the few slides online here.

So all in all, it was once more an amazing experience. Crossing my fingers, see you next year!


[1] The first rue about giving a talk is that you never do live coding. Some are just too stubborn...


python

Posted Sat 21 Nov 2015 04:39:25 PM CET Tags:

Almost two months since the last release, and with reason: I've been working hard to define remote()'s semantics. So far this is what there is:

  • Global and current scope's local variables can be used in the remote code. This includes envvars.
  • Changes to the local variables return to the local code.
  • Execution is done synchronously.

What this means is that if we had the following code:

[block 1]
with remote (...):
    [block 2]
[block 3]

we have the following:

  • Variables in block 1's scope are visible in block 2.
  • Modification to the local scope in block 2 are visible in block 3.
  • block 3 does not start executing until block 2 has finished.

This imposes some limitations on how we can communicate with the remote code. As it is synchronous, we can't expect to be able to send and receive data from block 3, so the previous way of communicating with paramiko's streams is no longer possible. On the other hand, stdin, stdout and stderr are not transmitted yet between the local and the remote, which means that actually no communication is posible right now except via variables.

Also, because of the way remote() is implemented, currently functions, classes and modules that are going to be used in the remote code must be declared/imported there.

Finally, exceptions raised in the remote code will be reraised in the local code. This two things together mean that any custom exception in the script must be declared twice if they're raised in the remote :(.

But all in all I'm happy with the new, defined semantics. I worked a lot to make sure the first two points worked properly. It took me a while to figure out how to do the changes to the scope's locals after the new values were returned from the remote. I know this is a very specific use case, but if you're interested, here's the thread were Armin Rigo tells me how it's done. You might also be interested in Yaniv Ankin's Python innards.

Finally, check the full ChangeLog:

  • Great improvements in remote()'s API and sematics
    • Made sure local variables go to and come back from the remote.
    • Code block is executed synchronously.
    • For the moment the streams are no longer returned.
    • _python_only option is gone.
    • Most tests actually connect to a listening netcat, only one test uses ssh.
  • Fixed bugs in the new parser.
  • Fixed globals/locals mix up.
  • Scripts are no longer wrapped in a function. This means that you can't return values and that module semantics are restored.
  • ayrton exits with status 1 when the script fails to run (SyntaxError, etc).

More fun times are coming!

Get it on github or pypi!


python ayrton

Posted Wed 28 Oct 2015 11:25:15 PM CET Tags:

I forgot to mention: last night I finally got to release ayrton-0.5. This has a major update to the language, thanks to our new parser, craftily thieved out of pypy. Other similar changes might come soon. Meanwhile, here's the ChangeLog:

  • Much better command detection.
  • CommandNotFound exception is now a subclass of NameError.
  • Allow Command keywords be named like -l and --long-option, so it supports options with single dashes (-long-option, à la find).
  • This also means that long-option is no longer passed as --long-option; you have to put the dashes explicitly.
  • bash() does not return a single string by default; override with single=True.
  • Way more tests.
  • Updated docs.

Get it on github or pypi! You can always find everything about ayrton in its GitHub page.


python ayrton

Posted Mon 31 Aug 2015 08:46:27 PM CEST Tags:

Nice tricks I found out trying to unfuck my laptop's setup, all my fault:

  • You can use snapshot.debian.org to recover packages for any date for any release that was available at that date. I actually new this, but somehow I forgot. I used deb http://snapshot.debian.org/archive/debian/20150720T214439Z/ testing main.

  • For that you have to disable the Packages-file-too-old check, which I have never seen, ever. Put this in any file in your /etc/apt.conf.d dir:

Acquire {
    Check-Valid-Until "false";
}
  • aptitude has a menu bar (activate with C-t), a preferences dialog, and you can set it up so any operation with a package moves down the cursor. Finally I figure that out.

  • It also has a dselect theme, but I was not brave enough to try it (for the record, I love dselect, I miss the fact that it shows how dependencies are resolved in the moment they're needed).

  • You can disable aptitude's resolver (-o Aptitude::ProblemResolver::StepLimit=0), but it doesn't make the UI that much more responsive (???).

  • digikam is not on testing right now. It FTBFS with gcc5 and has a licence problem.

  • Don't ride Debian sid right now, it's suffering a gcc transition and it might take a while.


debian

Posted Mon 31 Aug 2015 08:46:27 PM CEST Tags:

Don't look now, but there is a multi-legged creature on your shoulder.