glob no es un blog. No en el sentido corriente de la palabra. Es un registro de mis proyectos y otras interacciones con el software libre.

glob is not a blog. Not in the common meaning of the word. It's a record of my projects and other interactions with libre software.

All texts CC BY Marcos Dione


I've been improving a little Elevation's reproducibility. One of the steps of setting it up is to download an extract to both import in the database and fetch the DEM files that will be part of the background. The particular extract that I'm using, Europe, is more than 17GiB in size, which means that it takes a looong time to download. Thus, I would like to have the ability to continue the download if it has been interrupted.

The original script that was trying to do that is using curl. This version is not trying to continue the download, which can easily be achieved by adding the --continue - option. The version that has it never hit the repo because of the following:

The problem arises when the file we want to download is rolled every day. This means that the contents of the file changes from one day to the other, and we can't just continue from we left if that's the case, we must start all over[1]. One could think that curl has an option that looks like it handles that, --time-cond, which is what the script is trying to use. This option makes curl send the If-Modified-Since HTTP header, which allows the server to respond with a 304 (Not modified) if the file is not newer that the provided date. The date the curl provides is the one from the file referenced by that option, and I was giving the same file as the one where the output goes. I was using these options wrong, it was doing it the other way around: continue if the file changed or doing nothing if not.

So I sat down to try and tackle the problem. I know one can use the HEAD request to check (at least) two things: the resource's date and size (bah, at least in the case of static files like this). So the original idea was to get the URL's date and size; if the date is newer than the local file, I should restart the download from scratch; if not and the size was bigger than the local file, then continue; otherwise, assume the file is finished downloading and stop there.

The last twist of the problem is that the only useful dates from the file were either ctime or mtime, but both change on every write on the file. This means that if I leave the script downloading the file, and in the meanwhile the file is rotated, and the download is interrupted and I try again later, the file's c/mtime is newer that the URL, even when is for a file that is older then the URL. So I had to add a parallel timestamp file that is created only when starting a download and never updated (until the next full download; the file is actually touch'ed), and it is its mtime the one used for comparing with the URL's.

Long story short, curl's --time-cond and --continue options combined are not for this, a HEAD helps a little bit, but rotation-while-downloading can further complicate things. One last feature one could ask to such a script would be to keep the old file while downloading a new one and rotate at the end, but I will leave it for when/if I really need it. The new script is written in ayrton because it's easier to handle execution output and dates in it than in bash. This also pushed me to make minor improvements to it, so expect a release soon.


[1] In fact the other options are not do anything (but then we're left with an incomplete, useless file) or to try and find the file; in the case of geofabrik, they keep the last week of daily rotation, the first day of each previous month back to the beginning of the year; then the first day of each year back to 2014. Good luck with that.


elevation ayrton

Posted Tue 10 May 2016 05:30:28 PM CEST Tags:

Yesterday I climbed Cime du Cherion and to my surprise I saw Corsica[0]. Then a friend of mine pointed me to an article explaining that if you manage to see the island from the coast is because a mirage in a dry air layer 1000m high due to the Föhn's effect. It's notable that the French Wikipedia article about this effect is way more complete than the English one.

Punta Minuta (2556m) is one of the highest points in Corsica close to the northwestern coast. Cime du Cherion is 1778m. The distance between them is[1]:

surface_distance= 225.11km

Earth's mean radius[2] is:

km_per_radian= 6371km

which is also by definition the length of a radian on the theoretical surface of the Earth[3]. Those two mountains are then separated by an angle of:

alpha= 225.11km/6371km= 0.035333 radians.

or a little more than 2°[4]. According to this, the sagitta is then:

sagitta= km_per_radian*(1-math.cos (alpha/2))= 0.994215km, or 994.215m.

This means that is is possible to see the last 1.5km of Punta Minuta from Cime du Cherion and almost anything above around 1000m, which is quite a lot of Corsica, but definitely not what I saw.

In conclusion, we were both right, but him more than me :) And yes, I'm ignoring there is an angle between both points; if we take that in account and assume that Cime du Cherion is at 0°, then the projection of Punta Minuta over the secant that passes through those points is:

projection= math.sin (0.035333)/0.035333*2556m= 2555.46m

A little over half a meter :) Doesn't really change much in the calculations.

Last, a graph showing the height of the sagitta in function of the distance, quite surprising!


[0] Name in corsican :)

[1] Measured with marble.

[2] From the same page, polar radius is 6356.8km and equatorial is 6378.1km. We're measuring points between 42°20' and 43°50'N, so using the median is not that crazy.

[3] Don't go there.

[4] Another fun fact: 1° is about 111km.


misc

Posted Fri 15 Apr 2016 02:47:19 PM CEST Tags:

In a shallow but long yak shaving streak, I ended up learning Ruby (again). Coming from a deep Python background (but also Perl and others), I sat to write down a cheatsheet so I can come back to it time and again:

module Foo  # this is the root of this namespace
# everything defined here must be referenced as Foo::thing, as in
Foo::CamelCase::real_method()

:symbol  # symbols are the simplest objects, wich only have a name and a unique value
:'symbol with spaces!'

"#{interpolated_expression}"  # how to iterpolate expressions in strings

/regular_expression/  # very Perl-ish

generator { |item| block }  # this is related to yield

%q{quote words}  # à la perl!
%w{words}  # same?

def classless_method_aka_function(default=:value)  # Ruby calls these methods too
    block  # ruby custom indents by 2!
end

method_call :without :parens

class CamelCase < SuperClass  # simple inheritance
    include Bar  # this is a mixin;
    # Bar is a module and the class 'inherits' all its 'methods'

    public :real_method, :assign_method=
    protected :mutator_method!
    private :query_method?

    self  # here is the class!
    def real_method(*args)  # splat argument, can be anywhere
        # no **kwargs?
        super  # this calls SuperClass::real_method(*args)
        # comapre with
        super()  # this calls SuperClass::real_method()!

        local_variable
        @instance_variable  # always private
        @@class_variable  # always private
        $global_variable

        return self
        # alternatively
        self
        # as the implicit return value is the last statement executed
        # and all statements produce a value
    end

    def assign_method=()
        # conventionally for this kind of syntactic sugar:
        # When the interpreter sees the message "name" followed by " =",
        # it automatically ignores the space before the equal sign
        # and reads the single message "name=" -
        # a call to the method whose name is name=
    end

    class << self
        # this is in metaclass context!
    end

    protected
    def mutator_method!(a, *more, b)
        # conventionally this modifies the instance
    end

    private
    def query_method?()
        # conventionally returns true/false
    end
end

# extending classes
class CamelCase  # do I need to respecify the inheritance here?
    def more_methods ()
    end
end

obj.send(:method_name_as_symbol, args, ...)

begin
    raise 'exceptions can be strings'
rescue OneType => e  # bind the exception to e
    # rescued
rescue AnotherType
    # also
ensure
    # finally
else
    # fallback
end

=begin
Long
comment
blocks!
=end

statement; statement

long \
line

# everything is true except
false
# and
nil

variable ||= default_value

`shell`

AConstant  # technically class names are constants
# so do module names
A_CONSTANT  # conventionally; always public
# The Ruby interpreter does not actually enforce the constancy of constants,
# but it does issue a warning if a program changes the value of a constant

# case is an expression
foo = case
    when true then 100
    when false then 200
    else 300
end

do |args; another_local_variable|
    # args are local variables of this block
    # whose scope ends with the block
    # and which can eclipse another variable of the same name
    # in the containing scope

    # another_local_variable is declared local but does not
    # consume parameters
end

{ |args| ... }  # another block, conventionally single lined

# Matz says that any method can be called with a block as an implicit argument.
# Inside the method, you can call the block using the yield keyword with a value.

# Matz is Joe Ruby

# yield is not what python does
# see http://rubylearning.com/satishtalim/ruby_blocks.html
# block_given?

a= []  # array
a[0] == nil

ENV  # hash holding envvars
ARGV  # array with CL arguments

(1..10)  # range
(0...10)  # python like
5 === (1..10)  # true, 'case equality operator'

{ :symbol => 'value' } == { symbol: 'value' }  # hashes, not blocks :)

lambda { ... }  # convert a block into a Proc object

# you cannot pass methods into other methods (but you can pass Procs into methods),
# and methods cannot return other methods (but they can return Procs).

load 'foo.rb'  # #include like
require 'foo'  # import, uses $:
require_relative 'bar'  # import from local dir

It's not complete; in particular, I didn't want to go into depth on what yield does (hint: not what does in Python). I hope it's useful to others. I strongly recommend to read this tutorial.

Also, brace yourselves; my impression is that Ruby is not as well documented as we're used in Python.


python

Posted Thu 31 Mar 2016 06:33:50 PM CEST Tags:

ayrton is an modification of the Python language that tries to make it look more like a shell programming language. It takes ideas already present in sh, adds a few functions for better emulating envvars, and provides a mechanism for (semi) transparent remote execution via ssh.

A small update on v0.7.2:

  • Fix iterating over the log ouput of a Command in synchronous mode (that is, not running in the _bg). This complements the fix in the previous release.

Get it on github or pypi!


python ayrton

Posted Fri 26 Feb 2016 02:01:43 PM CET Tags:

This time we focused on making ayrton more debuggable and the scripts too. Featurewise, this release fixes a couple of bugs: one when executing remote code with the wrong Python version and another while iterating over long outputs. The latter needs more work so it's more automatic. Here's the ChangeLog:

  • Fix running remote tests with other versions of Python.
  • Fix tests broken by a change in ls's output.
  • Fix iterating over the long output of a command à la for line in foo(...): .... Currently you must add _bg=True to the execution options.
  • Fix recognizing names bound by for loops.
  • Added options -d|--debug, -dd|--debug2 and -ddd|--debug3 for enabling debug logs.
  • Added option -xxx|--trace-all for tracing all python execution. Use with caution, it generates lots of output.

Get it on github or pypi!


python ayrton

Posted Thu 25 Feb 2016 01:20:07 PM CET Tags:

A weird release, written from Russia via ssh on a tablet. The changelog is enough to show what's new:

  • Iterable parameters to executables are expanded in situ, so foo(..., i, ...) is expanded to foo (..., i[0], i[1], ... and foo (..., k=i, ...) is expanded to foo (..., k=i[0], k=i[1], ....
  • -x|--trace allows for minimal execution tracing.
  • -xx|--trace-with-linenos allows for execution tracing that also prints the line number.

Get it on github or pypi!


python ayrton

Posted Wed 10 Feb 2016 11:15:38 AM CET Tags:

For a long time I've been searching for a program that would allow me to plan (car) trips with my friends. Yes, I know of the existence of Google Maps, but the service has several characteristics that doesn't make it appealing to me, and lacks a couple of features I expect. This is more or less the list of things I want:

  1. Define the list of points I want to go to. No-brainer.
  2. Define the specific route I want to take. This is normally implemented by adding more control points, but normally they're of the same category as the waypoins of the places you want to visit. I think they shouldn't.
  3. Define stages; for instance, one stage per day.
  4. Get the distance and time of each stage; this is important when visiting several cities, for having an idea of how much time during the day you'll spend going to the next one.
  5. Define alternative routes, just in case you don't really have/make the time to visit some points.
  6. Store the trips in cookies, share them via a URL or central site, but that anybody can easily install in their own server.
  7. Manage several trips at the same time.

So I sat down to try and create such a thing. Currently is just a mashup of several things GIS: my own OSM data rendering, my own waypoints-in-cookies idea (in fact, this is the expansion of what fired that post) and OSRM for the routing. As for the backend, I decided to try flask and flask-restful for creating a small REST API for storing all this. So far some basics work (points #1 and #6, partially), and I had some fun during the last week learning RESTful, some more Javascript (including LeafLet and some jQuery) and putting all this together. Here are some interesting things I found out:

  • RESTful is properly defined, but not for all URL/method pairs. In particular, given that I decide that trip ids are their name, I defined a POST to trips/ as the UPSERT for that name. I hope SQLAlchemy implements it soon.
  • Most of the magic of RESTful APIs happen in the model of your service.
  • Creating APIs with flask-restful could not be more obvious.
  • I still have to get my head around Javascript's prototypes.
  • Mouse/finger events are a nightmare in browsers. In particular, with current leafLet, you get clicked events on double clicks, unless you use the appropriate singleclick plugin from here.
  • Given XSS attacks, same-origin policy is enforced for AJAX requests. If you control the web service, the easiest way to go around it is CORS.
  • The only way to do such calls with jQuery is using the low level function $.ajax().
  • jQuery provides a function to parse JSON but not to serialize to it; use window.JSON.stringify().
  • Javascript's default parameters were not recognized by my browser :(.
  • OSRM's viaroute returns the coordinates multiplied by 10 for precision reasons, so you have to scale it down.
  • Nominatim and OSRM rock!

I still have lots of things to learn and finish, so stay tunned for updates. Currently the code resides in Elevation's code, but I'll split it in the future.

Update:

I have it running here. You can add waypoints by clicking in the map, delete them by doublecliking them, save to cookies or the server (for the moment it overwrites what's there, as you can't name the trips or manage several yet) and ask for the routing.

trip-planner elevation openstreetmap osrm python flask leaflet javascript jquery

Posted Mon 25 Jan 2016 03:52:06 PM CET Tags:

Ever since I started working in Elevation I faced the problem that it's mostly a openstreetmap-carto fork. This means that each time osm-carto has new changes, I have to adapt mine. We all know this is not an easy task.

My first approach was to turn to osm-carto's VCS, namely, git. The idea was to keep a branch with my version, then pull from time to time, and merge the latest released version into my branch, never merging mine into master, just in case I decided to do some modifications that could benefit osm-carto too. In that case, I would work on my branch, make a commit there, then cherry pick it into master, push to my fork in GitHub, make a Pull Request and voilà.

All this theory is nice, but in practice it doesn't quite work. Every time I tried to merge the release into my local branch, I got several conflicts, not to mention modifications that made some of my changes obsolete or at least forcing me to refactor them in the new code (this is the developer in me talking, you see...). While I resolved these conflicts and refactorings, the working copy's state was a complete mess, forcing me to fix them all just to be able to render again.

As this was not a very smooth workflow, I tried another approach: keeping my local modifications in a big patch. This of course had the same and other problems that the previous approach, so I gained nothing but more headaches.

Then I thought: who else manages forks, and at a massive scale? Linux distributions. See, distros have to patch the packaged software to compile on their environments. They also keep security patches that also are sent upstream for inclusion. Once a patch is accepted upstream, they can drop their local patch. This sounds almost exactly the workflow I want for Elevation.

And what do they use for managing the patches? quilt. This tool is heavily used in the Debian distro and is maintained by someone working at SuSE. Its documentation is a little bit sparse, but the tool is very simple to use. You start doing quilt init just to create the directories and files that it will use to keep track of your changes.

The modification workflow is a little bit more complex that with git:

  1. You mark the beginning of a new patch with quilt new <patch_name>;
    1. Then either tell quilt to track the files you will modify for this patch with quilt add <file> ... (in fact it just needs to be told before you save the new version of each file, because it will save the current state of the file for producing the diff later),
    2. Or use quilt edit <file> to do both things at the same time (it will use $EDITOR for editing the file);
  2. Then do your changes;
  3. Check everything is ok (it compiles, passes tests, renders, whatever);
  4. And finally record the changes (in the form of a patch) with quilt refresh.

In fact, this last 4 items (add, edit, test, refresh) can be done multiple times and they will affect the current patch.

Why do I say current? Because quilt keeps a stack of patches, in what it calls a series. Every time you do quilt new a new patch is put on top of the stack, and in the series just behind the current patch; all the other commends affect the patch that is currently on top. You can 'move' through the series with quilt pop [<patch_name>] and quilt push [<patch_name>]. if a patch name is provided, it will pop/push all the intermediate patches in the series.

How does this help with my fork? It actually does not save me from conflicts and refactorings, it just makes these problems much easier to handle. My update workflow is the following (which non incidentally mimics the one Debian Developers and Maintainers do every time they update their packages):

  1. I quilt pop -a so I go back to the pristine version, with no modifications;
  2. I git pull to get the newest version, then git tag and git checkout <last_tag> (I just want to keep in sync with releases);
  3. quilt push -a will try to apply all the patches. If one fails, quilt stops and lets me check the situation.
    1. quilt push -f will try harder to apply the troubling patch; sometimes is just a matter of too many offset lines or too much fuzzyness needed.
    2. If it doesn't apply, a .rej wil be generated and you should pick up from there.
    3. In any case, once everything is up to date, I need to run quilt refresh and the patch will be updated[1].
    4. Then I try quilt push -a again.
    5. If a patch is no longer useful (because it was fixed upstream or because it doesn't make any sense), then I can simply quilt delete <patch>; this will remove it from the series, but the patch file will still exist (unless I use the option -r[2]).

As long as I keep my patches minimal, there are big chances that they will be easier to integrate into new releases. I can even keep track of the patches in my own branch without fearing having conflicts, and allowing me to independently provide fixes upstream.

Let's see how this works in the future; at least in a couple of months I will be rendering again. Meanwhile, I will be moving to a database update workflow that includes pulling diffs from geofabrik.


[1] Being an old timer VCS user (since cvs times), I wish they would call this command update.

[2] Why not --long-options?


elevation openstreetmap utils

Posted Wed 30 Dec 2015 06:37:04 PM CET Tags:

Another long-ish cycle (1.5 months, more or less). That's what two weeks of vacation do to the project.

This time I fixed executing things in, and handling the standard streams between the ayrton script and, the remote(), so now we can run complex programs like vi and mc. The ChangeLog:

  • Send data to/from the remote via another ssh channel, which is more stable than using stdin.
  • Stabilized a lot all tests, specially those using a mocked stdout for getting test validation.
  • A lot of tests have been moved to their own scripts in ayrton/tests/scripts, which also work as (very minimal) examples of what's working.
  • Use flake8 to check the code.
  • Move remote() to its own source.
  • API change: if a str or bytes object is passed in _in, then it's the name of a file where to read stdin. If it's an int, then it's considered a file descriptor. This makes the API consistent to _out and _err handling.
  • More error handling.
  • Fixed errors with global variables handling.
  • argv is handled at the last time possible, allowing it being passed from test invocation.
  • shift complains on negative values.
  • Lazy pprint(), so debug statements do not do useless work.
  • stdin/out/err handling in remote() is done by a single thread.
  • Modify a lot the local terminal when in remote() so, among other things, we have no local echo.
  • Properly pass the terminal type and size to the remote. These last three features allow programs like vi be run in the remote.
  • Paved the road to make remote()s more like Command()s.

Get it on github or pypi!


python ayrton

Posted Wed 09 Dec 2015 04:32:29 PM CET Tags:

One of ayrton's features is the remote execution of code and programs via ssh. For this I initially used paramiko, which is a complete reimplementation of the ssh protocol in pure Python. It manages to connect, authenticate and create channels and port forwardings with any recent ssh server, and is quite easy:

import paramiko

c= paramiko.SSHClient ()
c.connect (...)

# get_pty=True so we emulate a tty and programs like vi and mc work
i, o, e= c.execute_command (command, get_pty=True)

So far so good, but the interface is those 3 objects, i, o and e, that represent the remote command's stdin, stdout and stderr. If one wants to fully implement a client, one needs to copy everything from the local process' standard streams to those.

For this, the most brute force approach is to create a thread for each pair of streams[1]:

class CopyThread (Thread):
    def __init__ (self, src, dst):
        super ().__init__ ()
        self.src= src
        self.dst= dst

    def run (self):
        while True:
            data= self.src.read (1024)
            if len (data)==0:
                break
            else:
                self.dst.write (data)

        self.close ()

    def close (self):
        self.src.close ()
        self.dst.close ()

This for some reason does not work out of the bat. When I implemented it in ayrton, what I got was that I didn't get anything from stdout or stderr until the remote code was finished. I tiptoed a little around the problem, but at the end I took cue from one of paramiko's examples and implemented a single copy loop with select():

class InteractiveThread (Thread):
    def __init__ (self, pairs):
        super ().__init__ ()
        self.pairs= pairs
        self.copy_to= dict (pairs)
        self.finished= os.pipe ()

    def run (self):
        while True:
            wait_for= list (self.copy_to.keys ())
            wait_for.append (self.finished[0])
            r, w, e= select (wait_for, [], [])

            if self.finished[0] in r:
                self.self.finished[0].close ()
                break

            for i in r:
                o= self.copy_to[i]
                data= i.read (1024)
                if len (data)==0:
                    # do not try to read any more from this file
                    del self.copy_to[i]
                else:
                    o.write (data)

        self.close ()


    def close (self):
        for k, v in self.pairs:
            for f in (k, v):
                 f.close ()

        self.finished[1].close ()


t= InteractiveThread (( (0, i), (o, 1), (e, 2) ))
t.start ()
[...]
t.close ()

The extra pipe, finished, is there to make sure we don't wait forever for stdin to finish.

This completely solves the problem of handling the streams, but that's not the only problem. The next step is to handle the fact that when we do some input via stdin, we see it twice. This is because both the local and the remote terminals are echoing what we type, so we just need to disable the local echoing. In fact, ssh does quite more than that:

class InteractiveThread (Thread):
    def __init__ (self, pairs):
        super ().__init__ ()

        [...]

        self.orig_terminfo= tcgetattr (pairs[0][0])
        # input, output, control, local, speeds, special chars
        iflag, oflag, cflag, lflag, ispeed, ospeed, cc= self.orig_terminfo

        # turn on:
        # Ignore framing errors and parity errors
        iflag|= IGNPAR
        # turn off:
        # Strip off eighth bit
        # Translate NL to CR on input
        # Ignore carriage return on input
        # XON/XOFF flow control on output
        # (XSI) Typing any character will restart stopped output. NOTE: not needed?
        # XON/XOFF flow control on input
        iflag&= ~( ISTRIP | INLCR | IGNCR | ICRNL | IXON | IXANY | IXOFF )

        # turn off:
        # When any of the characters INTR, QUIT, SUSP, or DSUSP are received, generate the corresponding signal
        # canonical mode
        # Echo input characters (finally)
        # NOTE: why these three? they only work with ICANON and we're disabling it
        # If ICANON is also set, the ERASE character erases the preceding input character, and WERASE erases the preceding word
        # If ICANON is also set, the KILL character erases the current line
        # If ICANON is also set, echo the NL character even if ECHO is not set
        # implementation-defined input processing
        lflag&= ~( ISIG | ICANON | ECHO | ECHOE | ECHOK | ECHONL | IEXTEN )

        # turn off:
        # implementation-defined output processing
        oflag&= ~OPOST

        # NOTE: whatever
        # Minimum number of characters for noncanonical read
        cc[VMIN]= 1
        # Timeout in deciseconds for noncanonical read
        cc[VTIME]= 0

        tcsetattr(self.pairs[0][0], TCSADRAIN, [ iflag, oflag, cflag, lflag,
                                                 ispeed, ospeed, cc ])


    def close (self):
        # reset term settings
        tcsetattr (self.pairs[0][0], TCSADRAIN, self.orig_terminfo)

        [...]

I won't pretend I understand all of that. Checking the file's history, I'm tempted to bet that neither the openssh developers do. I would even bet that it was taken from a telnet or rsh implementation or something. This is the kind of things I meant when I wrote my previous post about implementing these complex pieces of software as a library with a public API and a shallow frontend in the form of a program. At least the guys from openssh say that they're going in that direction. That's wonderful news.

Almost there. The last stone in the way is the terminal emulation. As is, SSHClient.execute_command() tells the other end that we're running in a 80x25 VT100 terminal. Unluckily the API does not allow us to set it by ourselves, but SSHClient.execute_command() is a very simple method that we can rewrite:

channel= c.get_transport ().open_session ()
term= shutil.get_terminal_size ()
channel.get_pty (os.environ['TERM'], term.columns, term.lines)

Reacting to SIGWINCH and changing the terminal's size is left as an exercise for the reader :)


[1] In fact this might seem slightly wasteful, as data has to be read into user space and then pushed down back to the kernel. The problem is that os.sendfile() only works if src is a kernel object that supports mmap(), which sockets don't, and even when splice() is available in a 3dr party module, one of the parameters must be a pipe. There is at least one huge thread spread over 4 or 5 kernel mailing lists discussing widening the applicability of splice(), but to be honest, I hadn't finished reading it.


python ayrton

Posted Wed 09 Dec 2015 11:49:03 AM CET Tags:

In my last job I had to do a complex Python script for merging several git histories into one. I used Python because I needed to do a lot of high level stuff I was sure bash would be a pain to use, like building trees and traversing them. My options for managing the underlaying git repositories were two: either do an ugly hack to execute git and parse its output; or use the ugly hack that already exists, called GitPython. The first was not an option, and the second meant that in some corner cases I had to rely on it just to execute particular git invocations. It was not pleasant, but it somehow worked.

While developing ayrton I'm using paramiko as the ssh client for implementing semi transparent remote code execution. The problem I have with it is that it's mostly aimed at executing commands almost blindly, with not much interaction. It only makes sense: its main client code is fabric, which mostly uses it in that context. ayrton aims to have a ssh client as transparent as the original openssh client, but bugs like this and this are in the way.

What those two situations have in common? Well, there are two incomplete Python libraries to emulate an existing program. At least in the case of GitPython they have a backdoor to call git directly. My complaint is not their incompleteness, far from it, but the fact that they have to do it from scratch. It's because of that that they're incomplete.

Take ayrton, for instance. It's mostly an executable that serves as an interpreter for scripts written in that language (dialect?), but it's implementation is so that the executable itself barely handles command line options and calls a library. That library implements everything that ayrton does for interpreting the language, to the point where most unit tests are using ayrton library for executing ayrton scripts. ayrton is not alone, others do similarly: fades, and at some point all those other Python modules like timeit or unittest.

So that's my wish for these Christmas, or Three Wise Men day[1], or my birthday next month; I would even accept it as an Easter Egg: have all these complex pieces of software implemented mainly as a public library (even if the API changed a lot, but that right now should be fairly stable) and very thin frontends as executables. I wish for libgit and libssh and their Python bindings.


[1] In my culture, kids get presents that day too.


python rants

Posted Wed 02 Dec 2015 05:03:24 PM CET Tags:

Last weekend I was at PyCon.ar at Mendoza, Argentina. As always, it was a good opportunity to find old and new friends; learn more about python, technology and more; and this time I even gave a talk.

I went to see several talks, but they were not recorded, so I have no links to videos to provide. The highlight for me was Argentina En Python's and DjangoGirl's Django tutorial. It was a very good taste of what the former are doing all over South America, which is simply incredible.

The talk I gave was actually heavily based/stolen from a talk by A. Jesse Jiryu Davis called How Do Python Coroutines Work?. I recommend you to watch it because it's amazing. That was one of the things I forgot to mention during the talk; the other one is that the classes Future and Task developed in the live-coding session[1] resemble a lot the ones the asyncio module offers, so their introduction is completely deliberate, even when they're swept under the rug quickly. Thanks to @hernantz, I was remembered of the article/book chapter that Davis and GvRossum wrote about asyncio.

I also used a lightning talk slot for promoting ayrton and showing a little Elevation. I just put the few slides online here.

So all in all, it was once more an amazing experience. Crossing my fingers, see you next year!


[1] The first rue about giving a talk is that you never do live coding. Some are just too stubborn...


python

Posted Sat 21 Nov 2015 04:39:25 PM CET Tags:

Almost two months since the last release, and with reason: I've been working hard to define remote()'s semantics. So far this is what there is:

  • Global and current scope's local variables can be used in the remote code. This includes envvars.
  • Changes to the local variables return to the local code.
  • Execution is done synchronously.

What this means is that if we had the following code:

[block 1]
with remote (...):
    [block 2]
[block 3]

we have the following:

  • Variables in block 1's scope are visible in block 2.
  • Modification to the local scope in block 2 are visible in block 3.
  • block 3 does not start executing until block 2 has finished.

This imposes some limitations on how we can communicate with the remote code. As it is synchronous, we can't expect to be able to send and receive data from block 3, so the previous way of communicating with paramiko's streams is no longer possible. On the other hand, stdin, stdout and stderr are not transmitted yet between the local and the remote, which means that actually no communication is posible right now except via variables.

Also, because of the way remote() is implemented, currently functions, classes and modules that are going to be used in the remote code must be declared/imported there.

Finally, exceptions raised in the remote code will be reraised in the local code. This two things together mean that any custom exception in the script must be declared twice if they're raised in the remote :(.

But all in all I'm happy with the new, defined semantics. I worked a lot to make sure the first two points worked properly. It took me a while to figure out how to do the changes to the scope's locals after the new values were returned from the remote. I know this is a very specific use case, but if you're interested, here's the thread were Armin Rigo tells me how it's done. You might also be interested in Yaniv Ankin's Python innards.

Finally, check the full ChangeLog:

  • Great improvements in remote()'s API and sematics
    • Made sure local variables go to and come back from the remote.
    • Code block is executed synchronously.
    • For the moment the streams are no longer returned.
    • _python_only option is gone.
    • Most tests actually connect to a listening netcat, only one test uses ssh.
  • Fixed bugs in the new parser.
  • Fixed globals/locals mix up.
  • Scripts are no longer wrapped in a function. This means that you can't return values and that module semantics are restored.
  • ayrton exits with status 1 when the script fails to run (SyntaxError, etc).

More fun times are coming!

Get it on github or pypi!


python ayrton

Posted Wed 28 Oct 2015 11:25:15 PM CET Tags:

I forgot to mention: last night I finally got to release ayrton-0.5. This has a major update to the language, thanks to our new parser, craftily thieved out of pypy. Other similar changes might come soon. Meanwhile, here's the ChangeLog:

  • Much better command detection.
  • CommandNotFound exception is now a subclass of NameError.
  • Allow Command keywords be named like -l and --long-option, so it supports options with single dashes (-long-option, à la find).
  • This also means that long-option is no longer passed as --long-option; you have to put the dashes explicitly.
  • bash() does not return a single string by default; override with single=True.
  • Way more tests.
  • Updated docs.

Get it on github or pypi! You can always find everything about ayrton in its GitHub page.


python ayrton

Posted Mon 31 Aug 2015 08:46:27 PM CEST Tags:

Nice tricks I found out trying to unfuck my laptop's setup, all my fault:

  • You can use snapshot.debian.org to recover packages for any date for any release that was available at that date. I actually new this, but somehow I forgot. I used deb http://snapshot.debian.org/archive/debian/20150720T214439Z/ testing main.

  • For that you have to disable the Packages-file-too-old check, which I have never seen, ever. Put this in any file in your /etc/apt.conf.d dir:

Acquire {
    Check-Valid-Until "false";
}
  • aptitude has a menu bar (activate with C-t), a preferences dialog, and you can set it up so any operation with a package moves down the cursor. Finally I figure that out.

  • It also has a dselect theme, but I was not brave enough to try it (for the record, I love dselect, I miss the fact that it shows how dependencies are resolved in the moment they're needed).

  • You can disable aptitude's resolver (-o Aptitude::ProblemResolver::StepLimit=0), but it doesn't make the UI that much more responsive (???).

  • digikam is not on testing right now. It FTBFS with gcc5 and has a licence problem.

  • Don't ride Debian sid right now, it's suffering a gcc transition and it might take a while.


debian

Posted Mon 31 Aug 2015 08:46:27 PM CEST Tags:

So, only two days later I already have not only (what looks like) a full parser, which has already landed in develop, I also implemented the first big change in the grammar and semantics: keywords are allowed mixed with positional parameters; in case of command execution, they're converted to positional options; in normal function calls they're just put where they belong.

In the future there will be more restrictive checks so the Python part of the language does not change, but right now I'm interested in adding more small changes like that. For instance, as I said before, allowing the options to have the right amount of hyphens (-o, -option or --option), because right now I have code that prefixes with -- to anything longer than 1 character. The alternative would be to have another _special_arg to handle that. And while I'm at it, also allow --long-options. This is only possible because there's an specific check in the code for that. Unluckily this does not mean I can do the same trick for executable names, so I still lack absolute and relative commands, and you still have to write osmpbf-outline as osmpbf_outline. Maybe I'll just depart a little more from the grammar and allow those, but I have to deep think about it (that is, let the problem be in the back of my head for a while). What I can also do is to allow to use several times the same option (git ('comit-tree', p='fae76fae7', p='7aa3f63', 'a6fa33428bda9832') is an example that comes to mind) because it's another check not really done by the grammar.

In any case, it's quite a leap in the language. I just need to test it a little more before doing the next release, which surely will be the 0.5. I'll keep you posted!


ayrton python

Posted Tue 25 Aug 2015 01:00:37 AM CEST Tags:

Having my own version of the python parser has proven, so far, to be clumsy and chaotic. Clumsy because it means that I need a special interpreter just to run my language (which in any case uses an interpreter!), chaotic because the building of such interpreter has proven to not work stably in different machines. This means that currently it only works for me.

Because of this and because I wanted even more control over the parser (who said allowing to write things like rsync(--help)?), I decided to check my options. A friend of mine, more used to playing with languages, suggested using pypy to create my own parser, but that just lead me a little further: why not outright 'steal' pypy's parser? After all, they have their own, which is also generated from Python's Python.adsl.

In fact it took me one hour to port the parser and a couple more porting the AST builder. This included porting them to Python3 (both by running 2to3 and then applying some changes by hand, notably dict.iteritems -> dict.items) and trying to remove as much dependency on the rest of pypy, specially from rpython.

The last step was to migrate from their own AST implementation to Python's, but here's where (again) I hit the last brick wall: the ast.AST class and subclasses are very special. They're implemented in C, but the Python API does not allow to create nodes with the line and column info. for a moment I contemplated the option of creating another extension (that is, written in C) to make those calls, but the the obvious solution came to mind: a massive replacement from:

return ast.ASTClass ([params], foo.lineno, foo.column)

into:

new_node = ast.ASTClass ([params])
new_node.lineno = foo.lineno
new_node.column = foo.column
return new_node

and some other similar changes. See here if you're really interested in all the details . I can only be grateful for regular expressions, capturing groups and editors that support both.

The following code is able to parse and dump a simple python script:

#! /usr/bin/env python3
import ast

from pypy.interpreter.pyparser import pyparse
from pypy.interpreter.astcompiler import astbuilder

info= pyparse.CompileInfo('setup.py', 'exec')
p= pyparse.PythonParser(None)
t= p.parse_source (open ('setup.py').read(), info)
a= astbuilder.ast_from_node (None, t, info)

print (ast.dump (a))

The result is the following (formatted by hand):

Module(body=[
    ImportFrom(module='distutils.core', names=[alias(name='setup', asname=None)], level=0),
    Import(names=[alias(name='ayrton', asname=None)]),
    Expr(value=Call(func=Name(id='setup', ctx=<class '_ast.Load'>), args=None, keywords=[
        keyword(arg='name', value=Str(s='ayrton')),
        keyword(arg='version', value=Attribute(value=Name(id='ayrton', ctx=<class '_ast.Load'>), attr='__version__', ctx=<class '_ast.Load'>)),
        keyword(arg='description', value=Str(s='a shell-like scripting language based on Python3.')),
        keyword(arg='author', value=Str(s='Marcos Dione')),
        keyword(arg='author_email', value=Str(s='mdione@grulic.org.ar')),
        keyword(arg='url', value=Str(s='https://github.com/StyXman/ayrton')),
        keyword(arg='packages', value=List(elts=[Str(s='ayrton')], ctx=<class '_ast.Load'>)),
        keyword(arg='scripts', value=List(elts=[Str(s='bin/ayrton')], ctx=<class '_ast.Load'>)),
        keyword(arg='license', value=Str(s='GPLv3')),
        keyword(arg='classifiers', value=List(elts=[
            Str(s='Development Status :: 3 - Alpha'),
            Str(s='Environment :: Console'),
            Str(s='Intended Audience :: Developers'),
            Str(s='Intended Audience :: System Administrators'),
            Str(s='License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)'), Str(s='Operating System :: POSIX'),
            Str(s='Programming Language :: Python :: 3'),
            Str(s='Topic :: System'),
            Str(s='Topic :: System :: Systems Administration')
        ],
        ctx=<class '_ast.Load'>))
    ], starargs=None, kwargs=None))
])

The next steps are to continue removing references to pypy code, and make sure it can actually parse all possible code. Then I should revisit the harcoded limitations in the parser (in particular in this loop and then be able to freely format program calls :).

Interesting times are arriving to ayrton!

Update: fixed last link. Thanks nueces!


python ayrton

Posted Sun 23 Aug 2015 12:13:09 AM CEST Tags:

Soon I'll be changing jobs, going from one MegaCorp to another. The problem is, my current workplace already has a silly security policy that does not allow you to use IRC or do HTTP against a dynamic DNS/IP (like the one at home), but happily lets you use webmails through which you can send anyone the company's IP without leaving much trace. Furthermore, my next assignment will have stricter Internet policy, so I finally sit down to see alternatives to have more traffic with the less footprint.

As I already mentioned, back home I have ssh listening on port 443 (and the port forwarded from the router to the server), and this worked for a while. Then these connections were shutdown, so I used stunnel on the server and openssl s_client plus some ssh config magic to go over that. This allowed me to use screen and irssi to do IRC and that was enough for a while. This meant I could talk to the communities around the tools and libs we were using.

But now I plan to change the way I do my mail. So far the setup includes using fetchmail to bring everything to that server, then use dovecot and/or a webmail to check from anywhere. But as ports are filtered and I already use 443 for ssh, I can't connect to IMAPS and I don't want to use something like sslh to multiple ssh and https on the same port because it sounds to ohacky, I turned towards SOCKS proxying.

Setting up a SOCKS proxy through ssh is simple. Most of the tutorials you'll find online use putty, but here I'll show how to translate those to the CLI client:

Host home
    Hostname www.xxx.yyy.zzz  # do not even do a DNS req; the IP is mostly static for me
    Port 443
    <span class="createlink">ProxyCommand</span> openssl s_client -connect %h:%p -quiet 2>/dev/null
    <span class="createlink">DynamicForward</span> 9050  # this is the line that gives you a SOCKS proxy

Then the next step is to configure each of your clients to use it. Most clients have an option for that, but when not, you need a proxyfier. For instance, even when KDE has a global setting for the SOCKS proxy, kopete does not seem to honor it. These proxifyers work by redirecting any connect(), gethostbyname() and most probably others to the SOCKS proxy. One of the best sources for SOCKS configuration is TOR's wiki, which heavily relies on SOCKS proxies, but right now the proxyfier they suggest (dante-client) does not install on my Debian setup, so I went with proxychains. Its final config is quite simple:

# Strict - Each connection will be done via chained proxies
# all proxies chained in the order as they appear in the list
# all proxies must be online to play in chain
# otherwise EINTR is returned to the app
strict_chain

# Proxy DNS requests - no leak for DNS data
proxy_dns

# Some timeouts in milliseconds
tcp_read_time_out 15000
tcp_connect_time_out 8000

[ProxyList]
# defaults set to "tor"
socks5  127.0.0.1 9050

In fact, that's the default config with only one modification: the SOCKS protocol is forced to 5, so we can do DNS requests with its UDP support.

With this simple setup I managed to connect to my XMMP server with kopete, which is already a lot. Next step will be to figure out the mail setup and I can call this done.


sysadmin piercing

Posted Sun 09 Aug 2015 07:37:21 PM CEST Tags:

I forgot to mention: I did a couple of ayrton releases, one more than a month ago and another a couple of days ago. One thing to notice is that even when 0.4.4 introduces an incompatible change (source() is no more), I didn't bump the minor or major version, as the level of usage is practically null. Here's the combined changelog:

  • source() is out. use python's import system.
  • Support executing foo.py().
  • Let commands handle SIGPIE and SIGINT. Python does funky things to them.
  • for line in foo(): ... forces Capture'ing the output.
  • Fix remote() a little. The API stills sucks.
  • Fix remote() tests.

Get it on github or pypi!


python ayrton

Posted Sat 23 May 2015 02:12:38 PM CEST Tags:

As I mentioned in one of my lasts posts about it, I started using SRTM 1 arc data for rendering my maps. So far, with the 3 arc dataset, I was able to generate a huge image by stitching the tiles with gdal_merge.py, then generating the 3 final images (height, slope and hill shade) plus one intermediary for the slope, all with gdaldem. Now this is no longer possible, as the new dataset is almost 10x the old one, so instead of going that way, I decided to try another one.

With gdalbuildvrt it is possible to generate an XML file that 'virtually' stitches images. This means that any attempt to access data through this file will actually make the library (libgdal) to find the proper tile(s) and access them directly.

So now the problem becomes processing each tile individually and then virtual stitching them. The first part is easy, as I just need to do the same I was doing to the huge stitched image before. I also took the opportunity to use tiled files, which instead of storing the image one scan line at a time (being 1 arc second resolution, each scan line has 3601 pixels; the extra one is for overlapping with the neighbors), it stores the file in 256x256 sub-tiles, possibly (that is, no tested) making rendering faster by clustering related data closer. The second step, with gdalbuildvrt, should also be easy.

The first block on the way is the fact that SRTM tiles above 50°N are only 1801 pixels wide, most posibly because it makes no sense anyways. This meant that I had to preprocess the original tiles so libgdal didn't have to do the interpolation at render time (in fact, it already has to do it once while rendering, using the lanczos scaling algorithm). This was done with gdalwarp.

The second one came from slope and hill shading tiles. As the algortihm goes, it generates some 'fade out' values in the edges, and when libgdal was stitching them, I could see it as a line in the seam. This was fixed by passing -compute_edges to gdaldem.

Finally, for some reason gdalbuildvrt was generating some very strange .vrt files. The format of these files is more or less the following:

  • For each band in the source tiles it creates a band in the result.
    • For each source tile, it describes:
      • The source file
      • The source band
      • The size and tiling of the source (3601², 256²)
      • The rectangle we want from the source (0², 3601²)
      • The rectangle in the result (x, y, 3601²)

The problem I saw was some weird declararions of the rectangles in the result, where the coordinates or the sizes didn't match what I expected. I will try to figure this out with the GDAL poeple in the following weeks, but first I want to make sure that the source tiles are easily downloadable (so far I have only found download options through USGS' EarthExplorer, which requires you to be logged in in order to download tiles, which means that it is not very scriptable, so not very reproducible).

So for the moment I'm using my own .vrt file generator, completely not generic enough for release, but soon. I also took the opportunity to make the rectangles in the result non-overlapping, being just 3600² in size. I know that the generated file works because I'm also generating smaller samples of the resulting layers (again, height, slope and hill shading) for rendering smaller zoom levels.

The only remaining question about huge DEM datasets is contour generation. So far I had just generated contour lines for each tile and lived with the fact that they too look ugly at the seams.


gdal gis srtm

Posted Thu 30 Apr 2015 03:03:22 PM CEST Tags:

Q: How many Oregonians does it take to screw in a light bulb?
A:  Three.  One to screw in the light bulb and two to fend off all those
    Californians trying to share the experience.