Implementing an ssh client in Python

One of ayrton's features is the remote execution of code and programs via ssh. For this I initially used paramiko, which is a complete reimplementation of the ssh protocol in pure Python. It manages to connect, authenticate and create channels and port forwardings with any recent ssh server, and is quite easy:

import paramiko

c= paramiko.SSHClient ()
c.connect (...)

# get_pty=True so we emulate a tty and programs like vi and mc work
i, o, e= c.execute_command (command, get_pty=True)

So far so good, but the interface is those 3 objects, i, o and e, that represent the remote command's stdin, stdout and stderr. If one wants to fully implement a client, one needs to copy everything from the local process' standard streams to those.

For this, the most brute force approach is to create a thread for each pair of streams1:

class CopyThread (Thread):
    def __init__ (self, src, dst):
        super ().__init__ ()
        self.src= src
        self.dst= dst

    def run (self):
        while True:
            data= self.src.read (1024)
            if len (data)==0:
                break
            else:
                self.dst.write (data)

        self.close ()

    def close (self):
        self.src.close ()
        self.dst.close ()

This for some reason does not work out of the bat. When I implemented it in ayrton, what I got was that I didn't get anything from stdout or stderr until the remote code was finished. I tiptoed a little around the problem, but at the end I took cue from one of paramiko's examples and implemented a single copy loop with select():

class InteractiveThread (Thread):
    def __init__ (self, pairs):
        super ().__init__ ()
        self.pairs= pairs
        self.copy_to= dict (pairs)
        self.finished= os.pipe ()

    def run (self):
        while True:
            wait_for= list (self.copy_to.keys ())
            wait_for.append (self.finished[0])
            r, w, e= select (wait_for, [], [])

            if self.finished[0] in r:
                self.self.finished[0].close ()
                break

            for i in r:
                o= self.copy_to[i]
                data= i.read (1024)
                if len (data)==0:
                    # do not try to read any more from this file
                    del self.copy_to[i]
                else:
                    o.write (data)

        self.close ()


    def close (self):
        for k, v in self.pairs:
            for f in (k, v):
                 f.close ()

        self.finished[1].close ()


t= InteractiveThread (( (0, i), (o, 1), (e, 2) ))
t.start ()
[...]
t.close ()

The extra pipe, finished, is there to make sure we don't wait forever for stdin to finish.

This completely solves the problem of handling the streams, but that's not the only problem. The next step is to handle the fact that when we do some input via stdin, we see it twice. This is because both the local and the remote terminals are echoing what we type, so we just need to disable the local echoing. In fact, ssh does quite more than that:

class InteractiveThread (Thread):
    def __init__ (self, pairs):
        super ().__init__ ()

        [...]

        self.orig_terminfo= tcgetattr (pairs[0][0])
        # input, output, control, local, speeds, special chars
        iflag, oflag, cflag, lflag, ispeed, ospeed, cc= self.orig_terminfo

        # turn on:
        # Ignore framing errors and parity errors
        iflag|= IGNPAR
        # turn off:
        # Strip off eighth bit
        # Translate NL to CR on input
        # Ignore carriage return on input
        # XON/XOFF flow control on output
        # (XSI) Typing any character will restart stopped output. NOTE: not needed?
        # XON/XOFF flow control on input
        iflag&= ~( ISTRIP | INLCR | IGNCR | ICRNL | IXON | IXANY | IXOFF )

        # turn off:
        # When any of the characters INTR, QUIT, SUSP, or DSUSP are received, generate the corresponding signal
        # canonical mode
        # Echo input characters (finally)
        # NOTE: why these three? they only work with ICANON and we're disabling it
        # If ICANON is also set, the ERASE character erases the preceding input character, and WERASE erases the preceding word
        # If ICANON is also set, the KILL character erases the current line
        # If ICANON is also set, echo the NL character even if ECHO is not set
        # implementation-defined input processing
        lflag&= ~( ISIG | ICANON | ECHO | ECHOE | ECHOK | ECHONL | IEXTEN )

        # turn off:
        # implementation-defined output processing
        oflag&= ~OPOST

        # NOTE: whatever
        # Minimum number of characters for noncanonical read
        cc[VMIN]= 1
        # Timeout in deciseconds for noncanonical read
        cc[VTIME]= 0

        tcsetattr(self.pairs[0][0], TCSADRAIN, [ iflag, oflag, cflag, lflag,
                                                 ispeed, ospeed, cc ])


    def close (self):
        # reset term settings
        tcsetattr (self.pairs[0][0], TCSADRAIN, self.orig_terminfo)

        [...]

I won't pretend I understand all of that. Checking the file's history, I'm tempted to bet that neither the openssh developers do. I would even bet that it was taken from a telnet or rsh implementation or something. This is the kind of things I meant when I wrote my previous post about implementing these complex pieces of software as a library with a public API and a shallow frontend in the form of a program. At least the guys from openssh say that they're going in that direction. That's wonderful news.

Almost there. The last stone in the way is the terminal emulation. As is, SSHClient.execute_command() tells the other end that we're running in a 80x25 VT100 terminal. Unluckily the API does not allow us to set it by ourselves, but SSHClient.execute_command() is a very simple method that we can rewrite:

channel= c.get_transport ().open_session ()
term= shutil.get_terminal_size ()
channel.get_pty (os.environ['TERM'], term.columns, term.lines)

Reacting to SIGWINCH and changing the terminal's size is left as an exercise for the reader :)


  1. In fact this might seem slightly wasteful, as data has to be read into user space and then pushed down back to the kernel. The problem is that os.sendfile() only works if src is a kernel object that supports mmap(), which sockets don't, and even when splice() is available in a 3dr party module, one of the parameters must be a pipe. There is at least one huge thread spread over 4 or 5 kernel mailing lists discussing widening the applicability of splice(), but to be honest, I hadn't finished reading it. 

Can I haz libfoo

In my last job I had to do a complex Python script for merging several git histories into one. I used Python because I needed to do a lot of high level stuff I was sure bash would be a pain to use, like building trees and traversing them. My options for managing the underlaying git repositories were two: either do an ugly hack to execute git and parse its output; or use the ugly hack that already exists, called GitPython. The first was not an option, and the second meant that in some corner cases I had to rely on it just to execute particular git invocations. It was not pleasant, but it somehow worked.

While developing ayrton I'm using paramiko as the ssh client for implementing semi transparent remote code execution. The problem I have with it is that it's mostly aimed at executing commands almost blindly, with not much interaction. It only makes sense: its main client code is fabric, which mostly uses it in that context. ayrton aims to have a ssh client as transparent as the original openssh client, but bugs like this and this are in the way.

What those two situations have in common? Well, there are two incomplete Python libraries to emulate an existing program. At least in the case of GitPython they have a backdoor to call git directly. My complaint is not their incompleteness, far from it, but the fact that they have to do it from scratch. It's because of that that they're incomplete.

Take ayrton, for instance. It's mostly an executable that serves as an interpreter for scripts written in that language (dialect?), but it's implementation is so that the executable itself barely handles command line options and calls a library. That library implements everything that ayrton does for interpreting the language, to the point where most unit tests are using ayrton library for executing ayrton scripts. ayrton is not alone, others do similarly: fades, and at some point all those other Python modules like timeit or unittest.

So that's my wish for these Christmas, or Three Wise Men day1, or my birthday next month; I would even accept it as an Easter Egg: have all these complex pieces of software implemented mainly as a public library (even if the API changed a lot, but that right now should be fairly stable) and very thin frontends as executables. I wish for libgit and libssh and their Python bindings.


  1. In my culture, kids get presents that day too. 

satyr 0.3

I must be crazy or something. Instead of spending my last friday in France for this year partying in some bar I satyed home and produced another satyr release. The ChangeLog? Here, have some:

  • (De)Queueing Songs with the keyboard, with visual feedback.
  • Remembers window size.
  • Current song highlighted not with selection but with real color changes.
  • We can select Songs. Right now useful only for queuing several songs at the same time.
  • Workaround a bug in PyQt4 with the SeekBar.
  • Show the filepaths as much as possible in the user's encoding.
  • Hitting F2 in a cell edits its contents.
  • Silightly cleaner interface: don't show so many 0's.
  • bug with perv/next: weren't wrapping around.

Go get it from the Download area! I promise to party double tomorrorw...

Went to PyCon.ar 2015

Last weekend I was at PyCon.ar at Mendoza, Argentina. As always, it was a good opportunity to find old and new friends; learn more about python, technology and more; and this time I even gave a talk.

I went to see several talks, but they were not recorded, so I have no links to videos to provide. The highlight for me was Argentina En Python's and DjangoGirl's Django tutorial. It was a very good taste of what the former are doing all over South America, which is simply incredible.

The talk I gave was actually heavily based/stolen from a talk by A. Jesse Jiryu Davis called How Do Python Coroutines Work?. I recommend you to watch it because it's amazing. That was one of the things I forgot to mention during the talk; the other one is that the classes Future and Task developed in the live-coding session1 resemble a lot the ones the asyncio module offers, so their introduction is completely deliberate, even when they're swept under the rug quickly. Thanks to @hernantz, I was remembered of the article/book chapter that Davis and GvRossum wrote about asyncio.

I also used a lightning talk slot for promoting ayrton and showing a little Elevation. I just put the few slides online here.

So all in all, it was once more an amazing experience. Crossing my fingers, see you next year!


  1. The first rue about giving a talk is that you never do live coding. Some are just too stubborn... 

ayrton 0.6

Almost two months since the last release, and with reason: I've been working hard to define remote()'s semantics. So far this is what there is:

  • Global and current scope's local variables can be used in the remote code. This includes envvars.
  • Changes to the local variables return to the local code.
  • Execution is done synchronously.

What this means is that if we had the following code:

[block 1]
with remote (...):
    [block 2]
[block 3]

we have the following:

  • Variables in block 1's scope are visible in block 2.
  • Modification to the local scope in block 2 are visible in block 3.
  • block 3 does not start executing until block 2 has finished.

This imposes some limitations on how we can communicate with the remote code. As it is synchronous, we can't expect to be able to send and receive data from block 3, so the previous way of communicating with paramiko's streams is no longer possible. On the other hand, stdin, stdout and stderr are not transmitted yet between the local and the remote, which means that actually no communication is posible right now except via variables.

Also, because of the way remote() is implemented, currently functions, classes and modules that are going to be used in the remote code must be declared/imported there.

Finally, exceptions raised in the remote code will be reraised in the local code. This two things together mean that any custom exception in the script must be declared twice if they're raised in the remote :(.

But all in all I'm happy with the new, defined semantics. I worked a lot to make sure the first two points worked properly. It took me a while to figure out how to do the changes to the scope's locals after the new values were returned from the remote. I know this is a very specific use case, but if you're interested, here's the thread were Armin Rigo tells me how it's done. You might also be interested in Yaniv Ankin's Python innards.

Finally, check the full ChangeLog:

  • Great improvements in remote()'s API and sematics
    • Made sure local variables go to and come back from the remote.
    • Code block is executed synchronously.
    • For the moment the streams are no longer returned.
    • _python_only option is gone.
    • Most tests actually connect to a listening netcat, only one test uses ssh.
  • Fixed bugs in the new parser.
  • Fixed globals/locals mix up.
  • Scripts are no longer wrapped in a function. This means that you can't return values and that module semantics are restored.
  • ayrton exits with status 1 when the script fails to run (SyntaxError, etc).

More fun times are coming!

Get it on github or pypi!

ayrton 0.5

I forgot to mention: last night I finally got to release ayrton-0.5. This has a major update to the language, thanks to our new parser, craftily thieved out of pypy. Other similar changes might come soon. Meanwhile, here's the ChangeLog:

  • Much better command detection.
  • CommandNotFound exception is now a subclass of NameError.
  • Allow Command keywords be named like -l and --long-option, so it supports options with single dashes (-long-option, à la find).
  • This also means that long-option is no longer passed as --long-option; you have to put the dashes explicitly.
  • bash() does not return a single string by default; override with single=True.
  • Way more tests.
  • Updated docs.

Get it on github or pypi! You can always find everything about ayrton in its GitHub page.

Breaking off

Having my own version of the python parser has proven, so far, to be clumsy and chaotic. Clumsy because it means that I need a special interpreter just to run my language (which in any case uses an interpreter!), chaotic because the building of such interpreter has proven to not work stably in different machines. This means that currently it only works for me.

Because of this and because I wanted even more control over the parser (who said allowing to write things like rsync(--help)?), I decided to check my options. A friend of mine, more used to playing with languages, suggested using pypy to create my own parser, but that just lead me a little further: why not outright 'steal' pypy's parser? After all, they have their own, which is also generated from Python's Python.adsl.

In fact it took me one hour to port the parser and a couple more porting the AST builder. This included porting them to Python3 (both by running 2to3 and then applying some changes by hand, notably dict.iteritems -> dict.items) and trying to remove as much dependency on the rest of pypy, specially from rpython.

The last step was to migrate from their own AST implementation to Python's, but here's where (again) I hit the last brick wall: the ast.AST class and subclasses are very special. They're implemented in C, but the Python API does not allow to create nodes with the line and column info. for a moment I contemplated the option of creating another extension (that is, written in C) to make those calls, but the the obvious solution came to mind: a massive replacement from:

return ast.ASTClass ([params], foo.lineno, foo.column)

into:

new_node = ast.ASTClass ([params])
new_node.lineno = foo.lineno
new_node.column = foo.column
return new_node

and some other similar changes. See here if you're really interested in all the details . I can only be grateful for regular expressions, capturing groups and editors that support both.

The following code is able to parse and dump a simple python script:

#! /usr/bin/env python3
import ast

from pypy.interpreter.pyparser import pyparse
from pypy.interpreter.astcompiler import astbuilder

info= pyparse.CompileInfo('setup.py', 'exec')
p= pyparse.PythonParser(None)
t= p.parse_source (open ('setup.py').read(), info)
a= astbuilder.ast_from_node (None, t, info)

print (ast.dump (a))

The result is the following (formatted by hand):

Module(body=[
    ImportFrom(module='distutils.core', names=[alias(name='setup', asname=None)], level=0),
    Import(names=[alias(name='ayrton', asname=None)]),
    Expr(value=Call(func=Name(id='setup', ctx=<class '_ast.Load'>), args=None, keywords=[
        keyword(arg='name', value=Str(s='ayrton')),
        keyword(arg='version', value=Attribute(value=Name(id='ayrton', ctx=<class '_ast.Load'>), attr='__version__', ctx=<class '_ast.Load'>)),
        keyword(arg='description', value=Str(s='a shell-like scripting language based on Python3.')),
        keyword(arg='author', value=Str(s='Marcos Dione')),
        keyword(arg='author_email', value=Str(s='mdione@grulic.org.ar')),
        keyword(arg='url', value=Str(s='https://github.com/StyXman/ayrton')),
        keyword(arg='packages', value=List(elts=[Str(s='ayrton')], ctx=<class '_ast.Load'>)),
        keyword(arg='scripts', value=List(elts=[Str(s='bin/ayrton')], ctx=<class '_ast.Load'>)),
        keyword(arg='license', value=Str(s='GPLv3')),
        keyword(arg='classifiers', value=List(elts=[
            Str(s='Development Status :: 3 - Alpha'),
            Str(s='Environment :: Console'),
            Str(s='Intended Audience :: Developers'),
            Str(s='Intended Audience :: System Administrators'),
            Str(s='License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)'), Str(s='Operating System :: POSIX'),
            Str(s='Programming Language :: Python :: 3'),
            Str(s='Topic :: System'),
            Str(s='Topic :: System :: Systems Administration')
        ],
        ctx=<class '_ast.Load'>))
    ], starargs=None, kwargs=None))
])

The next steps are to continue removing references to pypy code, and make sure it can actually parse all possible code. Then I should revisit the harcoded limitations in the parser (in particular in this loop and then be able to freely format program calls :).

Interesting times are arriving to ayrton!

Update: fixed last link. Thanks nueces!

Using snapshot.debian.org for downgrading Debian packages

Nice tricks I found out trying to unfuck my laptop's setup, all my fault:

  • You can use snapshot.debian.org to recover packages for any date for any release that was available at that date. I actually new this, but somehow I forgot. I used deb http://snapshot.debian.org/archive/debian/20150720T214439Z/ testing main.

  • For that you have to disable the Packages-file-too-old check, which I have never seen, ever. Put this in any file in your /etc/apt.conf.d dir:

Acquire {
    Check-Valid-Until "false";
}
  • aptitude has a menu bar (activate with C-t), a preferences dialog, and you can set it up so any operation with a package moves down the cursor. Finally I figure that out.

  • It also has a dselect theme, but I was not brave enough to try it (for the record, I love dselect, I miss the fact that it shows how dependencies are resolved in the moment they're needed).

  • You can disable aptitude's resolver (-o Aptitude::ProblemResolver::StepLimit=0), but it doesn't make the UI that much more responsive (???).

  • digikam is not on testing right now. It FTBFS with gcc5 and has a licence problem.

  • Don't ride Debian sid right now, it's suffering a gcc transition and it might take a while.

I got myself a parser

So, only two days later I already have not only (what looks like) a full parser, which has already landed in develop, I also implemented the first big change in the grammar and semantics: keywords are allowed mixed with positional parameters; in case of command execution, they're converted to positional options; in normal function calls they're just put where they belong.

In the future there will be more restrictive checks so the Python part of the language does not change, but right now I'm interested in adding more small changes like that. For instance, as I said before, allowing the options to have the right amount of hyphens (-o, -option or --option), because right now I have code that prefixes with -- to anything longer than 1 character. The alternative would be to have another _special_arg to handle that. And while I'm at it, also allow --long-options. This is only possible because there's an specific check in the code for that. Unluckily this does not mean I can do the same trick for executable names, so I still lack absolute and relative commands, and you still have to write osmpbf-outline as osmpbf_outline. Maybe I'll just depart a little more from the grammar and allow those, but I have to deep think about it (that is, let the problem be in the back of my head for a while). What I can also do is to allow to use several times the same option (git ('comit-tree', p='fae76fae7', p='7aa3f63', 'a6fa33428bda9832') is an example that comes to mind) because it's another check not really done by the grammar.

In any case, it's quite a leap in the language. I just need to test it a little more before doing the next release, which surely will be the 0.5. I'll keep you posted!

Socks over ssh

Soon I'll be changing jobs, going from one MegaCorp to another. The problem is, my current workplace already has a silly security policy that does not allow you to use IRC or do HTTP against a dynamic DNS/IP (like the one at home), but happily lets you use webmails through which you can send anyone the company's IP without leaving much trace. Furthermore, my next assignment will have stricter Internet policy, so I finally sit down to see alternatives to have more traffic with the less footprint.

As I already mentioned, back home I have ssh listening on port 443 (and the port forwarded from the router to the server), and this worked for a while. Then these connections were shutdown, so I used stunnel on the server and openssl s_client plus some ssh config magic to go over that. This allowed me to use screen and irssi to do IRC and that was enough for a while. This meant I could talk to the communities around the tools and libs we were using.

But now I plan to change the way I do my mail. So far the setup includes using fetchmail to bring everything to that server, then use dovecot and/or a webmail to check from anywhere. But as ports are filtered and I already use 443 for ssh, I can't connect to IMAPS and I don't want to use something like sslh to multiple ssh and https on the same port because it sounds to ohacky, I turned towards SOCKS proxying.

Setting up a SOCKS proxy through ssh is simple. Most of the tutorials you'll find online use putty, but here I'll show how to translate those to the CLI client:

Host home
    Hostname www.xxx.yyy.zzz  # do not even do a DNS req; the IP is mostly static for me
    Port 443
    ProxyCommand openssl s_client -connect %h:%p -quiet 2>/dev/null
    DynamicForward 9050  # this is the line that gives you a SOCKS proxy

Then the next step is to configure each of your clients to use it. Most clients have an option for that, but when not, you need a proxyfier. For instance, even when KDE has a global setting for the SOCKS proxy, kopete does not seem to honor it. These proxifyers work by redirecting any connect(), gethostbyname() and most probably others to the SOCKS proxy. One of the best sources for SOCKS configuration is TOR's wiki, which heavily relies on SOCKS proxies, but right now the proxyfier they suggest (dante-client) does not install on my Debian setup, so I went with proxychains. Its final config is quite simple:

# Strict - Each connection will be done via chained proxies
# all proxies chained in the order as they appear in the list
# all proxies must be online to play in chain
# otherwise EINTR is returned to the app
strict_chain

# Proxy DNS requests - no leak for DNS data
proxy_dns

# Some timeouts in milliseconds
tcp_read_time_out 15000
tcp_connect_time_out 8000

[ProxyList]
# defaults set to "tor"
socks5  127.0.0.1 9050

In fact, that's the default config with only one modification: the SOCKS protocol is forced to 5, so we can do DNS requests with its UDP support.

With this simple setup I managed to connect to my XMMP server with kopete, which is already a lot. Next step will be to figure out the mail setup and I can call this done.