I started watching PyCon's videos. One of the first ones I saw is Amber Brown's "How we do identity wrong". I think she[1] is right in raising not only the notion of not assuming things related to names, addresses and ID numbers, but also that you shouldn't be collecting information that you don't need; at some point, it becomes a liability.

In the same vein about assuming, I have more examples. One of them is deciding what language you show your site depending on what country the client connects form. I'm not a millennial (more like a transmillennial, if you push me to it), but I tend to go places. Every time I go to a new place, I get sites in new languages, but maps in US!

Today I wanted to book a hotel room. The hotel's site asked me where do I live, so I chose France. Fact is, for them country and language is the same thing (I wonder what would happen if I answer Schweiz/Suisse/Svizzera/Svizra), so I can't say that I live in France but prefer English, so I chose United Kingdom instead. Of course, this also meant that I got prices in GBP, not EUR, so I had to correct that one too. At least I could.

Later they asked me country of residence and nationality; when I chose italian, the country was set to Italia, even when I chose France first!

I leave you all with an anecdote. As I said, I lake to go places, most of the times with friends. Imagine the puzzled expression of the police officer that stopped us to find a car licensed in France, driven by an italian, with an argentinian, a spanish and a chilean passangers, crossing from Austria to Slovakia, listening to US music. I only forgot to put the GPS in japanese or something.

So, don't assume; if you assume, let the user change settings to their preferences, and don't ask for data you don't actually need. And please use the user's Accept-Language header; they have it for a reason.

[1] I think that's the pronoun she[1] said she[1] preferred. I'm sorry if I got that wrong.

python misc

Posted dom 17 jun 2018 18:06:01 CEST Tags: python

At work I'm writing an API using Django/DRF. Suddenly I had to write an application (just a few pages for calling a few endpoints), so I (ab)used DRF's Serializers to build them. One of the problems I faced while doing this was that DRF's ChoiceField accepts only a sequence with the values for the dropdown, unlike Django's, who also accepts callables. This means that once you gave it a set of values, it never ever changes, at least until you restart the application.

Unless, of course, you cheat. Or hack. Aren't those synonyms?

class UpdatedSequence:
    def __init__(self, update_func):
        self.update_func = update_func
        self.restart = True

        self.data = None
        self.index = 0

    def __iter__(self):
        # we're our own iterator
        return self

    def __next__(self):
        # if we're iterating from the beginning, call the function
        # and cache the result
        if self.restart:
            self.data = self.update_func()
            self.index = 0

            datum = self.data[self.index]
        except IndexError:
            # we reached the limit, start all over
            self.restart = True
            raise StopIteration
            self.index += 1
            self.restart = False

        return datum

This simple class tracks when you start iterating over it and calls the function you pass to obtain the data. Then it iterates over the result. When you reach the end, it marks it to start all over, so the next time you iterate over it, it will call the function again. The function you pass can be the all() method of a QuerySet or anything else that goes fetch data and returns an iterable.

In my case in particular, I also added a TimedCache so I don't read twice the db to fill two dropdown with the same info in the same form:

class TimedCache:
    '''A function wrapper that caches the result for a while.'''
    def __init__(self, f, timeout):
        self.f = f
        self.timeout = timeout
        self.last_executed = None
        self.cache = None
        self.__name__ = f.__name__ + ' (Cached %ds)' % timeout

    def __call__(self):
        now = time.monotonic()

        if self.cache is None or (now - self.last_executed) > self.timeout:
            self.cache = self.f()
            self.last_executed = now

        return self.cache

I hope this helps someone.

python django drf

Posted vie 19 ene 2018 13:18:03 CET Tags: python

I have a love and hate relantionship with regular expressions (regexps). On one side they're a very powerful tool for text processing, but on the other side of the coin, the most well known implementation is a language whose syntax is so dense, it's hard to read beyond the most basic phrases. This clashes with my intention of trying to make programs as readable as possible[1]. It's true that you can add comments and make your regexps span several lines so you can digest them more slowly, but to me it feels like eating dried up soup by the teaspoon directly from the package without adding hot water.

So I started reading regexps aloud and writing down how I describe them in natural language. This way, [a-z]+ becomes one or more of any of the letters between lowercase a and lowercase z, but of course this is way too verbose.

Then I picked up these descriptions and tried to come up with a series of names (in the Pyhton sense) that could be combined to build the same regexps. Even 'literate' programs are not really plain English, but a more condensed version, while still readable. Otherwise you end up with Perl, and not many think that's a good idea. So, that regexp becomes one_or_more(any_of('a-z')). As you can see, some regexp language can still be recognizable, but it's the lesser part.

So, dinant was born. It's a single source file module that implements that language and some other variants (any_of(['a-z'], times=[1, ]), etc). It also implements some prebuilt regexps for common constructs, like integer, a datetime() function that accepts strptime() patterns or more complex things like IPv4 or IP_port. Conforming I start using it in (more) real world examples (or issues are filed!), the language will slowly grow.

Almost accidentally, its constructive form brought along a nice feature: you can debug() your expression so you can find out the first sub expression that fails matching:

# this is a real world example!
In [1]: import dinant as d
In [2]: line = '''36569.12ms (cpu 35251.71ms)\n'''
# can you spot the error?
In [3]: render_time_re = ( d.bol + d.capture(d.float, name='wall_time') + 'ms ' +
...:                       '(cpu' + d.capture(d.float, name='cpu_time') + 'ms)' + d.eol )

In [4]: print(render_time_re.match(line))

In [5]: print(render_time_re.debug(line))
# ok, this is too verbose (I hope next version will be more human readable)
# but it's clear it's the second capture
Out[5]: '^(?P<wall_time>(?:(?:\\-)?(?:(?:\\d)+)?\\.(?:\\d)+|(?:\\-)?(?:\\d)+\\.|(?:\\-)?(?:\\d)+))ms\\ \\(cpu(?P<cpu_time>(?:(?:\\-)?(?:(?:\\d)+)?\\.(?:\\d)+|(?:\\-)?(?:\\d)+\\.|(?:\\-)?(?:\\d)+))'
# the error is that the text '(cpu' needs a space at the end

Of course, the project is quite simple, so there is no regexp optimizer, which means that the resulting regexpes are less readable than the ones you would had written by hand. The idea is that, besides debugging, you will never have to see them again.

Two features are in the backburner, and both are related. One is to make debugging easier by simply returning a representation of the original expression instead of the internal regexp used. That means, in the previous example, something like:

bol + capture(float, name='wall_time') + 'ms ' + '(cpu' + capture(float, name='cpu_time')

The second is that you can tell which types the different captured groups must convert to. This way, capture(float) would not return the string representing the float, but the actual float. The same for datetime() and others.

As the time of writing the project only lives on GitHub, but it will also be available in PyPI Any Time Soon®. Go grab it!

python ayrton

[1] for someone that knows how to read English, that is.

Posted mié 18 oct 2017 19:42:37 CEST Tags: python

I'm writing a python module that allows me to 'drive' a site using Qt. This means that I can navigate the site, fill forms, submit them and read the resulting pages and scrape them, Selenium style. The reasons I'm using Qt are that it has enough support for the site I'm driving (it's the web frontend of the SIP telephony solution we're using, which has an incomplete API and I have to automatize several aspects not covered by it); there are python bindings; and because I can do it headless: instead of using browser instances, I simply instanciate one QWebPage[1] per thread and that's it.

The first thing I learned today is that JS objects representing the DOM elements have two sets of value holders: attributes and properties. The properties is what in Python we call attributes: the object's elements which are accesible with the '.' operator and hold instance values. The attributes are in fact the HTML element's attributes that gave the properties' initial values. That is, given the following HTML element:

<input type="text" name="foo° id="bar" value="quux">

the initial JS object's attributes and properties will have those values. If you change the value with your browser, the value property of that element will be changed, but not the attribute. When you submit the form, the value properties of all the form elements are used, so if you "only' change the value attribute, that won't be used. So forget attributes. Also, the DOM is the representation of the actual state of the page, but this state is never reflected in the HTML source that you can ask your browser to show, but you see those changes reflected in the browser's debugger. It's like they really wanted[3] to keep initial values apart from current state[2].

On the Qt side, QWebElement is only the DOM element representation, not the JS object[4], so you can't access the properties via its API, but by executing JS[5]:

e = DOMRoot.findFisrt('[name="foo"]')
e.evaluateJavaScript("this.value = 'abracadabra'")

Tonight I finished fixing the most annoying bug I had with this site. To add a user I have to fill a form that is split in 7 'tabs' (which means 7 <div>s with fields where only one is shown at a time). One of the fields on the second tab has a complex JS interaction and I was cracking my skull trying to make it work. Because the JS is reacting to key presses, setting the value property was not triggering it. Next I tried firing a KeyboardEvent in JS, but I didn't succeed. Maybe it was me, maybe the fact that the engine behind QWebPage is the original Webkit and for some reason its JS support is lacking there, who knows.

But the good guys from #qtwebkit gave me a third option: just send plain QKeyEvents to the input element. Luckily we can do that, the web engine is completely built in Qt and supports its event system and more. I only had to give focus to the widget.

Again, I tried with JS and failed[7], so I went back cheating with Qt behind curtains. QWebElemnt.geometry() returns the QRect of the QWidget that implements the input element; I just took the .center() of it, and generated a pair of mouse button press/release events in that point. One further detail is that the .geometry() won't be right unless I force the second tab to be shown, forcing the field to be drawn. Still, for some reason getting a reference to the input field on page load (when I'm trying to figure out which fields are available, which in the long run does not make sense, as fields could easily be created or destroyed on demand with JS) does not return an object that will be updated after the widget is repositioned, so asking its geometry returns ((0, -1), (-1, 0)), which amounts to an invalid geometry. The solution is to just get the reference to the input field after forcing the div/tab to be shown.

Finally, I create a pair of key press/release events for each character of the string I wanted as value, and seasoned everything with a lot of QMainLoop.processEvents(). Another advantage of using the Qt stuff is that while I was testing I could plug a QWebView, sprinkle some time.sleep() of various lengths, and see how it behaved. Now I can simply remove that to be back to headlessness.

I'm not sure I'll publish the code; as you can see, it's quite hacky and it will require a lot of cleanup to be able to publish it without a brown paper bag in my head.

[1] Yes, I'm using qt5.5 because that's what I will have available in the production server.

[2] Although as I said, you can change the attributes and so you lose the original values.

[3] I guess the answer is in in the spec.

[4] I think i got it: QWebElement is the C++ class that is used in WebKit to represent the HTML tree, the real DOM, while somewhere deeper in there are the classes representing the JS objects which you just can't reach[6].

[5] This clearly shows that there is a connection between the DOM object and the JS one, you just can't access it via the API.

[6] This is the original footnote: Or something like that. Look, I'm an engineer and I usually want to know how things work, but since my first exposure to HTML, CSS and JS, back in the time when support was flaky and fragmented on purpose, I always wanted to stay as far away from them as possible. Things got much better, but as you can see the details are still somewhat obscure. I guess, I hope the answer is in the spec.

[7] With this I mean that I executed something and it didn't trigger the events it should, and there's no practical way to figure out why.

python pyqt

Posted mar 24 ene 2017 20:25:26 CET Tags: python

Last night I realized the first point. Checking today I found the latter. Early, often, go!

  • ayrton-0.9 has debug on. It will leave lots of files laying around your file system.
  • Modify the release script to do not allow this never ever more.
  • make install was not running the tests.

Get it on github or pypi!

python ayrton

Posted mié 07 dic 2016 14:10:40 CET Tags: python

Another release, but this time not (only) a bugfix one. After playing with bool semantics I converted the file tests from a _X format, which, let's face it, was not pretty, into the more usual -X format. This alone merits a change in the minor version number. Also, _in, _out and _err also accept a tuple (path, flags), so you can specify things like os.O_APPEND.

In other news, I had to drop support for Pyhton-3.3, because otherwise I would have to complexify the import system a lot.

But in the end, yes, this also is a bugfix release. Lost of fd leaks where plugged, so I suggest you to upgrade if you can. Just remember the s/_X/-X/ change. I found all the leaks thanks to unitest's warnings, even if sometimes they were a little misleading:

testRemoteCommandStdout (tests.test_remote.RealRemoteTests) ... ayrton/parser/pyparser/parser.py:175: <span class="createlink">ResourceWarning</span>: unclosed <socket.socket fd=5, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, raddr=/tmp/ssh-XZxnYoIQxZX9/agent.7248>
  self.stack[-1] = (dfa, next_state, node)

The file and line cited in the warning have nothing to do with the warning itself (it was not the one who raised it) or the leaked fd, so it took me a while to find were those leaks were coming from. I hope I have some time to find why this is so. The most frustrating thing was that unitest closes the leaking fd, which is nice, but in one of the test cases it was closing it seemingly before the test finished, and the test failed because the socket was closed:

ERROR: testLocalVarToRemoteToLocal (tests.test_remote.RealRemoteTests)
Traceback (most recent call last):
File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 225, in wrapper
    test (self)
File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 235, in testLocalVarToRemoteToLocal
    self.runner.run_file ('ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay')
File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 304, in run_file
    return self.run_script (script, file_name, argv, params)
File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 323, in run_script
    return self.run_tree (tree, file_name, argv, params)
File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 336, in run_tree
    return self.run_code (code, file_name, argv)
File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 421, in run_code
    raise error
File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 402, in run_code
    exec (code, self.globals, self.locals)
File "ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay", line 6, in <module>
    with remote ('', _test=True):
File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 362, in __enter__
    i, o, e= self.prepare_connections (backchannel_port, command)
File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 270, in prepare_connections
    self.client.connect (self.hostname, *self.args, **self.kwargs)
File "/usr/lib/python3/dist-packages/paramiko/client.py", line 338, in connect
File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 493, in start_client
    raise e
File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1757, in run
    self.kex_engine.parse_next(ptype, m)
File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 75, in parse_next
    return self._parse_kexdh_reply(m)
File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 112, in _parse_kexdh_reply
File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 2079, in _activate_outbound
File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1566, in _send_message
File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 364, in send_message
File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 314, in write_all
    raise EOFError()

This probably has something to do with the fact that the test (a functional test, really) is using threads and real sockets. Again, I'll try to investigate this.

All in all, the release is an interesting one. I'll keep adding small features and releasing, let's see how it goes. Meanwhile, here's the changelog:

  • The 'No Government' release.
  • Test functions are no longer called _X but -X, which is more scripting friendly.
  • Some if those tests had to be fixed.
  • Dropped support for py3.3 because the importer does not work there.
  • tox support, but not yet part of the stable test suite.
  • Lots and lots of more tests.
  • Lots of improvements in the remote() tests; in particular, make sure they don't hang waiting for someone who's not gonna come.
  • Ignore ssh remote() tests if there's not password/phrase-less connection.
  • Fixed several fd leaks.
  • _in, _out and _err also accept a tuple (path, flags), so you can specify things like os.O_APPEND. Mostly used internally.

Get it on github or pypi!

python ayrton

Posted mar 06 dic 2016 19:46:11 CET Tags: python

I'll keep this short. During the weekend I found a bug in ayrton. I fixed it in develop, and decided to make a release with it, because it was kind of a showstopper. It was the first time I decided to use ayrton for a oneliner. It was this one:

ayrton -c "rm(v=True, locate('.xvpics', _out=Capture))"

See, ayrton's native support for filenames with spaces makes it a perfect replacement for find and xargs and tools like that. That command simply finds all the files or directories called like .xvpics using locate and removes them. There is a little bit of magic where locate's output becomes rm's arguments, but probably not magic enough: _out=Capture has to be specified. We'll probably fix that in the near future.

So, enjoy the new release. It just fixes a couple of bugs, one of them directly related to this oneliner. Here's the changelog:

  • The 'Release From The Bus' release.
  • Bugfix release.
  • Argv should not be created with an empty list.
  • Missing dependencies.
  • Several typos.
  • Fix for _h().
  • Handle paramiko exceptions.
  • Calling ayrton -c <script> was failing because the file name properly was not properly (f|b)aked.
  • ayrton --version didn't work!

Get it on github or pypi!

Meanwhile, a little about its future. I have been working on ayrton on and off. Right now I'm gathering energy to modify pypy's Python parser so it supports py3.6's formatted string literals. With this I can later update ayrton's parser, which is based on pypy's. A part of it has been done, but then I run out of gas. I think FSLs are perfect for ayrton in its aim to replace shell script languages. In other news, there's a nasty remote() bug that I can't pin down. These two things might mean that there won't be a significant release for a while.

python ayrton

Posted lun 21 nov 2016 22:16:53 CET Tags: python

I was trying to modify ayrton so we could really have sh[1]-style file tests. In sh, they're defined as unary operators in the -X form[2], where X is a letter. For instance, -f foo returns true (0 in sh-peak) if foo is some kind of file. In ayrton I defined them as functions you could use, but the names sucked a little. -f was called _f() and so on. Part of the reason is, I think, that both python-sh and ayrton already do some -/_ manipulations in executable names, and part because I thought that -True didn't make any sense.

A couple of days ago I came with the idea that I could symply call the function f() and (ab)use the fact that - is a unary operator. The only detail was to make sure that - didn't change the truthiness of bools. In fact, it doesn't, but this surprised me a little, although it shouldn't have:

In [1]: -True
Out[1]: -1

In [2]: -False
Out[2]: 0

In [3]: if -True: print ('yes!')

In [4]: if -False: print ('yes!')

You see, the bool type was introduced in Python-2.3 all the way back in 2003. Before that, the concept of true was represented by any 'true' object, and most of the time as the integer 1; false was mostly 0. In Python-2.2.1, True and False were added to the builtins, but only as other names for 1 and 0. According the that page and the PEP, bool is a subtype of int so you could still do arithmetic operations like True+1 (!!!), but I'm pretty sure deep down below the just wanted to be retro compatible.

I have to be honest, I don't like that, or the fact that applying - to bools convert them to ints, so I decided to subclass bool and implement __neg__() in such a way that it returns the original value. And that's when I got the real surprise:

In [5]: class FalseBool (bool):
   ...:     pass
TypeError: type 'bool' is not an acceptable base type

Probably you didn't know (I didn't), but Python has such a thing as a 'final class' flag. It can only be used while defining classes in a C extension. It's a strange flag, because most of the classes have to declare it just to be subclassable; it's not even part of the default flags. Even more surprising, is that there are a lot of classes that are not subclassable: around 124 in Python-3.6, and only 84 that are subclassable.

So there you go. You learn something new every day. If you're curious, here's the final implementation of FalseBool:

class FalseBool:
    def __init__ (self, value):
        if not isinstance (value, bool):
            raise ValueError

        self.value= value

    def __bool__ (self):
        return self.value

    def __neg__ (self):
        return self.value

This will go in ayrton's next release, which I hope will be soon. I'm also working in implementing all of the different styles of expansion found in bash. I even seem to have found some bugs in it.

python ayrton

[1] I'm talking about the shell, not to confuse with python-sh.

[2] Well, there are a couple of infix binary operands in the form -XY.

Posted vie 21 oct 2016 18:17:46 CEST Tags: python

I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.

Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.

Lastly, as imagico mentioned in the first post about this hack, the algorithms are not fast, and I already made my computer start thrashing the disk swapping like Hell because pg hit a big polygon and started using lots of RAM to calculate its centerline. At least this time I can see how complex the polygons are before handing them to the code. As an initial benchmark, the original data for that changeset (I later simplified it with JOSM's tool) took 0.063927s in pg+gis and 0.004737s in the Python code. More test will come later.

Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!

openstreetmap gis python

Posted lun 29 ago 2016 20:06:39 CEST Tags: python

Like I said in my last post[1], I'm looking at last PyCon's videos. Here are my selected videos, in the order I saw them:

Ned Batchelder - Machete-mode debugging: Hacking your way out of a tight spot. In fact, I saw this twice.

Larry Hastings - Removing Python's GIL: The Gilectomy

The Report Of Twisted’s Death or: Why Twisted and Tornado Are Relevant In The Asyncio Age

How I built a power debugger out of the standard library and things I found on the internet

Davey Shafik - HTTP/2 and Asynchronous APIs

Sumana Harihareswara - HTTP Can Do That?! Points for informative and funny.

Matthias Kramm - Python Typology Types are comming, so get used to them.

Scott Sanderson, Joe Jevnik - Playing with Python Bytecode Nice, very nice trick. I'm talking about the way the presentation is given.

Brian Warner - Magic Wormhole - Simple Secure File Transfer

And of course, the lighning talks. I always like these, because you can get exposed to any kind of things, some not even remotely connected to Python, but which can get your brain rolling down nice little bunny holes, or at least get a smile from you. So here:

LT#1. Please watch it at least between 20-25m.




And of course, check the other ones, don't stop at my own interests.

[1] Yes, I started writing this a month ago.


Posted vie 19 ago 2016 17:35:45 CEST Tags: python