I'm writing a python module that allows me to 'drive' a site using Qt. This
means that I can navigate the site, fill forms, submit them and read the
resulting pages and scrape them, Selenium style. The reasons I'm using
Qt are that it has enough support for the site I'm driving (it's the web
frontend of the SIP telephony solution we're using, which has an incomplete API
and I have to automatize several aspects not covered by it); there are python bindings; and
because I can do it headless:
instead of using browser instances, I simply instanciate one
per thread and that's it.
The first thing I learned today is that JS objects representing the DOM elements have two sets of value holders: attributes and properties. The properties is what in Python we call attributes: the object's elements which are accesible with the '.' operator and hold instance values. The attributes are in fact the HTML element's attributes that gave the properties' initial values. That is, given the following HTML element:
<input type="text" name="foo° id="bar" value="quux">
the initial JS object's attributes and properties will have those values.
If you change the value with your browser, the
value property of that element
will be changed, but not the attribute.
When you submit the form, the
value properties of all the form elements are
used, so if you "only' change the
value attribute, that won't be used.
So forget attributes. Also, the DOM is the
representation of the actual state of the page, but this state is never reflected in
the HTML source that you can ask your browser to show, but you see those changes
reflected in the browser's debugger. It's like they really
wanted to keep initial values apart from current state.
On the Qt side,
is only the DOM element representation, not the JS object, so you
can't access the properties via its API, but by executing JS:
Tonight I finished fixing the most annoying bug I had with this site. To add a
user I have to fill a form that is split in 7 'tabs' (which means 7
with fields where only one is shown at a time). One of the fields on the second
tab has a complex
JS interaction and I was cracking my skull trying to make it work. Because the JS
is reacting to key presses, setting the
value property was not triggering it.
Next I tried
KeyboardEvent in JS,
but I didn't succeed. Maybe it was me, maybe the fact that the engine behind
QWebPage is the original Webkit and for some reason its JS support is lacking
there, who knows.
But the good guys from
#qtwebkit gave me a third option: just send plain
QKeyEvents to the input element. Luckily we can do that, the web engine is
completely built in Qt and supports its event system and more. I only had to
give focus to the widget.
Again, I tried with JS and failed, so I went back cheating with Qt
QWebElemnt.geometry() returns the
QRect of the
that implements the input element; I just took the
.center() of it, and
generated a pair of mouse button press/release events in that point. One further
detail is that the
.geometry() won't be right unless I force the second tab to
be shown, forcing the field to be drawn. Still, for some reason getting a
reference to the input field on
page load (when I'm trying to figure out which fields are available, which in
the long run does not make sense, as fields could easily be created or destroyed
on demand with
JS) does not return an object that will be updated after the widget is
repositioned, so asking its geometry returns
((0, -1), (-1, 0)), which amounts to
an invalid geometry. The solution is to just get the reference to the input field
after forcing the div/tab to be shown.
Finally, I create a
pair of key press/release events for each character of the string I wanted as
value, and seasoned everything with a lot of
Another advantage of using the Qt stuff is that while I was testing I could plug
QWebView, sprinkle some
time.sleep() of various lengths, and see how it
behaved. Now I can simply remove that to be back to headlessness.
I'm not sure I'll publish the code; as you can see, it's quite hacky and it will require a lot of cleanup to be able to publish it without a brown paper bag in my head.
 Yes, I'm using qt5.5 because that's what I will have available in the production server.
 Although as I said, you can change the attributes and so you lose the original values.
 I guess the answer is in in the spec.
 I think i got it:
QWebElement is the C++ class that is used in
to represent the HTML tree, the real DOM, while somewhere deeper in
there are the classes representing the JS objects which you just can't reach.
 This clearly shows that there is a connection between the DOM object and the JS one, you just can't access it via the API.
 This is the original footnote: Or something like that. Look, I'm an engineer and I usually want to know how things work, but since my first exposure to HTML, CSS and JS, back in the time when support was flaky and fragmented on purpose, I always wanted to stay as far away from them as possible. Things got much better, but as you can see the details are still somewhat obscure. I guess, I hope the answer is in the spec.
 With this I mean that I executed something and it didn't trigger the events it should, and there's no practical way to figure out why.
Last night I realized the first point. Checking today I found the latter. Early, often, go!
ayrton-0.9has debug on. It will leave lots of files laying around your file system.
- Modify the release script to do not allow this never ever more.
make installwas not running the tests.
Another release, but this time not (only) a bugfix one. After
I converted the file tests from a
_X format, which, let's face it, was not pretty,
into the more usual
-X format. This alone merits a change in the minor version
_err also accept a tuple
(path, flags), so
you can specify things like
In other news, I had to drop support for Pyhton-3.3, because otherwise I would have to complexify the import system a lot.
But in the end, yes, this also is a bugfix release. Lost of fd leaks where
plugged, so I suggest you to upgrade if you can. Just remember the
change. I found all the leaks thanks to
unitest's warnings, even if sometimes
they were a little misleading:
testRemoteCommandStdout (tests.test_remote.RealRemoteTests) ... ayrton/parser/pyparser/parser.py:175: <span class="createlink">ResourceWarning</span>: unclosed <socket.socket fd=5, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, raddr=/tmp/ssh-XZxnYoIQxZX9/agent.7248> self.stack[-1] = (dfa, next_state, node)
The file and line cited in the warning have nothing to do with the warning
itself (it was not the one who raised it) or the leaked fd, so it took me a while
to find were those leaks were coming from. I hope I have some time to find why
this is so. The most frustrating thing was that
unitest closes the leaking fd,
which is nice, but in one of the test cases it was closing it seemingly before the
test finished, and the test failed because the socket was closed:
====================================================================== ERROR: testLocalVarToRemoteToLocal (tests.test_remote.RealRemoteTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 225, in wrapper test (self) File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 235, in testLocalVarToRemoteToLocal self.runner.run_file ('ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay') File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 304, in run_file return self.run_script (script, file_name, argv, params) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 323, in run_script return self.run_tree (tree, file_name, argv, params) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 336, in run_tree return self.run_code (code, file_name, argv) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 421, in run_code raise error File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 402, in run_code exec (code, self.globals, self.locals) File "ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay", line 6, in <module> with remote ('127.0.0.1', _test=True): File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 362, in __enter__ i, o, e= self.prepare_connections (backchannel_port, command) File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 270, in prepare_connections self.client.connect (self.hostname, *self.args, **self.kwargs) File "/usr/lib/python3/dist-packages/paramiko/client.py", line 338, in connect t.start_client() File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 493, in start_client raise e File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1757, in run self.kex_engine.parse_next(ptype, m) File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 75, in parse_next return self._parse_kexdh_reply(m) File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 112, in _parse_kexdh_reply self.transport._activate_outbound() File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 2079, in _activate_outbound self._send_message(m) File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1566, in _send_message self.packetizer.send_message(data) File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 364, in send_message self.write_all(out) File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 314, in write_all raise EOFError() EOFError
This probably has something to do with the fact that the test (a functional test, really) is using threads and real sockets. Again, I'll try to investigate this.
All in all, the release is an interesting one. I'll keep adding small features and releasing, let's see how it goes. Meanwhile, here's the changelog:
- The 'No Government' release.
- Test functions are no longer called
-X, which is more scripting friendly.
- Some if those tests had to be fixed.
- Dropped support for
py3.3because the importer does not work there.
toxsupport, but not yet part of the stable test suite.
- Lots and lots of more tests.
- Lots of improvements in the
remote()tests; in particular, make sure they don't hang waiting for someone who's not gonna come.
- Ignore ssh
remote()tests if there's not password/phrase-less connection.
- Fixed several fd leaks.
_erralso accept a tuple
(path, flags), so you can specify things like
os.O_APPEND. Mostly used internally.
I'll keep this short. During the weekend I found a bug in
ayrton. I fixed it
develop, and decided to make a release with it, because it was kind of a
showstopper. It was the first time I decided to use
ayrton for a oneliner.
It was this one:
ayrton -c "rm(v=True, locate('.xvpics', _out=Capture))"
ayrton's native support for filenames with spaces makes it a perfect
xargs and tools like that. That command simply finds
all the files or directories called like
locate and removes them. There is a
little bit of magic where
locate's output becomes
rm's arguments, but probably
not magic enough:
_out=Capture has to be specified. We'll probably fix that
in the near future.
So, enjoy the new release. It just fixes a couple of bugs, one of them directly related to this oneliner. Here's the changelog:
- The 'Release From The Bus' release.
- Bugfix release.
Argvshould not be created with an empty list.
- Missing dependencies.
- Several typos.
- Fix for
ayrton -c <script>was failing because the file name properly was not properly (f|b)aked.
ayrton --versiondidn't work!
Meanwhile, a little about its future. I have been working on
ayrton on and off.
Right now I'm gathering energy to modify
pypy's Python parser so it supports
py3.6's formatted string literals. With this I can later update
which is based on
pypy's. A part of it has been done, but then I run out of gas.
I think FSLs are perfect for
ayrton in its aim to replace shell script languages.
In other news, there's a nasty
remote() bug that I can't pin down. These two
things might mean that there won't be a significant release for a while.
I was trying to modify
ayrton so we could really have
sh-style file tests.
sh, they're defined as unary operators in the
-X form, where
X is a
letter. For instance,
-f foo returns true (
0 in sh-peak) if
foo is some
kind of file. In
ayrton I defined them as functions you could use, but the
names sucked a little.
-f was called
_f() and so on. Part of the reason is,
I think, that both
ayrton already do some
in executable names, and part because I thought that
-True didn't make any
A couple of days ago I came with the idea that I could symply call the function
f() and (ab)use the fact that
- is a unary operator. The only detail was to
make sure that
- didn't change the truthiness of
bools. In fact, it doesn't,
but this surprised me a little, although it shouldn't have:
In : -True Out: -1 In : -False Out: 0 In : if -True: print ('yes!') yes! In : if -False: print ('yes!')
You see, the
bool type was
introduced in Python-2.3
all the way back in 2003. Before that, the concept of true was represented by
any 'true' object, and most of the time as the integer
1; false was mostly
False were added to the builtins, but only as
other names for
0. According the that page and
bool is a subtype of
so you could still do arithmetic operations like
True+1 (!!!), but I'm pretty
sure deep down below the just wanted to be retro compatible.
I have to be honest, I don't like that, or the fact that applying
convert them to
ints, so I decided to subclass
bool and implement
in such a way that it returns the original value. And that's when I got the real
In : class FalseBool (bool): ...: pass ...: TypeError: type 'bool' is not an acceptable base type
Probably you didn't know (I didn't), but Python has such a thing as a 'final class' flag. It can only be used while defining classes in a C extension. It's a strange flag, because most of the classes have to declare it just to be subclassable; it's not even part of the default flags. Even more surprising, is that there are a lot of classes that are not subclassable: around 124 in Python-3.6, and only 84 that are subclassable.
So there you go. You learn something new every day. If you're curious, here's
the final implementation of
class : def __init__ (self, value): if not isinstance (value, bool): raise ValueError self.value= value def __bool__ (self): return self.value def __neg__ (self): return self.value
This will go in
ayrton's next release, which I hope will be soon. I'm also
working in implementing all of the different styles of expansion found in
I even seem to have found some bugs in it.
 I'm talking about the shell, not to confuse with
 Well, there are a couple of infix binary operands in the form
I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.
Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.
Lastly, as imagico mentioned in the first post about this hack,
the algorithms are not fast, and I already made my computer start thrashing the
disk swapping like Hell because
pg hit a big polygon and started using lots of
RAM to calculate its centerline. At least this time I can see how complex the
polygons are before handing them to the code. As an initial benchmark, the original data
for that changeset (I later simplified it with JOSM's tool) took 0.063927s in
pg+gis and 0.004737s in the Python code. More test will come later.
Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!
Like I said in my last post, I'm looking at last . Here are my selected videos, in the order I saw them: 's videos
Ned Batchelder - Machete-mode debugging: Hacking your way out of a tight spot. In fact, I saw this twice.
Sumana Harihareswara - HTTP Can Do That?! Points for informative and funny.
Matthias Kramm - Python Typology Types are comming, so get used to them.
Scott Sanderson, Joe Jevnik - Playing with Python Bytecode Nice, very nice trick. I'm talking about the way the presentation is given.
And of course, the lighning talks. I always like these, because you can get exposed to any kind of things, some not even remotely connected to Python, but which can get your brain rolling down nice little bunny holes, or at least get a smile from you. So here:
LT#1. Please watch it at least between 20-25m.
And of course, check the other ones, don't stop at my own interests.
 Yes, I started writing this a month ago.
Long time for this release. A couple of hard bugs (which fix was just moving a line down a little), a big-ish new feature, and moving in a new city. Here's the:
- You can import ayrton modules and packages!
- Depends on Python3.5 now.
argvis not quite a list: for some operations (
argvis left alone.
option()raises or if the option or its 'argument' is wrong.
stat()are available as functions.
pdbwhen there is an unhandled exception.
for line in foo(...): ...by automatically adding the
- A lot of internal fixes.
My latest Europe import was quite eventful. First, I run out of space several
times during the import itself, at indexing time. The good thing is that, if you
manage to reclaim some space, and reading a little of
you can replay the missing queries by hand and stop cursing. To be fair,
osm2pgsql currently uses a lot of space in slim+flat-nodes mode: three tables,
planet_osm_relation; and one file, the
flat nodes one. Those are not deleted until the whole process has finished, but
they're actually not needed after the processing phase. I started working on
But that was not the most difficult part. The most difficult part was that I
forgot, somehow, to add a column to the
Elevation, my own style,
renders different icons for different types of castles (and forts too), just like
the Historic Place map
of the Hiking and Bridle map. So today
I sat down and tried to figure out how to reparse the OSM extract I used for the
import to add this info.
The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:
node,way castle_type text polygon
That says that this applies to nodes and ways. If the element is a way,
will try to convert it to a polygon and put it in the
if it's a node, it ends in the
planet_osm_point table. So we just add the
column to those tables:
ALTER TABLE planet_osm_point ADD COLUMN castle_type text; ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;
Now how to process the extract? Enter
pyosmium. It's a Python binding
osmium library with a stream-like type of processing à la expat for
processing XML. The interface is quite simple: one subclasses
osmium.SimpleHandler, defines the element type handlers (
relation()) and that's it! Here's the full code of the simple Python
script I did:
#! /usr/bin/python3 import osmium import psycopg2 conn= psycopg2.connect ('dbname=gis') cur= conn.cursor () class CastleTypes (osmium.SimpleHandler): def process (self, thing, table): if 'castle_type' in thing.tags: try: name= thing.tags['name'] # osmium/boost do not raise a here!# : <Boost.Python.function object at 0x1329cd0> returned a result with an error setexcept (KeyError, SystemError): name= '' print (table, thing.id, name) cur.execute ('''UPDATE '''+table+ ''' SET castle_type = %s WHERE osm_id = %s''', (thing.tags['castle_type'], thing.id)) def node (self, n): self.process (n, 'planet_osm_point') def way (self, w): self.process (w, 'planet_osm_polygon') relation= way # handle them the same way (*honk*) ct= CastleTypes () ct.apply_file ('europe-latest.osm.pbf')
The only strange part of the API is that it doesn't seem to raise a
when the tag does not exist, but a
SystemError. I'll try to figure this out
later. Also interesting is the big amount of unnamed elements with this tag that
exist in the DB.
 I would love forto recognize something like https://github.com/openstreetmap/osm2pgsql/blob/master/table.cpp#table_t::stop and be directed to that method, because #Lxxx gets old pretty quick.
 I just noticed how much more complete those maps are. more ideas to use :)
For a few months now I've been trying to have a random slideshow of images. I
used to do this either with
kscreensaver, which for completely different
reasons I can't use now, or
glslideshow, which, even when I
compiled it by hand, I can't find the way to give it the root dir of the images.
So, based on OMIT, I developed my own.
The differences with OMIT are minimal. It has to scan the whole tree for finding the appropriate files (its definition of "appropriate" could be improved, it's true); it goes into full screen mode with black background; and it (more) properly handles EXIF rotation. All that in 176 LOCs, including proper licensing (GPLv3), and developed in one day and refined the next one.
So, there you are. Like OMIT, it's in
PyQt4, but this time in Python3 (that's
why I used
includes porting it to
PyQt5 and a few other things. You can grab it
here. I plan to do a proper
release soon, but for the moment just drop it in your
PATH and be happy with