glob no es un blog. No en el sentido corriente de la palabra. Es un registro de mis proyectos y otras interacciones con el software libre.
glob is not a blog. Not in the common meaning of the word. It's a record of my projects and other interactions with libre software.
I'm writing a python module that allows me to 'drive' a site using Qt. This
means that I can navigate the site, fill forms, submit them and read the
resulting pages and scrape them, Selenium style. The reasons I'm using
Qt are that it has enough support for the site I'm driving (it's the web
frontend of the SIP telephony solution we're using, which has an incomplete API
and I have to automatize several aspects not covered by it); there are python bindings; and
because I can do it headless:
instead of using browser instances, I simply instanciate one
per thread and that's it.
The first thing I learned today is that JS objects representing the DOM elements have two sets of value holders: attributes and properties. The properties is what in Python we call attributes: the object's elements which are accesible with the '.' operator and hold instance values. The attributes are in fact the HTML element's attributes that gave the properties' initial values. That is, given the following HTML element:
<input type="text" name="foo° id="bar" value="quux">
the initial JS object's attributes and properties will have those values.
If you change the value with your browser, the
value property of that element
will be changed, but not the attribute.
When you submit the form, the
value properties of all the form elements are
used, so if you "only' change the
value attribute, that won't be used.
So forget attributes. Also, the DOM is the
representation of the actual state of the page, but this state is never reflected in
the HTML source that you can ask your browser to show, but you see those changes
reflected in the browser's debugger. It's like they really
wanted to keep initial values apart from current state.
On the Qt side,
is only the DOM element representation, not the JS object, so you
can't access the properties via its API, but by executing JS:
Tonight I finished fixing the most annoying bug I had with this site. To add a
user I have to fill a form that is split in 7 'tabs' (which means 7
with fields where only one is shown at a time). One of the fields on the second
tab has a complex
JS interaction and I was cracking my skull trying to make it work. Because the JS
is reacting to key presses, setting the
value property was not triggering it.
Next I tried
KeyboardEvent in JS,
but I didn't succeed. Maybe it was me, maybe the fact that the engine behind
QWebPage is the original Webkit and for some reason its JS support is lacking
there, who knows.
But the good guys from
#qtwebkit gave me a third option: just send plain
QKeyEvents to the input element. Luckily we can do that, the web engine is
completely built in Qt and supports its event system and more. I only had to
give focus to the widget.
Again, I tried with JS and failed, so I went back cheating with Qt
QWebElemnt.geometry() returns the
QRect of the
that implements the input element; I just took the
.center() of it, and
generated a pair of mouse button press/release events in that point. One further
detail is that the
.geometry() won't be right unless I force the second tab to
be shown, forcing the field to be drawn. Still, for some reason getting a
reference to the input field on
page load (when I'm trying to figure out which fields are available, which in
the long run does not make sense, as fields could easily be created or destroyed
on demand with
JS) does not return an object that will be updated after the widget is
repositioned, so asking its geometry returns
((0, -1), (-1, 0)), which amounts to
an invalid geometry. The solution is to just get the reference to the input field
after forcing the div/tab to be shown.
Finally, I create a
pair of key press/release events for each character of the string I wanted as
value, and seasoned everything with a lot of
Another advantage of using the Qt stuff is that while I was testing I could plug
QWebView, sprinkle some
time.sleep() of various lengths, and see how it
behaved. Now I can simply remove that to be back to headlessness.
I'm not sure I'll publish the code; as you can see, it's quite hacky and it will require a lot of cleanup to be able to publish it without a brown paper bag in my head.
 Yes, I'm using qt5.5 because that's what I will have available in the production server.
 Although as I said, you can change the attributes and so you lose the original values.
 I guess the answer is in in the spec.
 I think i got it:
QWebElement is the C++ class that is used in
to represent the HTML tree, the real DOM, while somewhere deeper in
there are the classes representing the JS objects which you just can't reach.
 This clearly shows that there is a connection between the DOM object and the JS one, you just can't access it via the API.
 This is the original footnote: Or something like that. Look, I'm an engineer and I usually want to know how things work, but since my first exposure to HTML, CSS and JS, back in the time when support was flaky and fragmented on purpose, I always wanted to stay as far away from them as possible. Things got much better, but as you can see the details are still somewhat obscure. I guess, I hope the answer is in the spec.
 With this I mean that I executed something and it didn't trigger the events it should, and there's no practical way to figure out why.
This is the second time I spent hours looking for this, so this time I'm writing it down.
My 10 year old Dell Inspiron 1420N, which is now my home server where I keep several useful online tools, has two problems: The keyboard and the LCD do not work. Well, the LCD works erratically, most of the times a couple of seconds after boot. The first problem can be fixed by attaching a USB keyboard, and the second by attaching an external screen.
Except that the machine does not
enable the VGA output by default; but no problem, you just press
voilà, external screen works. Except that external keyboards do not have the
key; but no problem, you can
emulate it with
Scroll Lock by just telling the BIOS to do so.
But you can't do it if you can't see anything on the screen. To do it blindly, you have to either know you BOIS by heart or find any reference online. I don't know that BIOS by heart, mainly because it's been a loooong while since I had to use it for anything, but also because I barely touch that machine anymore. And online references, well, there are none for models so old.
One of the possible solutions it occured to me that could help was to try to run
a BIOS image, which you can still download from Dell's site (!), under
but this tool cannot run arbitrary BIOSes. A pity, but understandable.
So without further ado, a schematic of the BIOS contents and how to fix this blindly:
- System | System Info <-- the cursor starts here | Processor Info | Memory Info | Device Info | Battery Info | Battery Health | Date/Time | Boot Sequence + Onboard Devices + Video + Security + Performance + Power Management + Maintenance - POST Behaviour <-- 14 * <Down> + <Enter> and the following menu opens | Adapter Warnings | Fn Key Emulation <-- 2 * <Down> + <Enter> and the setup screen opens | Fast Boot | Virtualization | Keypad (embedded) | Numlock LED | USB Emulation + Wireless
The setup screen is quite simple, it has two options,
and you move with
<Right>. I'm not sure if it's needed, but pressing
<Enter> to choose your option does not hurt. Then you press
gives you the
Exit screen. This screen has three options:
Remain in Setup (which is
Discard/Exit. Guess which one you want
<Enter> and you're done! The machine reboots and now you can
<Scroll Lock>+<F8> in your external keyboard to activate the external
Last night I realized the first point. Checking today I found the latter. Early, often, go!
ayrton-0.9has debug on. It will leave lots of files laying around your file system.
- Modify the release script to do not allow this never ever more.
make installwas not running the tests.
Another release, but this time not (only) a bugfix one. After
I converted the file tests from a
_X format, which, let's face it, was not pretty,
into the more usual
-X format. This alone merits a change in the minor version
_err also accept a tuple
(path, flags), so
you can specify things like
In other news, I had to drop support for Pyhton-3.3, because otherwise I would have to complexify the import system a lot.
But in the end, yes, this also is a bugfix release. Lost of fd leaks where
plugged, so I suggest you to upgrade if you can. Just remember the
change. I found all the leaks thanks to
unitest's warnings, even if sometimes
they were a little misleading:
testRemoteCommandStdout (tests.test_remote.RealRemoteTests) ... ayrton/parser/pyparser/parser.py:175: <span class="createlink">ResourceWarning</span>: unclosed <socket.socket fd=5, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, raddr=/tmp/ssh-XZxnYoIQxZX9/agent.7248> self.stack[-1] = (dfa, next_state, node)
The file and line cited in the warning have nothing to do with the warning
itself (it was not the one who raised it) or the leaked fd, so it took me a while
to find were those leaks were coming from. I hope I have some time to find why
this is so. The most frustrating thing was that
unitest closes the leaking fd,
which is nice, but in one of the test cases it was closing it seemingly before the
test finished, and the test failed because the socket was closed:
====================================================================== ERROR: testLocalVarToRemoteToLocal (tests.test_remote.RealRemoteTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 225, in wrapper test (self) File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 235, in testLocalVarToRemoteToLocal self.runner.run_file ('ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay') File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 304, in run_file return self.run_script (script, file_name, argv, params) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 323, in run_script return self.run_tree (tree, file_name, argv, params) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 336, in run_tree return self.run_code (code, file_name, argv) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 421, in run_code raise error File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 402, in run_code exec (code, self.globals, self.locals) File "ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay", line 6, in <module> with remote ('127.0.0.1', _test=True): File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 362, in __enter__ i, o, e= self.prepare_connections (backchannel_port, command) File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 270, in prepare_connections self.client.connect (self.hostname, *self.args, **self.kwargs) File "/usr/lib/python3/dist-packages/paramiko/client.py", line 338, in connect t.start_client() File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 493, in start_client raise e File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1757, in run self.kex_engine.parse_next(ptype, m) File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 75, in parse_next return self._parse_kexdh_reply(m) File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 112, in _parse_kexdh_reply self.transport._activate_outbound() File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 2079, in _activate_outbound self._send_message(m) File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1566, in _send_message self.packetizer.send_message(data) File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 364, in send_message self.write_all(out) File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 314, in write_all raise EOFError() EOFError
This probably has something to do with the fact that the test (a functional test, really) is using threads and real sockets. Again, I'll try to investigate this.
All in all, the release is an interesting one. I'll keep adding small features and releasing, let's see how it goes. Meanwhile, here's the changelog:
- The 'No Government' release.
- Test functions are no longer called
-X, which is more scripting friendly.
- Some if those tests had to be fixed.
- Dropped support for
py3.3because the importer does not work there.
toxsupport, but not yet part of the stable test suite.
- Lots and lots of more tests.
- Lots of improvements in the
remote()tests; in particular, make sure they don't hang waiting for someone who's not gonna come.
- Ignore ssh
remote()tests if there's not password/phrase-less connection.
- Fixed several fd leaks.
_erralso accept a tuple
(path, flags), so you can specify things like
os.O_APPEND. Mostly used internally.
I'll keep this short. During the weekend I found a bug in
ayrton. I fixed it
develop, and decided to make a release with it, because it was kind of a
showstopper. It was the first time I decided to use
ayrton for a oneliner.
It was this one:
ayrton -c "rm(v=True, locate('.xvpics', _out=Capture))"
ayrton's native support for filenames with spaces makes it a perfect
xargs and tools like that. That command simply finds
all the files or directories called like
locate and removes them. There is a
little bit of magic where
locate's output becomes
rm's arguments, but probably
not magic enough:
_out=Capture has to be specified. We'll probably fix that
in the near future.
So, enjoy the new release. It just fixes a couple of bugs, one of them directly related to this oneliner. Here's the changelog:
- The 'Release From The Bus' release.
- Bugfix release.
Argvshould not be created with an empty list.
- Missing dependencies.
- Several typos.
- Fix for
ayrton -c <script>was failing because the file name properly was not properly (f|b)aked.
ayrton --versiondidn't work!
Meanwhile, a little about its future. I have been working on
ayrton on and off.
Right now I'm gathering energy to modify
pypy's Python parser so it supports
py3.6's formatted string literals. With this I can later update
which is based on
pypy's. A part of it has been done, but then I run out of gas.
I think FSLs are perfect for
ayrton in its aim to replace shell script languages.
In other news, there's a nasty
remote() bug that I can't pin down. These two
things might mean that there won't be a significant release for a while.
I was trying to modify
ayrton so we could really have
sh-style file tests.
sh, they're defined as unary operators in the
-X form, where
X is a
letter. For instance,
-f foo returns true (
0 in sh-peak) if
foo is some
kind of file. In
ayrton I defined them as functions you could use, but the
names sucked a little.
-f was called
_f() and so on. Part of the reason is,
I think, that both
ayrton already do some
in executable names, and part because I thought that
-True didn't make any
A couple of days ago I came with the idea that I could symply call the function
f() and (ab)use the fact that
- is a unary operator. The only detail was to
make sure that
- didn't change the truthiness of
bools. In fact, it doesn't,
but this surprised me a little, although it shouldn't have:
In : -True Out: -1 In : -False Out: 0 In : if -True: print ('yes!') yes! In : if -False: print ('yes!')
You see, the
bool type was
introduced in Python-2.3
all the way back in 2003. Before that, the concept of true was represented by
any 'true' object, and most of the time as the integer
1; false was mostly
False were added to the builtins, but only as
other names for
0. According the that page and
bool is a subtype of
so you could still do arithmetic operations like
True+1 (!!!), but I'm pretty
sure deep down below the just wanted to be retro compatible.
I have to be honest, I don't like that, or the fact that applying
convert them to
ints, so I decided to subclass
bool and implement
in such a way that it returns the original value. And that's when I got the real
In : class FalseBool (bool): ...: pass ...: TypeError: type 'bool' is not an acceptable base type
Probably you didn't know (I didn't), but Python has such a thing as a 'final class' flag. It can only be used while defining classes in a C extension. It's a strange flag, because most of the classes have to declare it just to be subclassable; it's not even part of the default flags. Even more surprising, is that there are a lot of classes that are not subclassable: around 124 in Python-3.6, and only 84 that are subclassable.
So there you go. You learn something new every day. If you're curious, here's
the final implementation of
class : def __init__ (self, value): if not isinstance (value, bool): raise ValueError self.value= value def __bool__ (self): return self.value def __neg__ (self): return self.value
This will go in
ayrton's next release, which I hope will be soon. I'm also
working in implementing all of the different styles of expansion found in
I even seem to have found some bugs in it.
 I'm talking about the shell, not to confuse with
 Well, there are a couple of infix binary operands in the form
Today I had to setup 3 Firefox profiles, because I started a new job, and I realized I never documented which extensions I use or why, so I had to work a little from memory. Hence, this post, which I plan to keep up-to-date as much as possible.
A little bit of rationale first. I'm very privacy-conscious, but at the same time very pragmatic. I use several profiles to add an extra level of data isolation. That also allows me to have different sets of extensions, because some are some intrusive that they break some non-important sites' functionality.
Finally, the list, in no particular order:
FlashGot, by Giorgio Maone: Better downloads handling.
Go-Mobile, by 'Geek in Training': A lot of sites are actually more useful (read, with less crap on them) in their Mobile versions. This plugins lets you switch from one to the other.
HTTPS everywhere, by EFF: Don't navigate in the clear anymore.
No Script, also by Giorgio Maone: A broad spectrum antibiotic. Not loading JS makes pages less CPU intensive, plus sites cannot track you if you don't make requests, plus also blocks videos.
Privacy Badger, also by EFF: In their own words, “protects privacy by blocking spying ads and invisible trackers”.
Tab Auto Reload, by 'Schuzak': I use this to reload sites that constantly log you out, but only under certain circumstances.
Tab mix plus, by 'onemen': Once upon a time ffox didn't have session management/recovery. Now it does, but not very good; I still think TMP's ones are better. Also, duplicate tab.
Toggle animated GIFs, by Simon Lindholm: Stop annoying animations. Just make sure to tick 'Pause GIFs by default'.
uBlock Origin, by Raymond Hill: an (ad) blocker, goodbye-adiós 15s ad videos in youtube.
So that's it. Unluckily there's nothing against browser fingerprinting yet (and my browser ranks as quite unique), and I don't know how much can be/has been implemented by [Mozilla]. If you have other suggestions about plugins, please do in the comments below. As I said, I'll try to keep this post up to date.
 I used to use ABP, but it seems it became a protection scam.
I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.
Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.
Lastly, as imagico mentioned in the first post about this hack,
the algorithms are not fast, and I already made my computer start thrashing the
disk swapping like Hell because
pg hit a big polygon and started using lots of
RAM to calculate its centerline. At least this time I can see how complex the
polygons are before handing them to the code. As an initial benchmark, the original data
for that changeset (I later simplified it with JOSM's tool) took 0.063927s in
pg+gis and 0.004737s in the Python code. More test will come later.
Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!
A month ago I revived my old-laptop-as-server I have at home. I don't do much in
it, just serve my photos, a map, provide a
ssh trampoline for me and some
friends and not much more. This time I decided to tackle one of the most
annoying problems I had with it: That closing the lid led to the system to
Now, the setup in that computer has evolved through some years, so a lot of cruft was left on it. For instance, at some point I solved the problem by installing a desktop and telling it not to suspend the machine, mostly because that's how I configure my current laptop. That, of course, was a cannon-for-killing-flies solution, but it worked, so I could focus in other things. Also, a lot of power-related packages were installed, assuming the were really needed for supporting everything I might ever wanted to do about power. This is the story on how I removed them all, why, and how I solved the lid problem... twice.
First thing to go were the desktop packages, mostly because the screen in that laptop has been dead for more than a year now, and because its new space in the house is a small shelf in my wooden desktop. Then I reviewed the power-related packages one by one and decided whether I needed it or not. This is more or less what I found:
acpi-fakekey: This package has a tool for injecting fake ACPI keystrokes in the input system. Not really needed.
acpi-support: It has a lot of scripts that can be run when some ACPI events occur. For instance, lid closing, battery/AC status, but also things like responding to power and even 'multimedia' keys. Nice, but not needed in my case; the lid is going to be closed all the time anyways.
laptop-mode-tools: Tools for saving power in your laptop. Not needed either, the server is going to be running all the time on AC (its battery also died some time ago).
upower: D-Bus interface for power events. No desktop or anything else to listen to them. Gone.
pm-utils: Nice CLI scripts for suspending/hibernating your system. I always have them around in my laptop because sometimes the desktops don't work properly. No use in my server, but it's cruft left from when I used it as my laptop. Adieu.
Even then, closing the lid led to the system suspending. Who else could be there?
Well, there is one project who's being everywhere:
systemd. I'm not saying
this is bad, but it is everywhere. Thing is, its login subsystem also handles
ACPI events. In the
/etc/systemd/logind.conf file you can read the following
#HandlePowerKey=poweroff #HandleSuspendKey=suspend #HandleHibernateKey=hibernate #HandleLidSwitch=suspend #HandleLidSwitchDocked=ignore
so I uncommented the 4th line and changed it so:
Here you can also configure how the inhibition of actions work:
#PowerKeyIgnoreInhibited=no #SuspendKeyIgnoreInhibited=no #HibernateKeyIgnoreInhibited=no #LidSwitchIgnoreInhibited=yes
Please check the config file's doc if you plan to modify it.
Not entirely unrelated, my main laptop also started suspending when I closed the lid. I have it configured, through the desktop environment, to only turn off the screen, because what use is the screen if it's facing the keyboard and touchpad :) Somehow, these settings only recently started to be in effect, but a quick search didn't gave any results on when things changed. Remembering what I did with the server, I just changed that config file to:
HandlePowerKey=ignore HandleSuspendKey=ignore HandleHibernateKey=ignore HandleLidSwitch=ignore HandleLidSwitchDocked=ignore
That is, “let me configure this through the desktop, please”, and now I have my old behavior back :)
PS: I should start reading more about
systemd. A good starting point seems to
be all the links in its home page.
Dear conference speakers:
test write your slides at 1024x768 resolution, on a projector.
Dear conference organizers:
Please remind your speakers to do so.
Dear conference attendants:
If you are at the back of the room and you can't see the text/code in the slides the speaker(s) is showing, please shout “I cant see shit!” in the appropriate language, and try to embarrass the speaker as much as possible.
Thanks in advance. Yours truly,
PS: I've been watching videos of talks in some conferences and I swear to $DEITY at in least 40% of the ones I was interested in, I couldn't read the code on the video. Sometimes the fonts are too small, sometimes the colors are not contrasting enough. Please, at least test your slides on a projector...
PSS: I know the resolution I'm suggesting is low. Be happy I'm not asking for 640x480 :-P
PSSS: Ok, attendants, don't embarrass/harass the speakers :)
Like I said in my last post, I'm looking at last . Here are my selected videos, in the order I saw them: 's videos
Ned Batchelder - Machete-mode debugging: Hacking your way out of a tight spot. In fact, I saw this twice.
Sumana Harihareswara - HTTP Can Do That?! Points for informative and funny.
Matthias Kramm - Python Typology Types are comming, so get used to them.
Scott Sanderson, Joe Jevnik - Playing with Python Bytecode Nice, very nice trick. I'm talking about the way the presentation is given.
And of course, the lighning talks. I always like these, because you can get exposed to any kind of things, some not even remotely connected to Python, but which can get your brain rolling down nice little bunny holes, or at least get a smile from you. So here:
LT#1. Please watch it at least between 20-25m.
And of course, check the other ones, don't stop at my own interests.
 Yes, I started writing this a month ago.
Long time for this release. A couple of hard bugs (which fix was just moving a line down a little), a big-ish new feature, and moving in a new city. Here's the:
- You can import ayrton modules and packages!
- Depends on Python3.5 now.
argvis not quite a list: for some operations (
argvis left alone.
option()raises or if the option or its 'argument' is wrong.
stat()are available as functions.
pdbwhen there is an unhandled exception.
for line in foo(...): ...by automatically adding the
- A lot of internal fixes.
My latest Europe import was quite eventful. First, I run out of space several
times during the import itself, at indexing time. The good thing is that, if you
manage to reclaim some space, and reading a little of
you can replay the missing queries by hand and stop cursing. To be fair,
osm2pgsql currently uses a lot of space in slim+flat-nodes mode: three tables,
planet_osm_relation; and one file, the
flat nodes one. Those are not deleted until the whole process has finished, but
they're actually not needed after the processing phase. I started working on
But that was not the most difficult part. The most difficult part was that I
forgot, somehow, to add a column to the
Elevation, my own style,
renders different icons for different types of castles (and forts too), just like
the Historic Place map
of the Hiking and Bridle map. So today
I sat down and tried to figure out how to reparse the OSM extract I used for the
import to add this info.
The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:
node,way castle_type text polygon
That says that this applies to nodes and ways. If the element is a way,
will try to convert it to a polygon and put it in the
if it's a node, it ends in the
planet_osm_point table. So we just add the
column to those tables:
ALTER TABLE planet_osm_point ADD COLUMN castle_type text; ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;
Now how to process the extract? Enter
pyosmium. It's a Python binding
osmium library with a stream-like type of processing à la expat for
processing XML. The interface is quite simple: one subclasses
osmium.SimpleHandler, defines the element type handlers (
relation()) and that's it! Here's the full code of the simple Python
script I did:
#! /usr/bin/python3 import osmium import psycopg2 conn= psycopg2.connect ('dbname=gis') cur= conn.cursor () class CastleTypes (osmium.SimpleHandler): def process (self, thing, table): if 'castle_type' in thing.tags: try: name= thing.tags['name'] # osmium/boost do not raise a here!# : <Boost.Python.function object at 0x1329cd0> returned a result with an error setexcept (KeyError, SystemError): name= '' print (table, thing.id, name) cur.execute ('''UPDATE '''+table+ ''' SET castle_type = %s WHERE osm_id = %s''', (thing.tags['castle_type'], thing.id)) def node (self, n): self.process (n, 'planet_osm_point') def way (self, w): self.process (w, 'planet_osm_polygon') relation= way # handle them the same way (*honk*) ct= CastleTypes () ct.apply_file ('europe-latest.osm.pbf')
The only strange part of the API is that it doesn't seem to raise a
when the tag does not exist, but a
SystemError. I'll try to figure this out
later. Also interesting is the big amount of unnamed elements with this tag that
exist in the DB.
 I would love forto recognize something like https://github.com/openstreetmap/osm2pgsql/blob/master/table.cpp#table_t::stop and be directed to that method, because #Lxxx gets old pretty quick.
 I just noticed how much more complete those maps are. more ideas to use :)
For a few months now I've been trying to have a random slideshow of images. I
used to do this either with
kscreensaver, which for completely different
reasons I can't use now, or
glslideshow, which, even when I
compiled it by hand, I can't find the way to give it the root dir of the images.
So, based on OMIT, I developed my own.
The differences with OMIT are minimal. It has to scan the whole tree for finding the appropriate files (its definition of "appropriate" could be improved, it's true); it goes into full screen mode with black background; and it (more) properly handles EXIF rotation. All that in 176 LOCs, including proper licensing (GPLv3), and developed in one day and refined the next one.
So, there you are. Like OMIT, it's in
PyQt4, but this time in Python3 (that's
why I used
includes porting it to
PyQt5 and a few other things. You can grab it
here. I plan to do a proper
release soon, but for the moment just drop it in your
PATH and be happy with
In this last two days I've been expanding
osm-centerlines. Now it not only
supports ways more complex than a simple rectangle, but also ones that lead to
'branches' (unfortunately, most probably because the mapper either imported
bad data or mapped it himself). Still, I tested it in very complex polygons
and the result is not pretty. There is still lots of room for improvements.
Unluckily, it's not as stand alone as it could be.
The problem is that, so far, the algos force you to provide now only the polygon
you want to process, but also its
medial. The code extends the
medial using info extracted from the skeleton in such a way that the resulting
medial ends on a segment of the polygon, hopefully the one(s) that cross
from one riverbank to another at down and upstream. Calculating the skeleton
could be performed by
CGAL, but the current
Python binding doesn't include
that function yet. As for the medial, SFCGAL (a C++ wrapper for CGAL)
exports a function that calculates an approximative medial,
but there seem to be no Python bindings for them yet.
So, a partial solution would be to use PostGIS-2.2's
ST_ApproximateMedialAxis(), so I added a function called
skeleton_medial_from_postgis(). The parameters are a
psycopg2 connection to a
PostgreSQL+PostGIS database and the way you want to calculate, as a
and it returns the skeleton and the medial ready to be fed into
The result of that should be ready for mapping.
So there's that. I'll be trying to improve it in the next days, and start looking into converting it into a JOSM plugin.
For a long time now I've been thinking on a problem: OSM data sometimes contains riverbanks that have no centerline. This means that someone mapped (part of) the coasts of a river (or stream!), but didn't care about adding a line that would mark its centerline.
But this should be computationally solvable, right? Well, it's not that easy. See, for given any riverbank polygon in OSM's database, you have 4 types of segments: those representing the right and left riverbanks (two types) and the flow-in and flow-out segments, which link the banks upstream and downstream. With a little bit of luck there will be only one flow-in and one flow-out segment, but there are no guarantees here.
One method could try and identify these segments, then draw a line starting in the middle of the flow-in segment, calculating the middle by traversing both banks at the same time, and finally connect to the middle for the flow-out segment. Identifying the segments by itself is hard, but it is also possible that the result is not optimal, leading to a jagged line. I didn't try anything on those lines, but I could try some examples by hand...
Enter topology, the section of maths that deals with this kind of problems. The skeleton of a polygon is a group of lines that are equidistant to the borders of the polygon. One of the properties this set of lines provides is direction, which can be exploited to find the banks and try to apply the previous algorithm. But a skeleton has a lot of 'branches' that might confuse the algo. Going a little further, there's the medial axis, which in most cases can be considered a simplified skeleton, without most of the skeleton branches.
Enter free software :) CGAL
is a library that can compute a lot of topological properties. PostGIS is clever
enough to leverage those algorithms and present, among others, the functions
ST_ApproximateMedialAxis(). With these two and the
original polygon I plan to derive the centerline. But first an image that will
help explaining it:
The green 'rectangle' is the original riverbank polygon. The thin black line is the skeleton for it; the medium red line is the medial. Notice how the medial and the center of the skeleton coincide. Then we have the 4 branches forming a V shape with its vertex at each end of the medial and its other two ends coincide with the ends of the flow in and flow out segments!
So the algorithm is simple: start with the medial; from its ends, find the branches in the skeleton that form that V; using the other two ends of those Vs, calculate the point right between them, and extend the medial to those points. This only calculates a centerline. The next step would be to give it a direction. For that I will need to see if there are any nearby lines that could be part of the river (that's what the centerline is for, to possibly extend existing rivers/centerlines), and use its direction to give it to the new centerline.
For the moment the algorithm only solves this simple case. A slightly more
complex case is not that trivial, as skeletons and medials are returned as a
MultiLineString with a line for each segment, so I will have to rebuild them
LineStrings before processing.
I put all the code
online, of course :)
Besides a preloaded PostgreSQL+PostGIS database with OSM data, you'll need
first two allows me to fetch the data from the db. Ah! by the way, you will need
a couple of views:
CREATE VIEW planet_osm_riverbank_skel AS SELECT osm_id, way, ST_StraightSkeleton (way) AS skel FROM planet_osm_polygon WHERE waterway = 'riverbank'; CREATE VIEW planet_osm_riverbank_medial AS SELECT osm_id, way, ST_ApproximateMedialAxis (way) AS medial FROM planet_osm_polygon WHERE waterway = 'riverbank';
Shapely allows me to manipulate the polygonal data, and fiona is used to save the results to a shapefile. This is the first time I ever use all of them (except SQLAlchemy), and it's nice that it's so easy to do all this in Python.
A few weeks ago an interesting
landed in the project's page. It adds rendering for several natural relief
features, adding ridges, valleys, aretes, dales, coulouirs and others to cliffs,
peaks and mountain passes, which were already being rendered. I decided to try
it in Elevation (offline for
I sync'ed the style first with the latest release, applied the patch and... not much. My current database is quite old (re-importing takes ages and I don't have space for updates), so I don't have much features like that in the region I'm interested in. In fact, I went checking and the closest mountain range around here was not in the database, so I added it.
By the way, the range is mostly concurrent with a part of an administrative
SK53 suggested to make a new line.
Even when other features are nearby (there's a path close to the crest and it's
also more or less the limit between a forest and a bare rock section), which already
makes the region a little bit crowded with lines, it makes sense: boundaries,
paths, forest borders and ridges change at different time scales, so having them
as separate lines makes an update to any of those independent of the rest.
Now I wanted to export this feature
and import it in my rendering database, so I can actually see the new part of the
style. This is not an straightforward process, only because when I imported my data I used
osm2pgsql --drop, which removes the much needed intermediate tables for when
one wants to update with
osm2pgsql --append. Here's a roundabout way to go.
First you download the full feature (thanks
RichardF!). In this case:
This not only exports the line (which is a sequence of references to nodes) with
its tags, but the nodes too (which are the ones storing the coords). The next
step is to convert it to something more malleable, for instance, GeoJSON. For
that I used
ogr2ogr like this:
ogr2ogr -f GeoJSON 430573542.GeoJSON 430573542.xml lines
The last parameter is needed because, quoting Even Rouault (a.k.a. José GDAL): «you will always get "points", "lines", "multilinestrings", "multipolygons" and "other_relations" layers when reading a osm file, even if some are empty», and the GeoJSON driver refuses to create layers for you:
ERROR 1: Layer lines not found, and <span class="createlink">CreateLayer</span> not supported by driver.
But guess what, that not the easiest way :) At least we learned something. In
postgis already has a tool called
shp2pgsql that imports ESRIShapeFiles,
ogr2ogr produces by default this kind of file. It creates a
for each layer as discussed before, but again, we're only interested in the line
ogr2ogr 430573542 430573542.xml lines shp2pgsql -a -s 900913 -S 430573542/lines.shp > 430573542.sql
We can't use this SQL file directly, as it has a couple of problems. First, you
shp2pgsql the names of the table where you want to insert the data
or the geometry column. Second, it only recognizes some attributes (see below),
and the rest it tries to add them as hstore tags. So we have to manually edit
the file to go from:
INSERT INTO "lines" ("osm_id","name","highway","waterway","aerialway","barrier","man_made","z_order","other_tags",geom) VALUES ('430573542','Montagne Sainte-Victoire',NULL,NULL,NULL,NULL,NULL,'0','"natural"=>"ridge"','010500002031BF0D[...]');
INSERT INTO "planet_osm_line" ("osm_id","name","z_order","natural",way) VALUES ('430573542','Montagne Sainte-Victoire','0','ridge','010500002031BF0D[...]');
s/other_tags/"natural"/ (with double quotes,
natural is a keyword in SQL, as in
s/'"natural"=>"ridge"'/'ridge'/ (in single quotes, so it's a string; double
quotes are for columns). And I also removed the superfluous values and the
ANALIZE line, as I don't care that much. Easy peasy.
A comment on the options for
-s 900913 declares the SRID of the
database. I got that when I tried without and:
ERROR: Geometry SRID (0) does not match column SRID (900913)
-S is needed because
shp2pgsql by default generated , but
that table in particular has a
way column. This is how I figure it
ERROR: Geometry type (MultiLineString) does not match column type (LineString)
Incredibly, after this data massacre, it loads in the db:
$ psql gis < 430573542.sql SET SET BEGIN INSERT 0 1 COMMIT
Today I stumbled upon PyCon 2016's youtube channel and started watching some of the talks. The first one I really finished watching was Ned Batchelder's "Machete debugging", a very interesting talk about 4 strange bugs and the 4 strange techniques they used to find where those bugs were produced. It's a wonderful talk, full of ideas that, if you're a mere mortal developer like me, will probably blow your mind.
One of the techniques they use for one of the bugs is to actually write a trace
function. A trace function in
cpython context is a function that is called
in several different points of execution of Python code. For more information
In my case I used tracing for something that I always liked about
bash: that you
can ask it to print every line that's being executed (even in functions and subprocesses!).
I wanted something similar for
ayrton, so I sat down to figure out how this would
The key to all this is the function I mention up there. The API seems simple enough
at first sight, but it's a little more complicated. You give this function what is
called the global trace function. This function will be called with three parameters:
a frame, an event and a event-dependent arg. The event I'm interested in is
line, which is called for each new line of code that is executed. The complication
comes because what this global trace function should return is a
local trace function. This function will be called with the same parameters as
the global trace function. I would really like an explanation why this is so.
The job for this function, in
ayrton's case, is simple:
inspect the frame, extract the filename and line number and print that. At first this
seems to mean that I should read the files by myself, but luckily there's another
interesting standard module:
linecache to the rescue.
The only 'real complication' of
ayrton's use is that it would not work if the
script to run was passed with the
-c|--script option, but (un)luckily the
execution engine already has to read the hold the script in lines, so using that
as the cache instead of
linecache was easy.
Finally, if you're interested in the actual code,
go take a look.
Just take in account that
ayrton has 3 levels of tracing: à la
lines prepended by
+), with line numbers, and tracing any Python line execution,
including any modules you might use and their dependencies. And don't forget that
it also has 3 levels of debug logging into files. See
ayrton has always been able to use any Python module, package or extension as
long as it is in a directory in
sys.path, but trying to solve a bigger bug, I
realized that there was no way to use
ayrton modules or packages. Having only
laterally heard about the new
importlib module and the new mechanism, I sat down
and read more about it.
The best source (or at least the easiest to find) is possibly what Python's reference says about the import system, but I have to be honest: it was not an easy read. Next week I'll sit down and see if I can improve it a little. So, for those out there who, like me, might be having some troubles understanding the mechanism, here's how I understand the system works (ignoring deprecated APIs and corner cases or even relative imports; I haven't used or tried those yet):
def import_single(full_path, parent=None, module=None): # try this cache first if full_path in sys.modules: return sys.modules[full_path] # if not, try all the finders for finder in sys.meta_path: if parent is not None: spec = finder.find_spec(full_path, parent.__path__, target) else: spec = finder.find_spec(full_path, None, target) # if the finder 'finds' ('knows how to handle') the full_path # it will return a loader if spec is not None: loader = spec.loader if module is None and hasattr(loader, 'create_module'): module = loader.create_module(spec) if module is None: module = ModuleType(spec.name) # let's assume this creates an empty module object module.__spec__ = spec # add it to the cache before loading so it can referenced from it sys.modules[spec.name] = module try: # if the module was passed as parameter, # this repopulates the module's namespace # by executing the module's (possibly new) code loader.exec_module(module) except: # clean up del sys.modules[spec.name] raise return module raise ImportError def import (full_path, target=None): parent= None # this code iterates over ['foo', 'foo.bar', 'foo.bar.baz'] elems = full_path.split('.') for partial_path in [ '.'.join (elems[:i]) for i in range (len (elems)+1) ][1:] parent = import_single(partial_path, parent, target) # the module is loaded in parent return parent
A more complete version of the
if spec is not None branch can be found in
the Loading section
of the reference. Notice that the algorithm uses all the finders in
So which are the default finders?
In : sys.meta_path Out: [_frozen_importlib.BuiltinImporter, _frozen_importlib.FrozenImporter, _frozen_importlib_external.PathFinder]
Of those finders, the latter one is the one that traverses
sys.path, and also has
a hook mechanism. I didn't use those, so for the moment I didn't untangle how they
Finally, this is how I implemented importing
ayrton modules and packages:
from importlib.abc import , Loader from importlib.machinery import import sys import os import os.path from ayrton.file_test import _a, _d from ayrton import Ayrton import ayrton.utils class AyrtonLoader (Loader): @classmethod def exec_module (klass, module): # «the loader should execute the module’s code # in the module’s global name space (module.__dict__).» load_path= module.__spec__.origin loader= Ayrton (g=module.__dict__) loader.run_file (load_path) # set the __path__ # TODO: read PEP 420 init_file_name= '__init__.ay' if load_path.endswith (init_file_name): # also remove the '/' module.__path__= [ load_path[:-len (init_file_name)-1] ] loader= AyrtonLoader () class AyrtonFinder (MetaPathFinder): @classmethod def find_spec (klass, full_name, paths=None, target=None): # TODO: read PEP 420 :) last_mile= full_name.split ('.')[-1] if paths is not None: python_path= paths # search only in the paths provided by the machinery else: python_path= sys.path for path in python_path: full_path= os.path.join (path, last_mile) init_full_path= os.path.join (full_path, '__init__.ay') module_full_path= full_path+'.ay' if _d (full_path) and _a (init_full_path): return ModuleSpec (full_name, loader, origin=init_full_path) else: if _a (module_full_path): return ModuleSpec (full_name, loader, origin=module_full_path) return None finder= AyrtonFinder () # I must insert it at the beginning so it goes before sys.meta_path.insert (0, finder)
Notice all the references to PEP 420. I'm pretty sure I must be breaking something, but for the moment this works.
Remember this? Ok,
maybe you never read that. The gist of the post is that I used
strace -r -T to
produce some logs that we «amassed[sic] [...] with a python script for generating[sic]
a CSV file [...] and we got a very interesting graph». Mein Gott, sometimes the
English I write is terrible... Here's that graph again:
This post is to announce that that Python script is now public. You can find it
here. It's not as fancy as those flame
graphs you see everywhere else, but it's a good first impression, specially if you have
to wait until the installs
perf or any other tool like that (ok, let's be
l/strace is not a standard tool, but probably your grumpy will
be more willing to install those than something more intrusive; I know it happened to
me, at least). It's written in Python3; I'll probably backport it to Python2 soon,
so those stuck with it can still profit from it.
To produce a similar graph, use the
--histogram option, then follow the
suggestions spewed to
stderr. I hope this helps you solve a problem like it did to
If you tell the truth you don't have to remember anything. -- Mark Twain