Privacy and efficiency etensions for Firefox

Marcos Dione

2016-10-14 19:30

Today I had to setup 3 Firefox profiles, because I started a new job, and I realized I never documented which extensions I use or why, so I had to work a little from memory. Hence, this post, which I plan to keep up-to-date as much as possible.

A little bit of rationale first. I'm very privacy-conscious, but at the same time very pragmatic. I use several profiles to add an extra level of data isolation. That also allows me to have different sets of extensions, because some are some intrusive that they break some non-important sites' functionality.

Finally, the list, in no particular order:

FlashGot, by Giorgio Maone: Better downloads handling.
Go-Mobile, by 'Geek in Training': A lot of sites are actually more useful (read, with less crap on them) in their Mobile versions. This plugins lets you switch from one to the other.
HTTPS everywhere, by EFF: Don't navigate in the clear anymore.
No Script, also by Giorgio Maone: A broad spectrum antibiotic. Not loading JS makes pages less CPU intensive, plus sites cannot track you if you don't make requests, plus also blocks videos.
Privacy Badger, also by EFF: In their own words, “protects privacy by blocking spying ads and invisible trackers”.
Tab Auto Reload, by 'Schuzak': I use this to reload sites that constantly log you out, but only under certain circumstances.
Tab mix plus, by 'onemen': Once upon a time ffox didn't have session management/recovery. Now it does, but not very good; I still think TMP's ones are better. Also, duplicate tab.
Toggle animated GIFs, by Simon Lindholm: Stop annoying animations. Just make sure to tick 'Pause GIFs by default'.
uBlock Origin, by Raymond Hill: an (ad) blocker, goodbye-adiós 15s ad videos in youtube.¹

So that's it. Unluckily there's nothing against browser fingerprinting yet (and my browser ranks as quite unique), and I don't know how much can be/has been implemented by [Mozilla]. If you have other suggestions about plugins, please do in the comments below. As I said, I'll try to keep this post up to date.

I used to use ABP, but it seems it became a protection scam. ↩

Barely working centerline JOSM plugin

Marcos Dione

2016-08-29 20:06

I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.

Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.

Lastly, as imagico mentioned in the first post about this hack, the algorithms are not fast, and I already made my computer start thrashing the disk swapping like Hell because pg hit a big polygon and started using lots of RAM to calculate its centerline. At least this time I can see how complex the polygons are before handing them to the code. As an initial benchmark, the original data for that changeset (I later simplified it with JOSM's tool) took 0.063927s in pg+gis and 0.004737s in the Python code. More test will come later.

Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!

Configuring power and hardware related events

Marcos Dione

2016-08-28 17:34

A month ago I revived my old-laptop-as-server I have at home. I don't do much in it, just serve my photos, a map, provide a ssh trampoline for me and some friends and not much more. This time I decided to tackle one of the most annoying problems I had with it: That closing the lid led to the system to suspend.

Now, the setup in that computer has evolved through some years, so a lot of cruft was left on it. For instance, at some point I solved the problem by installing a desktop and telling it not to suspend the machine, mostly because that's how I configure my current laptop. That, of course, was a cannon-for-killing-flies solution, but it worked, so I could focus in other things. Also, a lot of power-related packages were installed, assuming the were really needed for supporting everything I might ever wanted to do about power. This is the story on how I removed them all, why, and how I solved the lid problem... twice.

First thing to go were the desktop packages, mostly because the screen in that laptop has been dead for more than a year now, and because its new space in the house is a small shelf in my wooden desktop. Then I reviewed the power-related packages one by one and decided whether I needed it or not. This is more or less what I found:

acpi-fakekey: This package has a tool for injecting fake ACPI keystrokes in the input system. Not really needed.
acpi-support: It has a lot of scripts that can be run when some ACPI events occur. For instance, lid closing, battery/AC status, but also things like responding to power and even 'multimedia' keys. Nice, but not needed in my case; the lid is going to be closed all the time anyways.
laptop-mode-tools: Tools for saving power in your laptop. Not needed either, the server is going to be running all the time on AC (its battery also died some time ago).
upower: D-Bus interface for power events. No desktop or anything else to listen to them. Gone.
pm-utils: Nice CLI scripts for suspending/hibernating your system. I always have them around in my laptop because sometimes the desktops don't work properly. No use in my server, but it's cruft left from when I used it as my laptop. Adieu.

Even then, closing the lid led to the system suspending. Who else could be there? Well, there is one project who's being everywhere: systemd. I'm not saying this is bad, but it is everywhere. Thing is, its login subsystem also handles ACPI events. In the /etc/systemd/logind.conf file you can read the following lines:

#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
#HandleLidSwitch=suspend
#HandleLidSwitchDocked=ignore

so I uncommented the 4th line and changed it so:

HandleLidSwitch=ignore

Here you can also configure how the inhibition of actions work:

#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes

Please check the config file's doc if you plan to modify it.

Not entirely unrelated, my main laptop also started suspending when I closed the lid. I have it configured, through the desktop environment, to only turn off the screen, because what use is the screen if it's facing the keyboard and touchpad :) Somehow, these settings only recently started to be in effect, but a quick search didn't gave any results on when things changed. Remembering what I did with the server, I just changed that config file to:

HandlePowerKey=ignore
HandleSuspendKey=ignore
HandleHibernateKey=ignore
HandleLidSwitch=ignore
HandleLidSwitchDocked=ignore

That is, “let me configure this through the desktop, please”, and now I have my old behavior back :)

 PS: I should start reading more about `systemd`. A good starting point seems to
 be all the links in its [home page](https://freedesktop.org/wiki/Software/systemd/).

ayrton 0.8 Home Sweet New Home

Marcos Dione

2016-08-17 13:17

Long time for this release. A couple of hard bugs (which fix was just moving a line down a little), a big-ish new feature, and moving in a new city. Here's the ChangeLog:

You can import ayrton modules and packages!
Depends on Python3.5 now.
argv is not quite a list: for some operations (len(), iter(), pop()), argv[^0] is left alone.
option() raises KeyError or ValueError if the option or its 'argument' is wrong.
makedirs() and stat() are available as functions.
-p|--pdb launches pdb when there is an unhandled exception.
Fix for line in foo(...): ... by automatically adding the _bg=True option.
Better Command() detection.
A lot of internal fixes.

Get it on github or pypi!

Adding columns from OSM to postgis with osmium

Marcos Dione

2016-08-09 17:45

My latest Europe import was quite eventful. First, I run out of space several times during the import itself, at indexing time. The good thing is that, if you manage to reclaim some space, and reading a little of source code¹, you can replay the missing queries by hand and stop cursing. To be fair, osm2pgsql currently uses a lot of space in slim+flat-nodes mode: three tables, planet_osm_node, planet_osm_way and planet_osm_relation; and one file, the flat nodes one. Those are not deleted until the whole process has finished, but they're actually not needed after the processing phase. I started working on fixing that.

But that was not the most difficult part. The most difficult part was that I forgot, somehow, to add a column to the import.style file. Elevation, my own style, renders different icons for different types of castles (and forts too), just like the Historic Place map of the Hiking and Bridle map². So today I sat down and tried to figure out how to reparse the OSM extract I used for the import to add this info.

The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:

node,way   castle_type  text         polygon

That says that this applies to nodes and ways. If the element is a way, polygon will try to convert it to a polygon and put it in the planet_osm_polygon table; if it's a node, it ends in the planet_osm_point table. So we just add the column to those tables:

ALTER TABLE planet_osm_point   ADD COLUMN castle_type text;
ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;

Now how to process the extract? Enter pyosmium. It's a Python binding for the osmium library with a stream-like type of processing à la expat for processing XML. The interface is quite simple: one subclasses osmium.SimpleHandler, defines the element type handlers (node(), way() and/or relation()) and that's it! Here's the full code of the simple Python script I did:

#! /usr/bin/python3

import osmium
import psycopg2

conn= psycopg2.connect ('dbname=gis')
cur= conn.cursor ()

class CastleTypes (osmium.SimpleHandler):

    def process (self, thing, table):
        if 'castle_type' in thing.tags:
            try:
                name= thing.tags['name']
            # osmium/boost do not raise a KeyError here!
            # SystemError: <Boost.Python.function object at 0x1329cd0> returned a result with an error set
            except (KeyError, SystemError):
                name= ''
            print (table, thing.id, name)

            cur.execute ('''UPDATE '''+table+
                         ''' SET castle_type = %s
                            WHERE osm_id = %s''',
                         (thing.tags['castle_type'], thing.id))

    def node (self, n):
        self.process (n, 'planet_osm_point')

    def way (self, w):
        self.process (w, 'planet_osm_polygon')

    relation= way  # handle them the same way (*honk*)

ct= CastleTypes ()
ct.apply_file ('europe-latest.osm.pbf')

The only strange part of the API is that it doesn't seem to raise a KeyError when the tag does not exist, but a SystemError. I'll try to figure this out later. Also interesting is the big amount of unnamed elements with this tag that exist in the DB.

I would love for GitHub to recognize something like https://github.com/openstreetmap/osm2pgsql/blob/master/table.cpp#table_t::stop and be directed to that method, because #Lxxx gets old pretty quick. ↩
I just noticed how much more complete those maps are. more ideas to use :) ↩

Our man in Marseille

Marcos Dione

2016-08-05 14:51

Remember this?

For a few months now I've been trying to have a random slideshow of images. I used to do this either with kscreensaver, which for completely different reasons I can't use now, or xscreensavers' glslideshow, which, even when I compiled it by hand, I can't find the way to give it the root dir of the images. So, based on OMIT, I developed my own.

The differences with OMIT are minimal. It has to scan the whole tree for finding the appropriate files (its definition of "appropriate" could be improved, it's true); it goes into full screen mode with black background; and it (more) properly handles EXIF rotation¹. All that in 176 LOCs, including proper licensing (GPLv3), and developed in one day and refined the next one.

One interesting thing I found out is that pyexiv2 is deprecated, with is last release in 2011 (!!!). What this new app uses is its recommended replacement, gexiv2.

So, there you are. Like OMIT, it's in PyQt4, but this time in Python3 (that's why I used gexiv2 instead). Its TODO includes porting it to PyQt5 and a few other things. You can grab it here. I plan to do a proper release soon, but for the moment just drop it in your PATH and be happy with it.

http://www.daveperrett.com/articles/2012/07/28/exif-orientation-handling-is-a-ghetto/ ↩

OSM centerlines: two days after

Marcos Dione

2016-07-29 00:39

In this last two days I've been expanding osm-centerlines. Now it not only supports ways more complex than a simple rectangle, but also ones that lead to 'branches' (unfortunately, most probably because the mapper either imported bad data or mapped it himself). Still, I tested it in very complex polygons and the result is not pretty. There is still lots of room for improvements.

Unluckily, it's not as stand alone as it could be. The problem is that, so far, the algos force you to provide now only the polygon you want to process, but also its skeleton and medial. The code extends the medial using info extracted from the skeleton in such a way that the resulting medial ends on a segment of the polygon, hopefully the one(s) that cross from one riverbank to another at down and upstream. Calculating the skeleton could be performed by CGAL, but the current Python binding doesn't include that function yet. As for the medial, SFCGAL (a C++ wrapper for CGAL) exports a function that calculates an approximative medial, but there seem to be no Python bindings for them yet.

So, a partial solution would be to use PostGIS-2.2's ST_StraightSkeleton() and ST_ApproximateMedialAxis(), so I added a function called skeleton_medial_from_postgis(). The parameters are a psycopg2 connection to a PostgreSQL+PostGIS database and the way you want to calculate, as a shapely.geometry, and it returns the skeleton and the medial ready to be fed into extend_medials(). The result of that should be ready for mapping.

So there's that. I'll be trying to improve it in the next days, and start looking into converting it into a JOSM plugin.

Deriving centerlines from riverbanks without

Marcos Dione

2016-07-26 18:55

For a long time now I've been thinking on a problem: OSM data sometimes contains riverbanks that have no centerline. This means that someone mapped (part of) the coasts of a river (or stream!), but didn't care about adding a line that would mark its centerline.

But this should be computationally solvable, right? Well, it's not that easy. See, for given any riverbank polygon in OSM's database, you have 4 types of segments: those representing the right and left riverbanks (two types) and the flow-in and flow-out segments, which link the banks upstream and downstream. With a little bit of luck there will be only one flow-in and one flow-out segment, but there are no guarantees here.

One method could try and identify these segments, then draw a line starting in the middle of the flow-in segment, calculating the middle by traversing both banks at the same time, and finally connect to the middle for the flow-out segment. Identifying the segments by itself is hard, but it is also possible that the result is not optimal, leading to a jagged line. I didn't try anything on those lines, but I could try some examples by hand...

Enter topology, the section of maths that deals with this kind of problems. The skeleton of a polygon is a group of lines that are equidistant to the borders of the polygon. One of the properties this set of lines provides is direction, which can be exploited to find the banks and try to apply the previous algorithm. But a skeleton has a lot of 'branches' that might confuse the algo. Going a little further, there's the medial axis, which in most cases can be considered a simplified skeleton, without most of the skeleton branches.

Enter free software :) CGAL is a library that can compute a lot of topological properties. PostGIS is clever enough to leverage those algorithms and present, among others, the functions ST_StraightSkeleton() and ST_ApproximateMedialAxis(). With these two and the original polygon I plan to derive the centerline. But first an image that will help explaining it:

The green 'rectangle' is the original riverbank polygon. The thin black line is the skeleton for it; the medium red line is the medial. Notice how the medial and the center of the skeleton coincide. Then we have the 4 branches forming a V shape with its vertex at each end of the medial and its other two ends coincide with the ends of the flow in and flow out segments!

So the algorithm is simple: start with the medial; from its ends, find the branches in the skeleton that form that V; using the other two ends of those Vs, calculate the point right between them, and extend the medial to those points. This only calculates a centerline. The next step would be to give it a direction. For that I will need to see if there are any nearby lines that could be part of the river (that's what the centerline is for, to possibly extend existing rivers/centerlines), and use its direction to give it to the new centerline.

For the moment the algorithm only solves this simple case. A slightly more complex case is not that trivial, as skeletons and medials are returned as a MultiLineString with a line for each segment, so I will have to rebuild them into LineStrings before processing.

I put all the code online, of course :) Besides a preloaded PostgreSQL+PostGIS database with OSM data, you'll need python3-sqlalchemy, geoalchemy, python3-fiona and python3-shapely. The first two allows me to fetch the data from the db. Ah! by the way, you will need a couple of views:

CREATE VIEW planet_osm_riverbank_skel   AS SELECT osm_id, way, ST_StraightSkeleton (way)      AS skel   FROM planet_osm_polygon WHERE waterway = 'riverbank';
CREATE VIEW planet_osm_riverbank_medial AS SELECT osm_id, way, ST_ApproximateMedialAxis (way) AS medial FROM planet_osm_polygon WHERE waterway = 'riverbank';

Shapely allows me to manipulate the polygonal data, and fiona is used to save the results to a shapefile. This is the first time I ever use all of them (except SQLAlchemy), and it's nice that it's so easy to do all this in Python.

Importing single OSM changesets to a non updatable rendering database

Marcos Dione

2016-07-12 17:34

A few weeks ago an interesting PR for osm-carto landed in the project's GitHub page. It adds rendering for several natural relief features, adding ridges, valleys, aretes, dales, coulouirs and others to cliffs, peaks and mountain passes, which were already being rendered. I decided to try it in Elevation (offline for the moment).

I sync'ed the style first with the latest release, applied the patch and... not much. My current database is quite old (re-importing takes ages and I don't have space for updates), so I don't have much features like that in the region I'm interested in. In fact, I went checking and the closest mountain range around here was not in the database, so I added it.

By the way, the range is mostly concurrent with a part of an administrative boundary, but SomeoneElse and SK53 suggested to make a new line. Even when other features are nearby (there's a path close to the crest and it's also more or less the limit between a forest and a bare rock section), which already makes the region a little bit crowded with lines, it makes sense: boundaries, paths, forest borders and ridges change at different time scales, so having them as separate lines makes an update to any of those independent of the rest.

Now I wanted to export this feature and import it in my rendering database, so I can actually see the new part of the style. This is not an straightforward process, only because when I imported my data I used osm2pgsql --drop, which removes the much needed intermediate tables for when one wants to update with osm2pgsql --append. Here's a roundabout way to go.

First you download the full feature (thanks RichardF!). In this case:

http://www.openstreetmap.org/api/0.6/way/430573542/full

This not only exports the line (which is a sequence of references to nodes) with its tags, but the nodes too (which are the ones storing the coords). The next step is to convert it to something more malleable, for instance, GeoJSON. For that I used ogr2ogr like this:

ogr2ogr -f GeoJSON 430573542.GeoJSON 430573542.xml lines

The last parameter is needed because, quoting Even Rouault (a.k.a. José GDAL): «you will always get "points", "lines", "multilinestrings", "multipolygons" and "other_relations" layers when reading a osm file, even if some are empty», and the GeoJSON driver refuses to create layers for you:

ERROR 1: Layer lines not found, and CreateLayer not supported by driver.

But guess what, that not the easiest way :) At least we learned something. In fact postgis already has a tool called shp2pgsql that imports ESRIShapeFiles, and ogr2ogr produces by default this kind of file. It creates a .shp file for each layer as discussed before, but again, we're only interested in the line one. So:

ogr2ogr 430573542 430573542.xml lines
shp2pgsql -a -s 900913 -S 430573542/lines.shp > 430573542.sql

We can't use this SQL file directly, as it has a couple of problems. First, you can't tell shp2pgsql the names of the table where you want to insert the data or the geometry column. Second, it only recognizes some attributes (see below), and the rest it tries to add them as hstore tags. So we have to manually edit the file to go from:

INSERT INTO "lines" ("osm_id","name","highway","waterway","aerialway","barrier","man_made","z_order","other_tags",geom)
    VALUES ('430573542','Montagne Sainte-Victoire',NULL,NULL,NULL,NULL,NULL,'0','"natural"=>"ridge"','010500002031BF0D[...]');

into:

INSERT INTO "planet_osm_line" ("osm_id","name","z_order","natural",way)
    VALUES ('430573542','Montagne Sainte-Victoire','0','ridge','010500002031BF0D[...]');

See? s/lines/planet_osm_line/, s/other_tags/"natural"/ (with double quotes, because natural is a keyword in SQL, as in natural join), s/geom/way/ and s/'"natural"=>"ridge"'/'ridge'/ (in single quotes, so it's a string; double quotes are for columns). And I also removed the superfluous values and the ANALIZE line, as I don't care that much. Easy peasy.

A comment on the options for shp2pgsql. -s 900913 declares the SRID of the database. I got that when I tried without and:

ERROR:  Geometry SRID (0) does not match column SRID (900913)

-S is needed because shp2pgsql by default generated MultiLineStrings, but that table in particular has a LineString way column. This is how I figure it out:

ERROR:  Geometry type (MultiLineString) does not match column type (LineString)

Incredibly, after this data massacre, it loads in the db:

$ psql gis < 430573542.sql
SET
SET
BEGIN
INSERT 0 1
COMMIT

Enjoy!

Tracing Python execution

Marcos Dione

2016-06-23 20:32

Today I stumbled upon PyCon 2016's youtube channel and started watching some of the talks. The first one I really finished watching was Ned Batchelder's "Machete debugging", a very interesting talk about 4 strange bugs and the 4 strange techniques they used to find where those bugs were produced. It's a wonderful talk, full of ideas that, if you're a mere mortal developer like me, will probably blow your mind.

One of the techniques they use for one of the bugs is to actually write a trace function. A trace function in cpython context is a function that is called in several different points of execution of Python code. For more information see sys.settrace()'s documentation.

In my case I used tracing for something that I always liked about bash: that you can ask it to print every line that's being executed (even in functions and subprocesses!). I wanted something similar for ayrton, so I sat down to figure out how this would work.

The key to all this is the function I mention up there. The API seems simple enough at first sight, but it's a little more complicated. You give this function what is called the global trace function. This function will be called with three parameters: a frame, an event and a event-dependent arg. The event I'm interested in is line, which is called for each new line of code that is executed. The complication comes because what this global trace function should return is a local trace function. This function will be called with the same parameters as the global trace function. I would really like an explanation why this is so.

The job for this function, in ayrton's case, is simple: inspect the frame, extract the filename and line number and print that. At first this seems to mean that I should read the files by myself, but luckily there's another interesting standard module: linecache to the rescue. The only 'real complication' of ayrton's use is that it would not work if the script to run was passed with the -c|--script option, but (un)luckily the execution engine already has to read the hold the script in lines, so using that as the cache instead of linecache was easy.

Finally, if you're interested in the actual code, go take a look. Just take in account that ayrton has 3 levels of tracing: à la bash (script lines prepended by +), with line numbers, and tracing any Python line execution, including any modules you might use and their dependencies. And don't forget that it also has 3 levels of debug logging into files. See ayrton --help!