Detailed OSM mapnik redering times

Marcos Dione

2014-04-17 19:38

In the last post I concluded that I will reimport Europe's data before applying a new version of my design. This would also mean that I will have to render everything from time to time. How much does this take? Let's answer that question.

Remember first that I didn't quite import all of Europe, just a rectangle with North UK, Portugal, Malta and Istambul as limits, so I'll just render that part. Then comes the question of how deep do it. OSM's wiki has some information about tile usage that shows that only up to zoom level (ZL) 11 all the tiles were viewed at least once, and that after ZL 15 the percentage drops below 10. I strike my balance at 14, where there's a one in three chance of the tile being seen. Then I render specific regions down to ZL 18, mostly my current immediate area of influence and places I will visit.

So with the same bbox as the import step, I fired the rendering process and left the machine mostly by itself. I say mostly because at the end, it's my main machine, and I kept using it for hacking, mail, browsing, chat, etc, but nothing very CPU, RAM or disk intensive. I modified generate_tiles.py so it measures the time spent on each tile and logged that into a file. Then I run another small python script on it to get the minimum, average, maximum and total times, how many tiles were rendered and the space they take¹. Here's a table and graph that resumes it all:

The scale for all the lines is logarithmic, so all those linear progressions there are actually exponential. You can also see the cut at zoom level 14, and that I ended rendering more or less as many tiles from ZLs 15 to 18 as for ZLs 9 to 13.

The first thing to notice is that the average curve is similar, but considerably lower, to the one I had in my minimal benchmark. Most important, and something I didn't think then, is the total time columns. I have 4 of them: 'total' is the total CPU time in seconds, then I have one for the same time in hours and and another in days. As I actually have 4 CPUs and I run generate_tiles.py on all of them, I have an extra column that show an approximative wall clock time. At the bottom row you can see totals for the times, tile count and space used.

Talking about space, There's another constraint there. See, I have a 64GiB Nokia N9, which is one of, if not the, intended purposes for this map. Those 64GiB must be shared mostly with music, so I don't get bored while driving (specially in the endemic traffic jams in this region). This means that having up to ZL 14 in my phone is unfeasible, but ZL 11 seems more reasonable. This means that I will also have to make sure I have what I consider important info at least at that ZL, so it also impacts the map design.

So, 12.6 days and 84GiB later, I have my map. Cutting down ZLs 12 to 14 should save me some 8 days of rendering, most probably 7. I could also try some of the rendering optimizations I have found so far (in no particular order):

https://github.com/mapnik/mapnik/wiki/OptimizeRenderingWithPostGIS

https://github.com/mapnik/mapnik/wiki/Postgis-async

http://mike.teczno.com/notes/mapnik.html

http://blog.cartodb.com/post/20163722809/speeding-up-tiles-rendering

I was also thinking on doing depth first rendering, so requests done for ZL N can be used to render ZL N+1 (or maybe the other way around). This seems like a non obvious path, so I will probably check if I have lots of time, which I currently don't.

Actually, the space used is calculated by du -sm *, so it's in MiB. ↩

OSM planet importing and rendering times

Marcos Dione

2014-03-24 12:39

For at least four months I've been trying to import the whole Europe in slim mode so it would allow updates. The computer is a Lenovo quad-core with 8GiB of RAM and initially 500GiB of disk. Last time I tried with the disk alone it took like 3 days to import just the data and more than a week passed before I got tired and canceled the index creation. That's the most expensive part of the import, and reading the data and writing the index on a seeking device is slow.

So I bought a 256GB SSD¹ and wanted to try again. I took 15GiB for the system and the rest to share between my files and postgres, but having the data files on the HDD. At first I tried importing the whole Europe, using 6GiB of cache; remember that my computer has 8GiB of RAM, I though it would fit. It actually didn't and was killed by the OOM killer. I had logs that showed osm2pgsql and the different postgres thread's memory usage, but somehow I lost them. If I find them I'll post them. I lowered the cache to 4GiB but it was still too big and the OOM killer was triggered again.

So I lowered the cache size to 2GiB, but then I was running out of disk space. I tried using osm2pgsql --bbox to import only from Iceland to somewhere between Κύπρος (Cyprus) and Κρήτη (Crete), so it includes Istambul and Sicilia, but is was still too big. So I started wondering about the sizes of OSM data. I ducked and googled around³ for them to no avail, but then jburgess, the tile server sysadmin, answered me the question on the IRC channel⁴ with these numbers:

 gis=# \d+
  Schema |        NAME        | TYPE  | OWNER |    SIZE    | Description
 --------+--------------------+-------+-------+------------+-------------
  public | geography_columns  | VIEW  | tile  | 0 bytes    |
  public | geometry_columns   | VIEW  | tile  | 0 bytes    |
  public | planet_osm_line    | TABLE | tile  | 44 GB      |
  public | planet_osm_nodes   | TABLE | tile  | 8192 bytes | *
  public | planet_osm_point   | TABLE | tile  | 4426 MB    |
  public | planet_osm_polygon | TABLE | tile  | 52 GB      |
  public | planet_osm_rels    | TABLE | tile  | 1546 MB    | *
  public | planet_osm_roads   | TABLE | tile  | 7035 MB    |
  public | planet_osm_ways    | TABLE | tile  | 59 GB      | *
  public | raster_columns     | VIEW  | tile  | 0 bytes    |
  public | raster_overviews   | VIEW  | tile  | 0 bytes    |
  public | spatial_ref_sys    | TABLE | tile  | 3216 kB    |

 gis=# \di+
  Schema |           NAME           | TYPE  | OWNER |       TABLE        |    SIZE    | Description
 --------+--------------------------+-------+-------+--------------------+------------+-------------
  public | ferry_idx                | INDEX | tile  | planet_osm_line    | 824 kB     |
  public | leisure_polygon_idx      | INDEX | tile  | planet_osm_polygon | 1437 MB    |
  public | national_park_idx        | INDEX | tile  | planet_osm_polygon | 1608 kB    |
  public | planet_osm_line_index    | INDEX | tile  | planet_osm_line    | 8937 MB    |
  public | planet_osm_line_pkey     | INDEX | tile  | planet_osm_line    | 2534 MB    |
  public | planet_osm_nodes_pkey    | INDEX | tile  | planet_osm_nodes   | 8192 bytes | *
  public | planet_osm_point_index   | INDEX | tile  | planet_osm_point   | 2565 MB    |
  public | planet_osm_point_pkey    | INDEX | tile  | planet_osm_point   | 1232 MB    |
  public | planet_osm_polygon_index | INDEX | tile  | planet_osm_polygon | 9295 MB    |
  public | planet_osm_polygon_pkey  | INDEX | tile  | planet_osm_polygon | 3473 MB    |
  public | planet_osm_rels_idx      | INDEX | tile  | planet_osm_rels    | 208 kB     | *
  public | planet_osm_rels_parts    | INDEX | tile  | planet_osm_rels    | 2837 MB    | *
  public | planet_osm_rels_pkey     | INDEX | tile  | planet_osm_rels    | 75 MB      | *
  public | planet_osm_roads_index   | INDEX | tile  | planet_osm_roads   | 1151 MB    |
  public | planet_osm_roads_pkey    | INDEX | tile  | planet_osm_roads   | 301 MB     |
  public | planet_osm_ways_idx      | INDEX | tile  | planet_osm_ways    | 2622 MB    | *
  public | planet_osm_ways_nodes    | INDEX | tile  | planet_osm_ways    | 112 GB     | *
  public | planet_osm_ways_pkey     | INDEX | tile  | planet_osm_ways    | 10 GB      | *
  public | spatial_ref_sys_pkey     | INDEX | tile  | spatial_ref_sys    | 144 kB     |
  public | water_areas_idx          | INDEX | tile  | planet_osm_polygon | 564 MB     |
  public | water_lines_idx          | INDEX | tile  | planet_osm_line    | 38 MB      |

[*] These are the intermediate tables and their indexes

So, around 167GiB of data and around 158GiB of indexes, of which 60GiB and 127GiB are intermediate, respectively. These intermediate tables and indexes are used later during the updates. Clearly I couldn't import the whole planet, but surely Europe should fit in ~210GiB? planet.pbf weights 24063MiB and europe.pbf scales at 12251MiB, so little bit more than 50%. It should fit, but somehow it doesn't.

Having no more free space, I decided to both create a new tablespace in the HDD and put the data tables there and the rest in the SSD, and to still reduce the north limit to the British islands, cutting out Iceland and a good part of Scandinavia. osm2pgsql supports the former with its --tablespace-main-data option. This is a summary of the successful import, with human readable times between brackets added by me:

$ mdione@diablo:~/src/projects/osm/data/osm$ osm2pgsql --create --database gis --slim --cache 2048 --number-processes 4 --unlogged --tablespace-main-data hdd --bbox -11.9531,34.6694,29.8828,58.8819 europe-latest.osm.pbf
Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: pgsql, scale=100 cache=2048

Reading in file: europe-latest.osm.pbf
Processing: Node(990001k 263.4k/s) Way(139244k 11.37k/s) Relation(1749200 217.43/s)  parse time: 24045s [~6h40]

Node stats: total(990001600), max(2700585940) in 3758s [~1h03]
Way stats: total(139244632), max(264372509) in 12242s [~3h24]
Relation stats: total(1749204), max(3552177) in 8045s [~2h14]

Going over pending ways...
        100666720 ways are pending

Using 4 helper-processes
100666720 Pending ways took 21396s [~5h57] at a rate of 4704.93/s

node cache: stored: 197941325(19.99%), storage efficiency: 73.74% (dense blocks: 132007, sparse nodes: 66630145), hit rate: 20.02%
Stopped table: planet_osm_nodes in 1s
Stopped table: planet_osm_rels in 44s
All indexes on  planet_osm_point created  in 4006s [~1h07]
All indexes on  planet_osm_roads created  in 5894s [~1h38]
All indexes on  planet_osm_line created  in 11834s [~3h17]
All indexes on  planet_osm_polygon created  in 14862s [~4h07]
Stopped table: planet_osm_ways in 26122s [~7h15]

Osm2pgsql took 72172s overall [~20h24]

So, ~20h24 of import time, of which ~6h40 is fro the intermediate data, which went into the SSD, almost 6h importing the real data, which went into the HDD, and the rest indexing, which went again into the SSD. This is the final disk usage:

 Schema |        Name        | Type  | Owner  |  Size    | Description
--------+--------------------+-------+--------+----------+-------------
 public | geography_columns  | view  | mdione |  0 bytes |
 public | geometry_columns   | view  | mdione |  0 bytes |
 public | planet_osm_line    | table | mdione | 11264 MB | **
 public | planet_osm_nodes   | table | mdione | 43008 MB |
 public | planet_osm_point   | table | mdione |  2181 MB | **
 public | planet_osm_polygon | table | mdione | 23552 MB | **
 public | planet_osm_rels    | table | mdione |   871 MB |
 public | planet_osm_roads   | table | mdione |  2129 MB | **
 public | planet_osm_ways    | table | mdione | 43008 MB |
 public | raster_columns     | view  | mdione |  0 bytes |
 public | raster_overviews   | view  | mdione |  0 bytes |
 public | spatial_ref_sys    | table | mdione |     3 MB |
--------+--------------------+-------+--------+----------+--------------
total                                          126016 MB   39126 MB

 Schema |            Name             | Type  | Owner  |       Table        |  Size    | Description
--------+-----------------------------+-------+--------+--------------------+----------+-------------
 public | planet_osm_line_index       | index | mdione | planet_osm_line    |  4105 MB |
 public | planet_osm_line_pkey        | index | mdione | planet_osm_line    |   748 MB |
 public | planet_osm_nodes_pkey       | index | mdione | planet_osm_nodes   | 21504 MB |
 public | planet_osm_point_index      | index | mdione | planet_osm_point   |  1506 MB |
 public | planet_osm_point_pkey       | index | mdione | planet_osm_point   |   566 MB |
 public | planet_osm_point_population | index | mdione | planet_osm_point   |   566 MB |
 public | planet_osm_polygon_index    | index | mdione | planet_osm_polygon |  8074 MB |
 public | planet_osm_polygon_pkey     | index | mdione | planet_osm_polygon |  1953 MB |
 public | planet_osm_rels_idx         | index | mdione | planet_osm_rels    |    16 kB | *
 public | planet_osm_rels_parts       | index | mdione | planet_osm_rels    |   671 MB |
 public | planet_osm_rels_pkey        | index | mdione | planet_osm_rels    |    37 MB |
 public | planet_osm_roads_index      | index | mdione | planet_osm_roads   |   359 MB |
 public | planet_osm_roads_pkey       | index | mdione | planet_osm_roads   |    77 MB |
 public | planet_osm_ways_idx         | index | mdione | planet_osm_ways    |  2161 MB |
 public | planet_osm_ways_nodes       | index | mdione | planet_osm_ways    | 53248 MB |
 public | planet_osm_ways_pkey        | index | mdione | planet_osm_ways    |  6926 MB |
 public | spatial_ref_sys_pkey        | index | mdione | spatial_ref_sys    |   144 kB | *
--------+-----------------------------+-------+--------+--------------------+-----------+
total                                                                        102501 MB

[*] Too small, not counted
[**] In tablespace 'hdd', which is in the HDD.

That's a total of 228517MiB for this partial Europe import, of which 171434MiB are for the intermediate data. It's slightly more than I have to spare in the SSD, so I should cut still more data off if I wanted to import everything in the SSD. Then I tried to render with this, but it was awfully slow.

Luckily, when jburgess answered with the sizes, he also suggested to use flat nodes. This is an option for osm2pgsql which uses a special formatted file to store the intermediate data instead of postgres tables. According to the manpage, is faster for the import and the successive updates, and uses only about 16GiB of disk space, which is around a 10% of what my import used for the intermediate data but «[t]his mode is only recommended for full planet imports as it doesn't work well with small extracts.». I tried anyways.

So I used that option to create the flat node cache on the SSD and put all the data and indexes there too. Here's the summary:

mdione@diablo:~/src/projects/osm/data/osm$ osm2pgsql --create --drop --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache --cache 2048 --number-processes 4 --unlogged --bbox -11.9531,34.6694,29.8828,58.8819 europe-latest.osm.pbf
Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: pgsql, scale=100 cache=2048

Reading in file: europe-latest.osm.pbf
Processing: Node(990001k 914.1k/s) Way(139244k 17.64k/s) Relation(1749200 344.60/s)  parse time: 14052s [~3h54]

Node stats: total(990001600), max(2700585940) in 1083s [~0h18]
Way stats: total(139244632), max(264372509) in 7893s [~2h11]
Relation stats: total(1749204), max(3552177) in 5076s [~1h24]

Going over pending ways...
        100666720 ways are pending

Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
100666720 Pending ways took 29143s [~8h05] at a rate of 3454.23/s

node cache: stored: 197941325(19.99%), storage efficiency: 73.74% (dense blocks: 132007, sparse nodes: 66630145), hit rate: 18.98%
Stopped table: planet_osm_nodes in 0s
Stopped table: planet_osm_rels in 0s
All indexes on  planet_osm_roads created  in 1023s [~0h17]
All indexes on  planet_osm_point created  in 1974s [~0h33]
All indexes on  planet_osm_line created  in 4354s [~1h12]
All indexes on  planet_osm_polygon created  in 6777s [~1h52]
Stopped table: planet_osm_ways in 2s

Osm2pgsql took 50092s overall [~13h54]

So we went from 20h24 down to 13h54 for the whole operation, from 6h40 down to 3h54 for the intermediate data, from 5h57 up to 8h05 for the real data, and a lot less time from the indexing, like a third for each real data table and from 7h15 all the way down to 0 for the intermediate data. So even if the real data processing time went up more than 2h more, the whole import time is only ~68%, uses less space, and it fits in my SSD, with a lot of space to spare. For reference, the file nodes.cache uses only 20608MiB of disk space, which is ~12% of the space used by the intermediate postgres tables.

So, now, what about rendering time? This question is not easy to answer. I set up a very rough benchmark, which consists in rendering only one tile for each zoom level in a small town chosen without any particular criteria⁶.

I used Tilemill to export my modified version of openstreetmap-carto to a Mapnik XML file, and used a modified generate_tiles.py to measure the rendering times. This is the resulting logarithmic graph:

Notice how the render time increases exponentially (it looks linear in the graph) between zoom levels 0 and 5, and then the big peaks (up to 720s!) for zoom levels 6 to 8. This is definitely worse than the render times I used to have when I imported several countries, but that data never got to the size of this import.

Of course, next weekend I'll fire a full render for the imported region between zoom levels 0 to 14, and then I'll have better numbers to share.

Meanwhile, when I tried to update the data, it failed:

mdione@diablo:~/src/projects/osm/data/osm$ osm2pgsql --append --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache --cache 2048 --number-processes 4 362.osc.gz
osm2pgsql SVN version 0.82.0 (64bit id space)

Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
Maximum node in persistent node cache: 2701131775
Mid: pgsql, scale=100 cache=2048
Setting up table: planet_osm_nodes
PREPARE insert_node (int8, int4, int4, text[]) AS INSERT INTO planet_osm_nodes VALUES ($1,$2,$3,$4);
PREPARE get_node (int8) AS SELECT lat,lon,tags FROM planet_osm_nodes WHERE id = $1 LIMIT 1;
PREPARE delete_node (int8) AS DELETE FROM planet_osm_nodes WHERE id = $1;
 failed: ERROR:  relation "planet_osm_nodes" does not exist
LINE 1: ...rt_node (int8, int4, int4, text[]) AS INSERT INTO planet_osm...
                                                             ^
Error occurred, cleaning up

Somehow it's trying to use a table that was not created because the intermediate data is in the flat nodes file. I will have to investigate this; I'll try to do it this week.

Actually, the disk is sold as '250GB', which is still using the same units as HDDs, so it means that it's only (and the kernel confirms this) 232 GiB²:. ↩
I can't come up for a reason is not 256 GiB, it seems more difficult to fabricate memory in not-power-of-2 sizes. ↩
I found that the verb for using duckduckgo for searching the Internet is «duck». ↩
I have no idea how many times I join #osm in Freenode just to end up asking OSM question in the #joomla channel. ↩
Actually postgres shows the numbers in «human readable sizes», which means that any size above 10240MiB was shown in GiB. I just multiplied those by 1024 to have a rough MiB value. ↩
Some day with more time I'll learn how to use the OSM plugin for ikiwiki. ↩

Wrapping return by reference functions with SWIG

Marcos Dione

2014-03-14 08:44

At MegaCorp I'm developing in C++, but the APIs are in C. From time to time, we use Python for scripting something using those APIs. We write the bindings in SWIG.

All the functions in the APIs return a status code; it can be OK or many different values for different type of errors. This means that if a function needs to return a value, it actually takes a pointer (not a reference)¹ to such a value and at the end of the function, depending on its success, the value is returned at that address.

Then came the question of wrapping such functions with SWIG. According to the scarce documentation about it one should declare the type (+name if needed) as argout and not much more... except for the first part of the example. See, what numinputs=0 is doing is to make SWIG ignore that type as argument², meaning that the wrapper function won't be expecting it. Even so, the body does something, and a very important something. Let's see in my example:

// ignore any input parameter of type PageSetHandle_t *
%typemap(in, numinputs=0) PageSetHandle_t *pPageSetHandle (PageSetHandle_t temp) {
    $1= &temp;
}

// on the other hand, this is how we convert a PageSetHandle_t* to a PyCObject
// and add it to the list of returned values
%typemap(argout) PageSetHandle_t * {
    $result= SWIG_Python_AppendOutput ($result, PyCObject_FromVoidPtr (*$1, NULL));
}

In my case the type is PageSetHandle_t. The fact that behind is an actual void * is not important³, but that messed with my head while trying to understand the problem. What the first part is saying is «look, ignore any parameter of type PageSetHandle_t *, but when you find one, create a temporary variable of the dereferenced type (PageSetHandle_t, in this case), and assign a reference to it to the parameter you're going to pass to the C function». This is extremely important; otherwise, you wouldn't have an address where to store the reference returned by the function. Notice that that temporary variable will be declared in the wrapper's stack and will die when the wrapper finishes, but that's OK because of what we do in the actual conversion code: we create a PyCObject from the dereferencing of the parameter. Here's part of the generated code:

SWIGINTERN PyObject *_wrap_MFaST_PMCreatePageSet(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PageSetHandle_t *arg3 = (PageSetHandle_t *) 0 ;
  PageSetHandle_t temp3 ;

  {
    arg3= &temp3;
  }
  result = MFaST_PMCreatePageSet(arg1,arg2,arg3);
  resultobj = SWIG_NewPointerObj((new FqStatusCode_t(static_cast< const FqStatusCode_t& >(result))), SWIGTYPE_p__FqStatusCode_, SWIG_POINTER_OWN |  0 );
  {
    resultobj= SWIG_Python_AppendOutput (resultobj, PyCObject_FromVoidPtr (*arg3, NULL));
  }
  return resultobj;
}

This means that now I can call pm.MFaST_PMCreatePageSet() with only two parameters instead of three, and that both the status and the returned value will be returned in a tuple. Notice again the variables generated by SWIG and how they interact in the body of the typemap(in) and specially in the typemap(argout), where you can't reference the temporary variable, which means you can only dereference the argument (*$1) instead of trying to use the tempvar (temp).

That is, the parameter is declared as int *foo and not int &foo. I think this is a mistake. ↩
I understand what it means now that I saw it in effect. At the beginning it was not clear, mostly because the (int temp) part and the fact that it has code associated. ↩
Except for the part that I will use PyObject_FromVoidPtr() and PyObject_AsVoidPtr() to create Pyhton objects with such values. ↩

Elevation

Marcos Dione

2014-02-05 01:59

Another long time without mentioning any advancements on my map making efforts. While not much has changed, what has changed is a big step to easier customization.

In the last post in this series I gave a quick list on how I make my own maps:

Use SRTM elevation to generate a hypsometric base map, hill and slope shading, and finally hypsometric curves. For the latter I'm using gdal-contour and shp2pgsql.
Use any extract you want and import them with osm2pgsql. Small extracts import quite quick, but so far, I have never succeeded to import a big part of Europe (from Portugal to south Finland, so it covers UK/Ireland and Istambul) in a machine with 8GiB of RAM and hard disks. The recommendation is to use a machine with lots of RAM (16+, I heard up to 256 for the whole planet) and SSDs.
Use TileMill for the initial design.
Do a lot of xml manipulation to try to customize OSM's mapnik style based on your design.
Use generate_tiles.py to, well, generate the tiles.

But since August 2013 things have changed in the 3rd and 4th items in that list. Andy Allan had finished a first big iteration of redoing OSM's mapnik style in CartoCSS. The latter is a CSS-like language that is the native way to style things in TileMill. Since then, customizing is way much easier, not only because of the Cascading part of CSS, but also because Andy took the time to make a lot of things (widths, colors) to things similar to C/C++'s #defines, which you can override and impact anywhere where the definition is used.

So now, steps 3 and 4 are more like:

Use TileMill to change OSM's initial design¹.
Export the design to a mapnik XML file.

In fact, that last step could be avoided, given that TileMill can also render everything by himself.

The last step in having your own tiles to be able to use them. I use Marble in my phone, but I also setup a slippy map so I can brag about it. Unluckily I can't embed the map here (I should fix this).

The tiles served actually come from different rendering passes using different, incremental designs. The previous-to-last one can be seen in Deutschland's region; I will be rendering some parts of London with a newer design before the end of the month.

You can use any text editor to change the CartoCSS files; TileMill will pick them up via inotify and rerender accordingly. The only problem is when you're editing multiple files to impact a zoom level that takes a lot to render (for me is between 6 and 8). ↩

ayrton 0.4.1

Marcos Dione

2014-01-15 01:48

Last night I released ayrton v0.4 and v0.4.1. Here's the combined changelog:

Fixed the release.ay script that prematurely released v0.4.
>= can redirect stderr to stdout.
o(option=argument) can be used to declare keyword params among/before positional ones.
bash() now returns a single string if there is only one result.
Slightly better error reporting: don't print a part of the stacktrace that belongs to ayrton itself. There is still more to do.
No longer depends on sh.

It's not too long, but that last item and a 2 week vacation made this a quite long release time: about 3 months. Getting rid of sh means that the capture of a command's output is no longer buffered, and that we can launch all kinds of TUI programs, notably editors and such.

Still, it's quite far from being stable enough; I still have to think about how the remote() API should be; really implement pipes between two and three commands; think if ayrton should, and how to, autodetect what the user wants to do with the executable (does he want the output? foreground, background? what if there's a python function in between?); and implement several things I noted in the project's issue page.

All in all, the project goes forward, not at the pace I wanted, but now I have better control on what's going on and this simplifies its development. From now on, releases should be shorter (knock on wood).

Cars also have bugs

Marcos Dione

2014-01-06 12:33

In the last two weeks I took a couple of friends and my car to a trip around Deutschland and then some more. When I do this kind of road trips, the last thing I do each night is to take a picture of the dashboard with the trip counter showing the accumulated kilometers. Last year we did a slightly shorter one all the way to Praha and back, and on the last day I noticed that the counter seemed to reset a little bit beforehand. See, the trip counter only shows three integer digits and one decimal digit, so once it reaches 1000 km (1mm! Just not the small one :), it shows 000.0 again. So in this trip to Praha, while coming back, the counter reseted back to 0, so at the end of the trip I was not sure how many kilometers I did, only that it was around 3.5k km.

This time we were in the way between Hamburg and Köln when it happened again. This time I was sure we were around 3.2k km when it happened, and suddenly it struck me. But let me tell you how I was sure first.

In the first night, the numbers read 207780 for the total counter and 752.2 for the trip, which makes the start at around 207027.8. The last night they read 212048 and 743.3. Subtracting this last total counter to the belated start value makes some 5020.2 km!

Now, let's go back to the strange resetting problem. It's around the 3200.0 km mark, and the counter is digital. Digital counters need bits to count, and the amount of bits available determine the maximum number these bits can count. Also, this counter has a decimal place for counting kilometers... but what if it actually counted hundreds of meters (hectometers) and the display is just a representation? That would make the reset at around 32000 hectometers, hmm...

I just wonder why the technicians at BMW decided 20 years ago (my car is old) to use what looks like a signed int for this. With a two byte signed int a counter can go as up as 32767. If you're counting hectometers, that makes 3276.7 km, which seems to match the resetting point. Then, if we add the number from the last night, we get 4020.0. I knew that after Köln the digits shown wrapped once around 1000 km, so we end up with 5020.0 for the real trip. It's a suspiciously round number, but it's only a 1 in 10 probability.

So, cars can have bugs too. This time is just annoying bug for those us crazy people who make trips longer than 3.2k km, but with the tendency of making cars more and more dependent on computers (self-driving cars are the most clear cases), we have to be aware that worse things can happen. But then, we already have self-flying planes, which can even land mostly by themselves.

Restoring file times

Marcos Dione

2013-12-03 22:55

Do you remember this? Well, now I have a similar but quite different case: I want to restore the files' times. Why? Read me out.

Since a long time I'm doing backups to an external drive. For that I simply use rsync. When I bought my second laptop, I took the chance to test restoring from that backup, so I just did a last backup in the old machine and a restore in the new one. This allowed me to add the last few things that were missing. After a few more weeks finding missing things, doing more backup/restore cycles between the old and new laptops, I was happy.

But there was something that I didn't realize till a few months later. The file times were not backed up, and clearly not restored. This didn't affect me in any way, at least until I tried to a new post to this glob.

See, this glob is generated with ikiwiki. ikiwiki uses the files' mtime to generate the posts' times, and uses that to sort them and to assign them as belonging to some year or month. With the mtimes lost, ikiwiki was assigning all the older posts to some day in Oct, 2012, which was the date of the last backup/restore cycle.

Frustrated (yes, sometimes I have a low threshold), I left it like that: no planet was flooded because in any case the rss feeds didn't really change, and nobody would notice and complain about it. In fact, nobody did.¹ Have in mid that as I kept my old laptop as a server, and that I didn't clean up all the files I keep in my backup, I still had the original mtimes in the original files.

Yesterday, playing around with unrelated things, I came across the post I mentioned at the beginning of this one, and had the urge to really fix that mess. Today I started looking at it, and found that this command was some part of it:

find src/projects/glob/work/ -name *.mdwn | xargs stat --format '%X %Y %Z %n'

but processing its output in bash seemed like a burden. Luckily, I have the solution for that, a project that I started for exactly that same reason: bash sucks at manipulating data.

So I grabbed my editor, punched keys for some 7 minutes, run 3 or 4 tests, making sure that everything would go smooth, run once more on the live system, rebuilt my glob² and voilà! It's fixed now.

How long is the code? Not much really:

from os import utime

with remote ('192.168.0.42', allow_agent=False, password='XXXXXXXX') as fds:
    # find al files (not only posts), get its times and path
    with cd ('src/projects/glob/work'):
        find ('.', '-name', '*.mdwn') | xargs ('stat', format='%X %Y %Z %n')

(i, o, e)= fds

for line in e.readlines ():
    print (line.decode ('utf-8'), end='')

for line in o.readlines ():
    data= line.split ()
    access, modif, change= [ int (x) for x in data[:3] ]
    file_path= data[3].decode ('utf-8')
    if _e (file_path):
        utime (file_path, times= (access, modif))
    else:
        print ("%s not found!" % file_path)

Notice that this example is even more complex than initially thought: it ssh's into the server to fetch the original times.

This makes one thing clear: I have to improve a lot the remote() API. For instance, so far I have no way to know if the remote commands executed fine or not, and the way to read the lines is still ugly and not homogeneous with how other ayrton functions/executables work.

Also note that this does not restore the ctimes, as I hadn't found a way to do it, but didn't really did a lot of effort. I don't really need them.

Of course, I also fixed my backup and restore scripts :)

I was this >< close of adding #foreveralone to that sentence. I think I'll resist for a couple more years. ↩
Don't forget, ikiwiki compiles static sites. ↩

Ssl3_Get_Record wrong version number

Marcos Dione

2013-12-02 09:19

Since I work in MegaCorp I found a new level of security policy; for me this is a low point: I not only can't ssh home; since I changed my DynDNS provider from dym.com to afraid.org, I can't even access my webserver because the proxy denies me access citing: Your request was denied because of its content categorization: "Dynamic DNS Host;Suspicious". So I can access lots of questionable contents using Google cache but not my own photos, fine.

At the beginning, the classical trick of making the ssh server to listen in the port 443 worked fine, but at some point Network Operations managed to close that hole. This change was not communicated, so it's not clear that it was completely on purpose. I once asked for the Network Usage Policy, if it exists, but the unofficial answer was on the lines of «I'm not sure you really want to ask».

So, I managed to pierce the firewall again with a further trick: wrapping ssh traffic in a SSL connection. This makes the traffic look like regular https traffic (remember, the s stands for SSL/TLS) but it encrypts the traffic twice.

Everything was smooth again, until the server crashed due to a lack of power. After I powered it on again, I found that I couldn't connect anymore. This morning I decided to take a couple of minutes to figure out why. The ssh client tells me this:

$ ssh -v -o 'ProxyCommand openssl s_client -connect %h:%p -quiet 2>/dev/null' -p 443 foo.afraid.org
OpenSSH_5.1p1 Debian-5, OpenSSL 0.9.8g 19 Oct 2007
debug1: Reading configuration data /home/user/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Executing proxy command: exec openssl s_client -connect foo.afraid.org:443 -quiet 2>/dev/null
debug1: permanently_drop_suid: 1004
debug1: identity file /home/user/.ssh/identity type -1
debug1: identity file /home/user/.ssh/id_rsa type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-1024
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-1024
debug1: identity file /home/user/.ssh/id_dsa type -1
ssh_exchange_identification: Connection closed by remote host

Not much info, really. Form the server side I have this:

SSL_accept: 1408F10B: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket

Nice, a cryptic error message, at least for me. Strange enough, openssl by itself manages to connect alright:

$ openssl s_client -connect foo.afraid.org:443 -ssl3
CONNECTED(00000003)
[...]
SSH-2.0-OpenSSH_6.2p2 Debian-6

That's the ssh server saying hi. After some DDG'ing¹ I find this post in serverfault. The first answer itself is not very helpful, but the second one is actually the OP saying how he solved it. It's telling stunnel to accept any version of SSL client while telling SSL to ignore SSLv2. I don't understand how it fixes it, but it works, yay!

DuckDuckGo is getting better by the day. ↩

Breaking bad

Marcos Dione

2013-10-17 13:39

For the last two weeks I've been trying to slightly change Python's syntax. The goal was to allow a mix of keyword and normal parameters in a function, such as:

f (a=42, b)

I know that this sounds crazy, but (I think) I have a good reason for it:

in ayrton some functions act as frontends for executables;
keyword parameters for these functions are converted to options (a=42 gets converted to -a 42, foo=27 to --foo 27);
there is an arbitrary number of commands that consider their arguments as positional, like requiring that all the options come before the arguments; notably, rsync.

So, how to accomplish this? Well, there are good news and bad news... and plain news. The good news is that the lexer allows this kind of constructions. The plain news is that keywords-after-arguments is enforced in the syntax checker, which is implemented in Python/ast.c. This code ends up in the libpython. The bad news is that because all the functions defined in this file are declared as static, which means, among other things, that they're not exported by the library they're in. So, even if I only want to overwrite ast_for_call(), I still have to copy over all the rest of the functions. Finally, I already mentioned a slightly bad news, which is that I will have to make my own interpreter so this change applies, which implies also copying the Python interpreter's main function.

All that aside, how to make this syntax Python-compatible again? Well, the idea is simple: if a keyword is found before an argument, I convert it into a tuple with a string with the name of the parameter and its value. Thus, a=42 becomes ('a', 42) and then the frontend function resolves that to -a 42.

Between the plain and good news camps, the patch is rather simple:

```diff diff -r d047928ae3f6 Python/ast.c +++ b/Python/ast.c Thu Oct 17 12:37:17 2013 +0200 @@ -2408,7 +2408,7 @@ argument: [test '='] (test) [comp_for] # Really [keyword '='] test */

 -    int i, nargs, nkeywords, ngens;
 +    int i, nargs, nkeywords, ngens, convert_keywords;
 asdl_seq *args;
 asdl_seq *keywords;
 expr_ty vararg = NULL, kwarg = NULL;
 @@ -2418,15 +2418,20 @@
 nargs = 0;
 nkeywords = 0;
 ngens = 0;
 +    convert_keywords= 0;
 for (i = 0; i < NCH(n); i++) {
     node *ch = CHILD(n, i);
     if (TYPE(ch) == argument) {
 -            if (NCH(ch) == 1)
 +            if (NCH(ch) == 1) {
             nargs++;
 -            else if (TYPE(CHILD(ch, 1)) == comp_for)
 +                if (nkeywords) {
 +                    convert_keywords= 1;
 +                }
 +            } else if (TYPE(CHILD(ch, 1)) == comp_for)
             ngens++;
 -            else
 +            else {
             nkeywords++;
 +            }
     }
 }
 if (ngens > 1 || (ngens && (nargs || nkeywords))) {
 @@ -2440,6 +2445,11 @@
     return NULL;
 }

 +    if (convert_keywords) {
 +        nargs+= nkeywords;
 +        nkeywords= 0;
 +    }
 +
 args = asdl_seq_new(nargs + ngens, c->c_arena);
 if (!args)
     return NULL;
 @@ -2451,13 +2461,15 @@
 for (i = 0; i < NCH(n); i++) {
     node *ch = CHILD(n, i);
     if (TYPE(ch) == argument) {
 -            expr_ty e;
 +            expr_ty e, e1;
         if (NCH(ch) == 1) {
 +                /*
             if (nkeywords) {
                 ast_error(c, CHILD(ch, 0),
                           "non-keyword arg after keyword arg");
                 return NULL;
             }
 +                */
             if (vararg) {
                 ast_error(c, CHILD(ch, 0),
                           "only named arguments may follow *expression");
 @@ -2478,26 +2489,27 @@
             keyword_ty kw;
             identifier key, tmp;
             int k;
 +                asdl_seq *t;

             /* CHILD(ch, 0) is test, but must be an identifier? */
 -                e = ast_for_expr(c, CHILD(ch, 0));
 -                if (!e)
 +                e1 = ast_for_expr(c, CHILD(ch, 0));
 +                if (!e1)
                 return NULL;
             /* f(lambda x: x[0] = 3) ends up getting parsed with
              * LHS test = lambda x: x[0], and RHS test = 3.
              * SF bug 132313 points out that complaining about a keyword
              * then is very confusing.
              */
 -                if (e->kind == Lambda_kind) {
 +                if (e1->kind == Lambda_kind) {
                 ast_error(c, CHILD(ch, 0), "lambda cannot contain assignment");
                 return NULL;
 -                } else if (e->kind != Name_kind) {
 +                } else if (e1->kind != Name_kind) {
                 ast_error(c, CHILD(ch, 0), "keyword can't be an expression");
                 return NULL;
 -                } else if (forbidden_name(c, e->v.Name.id, ch, 1)) {
 +                } else if (forbidden_name(c, e1->v.Name.id, ch, 1)) {
                 return NULL;
             }
 -                key = e->v.Name.id;
 +                key = e1->v.Name.id;
             for (k = 0; k < nkeywords; k++) {
                 tmp = ((keyword_ty)asdl_seq_GET(keywords, k))->arg;
                 if (!PyUnicode_Compare(tmp, key)) {
 @@ -2508,10 +2520,21 @@
             e = ast_for_expr(c, CHILD(ch, 2));
             if (!e)
                 return NULL;
 -                kw = keyword(key, e, c->c_arena);
 -                if (!kw)
 -                    return NULL;
 -                asdl_seq_SET(keywords, nkeywords++, kw);
 +                if (!convert_keywords) {
 +                    kw = keyword(key, e, c->c_arena);
 +                    if (!kw)
 +                        return NULL;
 +                    asdl_seq_SET(keywords, nkeywords++, kw);
 +                } else {
 +                    /* build a tuple ('name', expr) */
 +                    t = asdl_seq_new(2, c->c_arena);
 +                    if (!t)
 +                        return NULL;
 +                    asdl_seq_SET(t, 0, Str (key, e1->lineno, e1->col_offset, c->c_arena));
 +                    asdl_seq_SET(t, 1, e);
 +                    /* ... and out it as an argument */
 +                    asdl_seq_SET(args, nargs++, Tuple (t, Load, e1->lineno, e1->col_offset, c->c_arena));
 +                }
         }
     }
     else if (TYPE(ch) == STAR) {
 """]]

 Don't you hate when you have to modify a line just to add braces?

 But of course, this break in the syntax should only be allowed for the frontend
 functions. This check is not implemented yet, so technically the new syntax is
 valid also for the Python code `ayrton` runs. This is not ideal, but hey, this
 is still a WIP :) For the moment this line of development will stay out of the
 main line. I will integrate it when I'm really convinced that it's a good idea.

ayrton: an update

Marcos Dione

2013-10-03 14:37

«So, by simply copying that one and ast.py, modifying the latter so it uses my modified syntax checker, and modifying the setup.py file, I should be done». Famous last words.

I was completely wrong. The _ast module imported by ast.py is not implemented in Python/ast.c but in Python/Python-ast.c. Understanding this, I reread this post I linked to last time and finally¹ understood that changing the syntax would not be so easy. Another eye-opener was the stack trace I got when ast.c's ast_for_call() (the function I'm planning to change) is called when a function call is found in the source. Here, have a look:

#0  ast_for_call (c=0x7fffffffa300, n=0x7ffff700d918, func=0x9ccf40) at Python/ast.c:2470
#1  0x00000000005bfe73 in ast_for_trailer (c=0x7fffffffa300, n=0x7ffff700db98, left_expr=0x9ccf40) at Python/ast.c:2105
#2  0x00000000005c03c0 in ast_for_power (c=0x7fffffffa300, n=0x7ffff6ca4238) at Python/ast.c:2217
#3  0x00000000005c0b01 in ast_for_expr (c=0x7fffffffa300, n=0x7ffff6ca4238) at Python/ast.c:2406
#4  0x00000000005c1108 in ast_for_testlist (c=0x7fffffffa300, n=0x7ffff6ca4030) at Python/ast.c:2562
#5  0x00000000005c118e in ast_for_expr_stmt (c=0x7fffffffa300, n=0x7ffff6facfa8) at Python/ast.c:2584
#6  0x00000000005c3e84 in ast_for_stmt (c=0x7fffffffa300, n=0x7ffff6facfa8) at Python/ast.c:3586
#7  0x00000000005bc94c in PyAST_FromNodeObject (n=0x7ffff6face18, flags=0x7fffffffa4e0, filename=0x7ffff6d56370, arena=0x8c8780) at Python/ast.c:709
#8  0x000000000050c9e8 in PyParser_ASTFromStringObject (s=0x7ffff6ca37a0 "foo (a=3, b)", filename=0x7ffff6d56370, start=257, flags=0x7fffffffa4e0, arena=0x8c8780) at Python/pythonrun.c:2160
#9  0x000000000050c6be in Py_CompileStringObject (str=0x7ffff6ca37a0 "foo (a=3, b)", filename=0x7ffff6d56370, start=257, flags=0x7fffffffa4e0, optimize=-1) at Python/pythonrun.c:2068
#10 0x00000000004c6e54 in builtin_compile (self=0x7ffff70f9c28, args=0x7ffff6ca04f8, kwds=0x0) at Python/bltinmodule.c:671

The compile() builtin (#10) passes through a couple of obscure functions till it reaches the AST build/syntax check code (all the ast_for_*() functions), and we quickly reach it from ast.parse():

def parse(source, filename='<unknown>', mode='exec'):
    return compile(source, filename, mode, PyCF_ONLY_AST)

In a desperate move, I though of hacking a modified version of just that function and do some LD_PRELOAD jedi moves, but the Python interpreter binary (/usr/bin/python3) is statically linked against libpython3. Future versions² of the Python interpreter main() function³will be really simple:

/* Minimal main program -- everything is loaded from the library */

#include "Python.h"

#ifdef __FreeBSD__
#include <floatingpoint.h>
#endif

int
main(int argc, char **argv)
{
        /* 754 requires that FP exceptions run in "no stop" mode by default,
         * and until C vendors implement C99's ways to control FP exceptions,
         * Python requires non-stop mode.  Alas, some platforms enable FP
         * exceptions by default.  Here we disable them.
         */
#ifdef __FreeBSD__
        fp_except_t m;

        m = fpgetmask();
        fpsetmask(m & ~FP_X_OFL);
#endif
        return Py_Main(argc, argv);
}

So it's not so crazy to think of creating a new interpreter that uses my version of ast_for_call(), but I haven't found the linking command that makes that possible. So this part of the project is in the freezer.

Meanwhile, I spent a couple of hours with the code and managed to implement piping and redirection, even with some tests! You can have an idea of how it works reading the README⁴. Also, expect a release soon.

As for replacing sh, no efforts (besides burning some glucose and firing some neurons) have been done.

Also famous last words. ↩
Python-3.3 has a more complex version which has to handle making two copies of argv while dealing with the system's encoding. ↩
It's weird, the source code of the executable resides in Modules/python.c. Go figure. ↩
Some day I'll convert that file to reST, make it the tutorial, and have it with the rest of the documentation. ↩