Block sizes from OSM data
My city has big blocks. One close to mine is around 6km of circumference. I wanted to make a map that show the sizes of the blocks. What I have is a rendering DB for my area. Let's see what I can do.
An obvious solution would be to use any routing database, which would convert squiggly segments into edges of a graph. I feebly tried that, but no routing system I know is packaged for Debian, and one of them asked me to compile it. I could compile it, but I was lazy that way.
So the plan is this: take the whole street network1 and recursively remove the dead end streets. The recursion part is because there are areas where there are whole branches of dead end streets, so if in one pass a street A might look not dead because it connects to another B, but B is removed, the next pass should remove A too. So, for each street, take each end, and see how many other segments it's connected to; if 0, it's a dead end and we should remove it. Keep going until no new streets were removed.
One problem is how the data is represented. In a rendering database, streets do not exist, just segments with a consistent tagging, so streets can be split if, for instance, max speed changes, or there's a bridge. But one thing they're not split on is where another street joins them (which would be reflected in a routing db, lazy me! :)
This does not matter when searching for connected streets to a point, but a segment might have parts that are a dead end, and parts that are not, because they're connected to other streets(segments) that are connected too. So now, for each end, take the point, and if it's not connected, removed the point and try again. The full algo:
- Find all the segments
- While no segment has been removed
- For each segment
- For each end
- While the end is not connected
- Remove the end from the segment
- If not enough nodes left, remove the segment
- While the end is not connected
- For each end
- For each segment
Looks like an infinite problem! The city I'm interested in has 16k segments, and this looks worse than O(N²)! But we can do some shortcuts. One of the expensive operations is "the end is connected" or "find all the other segments that include this point". Luckily, a rendering DB has a geographic index, and we can do trics like looking for segments only around the area of the segment you're looking at, so reducing the amount of comparisons by a lot: less that 10 instead of 16k! And of course, you should not consider segments you already removed.
To be honest, I thought this would take way more effort and code. Yes, I changed the algo thrice, but all in all it took me like 6h to get it as it is. Now, it is not perfect. It does not detect some artifacts (mostly cycles connected to a single segment), but is good enough for me for now. Not bad for <200 lines of Python and SQL :)
#! /usr/bin/env python3 import psycopg2 from shapely import from_wkb, set_srid, LineString, Point def main(): print('Fetching all segments, takes a while.') conn = psycopg2.connect(dbname='europe') cur = conn.cursor() cur.execute(f""" WITH marseille AS ( SELECT way FROM planet_osm_polygon WHERE osm_id = -76469 ) SELECT line.osm_id, line.way FROM planet_osm_line AS line, marseille WHERE line.way && marseille.way AND line.highway IN ( 'primary', 'primary_link', 'secondary', 'secondary_link', 'tertiary', 'tertiary_link', 'residential', 'unclassified', 'living_street', 'road' ) AND (line.access IS NULL OR line.access = 'yes') { f" LIMIT {sys.argv[1]}" if len(sys.argv) > 1 else '' }; """) segments = {} segments_removed = set([666]) # fake osm_id so line.osm_id NOT IN %s AND works (empty sets are syntax errors) # first pass: collect all segments for data in cur.fetchall(): osm_id = data[0] segment = from_wkb(data[1]) segments[osm_id] = segment print(f"Found {len(segments)} segments.") # second pass: eliminate segments for which we can find an unconnected end pass_number = 1 while len(segments) > 0: print(f"PASS {pass_number:2d}") total_count = 0 total_changed = 0 to_remove = set() for osm_id, segment in segments.items(): changed = False removed = False connections = 0 for direction in -1, 0: while len(segment.coords) > 0: point = Point(segment.coords[direction]) cur.execute(""" WITH segment AS ( SELECT way FROM planet_osm_line WHERE osm_id = %s ) SELECT line.osm_id, line.name FROM planet_osm_line AS line, segment WHERE line.way && ST_Buffer(ST_Envelope(segment.way), 10) AND line.osm_id != %s AND line.osm_id NOT IN %s AND line.highway IN ( 'trunk', 'trunk_link', 'primary', 'primary_link', 'secondary', 'secondary_link', 'tertiary', 'tertiary_link', 'residential', 'unclassified', 'living_street', 'road' ) AND (access IS NULL OR access = 'yes') AND ST_Intersects(line.way, ST_GeomFromWKB(%s, 3857)); """, (osm_id, osm_id, tuple(segments_removed), point.wkb)) data = cur.fetchall() if len(data) == 0: if len(segment.coords) == 2: # removing one point removes the segment removed = True to_remove.add(osm_id) break # remove the point if direction == 0: segment = LineString(segment.coords[1:]) else: segment = LineString(segment.coords[:-1]) changed = True else: connections += len(data) break if changed: segments[osm_id] = segment total_changed += 1 total_count += 1 match total_count % 10, changed, removed: case _, True, _: print('*', end='', flush=True) case _, False, True: print('_', end='', flush=True) case 0, _, _: print('|', end='', flush=True) case 5, _, _: print(',', end='', flush=True) case _: print('.', end='', flush=True) if total_count % 100 == 0: print() segments_removed.update(to_remove) print() print(', '.join([ str(id) for id in to_remove ])) for osm_id in to_remove: del segments[osm_id] print(f"{total_changed} changed, {len(to_remove)} removed, {len(segments)} left.") print() # if only some segments were changed, the topology does not change, so we save one pass if len(to_remove) == 0: break pass_number += 1 print("Saving segments...") for segment in segments.values(): cur.execute(""" INSERT INTO streets (way) VALUES (ST_GeomFromWKB(%s, 3857)) """, (segment.wkb, )) conn.commit() conn.commit() print() print("Cutting...") for index, segment in enumerate(segments.values()): buffered = segment.buffer(1) marseille = difference(marseille, buffered) match (index + 1) % 10: case 0: print('|', end='', flush=True) case 5: print(',', end='', flush=True) case _: print('.', end='', flush=True) if (index + 1) % 100 == 0: print() print() out = open('blocks.geojson', 'w+') # GeoJSONstream, one GeoJSON object per line # GDAL/QGIS won't accept for polygon in marseille.geoms: out.write(f"{json.dumps(mapping(polygon))}\n") main()
Runtime takes a while because someone correctly micromapped a 350m dead end road due to diffrences in road side parking, 14 segments/passes in total; otherwise 8 would have suffised. I could have added a heuristic where, when a pass removed just a few segments, the next pass would only search among the segments close to those. Technically I could do this from the first pass.
One thing that surprised me whas this: QGIS can read GeoJSON files, but if they're going to be a colleciton of things,
better be a GeoJSONseq, that is a file with a GeoJSON object per line. But since these files do not include info about
the EPSG, you have to set it by hand on QGIS. now, this would be fine if QGIS would ask about it when loading the
layer, but instead two things happen: it confuses the EPSG, but would still correctly zoom to the layer assuming that
the layer has the same projection as the project. Thanks to uglyhack#qgis@libera.chat.
Several caveats:
I didn't include motorways or tunks in the block computation, because they create a complication in the definition of block, specially with bridges over or tunnels under other roads, of which there are a lot here. I know at least 3 bridges that still break the graph; there are few enough that I can ignore them.
The city has big blocks of industrial areas. Service roads are not included, and in any case most are dead ends.
The biggest areas are actually not urban, including a big chunk of a National Park in the two big red areas to the South2 and the ports and the sea nearby3. Extracting a real urban area is possible, but harder.
On top I also drew three things, roads (grey, white, blue), rivers (thick blue) and train tracks (black), to give a better idea of the complexity of the city. This time it includes motorways, trunks, service and private roads. The latter also give an idea of how much is in private hands (white). The thin blue lines are the streets thad define the blocks, and the grey are public but dead ends.

-
I'm focused on a traffic issue: the city is not navigable by car, leading to lots of traffic in the few streets available, leading to lots of noise from impatient drivers. ↩
-
The map is rotated 108° so I could zoom in as much as possible, so South is rougly to the left. ↩
-
See where all the ferry lines go to; that's the Vieux Port (Old Port), and the new, industrial one is in the same area to the North/right. ↩