This week will be busier than the last! I'm writing this from the neighborhood pool where Bea is training with her swim team. It's sunny and mild and it feels great to be writing outside.
Including GDAL and GEOS in binary wheels that go to the Python Package Index (PyPI) has been good for users of fiona, rasterio, and shapely, but has also exposed Python GIS programmers on OS X to a perplexing bug. Shapely's issue #553 has been weighing on me for a long time. It's been hard to understand, hard to explain, and bites very hard. The short program below will trigger the bug on any version of OS X with any version of fiona and any version of shapely before 1.7a2 which is installed using a wheel from PyPI.
The unary_union function uses a GEOS STRtree and an assertion that should never be reached is in fact reached and the program aborts.
There has been a work-around – import shapely before importing other modules that are dynamically linked with GEOS, such as fiona and rasterio – but that's smelly and hard to ensure in a complex project.
The problem exists in a small niche: one operating system and one kind of
program, one that that loads GEOS twice in different ways. Once when fiona's
extension modules are loaded and cause the linked GEOS library to be loaded as
with any compiled program, and once again when shapely is imported and it calls
dlopen from python to load GEOS. I've found no diagnosis or solution of
a similar problem on the web. Was the problem in GEOS? Was it in the library
files that I built for the fiona, rasterio, and shapely wheels? Was the bug in
delocate, the program I use to find dependencies and bundle them into the
wheels? I asked GEOS developers and folks at work, to no avail. I was stumped
for a long time.
I asked Even Rouault about the issue and he suggested a solution might be found
in being more careful about using the correct mode with
dlopen and he
offered some example Python code that included this:
handle = CDLL(None).
My first thought was "can we use the ctypes CDLL class like this?" The docs say
that the first argument is a path to a library file and that's how I've been
using it in Shapely.
While looking for guidance on passing
None to the CDLL constructor, I found
some commented code in https://eli.thegreenplace.net/2013/03/09/python-ffi-with-ctypes-and-cffi.
The OS X dlopen man page doesn't exactly say that,
dlopen() examines the mach-o file specified by path. If the file is compatible with the current process and has not already been loaded into the current process, it is loaded and linked. After being linked, if it contains any initializer functions, they are called, before dlopen() returns. dlopen() can load dynamic libraries and bundles. It returns a handle that can be used with dlsym() and dlclose(). A second call to dlopen() with the same path will return the same handle, but the internal reference count for the handle will be incremented. Therefore all dlopen() calls should be balanced with a dlclose() call.
If a null pointer is passed in path, dlopen() returns a handle equivalent to RTLD_DEFAULT.
but the POSIX spec is clear that
dlopen(NULL) returns a handle to a global
symbols object and that's what I observe on my MacBook. Here's an interpreter
session where I import fiona and look in the global symbols object for a GDAL
C API function.
When fiona is imported, Python calls dlopen with the path to fiona's extension (.so) modules, and as these are liked with libgdal, the GDAL library is loaded. Now, libgdal is linked with libgeos_c, so will we find GEOS C API functions in the global symbols object?
Yes. This changes everything. The problem that has bedeviled shapely users is
caused by using dlopen to load a library that's already been loaded. We can avoid
the problem by detecting whether libgeos_c has already been loaded and skipping
the troublesome dlopen call. I've done this for shapely and made a new 1.7
pre-release. Please considering running
pip install -U --pre shapely to try
The OS X man page for dlsym says that looking up functions in the global handle is the slowest possible approach, but ctypes does cache the addresses of functions it finds and my test programs don't run measurably slower, so this just might work out.
I'd like to understand why this problem doesn't also occur on Linux. Is it because of differences in the library loaders or the executable file formats themselves? Is there subtle platform-specific bug in GEOS? I'll write more if I get to the bottom of this.
Ruth is out of town this week and the next and on top of solo parenting I am dog sitting. I'll be extra busy through the end of June. Please forgive me if I don't respond immediately to emails or tickets on GitHub.
Arthur's Rock is one of our local landmarks and hiking destinations. It's a big lump of pink Precambrian granite named after Arthur Howard, one of the previous owners of the land that would become Lory State Park in 1967. The Quad Rock and Black Squirrel courses go by it. I've been back twice since Quad Rock, once on a run and once with my family.
May was cool and wet and this weather has continued to some degree in June. Lory State Park is as green as I've ever seen it.
The ponderosa pines in the second photo were charred during a small lightning-sparked fire in June 2016. I'm glad it wasn't worse. The small forest on the east slope of Lory State Park is priceless.
Mapbox GL JS has excellent examples. This is a result of I cribbed freely from https://docs.mapbox.com/mapbox-gl-js/examples/ and everything worked. Not all projects have good examples. John Firebaugh (who has since moved on from Mapbox), Colleen McGinnis, and the GL JS and Documentation teams put a ton of work into these and it shows.
Mapbox GL JS Expressions give a developer ample control over styling and
filtering while keeping the styling API abstract and simple. In several parts
of its API, Mapbox GL JS can evaluate Lisp-like expressions composed of
["+", 4, ["*", 3, 2]] evaluates to
4 + (3 * 2), or
JS expressions operators contains feature property and map attribute getters.
snuggs and rio-calc) is that developers
to build expressions.
Another first for me is that I'm using Turf, the well-known simple features geometry library created by Morgan Herlocker. In my app I'm creating features on the fly derived from the intersections of features in a Mapbox tileset. Turf is fast and its functions take and return GeoJSON objects, which makes it easy for me to use and easy to integrate with Mapbox GL JS and other libraries or plugins. Turf is indispensable for a web app like the one I'm writing.
The season's 19th snowfall is heavier than the 18th. We're now 24 days after the average last snowfall date 🤷.
Nick Clark's official recap the Quad Rock races with photos and commentary is great and keeps me thinking about how much fun I had. I could read or listen to stories about the race all day long.
I haven't run or hiked at all this week. Not a single step.
My strained calf is feeling much better after four days of rest and I'm looking forward to getting back out on the trail for an hour this weekend. My plan is to resume running around 25-30 miles a week, starting next week, and then start building up for races in September and October.
I added some control over rio-calc's memory usage yesterday and had fun doing it. In 1.0.22, rio-calc read all your sources into memory, preventing calculations on very large datasets. There are a number of ways to chunk the calculations and output of the results. GDAL's gdal_calc.py script divides the job into chunks that match the storage of the input files. I rewrote rio-calc to chunk the work more like GDAL's warper does: you specify a maximum amount of memory to be used by the program and rio-calc divides the job into chunks that need approximately less than this amount. You, the user, are free to trade memory for speed.
The rio-calc command uses Lisp-like expressions. Some people like them, some
people don't. Evaluation of the expressions is done with snuggs. The expression evaluator in gdal_calc.py
eval(). Snuggs does not use
eval() and uses a whitelist
of names and keywords. It is much safer. For example, you can't trick snuggs
into importing modules that give filesystem access. You
could use snuggs on your server to evaluate expressions written by untrusted
people on the internet. You would be asking for trouble if you did this with
gdal_calc.py. This is not to say that Snuggs is perfect and has no bugs. We
found some this week and fixed them. If you're a rio-calc user, make sure you
upgrade snuggs to 1.4.6 to get the latest and greatest version.
Thank you for using Rasterio and thank you for the ever-excellent bug reports. Clear writing and examples of failing code make it much easier for a busy programmer to switch context and dispatch bugs efficiently.
This final week's numbers:
15 hours, 8 minutes running and hiking (2nd of 24)
62.8 miles (6th)
11,270 feet D+ (2nd)
Add to that one Quad Rock 50 finish in 13:13:50. My first ultramarathon.
Ruth says I looked fresher at the finish than I did in 2015 when I finished my first road marathon. It's partly because this race was more fun and partly because of my training. I'm sore today, but not wrecked.
David Bitner ran the first half of this with me and is a 25 mile finisher. He's building toward the Superior 100 in September and this was an early season training run. All the photos below are from his phone.
I had so much fun hanging out with Bitner before, during, and after the race. I'm looking forward to the next chance.
Despite the snow that fell 2 days before, the trails were in great shape. The sky was blue for the first half, which made staying in a good mood easy; and cloudy in the second half, so I was never at risk of overheating. For once, it seemed like all factors were in my favor.
Not that there wasn't adversity. At the bottom of the fifth climb I had reached my longest single-day distance ever and I was going into uncharted territory. I had strained my left calf, high up behind my knee, and it hindered me for the rest of the run. My body had reached its VFuel and almond butter & honey sandwich consumption limit. I compensated by hiking more slowly uphill and letting go of the brakes on the downhills, flipping my tactics. I switched to boiled potatoes, Haribo Sour Cubes, and gingerale. I was beginning to suffer a little chaffing as I approached the 40-mile aid station, but had had the foresight to stash my favorite shorts from last year, semi-retired, in a drop bag there. Those old shorts saved my butt.
Too make a long story short: I had a great adventure and am super pleased with myself for finishing. Congratulations to all you other finishers. You who didn't finish have my sympathy. Big thanks to Nick, Brad, Gnar Runners, and all the volunteers who planned and operated a fine event!