Fix for Shapely's GEOS library loading bug in 1.7a2

Including GDAL and GEOS in binary wheels that go to the Python Package Index (PyPI) has been good for users of fiona, rasterio, and shapely, but has also exposed Python GIS programmers on OS X to a perplexing bug. Shapely's issue #553 has been weighing on me for a long time. It's been hard to understand, hard to explain, and bites very hard. The short program below will trigger the bug on any version of OS X with any version of fiona and any version of shapely before 1.7a2 which is installed using a wheel from PyPI.

# crash.py

import fiona
from shapely.geometry import Point
from shapely.ops import unary_union


print("Buffering points to create polygons")
SHAPES = [Point(i, i).buffer(1.5) for i in range(20)]

print("Computing union of polygons")
union = unary_union(SHAPES)

print("Union: {}".format(union))

For example:

$ python crash.py
Buffering points to create polygons
Computing union of polygons
Assertion failed: (!static_cast<bool>("should never be reached")), function itemsTree, file AbstractSTRtree.cpp, line 373.
Abort trap: 6

The unary_union function uses a GEOS STRtree and an assertion that should never be reached is in fact reached and the program aborts.

There has been a work-around – import shapely before importing other modules that are dynamically linked with GEOS, such as fiona and rasterio – but that's smelly and hard to ensure in a complex project.

The problem exists in a small niche: one operating system and one kind of program, one that that loads GEOS twice in different ways. Once when fiona's extension modules are loaded and cause the linked GEOS library to be loaded as with any compiled program, and once again when shapely is imported and it calls dlopen from python to load GEOS. I've found no diagnosis or solution of a similar problem on the web. Was the problem in GEOS? Was it in the library files that I built for the fiona, rasterio, and shapely wheels? Was the bug in delocate, the program I use to find dependencies and bundle them into the wheels? I asked GEOS developers and folks at work, to no avail. I was stumped for a long time.

I asked Even Rouault about the issue and he suggested a solution might be found in being more careful about using the correct mode with dlopen and he offered some example Python code that included this: handle = CDLL(None). My first thought was "can we use the ctypes CDLL class like this?" The docs say that the first argument is a path to a library file and that's how I've been using it in Shapely.

While looking for guidance on passing None to the CDLL constructor, I found some commented code in https://eli.thegreenplace.net/2013/03/09/python-ffi-with-ctypes-and-cffi.

# CDLL(None) invokes dlopen(NULL), which loads the currently running
# process - in our case Python itself. Since Python is linked with
# libc, readdir_r will be found there.
# Alternatively, we can just explicitly load 'libc.so.6'.
lib = CDLL(None)

The OS X dlopen man page doesn't exactly say that,

dlopen() examines the mach-o file specified by path. If the file is compatible with the current process and has not already been loaded into the current process, it is loaded and linked. After being linked, if it contains any initializer functions, they are called, before dlopen() returns. dlopen() can load dynamic libraries and bundles. It returns a handle that can be used with dlsym() and dlclose(). A second call to dlopen() with the same path will return the same handle, but the internal reference count for the handle will be incremented. Therefore all dlopen() calls should be balanced with a dlclose() call.

If a null pointer is passed in path, dlopen() returns a handle equivalent to RTLD_DEFAULT.

but the POSIX spec is clear that dlopen(NULL) returns a handle to a global symbols object and that's what I observe on my MacBook. Here's an interpreter session where I import fiona and look in the global symbols object for a GDAL C API function.

>>> import fiona
>>> from ctypes import CDLL
>>> handle = CDLL(None)
>>> handle.GDALGetDriverCount
<_FuncPtr object at 0x111a3eb38>
>>> handle.GDALGetDriverCount()
187

When fiona is imported, Python calls dlopen with the path to fiona's extension (.so) modules, and as these are liked with libgdal, the GDAL library is loaded. Now, libgdal is linked with libgeos_c, so will we find GEOS C API functions in the global symbols object?

>>> handle.initGEOS
<_FuncPtr object at 0x111a3fb38>

Yes. This changes everything. The problem that has bedeviled shapely users is caused by using dlopen to load a library that's already been loaded. We can avoid the problem by detecting whether libgeos_c has already been loaded and skipping the troublesome dlopen call. I've done this for shapely and made a new 1.7 pre-release. Please considering running pip install -U --pre shapely to try it out.

The OS X man page for dlsym says that looking up functions in the global handle is the slowest possible approach, but ctypes does cache the addresses of functions it finds and my test programs don't run measurably slower, so this just might work out.

I'd like to understand why this problem doesn't also occur on Linux. Is it because of differences in the library loaders or the executable file formats themselves? Is there subtle platform-specific bug in GEOS? I'll write more if I get to the bottom of this.