GDAL and Python wheels, a cautionary tale

Another Fiona or Rasterio release approaches, and I've got a story to tell.

Once there was a programmer who worked at a small geospatial company. On the internet one day, they discovered an open source geospatial extension to Postgres, called PostGIS, and another open source project named GDAL that had bindings in a new programming language, Python. The GIS programmer was enchanted with GDAL's Python bindings. They used them a little at their job. They used them more at their next job. They showed other people how to use it. They evangelized it. They argued with people who were wrong about it on the internet. But, as their Python experience grew, they became disenchanted with the GDAL Python tooling. There were gotchas and the requirement to think about C language details at all times was fatiguing. The GIS programmer developed an itch for something better. Something that read and ran like Python's standard library. So, they started to create a replacement.

The programmer starting coding it at work at their new job at a mapping start-up. Many void pointers were used, and many eyebrows were raised. A new Python package was born. It worked, it wiped out GDAL's gotchas, and even though it didn't have all the features of GDAL's own Python bindings, the GIS programmer loved it. They wanted others to use it and love it, too. They evangelized it and argued about it with people on the internet, though less zealously than they used to do, because they had grown older and learned some important lessons.

It was hard to convince people to leave the old GDAL Python tooling for this new package, better as it was. It was new, and unfamiliar, and change is concerning for GIS analysts and programmers, because they're barely keeping the wheels on their enterprise as it is, what with the general terrible state of data formats and proprietary software in the industry. And like GDAL's original Python bindings, you had to go through the same arduous build and installation process on your own computer, for GDAL offered no pre-built distributions. And the new tooling required an extra build step on top of the already arduous GDAL build. Adoption of the programmer's new package was slow. They pondered the situation.

The programmer considered what could be done to make distribution and installation easier. They noticed that the Python package named "Numpy" provided platform-specific binary distributions ("wheels"), and used an open source program named "delocate" to vendor the OpenBLAS linear algebra and Fortran libraries into the Numpy wheels for macOS. Running "pip install numpy" on a Mac would fetch one of these wheels and Numpy's linear algebra methods could be used immediately without any compilation. The programmer wondered if this could be done for GDAL and its larger set of library dependencies and decided to try.

It worked. The boy published wheels for macOS 10.6 or better and Python 2.7 or Python 3.4. Two different wheels, each including a libgdal shared library along with several supporting libraries.

Gradual success.

But it gradually became more complicated.

Linux wheels, then Windows wheels, then Python 3.5. Then switching to multibuild to provide newer Linux wheels. Then Python 3.6, and 3.7. Then PROJ 6 and GDAL 3. Then wheels for macOS arm64. Then an entirely new build system for GDAL (CMake). Then Pythons 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, and a newer version of Linux wheels. Then vcpkg because Windows wheel builds were melting down. And patches to vcpkg because vcpkg was melting down. Patches to multi-build and auditwheel. Sercurity hotfixes for curl, xz, libjpeg.

The ever-morphing CI images (new homebrew packages, etc). Deprecation of macOS SDKs.

When vendored libraries were moved to a different path within wheels, the boy heard from people who were downloading the wheels only to strip GDAL and its dependencies out of them, they weren't using the Python package at all.

Users asking for Linux arm64 wheels, Linux ppc64 wheels, newer versions of GDAL. It became a treadmill.

The boy realized that maybe GDAL was wise in never providing built distributions.

Write your post here.