Busier

Sean Gillies

2019-06-25 08:09

This week will be busier than the last! I'm writing this from the neighborhood pool where Bea is training with her swim team. It's sunny and mild and it feels great to be writing outside.

Fix for Shapely's GEOS library loading bug in 1.7a2

Sean Gillies

2019-06-23 13:54

Including GDAL and GEOS in binary wheels that go to the Python Package Index (PyPI) has been good for users of fiona, rasterio, and shapely, but has also exposed Python GIS programmers on OS X to a perplexing bug. Shapely's issue #553 has been weighing on me for a long time. It's been hard to understand, hard to explain, and bites very hard. The short program below will trigger the bug on any version of OS X with any version of fiona and any version of shapely before 1.7a2 which is installed using a wheel from PyPI.

# crash.py

import fiona
from shapely.geometry import Point
from shapely.ops import unary_union


print("Buffering points to create polygons")
SHAPES = [Point(i, i).buffer(1.5) for i in range(20)]

print("Computing union of polygons")
union = unary_union(SHAPES)

print("Union: {}".format(union))

For example:

$ python crash.py
Buffering points to create polygons
Computing union of polygons
Assertion failed: (!static_cast<bool>("should never be reached")), function itemsTree, file AbstractSTRtree.cpp, line 373.
Abort trap: 6

The unary_union function uses a GEOS STRtree and an assertion that should never be reached is in fact reached and the program aborts.

There has been a work-around – import shapely before importing other modules that are dynamically linked with GEOS, such as fiona and rasterio – but that's smelly and hard to ensure in a complex project.

The problem exists in a small niche: one operating system and one kind of program, one that that loads GEOS twice in different ways. Once when fiona's extension modules are loaded and cause the linked GEOS library to be loaded as with any compiled program, and once again when shapely is imported and it calls dlopen from python to load GEOS. I've found no diagnosis or solution of a similar problem on the web. Was the problem in GEOS? Was it in the library files that I built for the fiona, rasterio, and shapely wheels? Was the bug in delocate, the program I use to find dependencies and bundle them into the wheels? I asked GEOS developers and folks at work, to no avail. I was stumped for a long time.

I asked Even Rouault about the issue and he suggested a solution might be found in being more careful about using the correct mode with dlopen and he offered some example Python code that included this: handle = CDLL(None). My first thought was "can we use the ctypes CDLL class like this?" The docs say that the first argument is a path to a library file and that's how I've been using it in Shapely.

While looking for guidance on passing None to the CDLL constructor, I found some commented code in https://eli.thegreenplace.net/2013/03/09/python-ffi-with-ctypes-and-cffi.

# CDLL(None) invokes dlopen(NULL), which loads the currently running
# process - in our case Python itself. Since Python is linked with
# libc, readdir_r will be found there.
# Alternatively, we can just explicitly load 'libc.so.6'.
lib = CDLL(None)

The OS X dlopen man page doesn't exactly say that,

dlopen() examines the mach-o file specified by path. If the file is compatible with the current process and has not already been loaded into the current process, it is loaded and linked. After being linked, if it contains any initializer functions, they are called, before dlopen() returns. dlopen() can load dynamic libraries and bundles. It returns a handle that can be used with dlsym() and dlclose(). A second call to dlopen() with the same path will return the same handle, but the internal reference count for the handle will be incremented. Therefore all dlopen() calls should be balanced with a dlclose() call.

If a null pointer is passed in path, dlopen() returns a handle equivalent to RTLD_DEFAULT.

but the POSIX spec is clear that dlopen(NULL) returns a handle to a global symbols object and that's what I observe on my MacBook. Here's an interpreter session where I import fiona and look in the global symbols object for a GDAL C API function.

>>> import fiona
>>> from ctypes import CDLL
>>> handle = CDLL(None)
>>> handle.GDALGetDriverCount
<_FuncPtr object at 0x111a3eb38>
>>> handle.GDALGetDriverCount()
187

When fiona is imported, Python calls dlopen with the path to fiona's extension (.so) modules, and as these are liked with libgdal, the GDAL library is loaded. Now, libgdal is linked with libgeos_c, so will we find GEOS C API functions in the global symbols object?

>>> handle.initGEOS
<_FuncPtr object at 0x111a3fb38>

Yes. This changes everything. The problem that has bedeviled shapely users is caused by using dlopen to load a library that's already been loaded. We can avoid the problem by detecting whether libgeos_c has already been loaded and skipping the troublesome dlopen call. I've done this for shapely and made a new 1.7 pre-release. Please considering running pip install -U --pre shapely to try it out.

The OS X man page for dlsym says that looking up functions in the global handle is the slowest possible approach, but ctypes does cache the addresses of functions it finds and my test programs don't run measurably slower, so this just might work out.

I'd like to understand why this problem doesn't also occur on Linux. Is it because of differences in the library loaders or the executable file formats themselves? Is there subtle platform-specific bug in GEOS? I'll write more if I get to the bottom of this.

Busy

Sean Gillies

2019-06-17 07:50

Ruth is out of town this week and the next and on top of solo parenting I am dog sitting. I'll be extra busy through the end of June. Please forgive me if I don't respond immediately to emails or tickets on GitHub.

Arthur's Rock

Sean Gillies

2019-06-15 18:00

Arthur's Rock is one of our local landmarks and hiking destinations. It's a big lump of pink Precambrian granite named after Arthur Howard, one of the previous owners of the land that would become Lory State Park in 1967. The Quad Rock and Black Squirrel courses go by it. I've been back twice since Quad Rock, once on a run and once with my family.

https://live.staticflickr.com/65535/48021888767_3473b561ed_b.jpg — View south from Arthur's Rock Trail, May 26, 2019

May was cool and wet and this weather has continued to some degree in June. Lory State Park is as green as I've ever seen it.

https://live.staticflickr.com/65535/48068992632_24cdfafe9f_b.jpg — Arthur's Rock from Howard Trail, June 1, 2019

The ponderosa pines in the second photo were charred during a small lightning-sparked fire in June 2016. I'm glad it wasn't worse. The small forest on the east slope of Lory State Park is priceless.

JavaScript mapping in 2018

Sean Gillies

2019-05-22 08:00

It's been five-and-a-half years since I last blogged about making maps in a browser using JavaScript. It's cool that my old Syriac Places demo still works. Yay for web standards. A lot of things have changed in the meanwhile and this is my first chance to acknowledge some of those changes.

I've been writing HTML and JavaScript every other day at work for the past 3 weeks, building a web app to give Sales and sales-supporting teams at Mapbox insight into our imagery basemap. It's HTML and mostly Vanilla JS and has been a fun exercise. Since it's for internal customers only I can use new browser and JavaScript features.

Mapbox GL JS, the JavaScript mapping library that's maintained by my coworkers, is the closest thing to a framework in my app. My current project is the first in which I've used Mapbox GL JS. I'd like to point out some features and aspects of the framework that I find especially notable.

Mapbox GL JS has excellent examples. This is a result of I cribbed freely from https://docs.mapbox.com/mapbox-gl-js/examples/ and everything worked. Not all projects have good examples. John Firebaugh (who has since moved on from Mapbox), Colleen McGinnis, and the GL JS and Documentation teams put a ton of work into these and it shows.

Mapbox GL JS makes a big and important case of dynamic maps simple. I have a GeoJSON Source defined in my app's JavaScript. It is registered with a layer and a map. When I remove features from the source, they disappear from the map. When I add features to the source, they appear on the map. The map is updated fast enough that I can do this on every mousemove event.

Mapbox GL JS Expressions give a developer ample control over styling and filtering while keeping the styling API abstract and simple. In several parts of its API, Mapbox GL JS can evaluate Lisp-like expressions composed of JavaScript arrays. ["+", 4, ["*", 3, 2]] evaluates to 4 + (3 * 2), or 10. In addition to the standard JavaScript operators, the namespace of GL JS expressions operators contains feature property and map attribute getters. An advantages of using JavaScript objects instead of conventional Lisp (as with snuggs and rio-calc) is that developers get syntax highlighting help when writing expressions, and can use JavaScript to build expressions.

Another first for me is that I'm using Turf, the well-known simple features geometry library created by Morgan Herlocker. In my app I'm creating features on the fly derived from the intersections of features in a Mapbox tileset. Turf is fast and its functions take and return GeoJSON objects, which makes it easy for me to use and easy to integrate with Mapbox GL JS and other libraries or plugins. Turf is indispensable for a web app like the one I'm writing.

Lastly, JavaScript itself has changed a lot in the last 5 years. I'm using features of ES6 for the first time ever. You know this, I'll bet, but ES6 arrow functions are like Scheme's lambda expressions and Python's lambda forms. I'm using them exclusively, instead of the old anonymous functions, when filtering and sorting arrays of features. For example, the following sorts features by a date, in descending date order, without needing to type "function" or "return". Handy, and to my eyes, more readable.

features.sort((a, b) => b.properties.render_date - a.properties.render_date)

These are obviously not the only recent advances in the JavaScript mapping field. Or even the most recent, which is why I chose the title "JavaScript mapping in 2018". The app that I'm building could have been written a year ago using the same software. For a glimpse at JavaScript mapping in 2019-2020, I suggest https://observablehq.com/collection/@observablehq/maps.

Still snowing

Sean Gillies

2019-05-21 10:33

Gnar Runners official Quad Rock recap

Sean Gillies

2019-05-16 10:12

Nick Clark's official recap the Quad Rock races with photos and commentary is great and keeps me thinking about how much fun I had. I could read or listen to stories about the race all day long.

Resting

Sean Gillies

2019-05-16 10:03

I haven't run or hiked at all this week. Not a single step.

https://live.staticflickr.com/65535/46947095815_acaa4c9c79_b.jpg

My strained calf is feeling much better after four days of rest and I'm looking forward to getting back out on the trail for an hour this weekend. My plan is to resume running around 25-30 miles a week, starting next week, and then start building up for races in September and October.

Rasterio 1.0.23

Sean Gillies

2019-05-16 08:39

I've been busy with a new product at work the past few weeks, writing Javascript and HTML, and hiring new teammates, but found some space this week to knock out some Rasterio bugs and release version 1.0.23. Here's the relevant section of the change log: https://github.com/mapbox/rasterio/blob/master/CHANGES.txt#L4-L20.

I added some control over rio-calc's memory usage yesterday and had fun doing it. In 1.0.22, rio-calc read all your sources into memory, preventing calculations on very large datasets. There are a number of ways to chunk the calculations and output of the results. GDAL's gdal_calc.py script divides the job into chunks that match the storage of the input files. I rewrote rio-calc to chunk the work more like GDAL's warper does: you specify a maximum amount of memory to be used by the program and rio-calc divides the job into chunks that need approximately less than this amount. You, the user, are free to trade memory for speed.

The rio-calc command uses Lisp-like expressions. Some people like them, some people don't. Evaluation of the expressions is done with snuggs. The expression evaluator in gdal_calc.py uses Python's eval(). Snuggs does not use eval() and uses a whitelist of names and keywords. It is much safer. For example, you can't trick snuggs into importing modules that give filesystem access. You could use snuggs on your server to evaluate expressions written by untrusted people on the internet. You would be asking for trouble if you did this with gdal_calc.py. This is not to say that Snuggs is perfect and has no bugs. We found some this week and fixed them. If you're a rio-calc user, make sure you upgrade snuggs to 1.4.6 to get the latest and greatest version.

Thank you for using Rasterio and thank you for the ever-excellent bug reports. Clear writing and examples of failing code make it much easier for a busy programmer to switch context and dispatch bugs efficiently.

Training week twenty-four and Quad Rock recap

Sean Gillies

2019-05-12 18:36

This final week's numbers:

15 hours, 8 minutes running and hiking (2nd of 24)
62.8 miles (6th)
11,270 feet D+ (2nd)

Add to that one Quad Rock 50 finish in 13:13:50. My first ultramarathon.

https://live.staticflickr.com/65535/47785140532_7716eaee3c_b.jpg

Ruth says I looked fresher at the finish than I did in 2015 when I finished my first road marathon. It's partly because this race was more fun and partly because of my training. I'm sore today, but not wrecked.

David Bitner ran the first half of this with me and is a 25 mile finisher. He's building toward the Superior 100 in September and this was an early season training run. All the photos below are from his phone.

https://live.staticflickr.com/65535/47782790042_2d6e0c9ba5_b.jpg — Heading up South Ridge Trail, the second of six climbs

https://live.staticflickr.com/65535/47045334154_6d88aacb1e_b.jpg — On Westridge Trail, the top of the third climb

https://live.staticflickr.com/65535/47045348654_60222d0d68_b.jpg — At the finish, just before I went up for another 25 miles

I had so much fun hanging out with Bitner before, during, and after the race. I'm looking forward to the next chance.

Despite the snow that fell 2 days before, the trails were in great shape. The sky was blue for the first half, which made staying in a good mood easy; and cloudy in the second half, so I was never at risk of overheating. For once, it seemed like all factors were in my favor.

Not that there wasn't adversity. At the bottom of the fifth climb I had reached my longest single-day distance ever and I was going into uncharted territory. I had strained my left calf, high up behind my knee, and it hindered me for the rest of the run. My body had reached its VFuel and almond butter & honey sandwich consumption limit. I compensated by hiking more slowly uphill and letting go of the brakes on the downhills, flipping my tactics. I switched to boiled potatoes, Haribo Sour Cubes, and gingerale. I was beginning to suffer a little chaffing as I approached the 40-mile aid station, but had had the foresight to stash my favorite shorts from last year, semi-retired, in a drop bag there. Those old shorts saved my butt.

Too make a long story short: I had a great adventure and am super pleased with myself for finishing. Congratulations to all you other finishers. You who didn't finish have my sympathy. Big thanks to Nick, Brad, Gnar Runners, and all the volunteers who planned and operated a fine event!