2019

Fiona 1.8.9 and GDAL 3

GDAL 3.0.0 was released in May. It has been adopted more quickly than I anticipated and lack of support for it in the Rasterio and Fiona packages was becoming a point of pain for users. Rasterio 1.0.25 added support for GDAL 3.0 in August and now Fiona 1.8.9 has similar basic support. We can build the extension modules in these packages with GDAL 3.0 and they will run almost exactly as they did with GDAL 2.4. No new spatial referencing features of GDAL 3.0 and PROJ 6 are exposed by Fiona 1.8.9.

What changes were required to the code? Everywhere an extension module calls one of the OSRImport* functions we now need to call a new OSRSetAxisMappingStrategy function to preserve the "traditional" open source GIS order of coordinates: longitude first, latitude second in common cases. This change was made in core modules and in the three existing shim modules. We had to add a new shim module to go with GDAL 3.0 to make a facade for a function that was removed from GDAL in 3.0. We try to avoid branching on the version of GDAL in Fiona and instead use the shims at build time. This binds installations of the packages to a specific version of GDAL.

Why did it take so long to make the changes? The changes required us to add builds of GDAL 3 and PROJ 6 to our CI server, which takes some iterations and time. Also the cultural knowledge of the project required to change the shim modules is scarce. There are only a few of us who understanding what's going on in there. This is certainly a problem to be overcome in the future.

The shims make the source distribution of the package fragile. There was a bug in 1.8.9 that prevented the fiona/_shim3.c file from being copied to fiona/_shim.c during installation. In 1.8.9.post1 we created a new bug that messed up copy of the fiona/_shim2.c file. These are all fixed and now users can build and install a version of Fiona on Linux or OS X that compiles with and links and existing GDAL 3.0 installation by running GDAL_CONFIG=/gdal3prefix/bin/gdal-config pip install --no-binary fiona -U fiona.

Thanks for your help and patience, everyone.

October 19, 2018: Golden Gate Bridge

I'm headed to San Francisco for work almost exactly a year after going on a family trip on which my kids and I made our first walk onto the Golden Gate Bridge. Among the photos we took are these by Arabelle, who was enjoying playing with depth of field out in the unique space of the bridge and bay and sky.

https://live.staticflickr.com/4857/31870024698_8e9fc48a7c_b.jpg
https://live.staticflickr.com/1905/30801309837_e42f872916_b.jpg

Planes

I spent last weekend in Minneapolis, my family flew to Seattle today, I'm flying to San Francisco on Sunday, and I'm planning to cross paths with Ruth, Arabelle, and Beatrice in DEN as they return. I think it goes without saying that it's unsustainable for a family to be flying around all the time. We're trying to reduce our trips and make up for them as much as we can, but this is a period of fairly intense stratospheric presence for my clan.

Arabelle is flying to LA for a week on Catalina Island with a school group in November, her first trip in a plane without me or Ruth. There is at least one kid on the trip who will be flying for their first time at age 13. My kids, who have been flying all their lives, to France multiple times, almost get it, but that first-time plane traveler is a lot like me. When I was a kid, my parents could barely afford to fly us around at all. I boarded my first domestic flight at 15 (Salt Lake City -> Dallas -> Detroit) and didn't make my first international flight (Miami to Caracas, Venezuela) until I was 25.

Air travel is both amazing (the speed, the views out the window!) and terrible (the pollution, the lines). I'm certain that it is going to be more limited in the future. Make the most of it, and give back as much as you can.

https://live.staticflickr.com/65535/48917149166_5e8c85cc78_b.jpg

Landing at sunset in Denver

Loopet Loppet recap

I'm back now from Minneapolis and the Loopet Loppet, the 12-hour trail event in Theodore Wirth Park, where I ran 8 laps of the 5 mile loop and 2 laps of the one mile loop. 42 in all. I had a great time and whole-heartedly recommend this event. The trails are challenging and fun. The aid station and facilities are first class. The staff and volunteers are amazing. Thank you, Bitner, for inviting me out and putting me up for the weekend.

Before the Loopet I was certain that it would be my last race of the year, but I'm feeling reinvigorated and not very sore today, and am reconsidering my options for this Saturday. Stay tuned.

https://live.staticflickr.com/65535/48906699636_b77d30c9d8_b.jpg

Rasterio 1.1.0

After 28 post-1.0 bug fix releases, Rasterio finally has a 1.1.0 release with new features: https://pypi.org/project/rasterio/1.1.0/. The changes are listed here: https://github.com/mapbox/rasterio/blob/master/CHANGES.txt#L9-L68. I had fun collaborating with Vincent Sarago on the design and implementation of transform.from_gcps(). I miss working with Vincent, it's been hard getting by without him on my team.

Rasterio has 90 authors now, including new authors Andrew Annex, Caleb Robinson, Christoph Rieke, Darren Weber, Denis Rykov, Guillaume Lostis, Ismail Sunni, Jennifer Reiber Kyle, Koshy Thomas, Leah Wasser, Lukasz, Martin Raspaud, Pablo Sanfilippo, RK Aranas, Raaj Tilak Sarma, Robert Sare, Yuvi Panda, appanacca, derekjanni, firas omrane, jaredairbusaerial, and sshuair.

I'm very pleased about two of the bug fixes in 1.1.0. I blogged recently about fixing some memory leaks; that's one of them. The other bug fix involved GDAL dataset reference counting. I finally got a really good grip on this. In 1.1.0, closing a DatasetReader doesn't automatically destroy the underlying GDAL dataset handle, it decrements the handle's reference count and destroys it only when the count reaches zero. This means that the handle will be kept alive through the lifetime of a VRT object based on the DatasetReader. A fun thing about writing Python extension modules with GDAL is that I get to deal with two reference counting implementations: Python's and GDAL's, as well as two implementations of exceptions: Python's and GDAL's. I started a blog post on these topics a while ago, but it stalled out. I should try to finish it some day.

Listing open GDAL datasets in Python

This post is a follow-up to Friday's and shows how you can dump a listing of open GDAL datasets in your own Python programs using nothing other than Python's standard library and how you can analyze the dumps with Python and an extra module or two.

If you've imported a Python module that links GDAL, such as Rasterio or GDAL's own Python bindings, you can access GDAL's GDALDumpOpenDatasets function using Python's ctypes module. That function takes a FILE pointer as its only argument and you can get a pointer to stdout or stderr from ctypes as well. I will use Rasterio's interactive dataset inspector to demonstrate.

Note that the C level FILE pointer to stdout is ctypes.c_void_p.in_dll(handle, "__stdoutp") on OS X and ctypes.c_void_p.in_dll(handle, "stdout") on Linux. The listing printed by the function bypasses Python and goes to the terminal.

$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif
Rasterio 1.0.28 Interactive Inspector (Python 3.6.4)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> import ctypes
>>> handle = ctypes.CDLL(None)
>>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp')
>>> _ = handle.GDALDumpOpenDatasets(cstdout)
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif

The first field in every record is the reference count of the dataset, the second is whether it is a shared dataset (S) or not (N), the third is the format driver's short name, the third is a thread id, the fourth is the dataset shape, and the fifth is the dataset's identifier.

In the interpreter we find what we expect: one shared open dataset with a reference count of 1.

What if we wanted to process the listing in Python? We would need to capture the low-level file descriptors and expose them in Python. There's a nice blog post about issues and an implementation at https://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/. Pytest includes a fixture for this, capfd, and it can be used in a test like the one shown below.

import ctypes


def test_sharing_on(capfd):
    """Datasets are shared"""
    # Open a dataset in not-shared mode.
    ...
    handle = ctypes.CDLL(None)
    cstdout = ctypes.c_void_p.in_dll(handle, "stdout")
    assert 1 == handle.GDALDumpOpenDatasets(cstdout)
    captured = capfd.readouterr()
    assert "1 N GTiff" in captured.out
    assert "1 S GTiff" not in captured.out

There's a package named capturer which is inspired by pytest and does the same kind of thing as a context manager.

$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif
Rasterio 1.0.28 Interactive Inspector (Python 3.6.4)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> import ctypes
>>> handle = ctypes.CDLL(None)
>>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp')
>>> from capturer import CaptureOutput
>>> with CaptureOutput() as capfd:
...     _ = handle.GDALDumpOpenDatasets(cstdout)
...     handle.GDALDumpOpenDatasets(cstdout)
...     captured = capfd.get_text()
...
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif
>>> print(captured)
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif

Chiles and fire

The sound and smell of charring chiles is a big part of the Colorado farmers market experience in September. The roar of propane burners and rattle of crisped peppers. The sweet, smokey, aroma of burned chile skins and roasted fruit. It's an end-of-summer ritual.

I've bought many a bag of charred chiles at the market, but these days I prefer to buy them fresh and take them home to give them the personal touch over a hot charcoal fire. In my experience, a gently charred chile that is not tumbled is much easier to peel. The blistered skins come off in big pieces and there's no need to rinse them or peel them under water, which would dilute their sublime flavor.

Yesterday I brought home a bag of green poblano (when red ripe and dried, this is known as ancho) and Big Jim chiles, 10 of each. The Big Jim is a New Mexico chile hybrid developed at New Mexico State University. It's the largest of this kind of chile and has a thick skin. Poblanos have a much thinner skin. It would not be a good idea to roast these together in a cage using propane burners: the tender poblanos would be destroyed. It's easy to do them together on a grill.

https://live.staticflickr.com/65535/48738984783_533d9937b7_b.jpg

Short, wide poblanos and long Big Jims on the grill

A week ago, I gave a bunch of Pueblo chiles the same treatment. The Pueblo chile looks a lot like a New Mexico chile, but is quite different. It's a large Mirasol chile that develops pointing up towards the sun. It seems to be a relatively recent import from Mexico. The New Mexico chile points down towards the earth and is derived from chiles grown by Pueblo people of Southwest North America. The Hatch chile is a New Mexico chile grown in the Hatch Valley along the Rio Grande River upstream of Las Cruces, New Mexico. Hatch is a distinguished appellation like Champagne or Napa are for grapes and wine and Palisade is for peaches, and is way ahead of the game in comparison to Colorado's Pueblo chile growers. New Mexico also has an official State Question: "Red or green?" In other words: do you want red chile or green chile on that? Colorado's climate is cooler and red chiles are less often found at market. I love all of these chiles, green or red.

http://www.mvd.newmexico.gov/uploads/.thumbnails/150x150/9be2dc9b1c8e87cda9d1b19dd0499169.png

I could move to New Mexico just for the plates

It doesn't take long over a hot fire for big blisters to form on the Big Jims and smaller ones to form on the poblanos. The technique is basic: turn them until the blistering is fairly uniform and then put them, whole, in a dry, heavy pot with a lid to steam for a few minutes. This will let the skins continue to loosen as the chiles cool.

https://live.staticflickr.com/65535/48738984633_128ce0cf6d_b.jpg

Flipped chiles

Here are the skinned chiles. The poblanos took about twice as long to peel because the thinner skins come off in smaller pieces. I don't sweat the little scraps of charred skin and never rinse the chiles. You wouldn't rinse a steak or a sausage, would you?

https://live.staticflickr.com/65535/48739315346_3242844ba3_b.jpg

Big Jim chiles, skinned

https://live.staticflickr.com/65535/48739494917_a772c8f20b_b.jpg

Poblano chiles

The final steps are trimming the stems, scraping the seeds from the inside, and putting them in freezer containers with the juices on the plates and in the pot. I will use the Big Jims in fall and winter stews and use the poblanos as a condiment with tacos or eggs or grilled steak.

Debugging temporary files using pytest autouse fixtures

This week I discovered that Rasterio doesn't always close the temporary in-memory datasets that are used within some of its methods. In testing Rasterio's WarpedVRT class I used a GDAL function to dump descriptions of all open datasets and found a bunch that looked unrelated to WarpedVRT. They were GDAL "MEM" type datasets with UUIDs for names, which didn't tell me much. What were their origins?

They have UUIDs for names because Rasterio imports uuid in its _io module and calls uuid.uuid4() to make temporary dataset names. If only the dataset name included the name of the test in which it was created, then I'd have an entry point into debugging. One way to do this is with a pytest auto-used fixture.

I changed the rasterio._io module's import statement from import uuid to from uuid import uuid4 to make it slightly easier to monkey patch and then I added 5 lines of code to Rasterio's conftest.py file:

@pytest.fixture(autouse=True)
def set_mem_name(request, monkeypatch):
    def youyoueyedeefour():
        return "{}-{}".format(request.node.name, uuid.uuid4())
    monkeypatch.setattr(rasterio._io, "uuid4", youyoueyedeefour)

This set_mem_name fixture uses two standard pytest fixtures: request and monkeypatch. The value of request.node.name is the name of the test and this set_mem_name fixture uses monkeypatch to replace uuid4 in rasterio._io with a custom function that prepends the name of the test to the UUID. The autouse=True argument tells pytest to add this fixture to every test it finds. I didn't need to touch the code of any of Rasterio's tests, not a one.

This quickly revealed to me that the unclosed temporary datasets were coming from tests that asserted certain exceptions were being raised by Rasterio's reprojection code. This code used temporary datasets and didn't close them before raising the exception to the caller. Once I changed the code to do the following, Rasterio no longer leaked datasets from those tests, or in our programs.

try:
    ...
    if condition:
        raise CRSError("error")
    ...
except:
    temp.close()
    raise

If Rasterio used only Python's unittest module, and not pytest, it would be possible to do the same thing. Import rasterio._io in the test case's setUp(), monkey patch it, and then restore it in tearDown(). If all the tests derived from one base class, it would only be necessary to extend that class. The unittest.mock module easily allows every test to be patched with a single decorator statement. It seems like it could be two fewer lines of code, but I don't immediately see how to get the name of the test and use it with only the mock.patch decorator. It looks like one would have to use a patcher's start and stop, which is back to somewhat more boilerplate than with pytest.

Black Squirrel Recap

I did it: I beat my previous best Black Squirrel time by 4 minutes and 45 seconds, finishing in 2:18:39, 87th out of 302 finishers. This time put me at 4th in my age group (50-59 men). I wasn't really close to the podium. Paul Nielsen, in 3rd place, finished 13 minutes ahead of me and 48th overall. Am I satisfied? Very!

https://live.staticflickr.com/65535/48695711123_8cb908e117_b.jpg

The black Abert's Squirrel is very shy, but I got a couple to pose with me. Photo by Ed Delosh, who was first in my age group.

There are four parts to the Black Squirrel course: a one mile preamble on a dirt road, a 4.5 mile climb on mostly single track and some fire road, a 3 mile single track descent, and 4.6 miles of rolling valley single track. I did the first mile in 8 and a half minutes, the climb and descent in one hour and 22 minutes, and the valley trails in 47 minutes. I feel good about how I did on the first 3 parts.

I struggled on the valley trail. It was hot and humid and my pace crashed whenever the trail tilted upwards even the slightest. I hiked some of it. As in 2015, I was passed by at least a dozen runners. I would love to figure out how to run the valley trails at faster than 9 minutes per mile, which would shave 5+ minutes off my time for next year.

I've been recording my runs on a Garmin Forerunner 35 since the beginning of the year. The data says that I ran the up and down parts of the race 5 minutes and 30 seconds faster than on a July 21 training run. Did I go too hard and leave nothing for the finish? The data says that I ran the final 2.2 miles ("Lory - East Valley Trail" segment on Strava) 40 seconds slower than on July 21. Let's say I lost another minute on the 2.5 miles between that segment and the end of the descent. That would be 1:40 lost on the flat, but 5:30 gained on the mountain. I think I made the right choice for this race. I haven't been running fast enough on that kind of rolling, slightly downhill, terrain to make up for time that I could have conceded on the climb or descent. For next year, I think I must to do a few more summer speed workouts and build more muscle if I'm going to improve on this year's result. Cooler weather would help, too. Roughly half of top finishers were 4-5 minutes slower than last year, and I heard other people acknowledge feeling the heat toward the finish.

The first male finisher was Nathan Austin in 1:38:30 and the first woman to finish was Rachael Rudel in 1:42:35 (8th overall). That's a new course record for women. All the results can be found at the event's web site: http://gnarrunners.com/black-squirrel-half/.

See you all next year!

Black squirrel training recap

The 2019 edition of the Black Squirrel Half is five days away. In 2015 I finished in 2:23:24. In 2018, 2:26:04. I'm aiming for a personal best in 2019 and am optimistic about it because I'm a better runner than in 2015 and in better shape. I'm lighter, I'm stronger. I've put in some solid miles during July and August, done more workouts than I did in the past summers. Unlike last summer, where I spent multiple weeks before the race on vacation at sea level, this year I have spent two weeks hiking and running at 8000 feet and higher.

Having turned 50, I'm in a new age group this year. I finished 30th in the group of 40-49 year-old men last year. The same time would have put me 8th in the 50-59 year male group. To finish in the top five this year I will probably have to finish in 2 hours and 10 minutes, 13 minutes faster than in 2015. That's a minute per mile faster, a big leap. However, I have increased my cadence, my speed on flat trails, my downhill confidence, and have been blowing up my previous best times on the climbs. If I get enough rest this week and summon enough determination on Saturday... we'll see. No matter what, I'm planning to have a good time, and enjoy hanging out with friends afterwards.