Listing open GDAL datasets in Python

This post is a follow-up to Friday's and shows how you can dump a listing of open GDAL datasets in your own Python programs using nothing other than Python's standard library and how you can analyze the dumps with Python and an extra module or two.

If you've imported a Python module that links GDAL, such as Rasterio or GDAL's own Python bindings, you can access GDAL's GDALDumpOpenDatasets function using Python's ctypes module. That function takes a FILE pointer as its only argument and you can get a pointer to stdout or stderr from ctypes as well. I will use Rasterio's interactive dataset inspector to demonstrate.

Note that the C level FILE pointer to stdout is ctypes.c_void_p.in_dll(handle, "__stdoutp") on OS X and ctypes.c_void_p.in_dll(handle, "stdout") on Linux. The listing printed by the function bypasses Python and goes to the terminal.

$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif
Rasterio 1.0.28 Interactive Inspector (Python 3.6.4)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> import ctypes
>>> handle = ctypes.CDLL(None)
>>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp')
>>> _ = handle.GDALDumpOpenDatasets(cstdout)
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif

The first field in every record is the reference count of the dataset, the second is whether it is a shared dataset (S) or not (N), the third is the format driver's short name, the third is a thread id, the fourth is the dataset shape, and the fifth is the dataset's identifier.

In the interpreter we find what we expect: one shared open dataset with a reference count of 1.

What if we wanted to process the listing in Python? We would need to capture the low-level file descriptors and expose them in Python. There's a nice blog post about issues and an implementation at https://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/. Pytest includes a fixture for this, capfd, and it can be used in a test like the one shown below.

import ctypes


def test_sharing_on(capfd):
    """Datasets are shared"""
    # Open a dataset in not-shared mode.
    ...
    handle = ctypes.CDLL(None)
    cstdout = ctypes.c_void_p.in_dll(handle, "stdout")
    assert 1 == handle.GDALDumpOpenDatasets(cstdout)
    captured = capfd.readouterr()
    assert "1 N GTiff" in captured.out
    assert "1 S GTiff" not in captured.out

There's a package named capturer which is inspired by pytest and does the same kind of thing as a context manager.

$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif
Rasterio 1.0.28 Interactive Inspector (Python 3.6.4)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> import ctypes
>>> handle = ctypes.CDLL(None)
>>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp')
>>> from capturer import CaptureOutput
>>> with CaptureOutput() as capfd:
...     _ = handle.GDALDumpOpenDatasets(cstdout)
...     handle.GDALDumpOpenDatasets(cstdout)
...     captured = capfd.get_text()
...
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif
>>> print(captured)
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif

Chiles and fire

The sound and smell of charring chiles is a big part of the Colorado farmers market experience in September. The roar of propane burners and rattle of crisped peppers. The sweet, smokey, aroma of burned chile skins and roasted fruit. It's an end-of-summer ritual.

I've bought many a bag of charred chiles at the market, but these days I prefer to buy them fresh and take them home to give them the personal touch over a hot charcoal fire. In my experience, a gently charred chile that is not tumbled is much easier to peel. The blistered skins come off in big pieces and there's no need to rinse them or peel them under water, which would dilute their sublime flavor.

Yesterday I brought home a bag of green poblano (when red ripe and dried, this is known as ancho) and Big Jim chiles, 10 of each. The Big Jim is a New Mexico chile hybrid developed at New Mexico State University. It's the largest of this kind of chile and has a thick skin. Poblanos have a much thinner skin. It would not be a good idea to roast these together in a cage using propane burners: the tender poblanos would be destroyed. It's easy to do them together on a grill.

https://live.staticflickr.com/65535/48738984783_533d9937b7_b.jpg

Short, wide poblanos and long Big Jims on the grill

A week ago, I gave a bunch of Pueblo chiles the same treatment. The Pueblo chile looks a lot like a New Mexico chile, but is quite different. It's a large Mirasol chile that develops pointing up towards the sun. It seems to be a relatively recent import from Mexico. The New Mexico chile points down towards the earth and is derived from chiles grown by Pueblo people of Southwest North America. The Hatch chile is a New Mexico chile grown in the Hatch Valley along the Rio Grande River upstream of Las Cruces, New Mexico. Hatch is a distinguished appellation like Champagne or Napa are for grapes and wine and Palisade is for peaches, and is way ahead of the game in comparison to Colorado's Pueblo chile growers. New Mexico also has an official State Question: "Red or green?" In other words: do you want red chile or green chile on that? Colorado's climate is cooler and red chiles are less often found at market. I love all of these chiles, green or red.

http://www.mvd.newmexico.gov/uploads/.thumbnails/150x150/9be2dc9b1c8e87cda9d1b19dd0499169.png

I could move to New Mexico just for the plates

It doesn't take long over a hot fire for big blisters to form on the Big Jims and smaller ones to form on the poblanos. The technique is basic: turn them until the blistering is fairly uniform and then put them, whole, in a dry, heavy pot with a lid to steam for a few minutes. This will let the skins continue to loosen as the chiles cool.

https://live.staticflickr.com/65535/48738984633_128ce0cf6d_b.jpg

Flipped chiles

Here are the skinned chiles. The poblanos took about twice as long to peel because the thinner skins come off in smaller pieces. I don't sweat the little scraps of charred skin and never rinse the chiles. You wouldn't rinse a steak or a sausage, would you?

https://live.staticflickr.com/65535/48739315346_3242844ba3_b.jpg

Big Jim chiles, skinned

https://live.staticflickr.com/65535/48739494917_a772c8f20b_b.jpg

Poblano chiles

The final steps are trimming the stems, scraping the seeds from the inside, and putting them in freezer containers with the juices on the plates and in the pot. I will use the Big Jims in fall and winter stews and use the poblanos as a condiment with tacos or eggs or grilled steak.

Debugging temporary files using pytest autouse fixtures

This week I discovered that Rasterio doesn't always close the temporary in-memory datasets that are used within some of its methods. In testing Rasterio's WarpedVRT class I used a GDAL function to dump descriptions of all open datasets and found a bunch that looked unrelated to WarpedVRT. They were GDAL "MEM" type datasets with UUIDs for names, which didn't tell me much. What were their origins?

They have UUIDs for names because Rasterio imports uuid in its _io module and calls uuid.uuid4() to make temporary dataset names. If only the dataset name included the name of the test in which it was created, then I'd have an entry point into debugging. One way to do this is with a pytest auto-used fixture.

I changed the rasterio._io module's import statement from import uuid to from uuid import uuid4 to make it slightly easier to monkey patch and then I added 5 lines of code to Rasterio's conftest.py file:

@pytest.fixture(autouse=True)
def set_mem_name(request, monkeypatch):
    def youyoueyedeefour():
        return "{}-{}".format(request.node.name, uuid.uuid4())
    monkeypatch.setattr(rasterio._io, "uuid4", youyoueyedeefour)

This set_mem_name fixture uses two standard pytest fixtures: request and monkeypatch. The value of request.node.name is the name of the test and this set_mem_name fixture uses monkeypatch to replace uuid4 in rasterio._io with a custom function that prepends the name of the test to the UUID. The autouse=True argument tells pytest to add this fixture to every test it finds. I didn't need to touch the code of any of Rasterio's tests, not a one.

This quickly revealed to me that the unclosed temporary datasets were coming from tests that asserted certain exceptions were being raised by Rasterio's reprojection code. This code used temporary datasets and didn't close them before raising the exception to the caller. Once I changed the code to do the following, Rasterio no longer leaked datasets from those tests, or in our programs.

try:
    ...
    if condition:
        raise CRSError("error")
    ...
except:
    temp.close()
    raise

If Rasterio used only Python's unittest module, and not pytest, it would be possible to do the same thing. Import rasterio._io in the test case's setUp(), monkey patch it, and then restore it in tearDown(). If all the tests derived from one base class, it would only be necessary to extend that class. The unittest.mock module easily allows every test to be patched with a single decorator statement. It seems like it could be two fewer lines of code, but I don't immediately see how to get the name of the test and use it with only the mock.patch decorator. It looks like one would have to use a patcher's start and stop, which is back to somewhat more boilerplate than with pytest.

Black Squirrel Recap

I did it: I beat my previous best Black Squirrel time by 4 minutes and 45 seconds, finishing in 2:18:39, 87th out of 302 finishers. This time put me at 4th in my age group (50-59 men). I wasn't really close to the podium. Paul Nielsen, in 3rd place, finished 13 minutes ahead of me and 48th overall. Am I satisfied? Very!

https://live.staticflickr.com/65535/48695711123_8cb908e117_b.jpg

The black Abert's Squirrel is very shy, but I got a couple to pose with me. Photo by Ed Delosh, who was first in my age group.

There are four parts to the Black Squirrel course: a one mile preamble on a dirt road, a 4.5 mile climb on mostly single track and some fire road, a 3 mile single track descent, and 4.6 miles of rolling valley single track. I did the first mile in 8 and a half minutes, the climb and descent in one hour and 22 minutes, and the valley trails in 47 minutes. I feel good about how I did on the first 3 parts.

I struggled on the valley trail. It was hot and humid and my pace crashed whenever the trail tilted upwards even the slightest. I hiked some of it. As in 2015, I was passed by at least a dozen runners. I would love to figure out how to run the valley trails at faster than 9 minutes per mile, which would shave 5+ minutes off my time for next year.

I've been recording my runs on a Garmin Forerunner 35 since the beginning of the year. The data says that I ran the up and down parts of the race 5 minutes and 30 seconds faster than on a July 21 training run. Did I go too hard and leave nothing for the finish? The data says that I ran the final 2.2 miles ("Lory - East Valley Trail" segment on Strava) 40 seconds slower than on July 21. Let's say I lost another minute on the 2.5 miles between that segment and the end of the descent. That would be 1:40 lost on the flat, but 5:30 gained on the mountain. I think I made the right choice for this race. I haven't been running fast enough on that kind of rolling, slightly downhill, terrain to make up for time that I could have conceded on the climb or descent. For next year, I think I must to do a few more summer speed workouts and build more muscle if I'm going to improve on this year's result. Cooler weather would help, too. Roughly half of top finishers were 4-5 minutes slower than last year, and I heard other people acknowledge feeling the heat toward the finish.

The first male finisher was Nathan Austin in 1:38:30 and the first woman to finish was Rachael Rudel in 1:42:35 (8th overall). That's a new course record for women. All the results can be found at the event's web site: http://gnarrunners.com/black-squirrel-half/.

See you all next year!

Black squirrel training recap

The 2019 edition of the Black Squirrel Half is five days away. In 2015 I finished in 2:23:24. In 2018, 2:26:04. I'm aiming for a personal best in 2019 and am optimistic about it because I'm a better runner than in 2015 and in better shape. I'm lighter, I'm stronger. I've put in some solid miles during July and August, done more workouts than I did in the past summers. Unlike last summer, where I spent multiple weeks before the race on vacation at sea level, this year I have spent two weeks hiking and running at 8000 feet and higher.

Having turned 50, I'm in a new age group this year. I finished 30th in the group of 40-49 year-old men last year. The same time would have put me 8th in the 50-59 year male group. To finish in the top five this year I will probably have to finish in 2 hours and 10 minutes, 13 minutes faster than in 2015. That's a minute per mile faster, a big leap. However, I have increased my cadence, my speed on flat trails, my downhill confidence, and have been blowing up my previous best times on the climbs. If I get enough rest this week and summon enough determination on Saturday... we'll see. No matter what, I'm planning to have a good time, and enjoy hanging out with friends afterwards.

Road trip recap 1: Fort Collins to Moab

Earlier this month, My family and I set off on a road trip through southeastern Utah and southwestern Colorado with friends from Montpellier, France. Our itinerary: Arches National Park, Mesa Verde National Park, Durango, Silverton, and Great Sand Dunes National Park. One of our friends had been to the United States twenty years ago, the other three, never. They spent a week in New York City before Ruth picked them up at Denver International Airport. We hit the road the next day, all 8 of us in a 2016 Honda Odyssey, loaded with optimism, good intentions, and Harry Potter audiobooks.

https://live.staticflickr.com/65535/48580595166_63a2157b7a_b.jpg

View from the 3rd row of our Honda Odyssey, leaving Fort Collins, CO

Configured for this road trip, our Odyssey has 3 rows of seats, 8 in all, with 31 cubic feet of cargo space. We had another 11 cubic feet in our ski carrier. This turned out to be completely adequate for 4 adults and 4 kids. We had enough room in back for a mid-sized cooler and bags. Outdoor gear like extra shoes, hats, picnic blankets, &c went up top. We soon settled on a formation of three kids in the back row, 2 adults and one kid in the second row, and 2 adults up front. Fully loaded like this, we got about 25 miles per gallon. When you multiply this by 8, 200 person-miles per gallon isn't bad mileage. The curb weight of the Odyssey is about 4500 pounds. Human weight was another 1000 pounds. And then we probably had another 200 pounds of gear, food, water, and ice. We've never asked so much of our car and it did fine. It affords good views, has a small sunroof, the seats are comfortable. The one drawback is that when it comes to listening to music or audiobooks, the occupants of the 3rd row seats depend on the speakers at the feet of their family in the 2nd row. In the 2nd you get blasted and in the 3rd it's not quite loud enough.

Much of the time on this trip we listened to Bernard Giraudeau read the first three Harry Potter novels. My French is just good enough to follow along and I loved having native French speakers along to explain the subtle details, such as that Giraudeau's impression of Gilderoy Lockhart used an obviously false Languedoc accent. It seemed a weird to me, too.

Our friends speak English as well as I speak French, but as their kids are beginners in English and Ruth, Arabelle, and Bea are quite fluent in French, we mainly used French. I spoke French every day of the trip, and not just with our friends: we were among French or French-Canadian tourists everywhere we went.

The drive from Fort Collins to Moab is long and we broke it up by stopping for a night in Glenwood Springs, a tourist town at the west end of a beautiful canyon. The next morning we stopped shortly after in Palisade to show our friends some orchards and buy fresh peaches and nectarines directly from the producers. Abundant sunshine, water from the Colorado River, and a relatively (for Colorado) frost-free microclimate make Palisade one of Colorado's best places to grow fruit. It's a location not unlike the Terasses du Larzac in the south of France, where cool air descending from the nearby cliffs keeps grapes from stewing in their skins after the sun goes down.

https://live.staticflickr.com/65535/48580742982_8f9d7a8098_b.jpg

Mt. Garfield and peach trees, Palisade, CO

https://live.staticflickr.com/65535/48580743032_6dee6367c0_b.jpg

Palisade peaches

After leaving Palisade, Grand Junction, and Fruita behind we crossed into Utah and then left I-70 to take the scenic route along the Colorado River to Moab, Utah's State Route 128. We picnicked at the old Dewey Bridge site, stopped briefly at the base of the Fisher Towers, and enjoyed the rise of the canyon walls as we approached Moab. I hadn't been on SR-128 since 1992 and it was as beautiful as I remembered. Our friends were gobsmacked by the colors. I still am, and I've been exploring Southern Utah for fifty years. It's a uniquely beautiful part of the world.

https://live.staticflickr.com/65535/48601723286_67f901992b_b.jpg

Fisher Towers, Utah

Bonne rentrée

Today is the first day of the 2019-2020 school year for French kids. Here's a photo from Bea's rentrée, 2016.

https://live.staticflickr.com/65535/48666176571_79bca797f7_b.jpg

Vacation

My family and I are taking friends from Montpellier (France) on a Colorado and Utah road trip and I'll be away from work and open source projects until the 16th.

Rasterio 1.0.25

I released Rasterio 1.0.25 yesterday. It has a few important bug fixes, but the core of the work was writing and testing shims to make the package compatible with GDAL version 3.0 and PROJ version 6. Norman Barker did much of that work and I only had to make sure that we were using the right coordinate axis order strategy everywhere and figure out which of the output changes were new Rasterio bugs and which were actually improvements delivered by PROJ 6.

Please note that the binary wheels for 1.0.25 on PyPI contain GDAL 2.4.2, not 3.0, and that no new features of GDAL 3 and PROJ 6 are intentionally exposed in Rasterio's API. My wheel builds are already running up against time limits on Travis CI, and GDAL 3 and PROJ 6 take even longer to compile. My system is going to need some more hacks before Rasterio wheels with the latest GDAL and PROJ are possible. You might be able to get a combination of Rasterio 1.0.25, GDAL 3, and PROJ 6 from conda-forge. I look forward to hearing how that works for users.

I hope you'll appreciate that I've managed to shrink the size of the GDAL shared libraries in the manylinux wheels by 50%. I wish I knew why they are so much bigger than the OS X libraries. I suspect it's due to the ancient toolchain and glibc used by manylinux1.

Never Summer 100K volunteering

In July the the Gnar Runners put on a race in State Forest State Park, the Never Summer 100K. It's a 64 mile loop through the Never Summer and Rawah ranges and the Colorado State Forest. The race requires park rangers, emergency medical technicians, ham radio operators, and lots of volunteer bodies.

https://live.staticflickr.com/65535/48400926322_fa06cb8031_b.jpg

Lulu and Thunder mountains, and moose, from Cameron Pass at 7 a.m.

In 2018 I flipped burgers at the finish line so that runners could get a hot meal in the middle of the night after being on the trail for twenty hours or more. In 2019, I spent an afternoon and evening supporting runners at the "Canadian" aid station at the race's 50 mile mark. The aid station gets its name from being near Never Summer Nordic's North Fork Canadian Yurt near the North Fork of the Canadian River, a tributary of the North Platte River. We had Canadian flags and a life-sized cutout of Justin Trudeau. At some point there was consensus that next year Ryan Gosling should join us.

https://live.staticflickr.com/65535/48400926962_0a3bf8359a_b.jpg

Canadian aid station ready for runners to arrive, 2 p.m.

I had a cold, so I didn't cook or handle food, but I did a little of everything else. I hauled gear, deployed a portable toilet, set up canopies, organized and fetched drop bags, helped runners arrive, recover, and head out again. After the 1 a.m. cutoff, I helped break down the aid station and pack it all up. I was out on the course for 15 hours, 3 hours more than the first placed runner, but 8 hours less than the last finisher.

https://live.staticflickr.com/65535/48400786551_b1a322132f_b.jpg

Last of the day's rain showers, 8 p.m.

We could see rain on the course as early as 11 a.m. None fell at the Canadian aid station until about 4 p.m., but then it rained until just before sunset. We had a pretty solid shower of small hail as well. Runners arriving in this rain were pretty worn out from a 6 mile slog through heavy mud. Some contemplated dropping out and a few did. Most found the energy to continue after some food, a cup of ramen or tea, and a couple minutes out of the rain. It was, after all, only 14 miles to the finish and there was plenty of time left.

https://live.staticflickr.com/65535/48400784606_e4d737a156_b.jpg

Sunset and mud puddles

I felt good helping runners accomplish their goal and had a great time hanging out with other volunteers, many of them experienced ultra-runners, and listening to their stories. I wish I'd spent more time with the ham radio team and learned more about packet radio and running a network in the backcountry.

Race director Nick Clark's official recap of the race is here: http://gnarrunners.com/2019/08/a-recap-of-the-2019-never-summer-100km/.