Wobbly Fiona and Shapely Wheels

On 2017-06-05 I began to use GDAL 2.2.0 in the Fiona binary wheels I upload to the Cheeseshop. I also changed the MACOSX_DEPLOYMENT_TARGET environment variable from 10.6 to 10.9 in order to get GDAL 2.2.0 to compile. This seems to be the cause of Shapely and Fiona wheel compatibility issues. Have I mentioned that I'm not completely competent at C++ on OS X? I'm not.

My co-worker Matt Perry provided a script that exposed this issue clearly and I bisected in Fiona and Shapely version space until I found the root of the problem.

Until I get new wheels on PyPI, here are the three coping strategies.

  1. Avoid the wheels entirely:
pip install -I --no-binary fiona,shapely fiona==1.7.10 shapely==1.6.2

This is what I do in my Dockerfiles at work. This strategy requires the GDAL and GEOS libraries and headers to be pre-installed on your system, and also Numpy and Cython.

  1. Downgrade Fiona, using wheels:
pip install -I fiona==1.7.6
  1. Downgrade Shapely, using wheels:
pip install -I shapely==1.6b4

The issue affects Rasterio, too. Rasterio version 1.0a9 wheels should be safe with any version of Shapely.

I'll have an announcement about new wheels soon.

Update (2017-10-31): the nightmare is over.

Trying Strava

I've been using Runkeeper for the past two years, but am going to give Strava a try while I'm not training for anything in particular. The big difference between the two are Strava's segments and leader board.

I've made my first route during a lunchtime run yesterday: https://www.strava.com/routes/11012256.

There's a segment at the start that 161 Strava users have run 435 times (since when, I can't tell): https://www.strava.com/segments/961980.

I ran this route and segment with no thought to the leader board. Would I have run harder if I was aware of it going in? Maybe. Will I try to move up the board next time? I think so. I'm going to try to have a little fun with it, balanced by some critical reading about gamification.

SciPy at 1.0

SciPy 1.0 was announced this morning. Ralf Gommers wrote:

A version number should reflect the maturity of a project - and SciPy was a mature and stable library that is heavily used in production settings for a long time already. From that perspective, the 1.0 version number is long overdue.

Some key project goals, both technical (e.g. Windows wheels and continuous integration) and organisational (a governance structure, code of conduct and a roadmap), have been achieved recently.

The code of conduct he mentions is here: https://github.com/scipy/scipy/blob/master/doc/source/dev/conduct/code_of_conduct.rst.

It's been interesting lurking in the code of conduct discussions on scipy-dev and on GitHub. The project used a few existing CoCs as starting points (including the Contributor Covenant used at Mapbox) and ended up with something they felt more suited to their community.

Day After Thoughts About Trail Running

Today, the day after the Blue Sky Trail Marathon, I'm tending my home's wireless network and thinking about the Trail Quillan, the race that motivated my training during the Winter of 2016-2017 and inspired me to sign up for the Blue Sky Trail Marathon.

The Trail Quillan site has some great photos from 2017. My favorite sets are from the top, Pech Tignous, where you can see the snow (and me, number 80), and Belvianes et Belvédère du Diable. Runners from the Front Range of Colorado would find Quillan both familiar and exotic at the same time and vice versa; if anyone reading this in Colorado or France discovers an interest in crossing over for a trail run and would like a contact, please do let me know and I would be happy to help you. The spirit of the organizers and the participants in these events is very similar. The Fort Collins Gnar Runners host a grand slam series of events, and in the Pays Cathare there is the Défi Sud Trail's.

I'm beginning to think about proposing to my family a trip to France in early summer of 2018 that would include a couple races in the foothills of the Pyrénées.

Blue Sky Marathon Finisher

Well, I did it: 26.2 miles and 3780 feet of climbing in 5 hours and 51 minutes. I was the 175th finisher out of 285. Abby Mitchell, the first woman to finish (11th overall) finished in 3:50. Chris Mocko, the overall winner, finished in 3:15.

The course went north at first, counter-clockwise around a loop in Horsetooth Mountain Park, and then back to the start. About 9 miles. From there, we went south and counter-clockwise around two loops in Devil's Backbone, and then back to a finish line just a few meters from the start.

I suffered from gastro symptoms until the very end of the course, stopping at every toilet along the route, and behind some trees as well. I coped by not eating on the course, subsisting on water and a few gels, and staying well within my limits. I felt pretty good in the last 4 miles after a gel pack and a big swig of coke at the last aid station. I probably moved up 12 places in that distance. While I didn't do as well as I'd hoped, I did finish my first trail marathon, and overcame a bit of adversity to do it. Good job, me!

I ran for more than 90 hours this summer and fall to train for this 6 hour race, and I couldn't have done that without my family's support. Thank you, R., A., and B.! I hope you'll back me again in 2018.

If you're into this kind of thing, I recommend giving the Blue Sky Marathon or one of the other Gnar Runners events a try. The trails and views are sweet and it's very well run and staffed.

I almost forgot: my map data workflow is the following bash one-liner:

$ fio dump export.gpx --layer tracks | mapbox upload blue_sky_marathon

Race Day Eve

In 12 hours I'll be driving to the start of the Blue Sky Marathon. This evening I'm setting my gear out, getting thoroughly hydrated and fed, and suffering from some kind of gastro-intestinal thing. Yesterday afternoon I was dizzy and tired. Now I've got some mild diarrhoea. I'm preparing as though I'll be fine in the morning since there's nothing else I can do about it.

A cold front is coming through tonight and there may be snow above 8500 feet. The high point of the course is 2000 feet below that and shouldn't be affected. Tomorrow is forecast to be sunny, cooler, and breezy. Better conditions than during my last long run on the course two weeks ago.

Being asleep by 8 p.m. will be hard, but I'm going to try. I'll have photos on Instagram and this site tomorrow.

Mock is Magic

I'm sprinting with my teammates with occasionally spotty internet. We're developing a module that takes some directory names, archives the directories, uploads the archive to S3, and then cleans up temporary files. Testing this by actually posting data to S3 is slow, leaves debris, and is almost pointless: we're using boto3 and boto3 is, for our purposes, solid. We've only ever found one new boto bug at Mapbox, and that involved very large streaming uploads. My teammates and I only need to test that we're making a proper archive, using the boto3 API properly, and cleaning up afterwards. Whether or not data lands on S3 isn't important for these tests. Python's mock module is one of many Python tools for faking components during testing. If you're not already using it to create test doubles for boto components (and AWS services), this post will help get you started down the right path.

Here's the function to test, with the code I actually want to be testing glossed over. This post is about boxing out the stuff we don't want to test and so we'll be looking only at making and using mock boto3 objects.

"""mymodule.py"""

import boto3


def archive_and_upload(dir, bucket, key):
    """Archive data and upload to S3"""
    # A bunch of code makes a zip file in a `tmp` directory.

    boto3.resource('s3').Object(bucket, key).upload_file(
        os.path.join(tmp, zip_file))

    # A bunch of code now cleans up temporary resources.

Now, in the test function that we're discovering and running with pytest, we create a fake boto3 API using mock.patch.

from unittest.mock import patch

from mymodule import archive_and_upload


@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload('test', 'bucket', 'key')

While the test runs, boto3 in the module is replaced by an instance of unittest.mock.MagicMock. We're also able to bind the same mock object to boto3 for inspection within the test function by passing that as an argument. These mock objects have almost incredible properties. Substituting one for the boto3 module gives us a fairly complete API in the sense that all the methods and properties seem to be there.

>>> from unittest.mock import MagicMock
>>> boto3 = MagicMock()
>>> boto3.resource('s3')
<MagicMock name='mock.resource()' id='4327834232'>
>>> boto3.resource('s3').Object('mybucket', 'mykey')
<MagicMock name='mock.resource().Object()' id='4327879960'>

It does almost nothing, of course, but that's fine for these tests. One thing that the mock objects do to help with testing is record how they are accessed or called. We can assert that certain calls were made with certain arguments.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload('test', 'bucket', 'key')

    boto3.resource.assert_called_with('s3')
    boto3.resource().Object.assert_called_with('bucket', 'key')
    boto3.resource().Object().upload_file.assert_called_with('/tmp/test.zip')

Asserting that the mock file uploader was called with the correct argument is, in this case, preferable to posting data to S3. It's fast and leaves no artifacts to remove. If we wanted to test that archive_and_upload does the right thing when AWS and boto3 signal an error, we can set a side effect for the mock upload_file method.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload_authorized(boto3):
    """Unauthorized errors are handled"""

    boto3.return_value.resource.return_value.Object.return_value.upload_file.side_effect = \
        botocore.exceptions.ClientError(
            {'Error': {'Code': '403', 'Message': 'Unauthorized'}}, 'PutObject')

    archive_and_upload('test', 'bucket', 'key')

    # assert that exception has been handled.

A botocore.exceptions.ClientError will be raised in archive_and_upload from the upload_file call. We could test against a bucket for which we have no access, but I think the mock is preferable for a unit test. It doesn't require an internet connection and doesn't require any AWS ACL configuration.

Mock's magic can, however, lead to subtle bugs like the one in the test below. Can you find it?

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload_wtf(boto3):
    """Why does this keeping failing?"""
    archive_and_upload('test', 'bucket', 'key')

    boto3.resource().Object().upload_file().assert_called_with('/tmp/test.zip')

Because all mock methods and properties yield more mocks, it can be hard to figure out why boto3.resource().Object().upload_file() is never called with the expected arguments, even when we're certain the arguments are right. Unintended parentheses after upload_file cost me 15 minutes of head scratching earlier this morning.

P.S. If testing the components of distributed geospatial data processing systems with mock or other test double frameworks is the kind of thing you enjoing doing at work, you might be interested in these Mapbox job postings: