SciPy at 1.0

Sean Gillies

2017-10-25 09:14

SciPy 1.0 was announced this morning. Ralf Gommers wrote:

A version number should reflect the maturity of a project - and SciPy was a mature and stable library that is heavily used in production settings for a long time already. From that perspective, the 1.0 version number is long overdue.

Some key project goals, both technical (e.g. Windows wheels and continuous integration) and organisational (a governance structure, code of conduct and a roadmap), have been achieved recently.

The code of conduct he mentions is here: https://github.com/scipy/scipy/blob/master/doc/source/dev/conduct/code_of_conduct.rst.

It's been interesting lurking in the code of conduct discussions on scipy-dev and on GitHub. The project used a few existing CoCs as starting points (including the Contributor Covenant used at Mapbox) and ended up with something they felt more suited to their community.

Four Years at Mapbox

Sean Gillies

2017-10-23 11:19

I'm starting my 5th year at Mapbox today!

Four years ago I wrote a blog post about joining. I'm still very glad I did.

Bring Blogging Back For Real

Sean Gillies

2017-10-22 17:04

https://c1.staticflickr.com/5/4477/37867130101_6a6e4d854c_o.jpg

Can you see that I am serious?

Day After Thoughts About Trail Running

Sean Gillies

2017-10-22 15:21

Today, the day after the Blue Sky Trail Marathon, I'm tending my home's wireless network and thinking about the Trail Quillan, the race that motivated my training during the Winter of 2016-2017 and inspired me to sign up for the Blue Sky Trail Marathon.

The Trail Quillan site has some great photos from 2017. My favorite sets are from the top, Pech Tignous, where you can see the snow (and me, number 80), and Belvianes et Belvédère du Diable. Runners from the Front Range of Colorado would find Quillan both familiar and exotic at the same time and vice versa; if anyone reading this in Colorado or France discovers an interest in crossing over for a trail run and would like a contact, please do let me know and I would be happy to help you. The spirit of the organizers and the participants in these events is very similar. The Fort Collins Gnar Runners host a grand slam series of events, and in the Pays Cathare there is the Défi Sud Trail's.

I'm beginning to think about proposing to my family a trip to France in early summer of 2018 that would include a couple races in the foothills of the Pyrénées.

Blue Sky Marathon Finisher

Sean Gillies

2017-10-21 17:27

Well, I did it: 26.2 miles and 3780 feet of climbing in 5 hours and 51 minutes. I was the 175th finisher out of 285. Abby Mitchell, the first woman to finish (11th overall) finished in 3:50. Chris Mocko, the overall winner, finished in 3:15.

The course went north at first, counter-clockwise around a loop in Horsetooth Mountain Park, and then back to the start. About 9 miles. From there, we went south and counter-clockwise around two loops in Devil's Backbone, and then back to a finish line just a few meters from the start.

I suffered from gastro symptoms until the very end of the course, stopping at every toilet along the route, and behind some trees as well. I coped by not eating on the course, subsisting on water and a few gels, and staying well within my limits. I felt pretty good in the last 4 miles after a gel pack and a big swig of coke at the last aid station. I probably moved up 12 places in that distance. While I didn't do as well as I'd hoped, I did finish my first trail marathon, and overcame a bit of adversity to do it. Good job, me!

I ran for more than 90 hours this summer and fall to train for this 6 hour race, and I couldn't have done that without my family's support. Thank you, R., A., and B.! I hope you'll back me again in 2018.

If you're into this kind of thing, I recommend giving the Blue Sky Marathon or one of the other Gnar Runners events a try. The trails and views are sweet and it's very well run and staffed.

I almost forgot: my map data workflow is the following bash one-liner:

$ fio dump export.gpx --layer tracks | mapbox upload blue_sky_marathon

Race Day Eve

Sean Gillies

2017-10-20 17:49

In 12 hours I'll be driving to the start of the Blue Sky Marathon. This evening I'm setting my gear out, getting thoroughly hydrated and fed, and suffering from some kind of gastro-intestinal thing. Yesterday afternoon I was dizzy and tired. Now I've got some mild diarrhoea. I'm preparing as though I'll be fine in the morning since there's nothing else I can do about it.

A cold front is coming through tonight and there may be snow above 8500 feet. The high point of the course is 2000 feet below that and shouldn't be affected. Tomorrow is forecast to be sunny, cooler, and breezy. Better conditions than during my last long run on the course two weeks ago.

Being asleep by 8 p.m. will be hard, but I'm going to try. I'll have photos on Instagram and this site tomorrow.

Mock is Magic

Sean Gillies

2017-10-19 19:46

I'm sprinting with my teammates with occasionally spotty internet. We're developing a module that takes some directory names, archives the directories, uploads the archive to S3, and then cleans up temporary files. Testing this by actually posting data to S3 is slow, leaves debris, and is almost pointless: we're using boto3 and boto3 is, for our purposes, solid. We've only ever found one new boto bug at Mapbox, and that involved very large streaming uploads. My teammates and I only need to test that we're making a proper archive, using the boto3 API properly, and cleaning up afterwards. Whether or not data lands on S3 isn't important for these tests. Python's mock module is one of many Python tools for faking components during testing. If you're not already using it to create test doubles for boto components (and AWS services), this post will help get you started down the right path.

Here's the function to test, with the code I actually want to be testing glossed over. This post is about boxing out the stuff we don't want to test and so we'll be looking only at making and using mock boto3 objects.

"""mymodule.py"""

import boto3


def archive_and_upload(dir, bucket, key):
    """Archive data and upload to S3"""
    # A bunch of code makes a zip file in a `tmp` directory.

    boto3.resource('s3').Object(bucket, key).upload_file(
        os.path.join(tmp, zip_file))

    # A bunch of code now cleans up temporary resources.

Now, in the test function that we're discovering and running with pytest, we create a fake boto3 API using mock.patch.

from unittest.mock import patch

from mymodule import archive_and_upload


@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload('test', 'bucket', 'key')

While the test runs, boto3 in the module is replaced by an instance of unittest.mock.MagicMock. We're also able to bind the same mock object to boto3 for inspection within the test function by passing that as an argument. These mock objects have almost incredible properties. Substituting one for the boto3 module gives us a fairly complete API in the sense that all the methods and properties seem to be there.

>>> from unittest.mock import MagicMock
>>> boto3 = MagicMock()
>>> boto3.resource('s3')
<MagicMock name='mock.resource()' id='4327834232'>
>>> boto3.resource('s3').Object('mybucket', 'mykey')
<MagicMock name='mock.resource().Object()' id='4327879960'>

It does almost nothing, of course, but that's fine for these tests. One thing that the mock objects do to help with testing is record how they are accessed or called. We can assert that certain calls were made with certain arguments.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload('test', 'bucket', 'key')

    boto3.resource.assert_called_with('s3')
    boto3.resource().Object.assert_called_with('bucket', 'key')
    boto3.resource().Object().upload_file.assert_called_with('/tmp/test.zip')

Asserting that the mock file uploader was called with the correct argument is, in this case, preferable to posting data to S3. It's fast and leaves no artifacts to remove. If we wanted to test that archive_and_upload does the right thing when AWS and boto3 signal an error, we can set a side effect for the mock upload_file method.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload_authorized(boto3):
    """Unauthorized errors are handled"""

    boto3.return_value.resource.return_value.Object.return_value.upload_file.side_effect = \
        botocore.exceptions.ClientError(
            {'Error': {'Code': '403', 'Message': 'Unauthorized'}}, 'PutObject')

    archive_and_upload('test', 'bucket', 'key')

    # assert that exception has been handled.

A botocore.exceptions.ClientError will be raised in archive_and_upload from the upload_file call. We could test against a bucket for which we have no access, but I think the mock is preferable for a unit test. It doesn't require an internet connection and doesn't require any AWS ACL configuration.

Mock's magic can, however, lead to subtle bugs like the one in the test below. Can you find it?

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload_wtf(boto3):
    """Why does this keeping failing?"""
    archive_and_upload('test', 'bucket', 'key')

    boto3.resource().Object().upload_file().assert_called_with('/tmp/test.zip')

Because all mock methods and properties yield more mocks, it can be hard to figure out why boto3.resource().Object().upload_file() is never called with the expected arguments, even when we're certain the arguments are right. Unintended parentheses after upload_file cost me 15 minutes of head scratching earlier this morning.

P.S. If testing the components of distributed geospatial data processing systems with mock or other test double frameworks is the kind of thing you enjoing doing at work, you might be interested in these Mapbox job postings:

Minutemen Live at Brett's Party

Sean Gillies

2017-10-19 08:34

I can't get over the amazing video time capsule that is Minutemen playing in a backyard in Rancho Palo Verde in June, 1985. From the uploader:

It was 1985. Brett & my bdays are in June. We had both just graduated from college. We had a birthday/graduation party at his mom's house on the outskirts of San Pedro. The Minutemen played. It was a great day.

Many of the 6258 views of this video are mine.

Mercantile 0.11.0

Sean Gillies

2017-10-18 07:32

As I mentioned the other day, my team is in Fort Collins this week for a sprint. Damon Burgett and I were sitting together working yesterday and he says, "I keep looking for the inverse of the xy function in mercantile, but it's never there." It's true, the module is missing a function to convert web mercator x and y to longitude and latitude. He wrote it, and some tests that numbers round trip properly through xy and lnglat, then made a pull request. I merged it, tagged it, pushed to GitHub. A minute later Travis-CI had uploaded mercantile 0.11.0 to the Python package index and we were pulling it back into our sprint work through an updated pip requirements file. I love how frictionless Python development and packaging can be now.

I've got limited time for Rasterio and GDAL issues this week. I can catch up a bit in the evenings, but I must prioritize getting extra rest before Saturday's race. Apologies if I don't respond until Monday.

One More Week of Running

Sean Gillies

2017-10-15 10:47

Next week is the 18th and final week of my training for the Blue Sky Marathon, an all-dirt and 90% singletrack in the foothills west of Fort Collins. The race is on Saturday. I'm aiming to finish in less than 5:30, a 12 minute-per-mile pace.

I've run more miles in training than I have for my previous marathon: over 500 by Saturday morning. I've run over 20 miles three times and a little less than 20 on the race course once. I've run more hills than I did in training for the Trail Quillan in March. Barring an accident, I'm going to finish. With luck, I may finish with a respectable time.

The weather forecast for this week is beautiful: sunny, dry, and mild. The latest forecast discussion from the Denver/Boulder NWS office has a change coming for Saturday: a high of about 60 and a chance of showers. I'd rather have dry and 70, but am relieved that the trails will be dry and snow gear won't be needed.

This is the one year anniversary of my first race in France: the Trail des Calades. The 4th edition of the race was run earlier today. I saw photos online and it looked like a great day in Saint-Jean-de-Cuculles.

Because of my participation in the Blue Sky Marathon, I won't be at the State of the Map in Boulder on Saturday like much of Mapbox and many the other mapping folks in Fort Collins. I will, however, get to see my own Mapbox team: they're coming here to work with Matt Perry and I all week. It's the first time we've come together as a team outside of DC or San Francisco, and my first chance to hang in person with Vincent Sarago! I heard some French on the A trail today while I was doing my last longish training run and did a double take, but it wasn't Vincent.