Mock is Magic

I'm sprinting with my teammates with occasionally spotty internet. We're developing a module that takes some directory names, archives the directories, uploads the archive to S3, and then cleans up temporary files. Testing this by actually posting data to S3 is slow, leaves debris, and is almost pointless: we're using boto3 and boto3 is, for our purposes, solid. We've only ever found one new boto bug at Mapbox, and that involved very large streaming uploads. My teammates and I only need to test that we're making a proper archive, using the boto3 API properly, and cleaning up afterwards. Whether or not data lands on S3 isn't important for these tests. Python's mock module is one of many Python tools for faking components during testing. If you're not already using it to create test doubles for boto components (and AWS services), this post will help get you started down the right path.

Here's the function to test, with the code I actually want to be testing glossed over. This post is about boxing out the stuff we don't want to test and so we'll be looking only at making and using mock boto3 objects.

"""mymodule.py"""

import boto3


def archive_and_upload(dir, bucket, key):
    """Archive data and upload to S3"""
    # A bunch of code makes a zip file in a `tmp` directory.

    boto3.resource('s3').Object(bucket, key).upload_file(
        os.path.join(tmp, zip_file))

    # A bunch of code now cleans up temporary resources.

Now, in the test function that we're discovering and running with pytest, we create a fake boto3 API using mock.patch.

from unittest.mock import patch

from mymodule import archive_and_upload


@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload('test', 'bucket', 'key')

While the test runs, boto3 in the module is replaced by an instance of unittest.mock.MagicMock. We're also able to bind the same mock object to boto3 for inspection within the test function by passing that as an argument. These mock objects have almost incredible properties. Substituting one for the boto3 module gives us a fairly complete API in the sense that all the methods and properties seem to be there.

>>> from unittest.mock import MagicMock
>>> boto3 = MagicMock()
>>> boto3.resource('s3')
<MagicMock name='mock.resource()' id='4327834232'>
>>> boto3.resource('s3').Object('mybucket', 'mykey')
<MagicMock name='mock.resource().Object()' id='4327879960'>

It does almost nothing, of course, but that's fine for these tests. One thing that the mock objects do to help with testing is record how they are accessed or called. We can assert that certain calls were made with certain arguments.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload('test', 'bucket', 'key')

    boto3.resource.assert_called_with('s3')
    boto3.resource().Object.assert_called_with('bucket', 'key')
    boto3.resource().Object().upload_file.assert_called_with('/tmp/test.zip')

Asserting that the mock file uploader was called with the correct argument is, in this case, preferable to posting data to S3. It's fast and leaves no artifacts to remove. If we wanted to test that archive_and_upload does the right thing when AWS and boto3 signal an error, we can set a side effect for the mock upload_file method.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload_authorized(boto3):
    """Unauthorized errors are handled"""

    boto3.return_value.resource.return_value.Object.return_value.upload_file.side_effect = \
        botocore.exceptions.ClientError(
            {'Error': {'Code': '403', 'Message': 'Unauthorized'}}, 'PutObject')

    archive_and_upload('test', 'bucket', 'key')

    # assert that exception has been handled.

A botocore.exceptions.ClientError will be raised in archive_and_upload from the upload_file call. We could test against a bucket for which we have no access, but I think the mock is preferable for a unit test. It doesn't require an internet connection and doesn't require any AWS ACL configuration.

Mock's magic can, however, lead to subtle bugs like the one in the test below. Can you find it?

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload_wtf(boto3):
    """Why does this keeping failing?"""
    archive_and_upload('test', 'bucket', 'key')

    boto3.resource().Object().upload_file().assert_called_with('/tmp/test.zip')

Because all mock methods and properties yield more mocks, it can be hard to figure out why boto3.resource().Object().upload_file() is never called with the expected arguments, even when we're certain the arguments are right. Unintended parentheses after upload_file cost me 15 minutes of head scratching earlier this morning.

P.S. If testing the components of distributed geospatial data processing systems with mock or other test double frameworks is the kind of thing you enjoing doing at work, you might be interested in these Mapbox job postings: