Debugging temporary files using pytest autouse fixtures

Sean Gillies

2019-09-13 17:10

This week I discovered that Rasterio doesn't always close the temporary in-memory datasets that are used within some of its methods. In testing Rasterio's WarpedVRT class I used a GDAL function to dump descriptions of all open datasets and found a bunch that looked unrelated to WarpedVRT. They were GDAL "MEM" type datasets with UUIDs for names, which didn't tell me much. What were their origins?

They have UUIDs for names because Rasterio imports uuid in its _io module and calls uuid.uuid4() to make temporary dataset names. If only the dataset name included the name of the test in which it was created, then I'd have an entry point into debugging. One way to do this is with a pytest auto-used fixture.

I changed the rasterio._io module's import statement from import uuid to from uuid import uuid4 to make it slightly easier to monkey patch and then I added 5 lines of code to Rasterio's conftest.py file:

@pytest.fixture(autouse=True)
def set_mem_name(request, monkeypatch):
    def youyoueyedeefour():
        return "{}-{}".format(request.node.name, uuid.uuid4())
    monkeypatch.setattr(rasterio._io, "uuid4", youyoueyedeefour)

This set_mem_name fixture uses two standard pytest fixtures: request and monkeypatch. The value of request.node.name is the name of the test and this set_mem_name fixture uses monkeypatch to replace uuid4 in rasterio._io with a custom function that prepends the name of the test to the UUID. The autouse=True argument tells pytest to add this fixture to every test it finds. I didn't need to touch the code of any of Rasterio's tests, not a one.

This quickly revealed to me that the unclosed temporary datasets were coming from tests that asserted certain exceptions were being raised by Rasterio's reprojection code. This code used temporary datasets and didn't close them before raising the exception to the caller. Once I changed the code to do the following, Rasterio no longer leaked datasets from those tests, or in our programs.

try:
    ...
    if condition:
        raise CRSError("error")
    ...
except:
    temp.close()
    raise

If Rasterio used only Python's unittest module, and not pytest, it would be possible to do the same thing. Import rasterio._io in the test case's setUp(), monkey patch it, and then restore it in tearDown(). If all the tests derived from one base class, it would only be necessary to extend that class. The unittest.mock module easily allows every test to be patched with a single decorator statement. It seems like it could be two fewer lines of code, but I don't immediately see how to get the name of the test and use it with only the mock.patch decorator. It looks like one would have to use a patcher's start and stop, which is back to somewhat more boilerplate than with pytest.