This week I discovered that Rasterio doesn't always close the temporary in-memory datasets that are used within some of its methods. In testing Rasterio's WarpedVRT class I used a GDAL function to dump descriptions of all open datasets and found a bunch that looked unrelated to WarpedVRT. They were GDAL "MEM" type datasets with UUIDs for names, which didn't tell me much. What were their origins?
They have UUIDs for names because Rasterio imports uuid in its _io module and
uuid.uuid4() to make temporary dataset names. If only the dataset
name included the name of the test in which it was created, then I'd have an
entry point into debugging. One way to do this is with a pytest auto-used
I changed the rasterio._io module's import statement from
import uuid to
from uuid import uuid4 to make it slightly easier to monkey patch and then
I added 5 lines of code to Rasterio's
set_mem_name fixture uses two standard pytest fixtures:
monkeypatch. The value of
request.node.name is the name of the test
set_mem_name fixture uses
monkeypatch to replace
rasterio._io with a custom function that prepends the name of the test to
the UUID. The
autouse=True argument tells pytest to add this fixture to every
test it finds. I didn't need to touch the code of any of Rasterio's tests, not a one.
This quickly revealed to me that the unclosed temporary datasets were coming from tests that asserted certain exceptions were being raised by Rasterio's reprojection code. This code used temporary datasets and didn't close them before raising the exception to the caller. Once I changed the code to do the following, Rasterio no longer leaked datasets from those tests, or in our programs.
If Rasterio used only Python's unittest module, and not pytest, it would be
possible to do the same thing. Import rasterio._io in the test case's
setUp(), monkey patch it, and then restore it in
tearDown(). If all the
tests derived from one base class, it would only be necessary to extend that
class. The unittest.mock module easily allows every test to be patched with
a single decorator statement.
It seems like it could be two fewer lines of code, but I don't immediately see
how to get the name of the test and use it with only the mock.patch decorator.
It looks like one would have to use a patcher's start and stop,
which is back to somewhat more boilerplate than with pytest.