Listing open GDAL datasets in Python

Sean Gillies

2019-09-16 13:13

This post is a follow-up to Friday's and shows how you can dump a listing of open GDAL datasets in your own Python programs using nothing other than Python's standard library and how you can analyze the dumps with Python and an extra module or two.

If you've imported a Python module that links GDAL, such as Rasterio or GDAL's own Python bindings, you can access GDAL's GDALDumpOpenDatasets function using Python's ctypes module. That function takes a FILE pointer as its only argument and you can get a pointer to stdout or stderr from ctypes as well. I will use Rasterio's interactive dataset inspector to demonstrate.

Note that the C level FILE pointer to stdout is ctypes.c_void_p.in_dll(handle, "__stdoutp") on OS X and ctypes.c_void_p.in_dll(handle, "stdout") on Linux. The listing printed by the function bypasses Python and goes to the terminal.

$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif
Rasterio 1.0.28 Interactive Inspector (Python 3.6.4)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> import ctypes
>>> handle = ctypes.CDLL(None)
>>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp')
>>> _ = handle.GDALDumpOpenDatasets(cstdout)
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif

The first field in every record is the reference count of the dataset, the second is whether it is a shared dataset (S) or not (N), the third is the format driver's short name, the third is a thread id, the fourth is the dataset shape, and the fifth is the dataset's identifier.

In the interpreter we find what we expect: one shared open dataset with a reference count of 1.

What if we wanted to process the listing in Python? We would need to capture the low-level file descriptors and expose them in Python. There's a nice blog post about issues and an implementation at https://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/. Pytest includes a fixture for this, capfd, and it can be used in a test like the one shown below.

import ctypes


def test_sharing_on(capfd):
    """Datasets are shared"""
    # Open a dataset in not-shared mode.
    ...
    handle = ctypes.CDLL(None)
    cstdout = ctypes.c_void_p.in_dll(handle, "stdout")
    assert 1 == handle.GDALDumpOpenDatasets(cstdout)
    captured = capfd.readouterr()
    assert "1 N GTiff" in captured.out
    assert "1 S GTiff" not in captured.out

There's a package named capturer which is inspired by pytest and does the same kind of thing as a context manager.

$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif
Rasterio 1.0.28 Interactive Inspector (Python 3.6.4)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> import ctypes
>>> handle = ctypes.CDLL(None)
>>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp')
>>> from capturer import CaptureOutput
>>> with CaptureOutput() as capfd:
...     _ = handle.GDALDumpOpenDatasets(cstdout)
...     handle.GDALDumpOpenDatasets(cstdout)
...     captured = capfd.get_text()
...
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif
>>> print(captured)
Open GDAL Datasets:
  1 S GTiff  26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif