Listing open GDAL datasets in Python
This post is a follow-up to Friday's and shows how you can dump a listing of open GDAL datasets in your own Python programs using nothing other than Python's standard library and how you can analyze the dumps with Python and an extra module or two.
If you've imported a Python module that links GDAL, such as Rasterio or GDAL's
own Python bindings, you can access GDAL's GDALDumpOpenDatasets function using
Python's ctypes module. That function takes a FILE
pointer as its only
argument and you can get a pointer to stdout or stderr from ctypes as well.
I will use Rasterio's interactive dataset inspector to demonstrate.
Note that the C level FILE
pointer to stdout is ctypes.c_void_p.in_dll(handle, "__stdoutp")
on OS X and ctypes.c_void_p.in_dll(handle, "stdout")
on Linux.
The listing printed by the function bypasses Python and goes to the
terminal.
$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif Rasterio 1.0.28 Interactive Inspector (Python 3.6.4) Type "src.meta", "src.read(1)", or "help(src)" for more information. >>> import ctypes >>> handle = ctypes.CDLL(None) >>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp') >>> _ = handle.GDALDumpOpenDatasets(cstdout) Open GDAL Datasets: 1 S GTiff 26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif
The first field in every record is the reference count of the dataset, the second is whether it is a shared dataset (S) or not (N), the third is the format driver's short name, the third is a thread id, the fourth is the dataset shape, and the fifth is the dataset's identifier.
In the interpreter we find what we expect: one shared open dataset with a reference count of 1.
What if we wanted to process the listing in Python? We would need to capture the low-level file descriptors and expose them in Python. There's a nice blog post about issues and an implementation at https://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/. Pytest includes a fixture for this, capfd, and it can be used in a test like the one shown below.
import ctypes def test_sharing_on(capfd): """Datasets are shared""" # Open a dataset in not-shared mode. ... handle = ctypes.CDLL(None) cstdout = ctypes.c_void_p.in_dll(handle, "stdout") assert 1 == handle.GDALDumpOpenDatasets(cstdout) captured = capfd.readouterr() assert "1 N GTiff" in captured.out assert "1 S GTiff" not in captured.out
There's a package named capturer which is inspired by pytest and does the same kind of thing as a context manager.
$ rio insp ~/code/rasterio/tests/data/RGB.byte.tif Rasterio 1.0.28 Interactive Inspector (Python 3.6.4) Type "src.meta", "src.read(1)", or "help(src)" for more information. >>> import ctypes >>> handle = ctypes.CDLL(None) >>> cstdout = ctypes.c_void_p.in_dll(handle, '__stdoutp') >>> from capturer import CaptureOutput >>> with CaptureOutput() as capfd: ... _ = handle.GDALDumpOpenDatasets(cstdout) ... handle.GDALDumpOpenDatasets(cstdout) ... captured = capfd.get_text() ... Open GDAL Datasets: 1 S GTiff 26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif >>> print(captured) Open GDAL Datasets: 1 S GTiff 26416000 791x718x3 /Users/seang/code/rasterio/tests/data/RGB.byte.tif