Rasterio 0.34
Last fall Even Rouault announced that GDAL 2.1 would have a new Amazon S3 virtual file system. Extending GDAL's capability to make HTTP byte range requests to AWS's HTTPS + XML S3 API, Even has made it possible to efficiently access partial content of S3 objects using certain formats like GeoTIFF. In other words, metadata of a GeoTIFF on S3 or overviews stored as sub-images can be accessed without retrieving the bulk of its image data. Génial!
With help from Even, Rob Emanuele, and Matt Perry, Rasterio 0.34 has a handy
abstraction for this feature. Rasterio uses s3://
URIs instead of
GDAL's /vsis3/
paths because URIs are how we identify resources on the web
and because this is the URI scheme – if unregistered – used by the AWS Command
Line Interface. The same URIs you use with the
AWS CLI
$ aws s3 ls s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF 2015-03-14 17:20:01 51099231 LC81390452014295LGN00_B1.TIF 2015-03-14 17:20:30 6626356 LC81390452014295LGN00_B1.TIF.ovr
can also be used with Rasterio:
$ rio info s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF --indent 2 { "nodata": null, "dtype": "uint16", "crs": "EPSG:32645", "bounds": [ 381885.0, 2279085.0, 610515.0, 2512815.0 ], "count": 1, "blockxsize": 512, "driver": "GTiff", "transform": [ 30.0, 0.0, 381885.0, 0.0, -30.0, 2512815.0 ], "blockysize": 512, "tiled": true, "lnglat": [ 86.96327090815723, 21.666821827007748 ], "shape": [ 7791, 7621 ], "compress": "deflate", "res": [ 30.0, 30.0 ], "width": 7621, "height": 7791, "interleave": "band" }
Rasterio gets its credentials in the same manner as the AWS CLI (see Configuring the AWS Command Line Interface). If you're already using the AWS CLI no extra configuration is needed to start using Rasterio on S3 raster datasets.
A close read of the GDAL debug logs shows that only 16384 bytes of this 50MB TIFF are fetched in order to get the metadata printed above. That's an efficiency of 3000:1.
The S3 virtual filesystem is only available in Rasterio if you have a GDAL library version >= 2.1.0dev. The macosx wheels for Rasterio 0.34 on PyPI contain GDAL version 2.1.0dev and are probably the easiest way to try this new feature.
Share and enjoy!