And a Laser in My Shoe

The GeoJSON-LD project

JSON-LD is JSON for linking data and it’s a web standard: http://json-ld.org/. It offers a smooth, even zero-edit in some cases, path for upgrading existing GeoJSON data and makes web-friendly extension of GeoJSON possible.

Previously, if you added a "timestamp": "2014-03-27" to a GeoJSON feature object’s “properties”, a user has to guess what you mean. Playing guessing games with my kids is fun. Playing pointless guessing games with data sucks. One of the most important things JSON-LD offers is a framework in which you can unambiguously identify “timestamp” as “http://purl.org/dc/terms/created”. Now users of your data have a solid start toward understanding what you mean by “timestamp”.

There’s more to JSON-LD than defining your feature properties. You can also bring in terms from other vocabularies to express that your features are, say, schema.org AdministrativeAreas or Landforms.

If you’d like to help shape the way we use GeoJSON in the future, please follow https://github.com/geojson/geojson-ld and jump into discussion on the open vocabulary and context issues.

Warping images with rasterio

Warping imagery from one spatial basis to another is a crucial function. I’ve been determined that rasterio be able to do this. Scipy provides a geometric_transform() function that I like, but I’d like rasterio to operate at a higher level and take advantage of GDAL’s fast transformers. GDAL’s gdalwarp program only operates on files, and I’m aiming lower than this (so rasterio can let us build better scripts). The ReprojectImage() function in osgeo.gdal, as pointed out in http://jgomezdans.github.io/gdal_notes/reprojection.html, only works on GDAL dataset objects. And the code to make them gets messy.

My requirements for a rasterio warper are:

  • Speed approaching gdalwarp.
  • Same results (same algorithms) as gdalwarp.
  • Sources and destinations can be ndarrays as well as bands in files.
  • Simplicity and clarity.

I think I’ve managed to satisfy these requirements. The warp module will be the key feature of the next rasterio release.

Here’s an example within the interactive dataset inspector.

$ rasterio.insp rasterio/tests/data/RGB.byte.tif
Rasterio 0.7 Interactive Inspector (Python 2.7.5)
Type "src.name", "src.read_band(1)", or "help(src)" for more information.

The reproject() function takes source and destination ndarrays and an affine transformation vector and coordinate reference system for each. In this example, I’m warping the UTM source to a Web Mercator destination of slightly smaller extent. I computed the extents of the destination using pyplot, rasterio has yet no method to come up with an optimal destination extent (to do).

>>> source = src.read_band(1)
>>> dst_transform = [
...     -8789636.708,
...     300.0,
...     0.0,
...     2943560.235,
...     0.0,
...     -300.0]
>>> dst_crs = {'init': 'EPSG:3857'}
>>> destin = np.empty(src.shape, dtype=np.uint8)
>>> from rasterio import _warp
>>> _warp.reproject(
...     source, destin,
...     src.transform, src.crs,
...     dst_transform, dst_crs)
>>> import matplotlib.pyplot as plt
>>> plt.subplot(121)
<matplotlib.axes.AxesSubplot object at 0x1075362d0>
>>> plt.imshow(source)
<matplotlib.image.AxesImage object at 0x1078b1450>
>>> plt.gray()
>>> plt.subplot(122)
<matplotlib.axes.AxesSubplot object at 0x1075ffe50>
>>> plt.imshow(destin)
<matplotlib.image.AxesImage object at 0x1078b19d0>
>>> plt.gray()
>>> plt.savefig('/tmp/warp.png')

It’s pretty much just three statements. One to get a source ndarray from a dataset. One to create a destination ndarray. One to call reproject().

I’m out of time to properly transform the axes in the plots to show coordinates in their proper CRS, but they’re warped correctly.

http://farm4.staticflickr.com/3776/12773733534_c51761076e_z_d.jpg

Rasterio cookbook recipe number one

I got a nice message today from a developer who found rasterio to be the right tool for scaling the values of a 15 GB LiDAR DEM. Here’s the code.

import numpy as np
import rasterio

"""
2014-02-13
Bryan Luman
Use it however you like at your own risk

Problem:
I have a huge DEM converted from LiDAR LAS points. I'd like to make
it slightly less massive for further processing.

All my z values are between 682 and 869 feet stored as float32. If
I convert them to centimeters and they will fit well within a 16 bit
unsigned integer.

This saves a significant amount of storage space and greatly reduces
any type of future analysis.

"""

CM_IN_FOOT = 30.48

with rasterio.drivers():
    with rasterio.open('massive_dem_as_float') as src:
        kwargs = src.meta
        kwargs.update(
            driver='GTiff',
            dtype=rasterio.uint16,
            count=1,
            compress='lzw',
            nodata=0,
            bigtiff='YES' # Output will be larger than 4GB
        )

        windows = src.block_windows(1)

        with rasterio.open(
                'less_massive_dem_with_as_int.tif',
                'w',
                **kwargs) as dst:
            for idx, window in windows:
                src_data = src.read_band(1, window=window)

                # Source nodata value is a very small negative number
                # Converting in to zero for the output raster
                np.putmask(src_data, src_data < 0, 0)

                dst_data = (src_data * CM_IN_FOOT).astype(rasterio.uint16)
                dst.write_band(1, dst_data, window=window)

This uses the windowed read and write and windows iterator features of rasterio that I worked on with Asger Skovbo Petersen.

Do you have any more short rasterio code examples? I’d love to start collecting these into a cookbook of sorts.

Rasterio 0.6

I uploaded Rasterio 0.6 to PyPI today. The new features are tags (aka GDAL metadata) and colormaps.

With just a couple lines of Python you can turn this simple greyscale image

http://farm4.staticflickr.com/3759/12449997993_77d8718045_d.jpg

into an eye-searing RGB color image.

http://farm8.staticflickr.com/7391/12443115173_80ecca89db_d.jpg

Fiona 1.1.1, rasterio 0.5.1

I released Fiona 1.1.1 and rasterio 0.5.1 yesterday. There are a couple bug fixes, but the main features are the interactive data inspectors fiona.insp

$ fiona.insp docs/data/test_uk.shp
Fiona 1.1.1 Interactive Inspector (Python 2.7.5)
Type "src.schema", "next(src)", or "help(src)" for more information.
>>>

and rasterio.insp.

$ rasterio.insp rasterio/tests/data/RGB.byte.tif
Rasterio 0.5.1 Interactive Inspector (Python 2.7.5)
Type "src.name", "src.read_band(1)", or "help(src)" for more information.
>>>

Interactive interpreters for Fiona and Rasterio

I’m not sure why I didn’t think of this before, but last night after the nth time launching python in interactive mode, importing rasterio, and typing in a filename with no tab completion, I made myself an interactive raster dataset inspector for Rasterio. Oh, and yes, I made it for you, too. Then I made one for Fiona.

You’ve used Python’s interpreter interactively before, yes? By which I mean typing “python” at a shell prompt to get the banner and triple right-angle bracket prompt.

$ python
Python 2.7.5 (default, Oct  3 2013, 20:27:52)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

I’ve used code.interact() to give you the same kind of prompt right after calling rasterio.open() on a filename you provide to the main function in rasterio.tool(). The code is ridiculously simple.

$ python -m rasterio.tool rasterio/tests/data/RGB.byte.tif
Rasterio 0.5.1 Interactive Interpreter
Type "src.name", "src.read_band(1)", or "help(src)" for more information.
>>>

Typing in the first two suggested expressions gives you useful entry points.

>>> src.name
'rasterio/tests/data/RGB.byte.tif'
>>> src.read_band(1)
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

it’s a bit like gdalinfo on steroids. You can get statistics by using Numpy ndarray methods, because rasterio has ndarrays instead of band objects.

>>> red = src.read_band(1)
>>> red.min(), red.max(), red.mean()
(0, 255, 29.94772668847656)

Fiona’s inspector is in the fiona.inspector module.

$ python -m fiona.inspector docs/data/test_uk.shp
Fiona 1.1.1 Interactive Inspector
Type "src.name", "src.schema", "next(src)", or "help(src)" for more information.
>>>

Its suggested expressions give you (with the help of pprint.pprint()):

>>> src.name
u'test_uk'
>>> import pprint
>>> pprint.pprint(src.schema)
{'geometry': 'Polygon',
 'properties': OrderedDict([(u'CAT', 'float:16'), (u'FIPS_CNTRY', 'str'), (u'CNTRY_NAME', 'str'), (u'AREA', 'float:15.2'), (u'POP_CNTRY', 'float:15.2')])}
>>> pprint.pprint(next(src))
{'geometry': {'coordinates': [[(0.899167, 51.357216),
                               (0.885278, 51.35833),
                               (0.7875, 51.369438),
                               (0.781111, 51.370552),
                               (0.766111, 51.375832),
                               (0.759444, 51.380829),
                               (0.745278, 51.39444),
                               (0.740833, 51.400276),
                               (0.735, 51.408333),
                               (0.740556, 51.429718),
                               (0.748889, 51.443604),
                               (0.760278, 51.444717),
                               (0.791111, 51.439995),
                               (0.892222, 51.421387),
                               (0.904167, 51.418884),
                               (0.908889, 51.416939),
                               (0.930555, 51.398888),
                               (0.936667, 51.393608),
                               (0.943889, 51.384995),
                               (0.9475, 51.378609),
                               (0.947778, 51.374718),
                               (0.946944, 51.371109),
                               (0.9425, 51.369164),
                               (0.904722, 51.358055),
                               (0.899167, 51.357216)]],
              'type': 'Polygon'},
 'id': '0',
 'properties': OrderedDict([(u'CAT', 232.0), (u'FIPS_CNTRY', u'UK'), (u'CNTRY_NAME', u'United Kingdom'), (u'AREA', 244820.0), (u'POP_CNTRY', 60270708.0)]),
 'type': 'Feature'}

I’d like to have proper program names for these soon instead of “python -m rasterio.tool”, but haven’t thought of any yet. Also, I will gladly accept patches that would use your own favorite other interpreters (bpython, IPython) if available.

New home for rasterio

As it’s getting more use at work, I’ve moved rasterio to Mapbox’s GitHub organization and wrote about it on the Mapbox blog. Mapbox is probably known more for Javascript, but has a lot of Python programs running at any moment and more in development.

What do I mean when I write that rasterio is Python software, not GIS software? There is GIS software under the hood, obviously, and rasterio would be nowhere without the excellent GDAL library. What I mean is that whenever faced with a design choice between the way something would be done in a name brand GIS program and the way it would be done in Python, I always choose the Python way. That means I/O is done via file-like objects. That means no reliance on side effects. Python dicts instead of GIS-specific types or microformatted strings. It means embracing Python idioms and protocols. Making things iterable and zip-able. It means PEP 8 and PEP 20.

This is not to say that I think Python is the best language for every application or that Python language design is always right. I’m increasingly a polyglot and I expect that you are, too, or will be soon. It seems clear to me that one of the main keys to successful polyglot programming isn’t using interfaces that stick to the common features of languages, it’s using the best parts of various languages to their fullest. I believe this means idiomatic Javascript in your Javascript programs (for example) and idiomatic Python in your Python programs. Celebrating Python is what I’m doing to make rasterio a better tool for polyglots.

Bill Dollins recently wrote about some of the characteristics of GIS software, and here is the crux of it:

So I have come to realize that the mainstream GIS community has become very much like my professor’s theoretical cat; conditioned to take the long way to the end result when more direct paths are clearly available. What’s more, they have become conditioned to think of such approaches as normal.

Python, on the other hand, is supposed to be direct, highly productive, and fun. I’ve chosen to make rasterio less like the GIS software Bill wrote about and more like ElementTree, simplejson, Requests, pytest, and Numpy – packages that make the most of Python’s strengths.

By the way, I wrote to the Python Software Foundation’s Trademark Committee to make extra sure that my derived work (the GeoJSON file at https://gist.github.com/sgillies/8655640#file-python-powered-json) was legit, and received a very clear and friendly response from David Mertz in the affirmative. I’ve been a reader of his articles for years and it was a pleasure to correspond with him at last. Python isn’t just a great language, it’s great people.

Rasterio 0.5

$ pip install 'rasterio>=0.5'

Which is to say, share and enjoy.

Here’s a script that shows off everything new in rasterio 0.5: GDAL driver environments, raster feature sieving, and a generator of raster feature shapes.

#!/usr/bin/env python
#
# sieve: demonstrate sieving and polygonizing of raster features.

import subprocess

import numpy
import rasterio
from rasterio.features import sieve, shapes

# Register GDAL and OGR drivers.
with rasterio.drivers():

    # Read a raster to be sieved.
    with rasterio.open('rasterio/tests/data/shade.tif') as src:
        shade = src.read_band(1)

    # Print the number of shapes in the source raster.
    print "Slope shapes: %d" % len(list(shapes(shade)))

    # Sieve out features 13 pixels or smaller.
    sieved = sieve(shade, 13)

    # Print the number of shapes in the sieved raster.
    print "Sieved (13) shapes: %d" % len(list(shapes(sieved)))

    # Write out the sieved raster.
    with rasterio.open('example-sieved.tif', 'w', **src.meta) as dst:
        dst.write_band(1, sieved)

# Dump out gdalinfo's report card and open (or "eog") the TIFF.
print subprocess.check_output(
    ['gdalinfo', '-stats', 'example-sieved.tif'])
subprocess.call(['open', 'example-sieved.tif'])

The images that you can operate on with the new functions in rasterio.features don’t have to be read from GeoTIFFs or other image files. They could be purely numpy-based spatial models or slices of multidimensional image data arrays produced with scipy.ndimage.

For fun and benchmarking purposes I’ve written a program that uses both Fiona and rasterio to emulate GDAL’s gdal_polygonize.py: rasterio_polygonize.py. If you’re interested in integrating these modules, it’s a good starting point.

JSON-LD and GeoJSON

JSON-LD, a a JSON-based serialization for Linked Data, is finally a W3C Recommendation. I’d like to remind readers that dumpgj, the any-vectors-to-GeoJSON program that is distributed with Fiona, will optionally add JSON-LD contexts to GeoJSON files.

$ dumpgj --help
usage: dumpgj [-h] [-d] [-n N] [--compact] [--encoding ENC]
              [--record-buffered] [--ignore-errors] [--use-ld-context]
              [--add-ld-context-item TERM=URI]
              infile [outfile]

Serialize a file's records or description to GeoJSON

positional arguments:
  infile                input file name
  outfile               output file name, defaults to stdout if omitted

optional arguments:
  -h, --help            show this help message and exit
  -d, --description     serialize file's data description (schema) only
  -n N, --indent N      indentation level in N number of chars
  --compact             use compact separators (',', ':')
  --encoding ENC        Specify encoding of the input file
  --record-buffered     Economical buffering of writes at record, not
                        collection (default), level
  --ignore-errors       log errors but do not stop serialization
  --use-ld-context      add a JSON-LD context to JSON output
  --add-ld-context-item TERM=URI
                        map a term to a URI and add it to the output's JSON LD
                        context

Rethinking driver management in fiona and rasterio

In a C program using the GDAL (or OGR) API, you manage the global format driver registry like gdalinfo does:

#include "gdal.h"

int main( int argc, char ** argv )
{

    /* Register all drivers. */
    GDALAllRegister();

    ...

    /* De-register all drivers. */
    GDALDestroyDriverManager();

    exit( 1 );
}

I used to try to hide this in Python bindings by calling GDALAllRegister() in a module and registering a function that called GDALDestroyDriverManager() with the atexit module.

""" gdal.py """

import atexit

def cleanup():
    GDALDestroyDriverManager()

atexit.register(cleanup)

GDALAllRegister()

Importing that gdal module registers all drivers and the drivers are de-registered when the module is torn down. I don’t like this approach anymore. I prefer that driver registration to not be a side effect of module import. And in testing, I’m finding that it’s a drag to write the code needed to keep the cleanup function around until the atexit handlers execute.

Instead of going back to the C style, I’m going to go forward with context managers and a with statement to manage the GDAL and OGR global driver registries. The with rasterio.drivers() expression creates a GDAL driver environment.

import rasterio

try:

    # Use a with statement and context manager to register all the
    # GDAL/OGR drivers.
    with rasterio.drivers():

        # Count the manager's registered drivers (156 in this case).
        print rasterio.driver_count()

        # Open a file and get to work.
        with rasterio.open('rasterio/tests/data/RGB.byte.tif') as f:

            # Oops, an exception.
            raise Exception("Oops")

        # When this block ends, even via an exception, the drivers are
        # de-registered.

except Exception, e:
    print "Caught exception: %s" % e

# Drivers were de-registered and this will print 0.
print rasterio.driver_count()

# Output:
# 156
# Caught exception: Oops
# 0

The nice thing about this approach is that the registered drivers get de-registered and the memory deallocated no matter what happens in the with block. And it’s less code than my old approach using atexit.

So that programs written for older versions of Fiona and rasterio don’t break, I’m also letting opened collections and datasets manage the driver registry if no other manager is already present. So this code below works as before.

import rasterio

try:

    with rasterio.open('rasterio/tests/data/RGB.byte.tif') as f:
        print rasterio.driver_count()
        raise Exception("Oops")

except Exception, e:
    print "Caught exception: %s" % e

print rasterio.driver_count()

# Output:
# 156
# Caught exception: Oops
# 0

It’s more efficient to register drivers once and only once in your program, so use with fiona.drivers() or with rasterio.drivers() in new programs using Fiona 1.1 and rasterio 0.5 or newer.