Python and GIS 101

Reading that Chris Garrard was open to suggestions for improving her Geoprocessing with Python using Open Source GIS section at Utah State, the first thing that came to mind was that there could be a lesson on working with object-oriented and functional programming paradigms and how to take advantage of Python's support for each of them. Not just how to use instances, but how to write a useful class. How to use iterators and generators. A lesson about how to start to move from simple scripting, and the inevitable folders of replicating, mutating files named processing.py, processing2.py, processing2b.py, and so on (believe me, I've been there, only it was EASI scripts), to modules of reusable classes and functions. I've tried to come up with a simple example using the conventional GIS geometry model, something that will be familiar to a GIS analyst.

Actually, I wasn't considering functional programming at all when I started this, but it's obviously very helpful to a GIS programmer. All those nested loops one uses to loop over the elements of rasters? There's a Python function for that. This has been an exercise for me, too. I've found some recipes in Python's itertools module that I've overlooked for too long and are going to immediately improve a few of my own projects. For better or worse, Python lets you mix the object-oriented and functional paradigms, and I do that in the following walk-through of yet another Point class for Python.

The C or C++ library under your favorite Python bindings already provides a point class, so why reinvent the wheel? One day (I'm not predicting when) geometry and topology operation code written entirely in Python will execute quickly enough on enough different platforms to be ready for serious work. Imagine we're coding toward that future. In the meanwhile we'll see how to get more out of today's Python, and dabble in doctest, json, array, and itertools.

I've used Python 2.6 for this exercise, and nothing from outside the standard library. The source for the completed pygis_101 module is at http://bitbucket.org/sgillies/python-gis-101/src/.

Let's start with module containing a do-nothing Point class and a test runner.

# Module: pygis_101.py
# We'll stick to plane geometry

class Point(object):
    """The building block of higher dimensional geometries."""

# and introduce a doctest test runner to the end of the module

def _test():
    import doctest
    doctest.testmod()

if __name__ == "__main__":
    _test()

You can run all doctests in the module if you execute it as a script.

$ python pygis_101.py -v
3 items had no tests:
    __main__
    __main__.Point
    __main__._test
0 tests in 3 items.
0 passed and 0 failed.
Test passed.

Now, let's add a docstring with tests.

class Point(object):
    """The building block of higher dimensional geometries.

    This point class has 2 coordinates, which we'll call `x` and `y`:

        >>> point = Point(0.0, 0.0)
        >>> point.x
        0.0
        >>> point.y
        0.0
    """

Run the tests again

$ python pygis_101.py -v
Trying:
    point = Point(0.0, 0.0)
Expecting nothing
**********************************************************************
File "pygis_101.py", line 8, in __main__.Point
Failed example:
    point = Point(0.0, 0.0)
Exception raised:
    Traceback (most recent call last):
      File "/Users/seang/code/python-macosx/parts/opt/lib/python2.6/doctest.py", line 1231, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__.Point[0]>", line 1, in <module>
        point = Point(0.0, 0.0)
    TypeError: object.__new__() takes no parameters
Trying:
    point.x
Expecting:
    0.0
**********************************************************************
File "pygis_101.py", line 9, in __main__.Point
Failed example:
    point.x
Exception raised:
    Traceback (most recent call last):
      File "/Users/seang/code/python-macosx/parts/opt/lib/python2.6/doctest.py", line 1231, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__.Point[1]>", line 1, in <module>
        point.x
    NameError: name 'point' is not defined
Trying:
    point.y
Expecting:
    0.0
**********************************************************************
File "pygis_101.py", line 11, in __main__.Point
Failed example:
    point.y
Exception raised:
    Traceback (most recent call last):
      File "/Users/seang/code/python-macosx/parts/opt/lib/python2.6/doctest.py", line 1231, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__.Point[2]>", line 1, in <module>
        point.y
    NameError: name 'point' is not defined
2 items had no tests:
    __main__
    __main__._test
**********************************************************************
1 items had failures:
   3 of   3 in __main__.Point
3 tests in 3 items.
0 passed and 3 failed.
***Test Failed*** 3 failures.

We'll need to override object.__init__() in order to pass in coordinate values when we create point instances. The values will be stored in instance attributes.

class Point(object):
    """The building block of higher dimensional geometries.

    This point class has 2 coordinates, which we'll call `x` and `y`:

        >>> point = Point(0.0, 0.0)
        >>> point.x
        0.0
        >>> point.y
        0.0
    """

    def __init__(self, x, y):
        self.x = x
        self.y = y

Now, the 3 tests pass

$ python pygis_101.py -v
Trying:
    point = Point(0.0, 0.0)
Expecting nothing
ok
Trying:
    point.x
Expecting:
    0.0
ok
Trying:
    point.y
Expecting:
    0.0
ok
3 items had no tests:
    __main__
    __main__.Point.__init__
    __main__._test
1 items passed all tests:
   3 tests in __main__.Point
3 tests in 4 items.
3 passed and 0 failed.
Test passed.

With that brief introduction to Python classes, test-driven development, and doctest finished, let's move on to making the Point class more useful.

Often, one wants to serialize geometries for use with other programs. One useful format is "well-known text", or "WKT". Let's add the following to the Point class:

def toWKT(self):
    """Returns a WKT representation of a point.

        >>> point = Point(0.0, 0.0)
        >>> print point.toWKT()
        POINT (0.000000 0.000000)
    """
    return 'POINT (%f %f)' % (self.x, self.y)

JSON is a rising format. Python and JSON go together like sausage and beer. You might be tempted to produce JSON by string interpolation, as we did for WKT, but it's best to use the json module included with Python 2.6, or simplejson installed separately for earlier Python versions. Don't forget to import json at the top of the module.

def toJSON(self):
    """Returns a GeoJSON representation of a point.

        >>> point = Point(0.0, 0.0)
        >>> print point.toJSON()
        {"type": "Point", "coordinates": [0.0, 0.0]}
    """
    return json.dumps(
        dict(type='Point', coordinates=(self.x, self.y))
        )

If we add a __geo_interface__ attribute (implemented as a property) that returns a dict like the one above, we could even use the point objects with the Shapely and GeoJSON packages.

@property
def __geo_interface__(self):
    """Returns a GeoJSON-ish dict representation of a point for interop
    with some other Python + GIS software.

        >>> point = Point(0.0, 0.0)
        >>> point.__geo_interface__
        {'type': 'Point', 'coordinates': (0.0, 0.0)}
    """
    return dict(type='Point', coordinates=(self.x, self.y))

def toJSON(self):
    """Returns a GeoJSON representation of a point.

        >>> point = Point(0.0, 0.0)
        >>> print point.toJSON()
        {"type": "Point", "coordinates": [0.0, 0.0]}
    """
    return json.dumps(self.__geo_interface__)

Now we've got 9 passing tests.

$ python pygis_101.py -v
...
4 items passed all tests:
   3 tests in __main__.Point
   2 tests in __main__.Point.__geo_interface__
   2 tests in __main__.Point.toJSON
   2 tests in __main__.Point.toWKT
9 tests in 7 items.
9 passed and 0 failed.
Test passed.

And, btw, 100% code coverage:

$ coverage run pygis_101.py
$ coverage report
Name        Stmts   Exec  Cover
-------------------------------
pygis_101      16     16   100%

A geometry object ought to be able to provide its bounding box. In order to show how to do this for many more kinds of geometries, let's rewrite Point's coordinate storage using arrays, replace the x and y attributes with dynamic properties for convenience, and add a move method for shifting the point to new coordinates. Then we get the bounding box by checking the minimum and maximum values of the coordinate arrays. This technique will work for just about any type of geometry. Use from array import array at the top, of course.

def __init__(self, x, y):
    self.xs = array('d')
    self.xs.append(x)
    self.ys = array('d')
    self.ys.append(y)

@property
def x(self):
    return self.xs[0]

@property
def y(self):
    return self.ys[0]

def move(self, x, y):
    """Move point to new coordinates x, y.

        >>> point = Point(0.0, 0.0)
        >>> point.move(1.0, 1.0)
        >>> point.x
        1.0
        >>> point.y
        1.0
    """
    self.xs[0] = x
    self.ys[0] = y

@property
def bounds(self):
    """Returns the bounding box as a (minx, miny, maxx, maxy) tuple.

        >>> point = Point(0.0, 0.0)
        >>> point.bounds
        (0.0, 0.0, 0.0, 0.0)
    """
    return min(self.xs), min(self.ys), max(self.xs), max(self.ys)

All production geometry-for-GIS software I know implements these kinds of coordinate sequences. To get coordinate pairs from the sequences, operate on the coordinate arrays with Python's built in zip method or itertools.izip, which I mention because the itertools documentation is a motherlode of fast and efficient iteration recipes.

>>> from array import array
>>> xs = array('d', [0.0, 1.0, 2.0])
>>> ys = array('d', [0.0, 1.0, 2.0])
>>> from itertools import izip, islice
>>> list(izip(xs, ys))
[(0.0, 0.0), (1.0, 1.0), (2.0, 2.0)]
>>> list(islice(izip(xs, ys), 1, None))
[(1.0, 1.0), (2.0, 2.0)]

Zip, btw, is its own inverse.

>>> zip(*zip([1,2,3], ['a', 'b', 'c']))
[(1, 2, 3), ('a', 'b', 'c')]

Going further, we can consolidate the toWKT and toJSON methods in a way that also works very efficiently for lines or paths (using list() instead of next() in that case):

@property
def coords(self):
    """Iterate over x,y pairs.

        >>> point = Point(0.0, 0.0)
        >>> next(point.coords)
        (0.0, 0.0)
    """
    return izip(self.xs, self.ys)

def toWKT(self):
    """Returns a WKT representation of a point.

        >>> point = Point(0.0, 0.0)
        >>> print point.toWKT()
        POINT (0.000000 0.000000)
    """
    return 'POINT (%f %f)' % next(self.coords)

@property
def __geo_interface__(self):
    """Returns a GeoJSON-ish dict representation of a point for interop
    with some other Python + GIS software.

        >>> point = Point(0.0, 0.0)
        >>> point.__geo_interface__
        {'type': 'Point', 'coordinates': (0.0, 0.0)}
    """
    return dict(type='Point', coordinates=next(self.coords))

Now we've got 17 passing tests and still 100% coverage.

$ coverage run pygis_101.py
$ coverage report
Name        Stmts   Exec  Cover
-------------------------------
pygis_101      31     31   100%

Next chance I get, I'll dig into some of my old blogged code to see what might be better done in a more functional style.