Reading that Chris Garrard was open to suggestions for improving her
Geoprocessing with Python using Open Source GIS section at Utah State, the
first thing that came to mind was that there could be a lesson on working with
object-oriented and functional programming paradigms and how to take advantage
of Python's support for each of them. Not just how to use instances, but how
to write a useful class. How to use iterators and generators. A lesson about
how to start to move from simple scripting, and the inevitable folders of
replicating, mutating files named processing.py, processing2.py,
processing2b.py, and so on (believe me, I've been there, only it was EASI
scripts), to modules of reusable classes and functions. I've tried to come up
with a simple example using the conventional GIS geometry model, something
that will be familiar to a GIS analyst.
Actually, I wasn't considering functional programming at all when I started
this, but it's obviously very helpful to a GIS programmer. All those nested
loops one uses to loop over the elements of rasters? There's a Python function
for that. This has been an exercise for me, too. I've found some recipes in
Python's itertools module that I've overlooked for too long and are going to
immediately improve a few of my own projects. For better or worse, Python lets
you mix the object-oriented and functional paradigms, and I do that in the
following walk-through of yet another Point class for Python.
The C or C++ library under your favorite Python bindings already provides a
point class, so why reinvent the wheel? One day (I'm not predicting when)
geometry and topology operation code written entirely in Python will execute
quickly enough on enough different platforms to be ready for serious work.
Imagine we're coding toward that future. In the meanwhile we'll see how to get
more out of today's Python, and dabble in doctest, json, array, and
itertools.
I've used Python 2.6 for this exercise, and nothing from outside the standard
library. The source for the completed pygis_101 module is at http://bitbucket.org/sgillies/python-gis-101/src/.
Let's start with module containing a do-nothing Point class and a test runner.
# Module: pygis_101.py
# We'll stick to plane geometry
class Point(object):
"""The building block of higher dimensional geometries."""
# and introduce a doctest test runner to the end of the module
def _test():
import doctest
doctest.testmod()
if __name__ == "__main__":
_test()
You can run all doctests in the module if you execute it as a script.
$ python pygis_101.py -v
3 items had no tests:
__main__
__main__.Point
__main__._test
0 tests in 3 items.
0 passed and 0 failed.
Test passed.
Now, let's add a docstring with tests.
class Point(object):
"""The building block of higher dimensional geometries.
This point class has 2 coordinates, which we'll call `x` and `y`:
>>> point = Point(0.0, 0.0)
>>> point.x
0.0
>>> point.y
0.0
"""
Run the tests again
$ python pygis_101.py -v
Trying:
point = Point(0.0, 0.0)
Expecting nothing
**********************************************************************
File "pygis_101.py", line 8, in __main__.Point
Failed example:
point = Point(0.0, 0.0)
Exception raised:
Traceback (most recent call last):
File "/Users/seang/code/python-macosx/parts/opt/lib/python2.6/doctest.py", line 1231, in __run
compileflags, 1) in test.globs
File "<doctest __main__.Point[0]>", line 1, in <module>
point = Point(0.0, 0.0)
TypeError: object.__new__() takes no parameters
Trying:
point.x
Expecting:
0.0
**********************************************************************
File "pygis_101.py", line 9, in __main__.Point
Failed example:
point.x
Exception raised:
Traceback (most recent call last):
File "/Users/seang/code/python-macosx/parts/opt/lib/python2.6/doctest.py", line 1231, in __run
compileflags, 1) in test.globs
File "<doctest __main__.Point[1]>", line 1, in <module>
point.x
NameError: name 'point' is not defined
Trying:
point.y
Expecting:
0.0
**********************************************************************
File "pygis_101.py", line 11, in __main__.Point
Failed example:
point.y
Exception raised:
Traceback (most recent call last):
File "/Users/seang/code/python-macosx/parts/opt/lib/python2.6/doctest.py", line 1231, in __run
compileflags, 1) in test.globs
File "<doctest __main__.Point[2]>", line 1, in <module>
point.y
NameError: name 'point' is not defined
2 items had no tests:
__main__
__main__._test
**********************************************************************
1 items had failures:
3 of 3 in __main__.Point
3 tests in 3 items.
0 passed and 3 failed.
***Test Failed*** 3 failures.
We'll need to override object.__init__() in order to pass in coordinate values
when we create point instances. The values will be stored in instance attributes.
class Point(object):
"""The building block of higher dimensional geometries.
This point class has 2 coordinates, which we'll call `x` and `y`:
>>> point = Point(0.0, 0.0)
>>> point.x
0.0
>>> point.y
0.0
"""
def __init__(self, x, y):
self.x = x
self.y = y
Now, the 3 tests pass
$ python pygis_101.py -v
Trying:
point = Point(0.0, 0.0)
Expecting nothing
ok
Trying:
point.x
Expecting:
0.0
ok
Trying:
point.y
Expecting:
0.0
ok
3 items had no tests:
__main__
__main__.Point.__init__
__main__._test
1 items passed all tests:
3 tests in __main__.Point
3 tests in 4 items.
3 passed and 0 failed.
Test passed.
With that brief introduction to Python classes, test-driven development, and doctest finished, let's move on to making the Point class more useful.
Often, one wants to serialize geometries for use with other programs. One
useful format is "well-known text", or "WKT". Let's add the following to the Point class:
def toWKT(self):
"""Returns a WKT representation of a point.
>>> point = Point(0.0, 0.0)
>>> print point.toWKT()
POINT (0.000000 0.000000)
"""
return 'POINT (%f %f)' % (self.x, self.y)
JSON is a rising format. Python and JSON go together like sausage and beer.
You might be tempted to produce JSON by string interpolation, as we did for
WKT, but it's best to use the json module included with Python 2.6, or
simplejson installed separately for earlier Python versions. Don't forget to
import json at the top of the module.
def toJSON(self):
"""Returns a GeoJSON representation of a point.
>>> point = Point(0.0, 0.0)
>>> print point.toJSON()
{"type": "Point", "coordinates": [0.0, 0.0]}
"""
return json.dumps(
dict(type='Point', coordinates=(self.x, self.y))
)
If we add a __geo_interface__ attribute (implemented as a property) that
returns a dict like the one above, we could even use the point objects with
the Shapely and GeoJSON packages.
@property
def __geo_interface__(self):
"""Returns a GeoJSON-ish dict representation of a point for interop
with some other Python + GIS software.
>>> point = Point(0.0, 0.0)
>>> point.__geo_interface__
{'type': 'Point', 'coordinates': (0.0, 0.0)}
"""
return dict(type='Point', coordinates=(self.x, self.y))
def toJSON(self):
"""Returns a GeoJSON representation of a point.
>>> point = Point(0.0, 0.0)
>>> print point.toJSON()
{"type": "Point", "coordinates": [0.0, 0.0]}
"""
return json.dumps(self.__geo_interface__)
Now we've got 9 passing tests.
$ python pygis_101.py -v
...
4 items passed all tests:
3 tests in __main__.Point
2 tests in __main__.Point.__geo_interface__
2 tests in __main__.Point.toJSON
2 tests in __main__.Point.toWKT
9 tests in 7 items.
9 passed and 0 failed.
Test passed.
And, btw, 100% code coverage:
$ coverage run pygis_101.py
$ coverage report
Name Stmts Exec Cover
-------------------------------
pygis_101 16 16 100%
A geometry object ought to be able to provide its bounding box. In order to
show how to do this for many more kinds of geometries, let's rewrite Point's
coordinate storage using arrays, replace the x and y attributes with
dynamic properties for convenience, and add a move method for shifting the
point to new coordinates. Then we get the bounding box by checking the minimum
and maximum values of the coordinate arrays. This technique will work for just
about any type of geometry. Use from array import array at the top, of
course.
def __init__(self, x, y):
self.xs = array('d')
self.xs.append(x)
self.ys = array('d')
self.ys.append(y)
@property
def x(self):
return self.xs[0]
@property
def y(self):
return self.ys[0]
def move(self, x, y):
"""Move point to new coordinates x, y.
>>> point = Point(0.0, 0.0)
>>> point.move(1.0, 1.0)
>>> point.x
1.0
>>> point.y
1.0
"""
self.xs[0] = x
self.ys[0] = y
@property
def bounds(self):
"""Returns the bounding box as a (minx, miny, maxx, maxy) tuple.
>>> point = Point(0.0, 0.0)
>>> point.bounds
(0.0, 0.0, 0.0, 0.0)
"""
return min(self.xs), min(self.ys), max(self.xs), max(self.ys)
All production geometry-for-GIS software I know implements these kinds of
coordinate sequences. To get coordinate pairs from the sequences, operate on
the coordinate arrays with Python's built in zip method or itertools.izip,
which I mention because the itertools documentation is a motherlode of fast
and efficient iteration recipes.
>>> from array import array
>>> xs = array('d', [0.0, 1.0, 2.0])
>>> ys = array('d', [0.0, 1.0, 2.0])
>>> from itertools import izip, islice
>>> list(izip(xs, ys))
[(0.0, 0.0), (1.0, 1.0), (2.0, 2.0)]
>>> list(islice(izip(xs, ys), 1, None))
[(1.0, 1.0), (2.0, 2.0)]
Zip, btw, is its own inverse.
>>> zip(*zip([1,2,3], ['a', 'b', 'c']))
[(1, 2, 3), ('a', 'b', 'c')]
Going further, we can consolidate the toWKT and toJSON methods in a way
that also works very efficiently for lines or paths (using list() instead of next() in that case):
@property
def coords(self):
"""Iterate over x,y pairs.
>>> point = Point(0.0, 0.0)
>>> next(point.coords)
(0.0, 0.0)
"""
return izip(self.xs, self.ys)
def toWKT(self):
"""Returns a WKT representation of a point.
>>> point = Point(0.0, 0.0)
>>> print point.toWKT()
POINT (0.000000 0.000000)
"""
return 'POINT (%f %f)' % next(self.coords)
@property
def __geo_interface__(self):
"""Returns a GeoJSON-ish dict representation of a point for interop
with some other Python + GIS software.
>>> point = Point(0.0, 0.0)
>>> point.__geo_interface__
{'type': 'Point', 'coordinates': (0.0, 0.0)}
"""
return dict(type='Point', coordinates=next(self.coords))
Now we've got 17 passing tests and still 100% coverage.
$ coverage run pygis_101.py
$ coverage report
Name Stmts Exec Cover
-------------------------------
pygis_101 31 31 100%
Next chance I get, I'll dig into some of my old blogged code to see what might
be better done in a more functional style.
Comments
Re: Iterators
Author: few people
I think ESRI has done away with the while: row = rows.next() stuff at 9.3.1: http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=FeatureSets_and_RecordSets
Re: Iterators
Author: Mateusz Loskot
Sean, brilliant post! Finally, a voice clarity, correcness, robustness, discoverability, transparency, genericity, expressiveness, flexibility, consistency, meaning...
Quality MattersAny of the interested parties are considering it in the same manner?
I'm afraid, that's the end of story.
Re: Iterators
Author: Michael Weisman
Just a quick clarification, FME does support iterators. The following snippet will iterate over the parts of an aggregate feature and extract the coordinates for each part:
Re: Iterators
Author: Mateusz Loskot
And a bit of clarification from me, I was referring to the Open Source projects mentioned above. Unfortunately, SWIG is in the house over there.
Re: Iterators
Author: Sean
Great, Michael. I simply didn't find examples of iterators employed by FME users. I haven't seen examples of programmers using Howard Butler's neater ogr.py API either.
QGIS
Author: Martin Dobias
Thanks for the nice article. Recently I've been thinking about adding a more pythonic way of accessing features in PyQGIS... currently it's just a crude wrapper around c++ api. So thanks for the inspiration :)
Re: Iterators
Author: Howard Butler
ogr.py is not mine. ogr.py is Frank's API ;), but I think I added support for this at GDAL 1.6. I guess we haven't advertised it so well.