2007 (old posts, page 28)

Geocoding, GeoRSS, and KML for Plone 3

I've made the necessary configuration changes so that PleiadesGeocoder 1.0b2 works with Plone 2.5 or Plone 3.0. To deploy PleiadesGeocoder in a Plone 3 buildout, add geopy and simplejson to the buildout eggs in buildout.cfg:

[buildout]
...
eggs =
    elementtree
    geopy
    simplejson

and add the PleiadesGeocoder tarball to the list of product distro URLs:

...
[productdistros]
recipe = plone.recipe.distros
urls =
    http://gispython.org/downloads/zope/PleiadesGeocoder-1.0b2.tar.gz
...

Then run the buildout script again to update.

Thanks to David Siedband for the initial testing on Plone 3.

Kindle: meh

I'll add to the spasm of shallow posts about Kindle in the geo blogosphere: the Kindle does nothing for me. I like paper books. I like the user interface and I like the freedom. Mark Pilgrim captures Amazon's evolving view on this freedom with a couple of quotes:

When someone buys a book, they are also buying the right to resell that book, to loan it out, or to even give it away if they want. Everyone understands this.

Jeff Bezos, Open letter to Author’s Guild, 2002

You may not sell, rent, lease, distribute, broadcast, sublicense or otherwise assign any rights to the Digital Content or any portion of it to any third party, and you may not remove any proprietary notices or labels on the Digital Content. In addition, you may not, and you will not encourage, assist or authorize any other person to, bypass, modify, defeat or circumvent security features that protect the Digital Content.

Amazon, Kindle Terms of Service, 2007

Comments

Re: Kindle: meh

Author: James Fee

Well put Sean. Why do they keep going back to the same model that hasn't worked for years. I enjoy the feel of paper books and won't be buying any "eBooks" anytime soon.

Re: Kindle: meh

Author: Andrew Larcombe

Maybe Bezos is just one of those guys who surrounds himself by 'yes' men? I've yet to meet *anyone* (outside of the usual Web2.0-Bay Area suspects) who thinks Kindle is anything but a crock of £$%^. And ugly with it.

That's About Right

Howard Butler sums up my motivations pretty well. The one factor he doesn't mention: community. There's a gap between the GDAL/OGR/MapServer/OSGeo and Python communities, which I jumped a while ago. Check out the list of GIS projects in the Python community's package index: Topic :: Scientific/Engineering :: GIS. There's not one OSGeo package there now; Howard's Python bindings for GDAL 1.5 will be the first. The GIS community is an inward-looking one, and the open source side of it looks outward only fractionally more.

Update: It's also true that the GDAL community's bindings aren't dead yet. As I pointed out previously, the new style bindings (to be standard in 1.5) are so much faster than the original bindings that everybody's a winner whether they switch to WorldMill or not.

WorldMill

Howard Butler pointed out that "Ogre" is already taken by a gaming graphics library, and "Refinery" seemed to imply that OGR is raw and crude. I've wanted a name that resonated with the sounds and smells of a machine shop. "Mill" is just about right.

Now, the new benchmark times. The script is now in the Subversion repository: benchmark.py. I've added to the ogr.py block the code that is equivalent to the extra (and extra handy) baggage carried by WorldMill, the marshaling of Python collection (OGR layer) schema and feature data. This stuff is compiled in WorldMill, interpreted with ogr.py, and the performance difference is obvious:

sean@lenny:~/Projects/WorldMill$ PYTHONPATH=src python benchmark.py
WorldMill (Cython)
8633.75 usec/pass

ogr.py (new bindings)
22740.06 usec/pass

Enough blogging, here it is: WorldMill 0.1.

Update: the new benchmarks involve a 50 record shapefile that's distributed with WorldMill, hence the timing differences from the previous benchmarks.

OGR, Ctypes, and Cython, Again

My previous benchmarks were made using the original recipe ogr.py module from GDAL 1.3.2. I saw some numbers that compelled me to try to new Python bindings from GDAL 1.4.3. Here are the new results (same benchmark code) on that same machine:

sean@lenny:~/Projects$ python ogre_benchmark.py
Refinery (with ctypes)
4994133.40 usec/pass

Ogre (Cython/Pyrex)
593522.91 usec/pass

ogr.py (next generation bindings)
594103.50 usec/pass

Howard Butler's new bindings for 1.4.3 are about 15 times faster at this task than the old bindings for 1.3.2. I had no idea that performance was so improved. That's great news for people who want to keep using programs based on ogr.py. You'll notice that Ogre itself is 15% faster today thanks to conversion of some Python statements to C. I feel like I'm starting to know my way around Pyrex and Cython and can squeeze out some higher performance yet.

Ogre is as quick as the latest and greatest ogr.py, and does more work at the same price. For each collection iteration, Ogre is marshaling the OGR layer definitions into a collection schema, and marshals every OGR feature into a GeoJSON-like data structure using the collection object hook. Even with that overhead, Ogre is as fast as ogr.py. Remove the extras and it compares to ogr.py like this:

sean@lenny:~/Projects$ python ogre_benchmark.py
Ogre (no auto schema, no object hook)
443894.31 usec/pass

ogr.py (next generation bindings)
588154.32 usec/pass

That's 25% faster. Not bad at all, considering that better performance was never my motivation.

OGR, Ctypes, Cython

Leave now if you're not into Python extension programming and performance benchmarking: we're headed deep into the weeds.

Iterating over the features of a collection (or layer, if you will) and operating on them is one of a geo-processor's most fundamental patterns. We exercise this pattern over and over, and one of my design goals for Refinery is to do it more easily and more efficiently than it can be done with ogr.py. Recent correspondence with a Cython user convinced me that I should give it a try as a potentially faster way to wrap libgdal. Cython compiles Pyrex code (which is Python more or less) into C, which can then be compiled into a Python extension.

Last night I rewrote Refinery using Cython instead of ctypes and just finished getting the original tests to pass this morning. Is it faster? To find out, I benchmarked ogr.py, Refinery, and the new code I'm temporarily calling Ogre ("OGR, easier") using the timeit module. The Timer executes a block of statements in a manner that avoids the most common timing errors:

t = timeit.Timer(stmt=STATEMENTS)
print "%.2f usec/pass" % (1000000 * t.timeit(number=10)/10)

The Refinery and Ogre statements are basically the same:

w = ogre.workspace(PATH)
c = w[NAME]
for f in c:
    id = f.id

PATH is the filesystem path to a shapefile with 3784 features, NAME is the name of the data collection (or layer), and all I'm doing inside the iteration block is dereferencing the feature's id. The equivalent block of statements using ogr.py is:

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
layer.ResetReading()
while 1:
    feature = layer.GetNextFeature()
    if not feature:
        break
    id = feature.GetFID()
    feature.Destroy()
source.Destroy()

And now, the results (on a single Celeron processor machine that runs the Pystone benchmarks in 2.71):

Refinery (with ctypes)
4972288.61 usec/pass

Ogre (Cython/Pyrex)
682301.81 usec/pass

ogr.py (old-school bindings)
9183181.72 usec/pass

Ogre takes 180 microseconds per feature, about 7 times faster than Refinery, and about 13 times faster than ogr.py. I didn't benchmark the gdal module from the GeoDjango because I couldn't see anything that would make its performance much different from that of Refinery. I've heard that the QGIS community also has Python bindings for OGR in the works, but I'm not set up to build or benchmark those.

Cython makes for a faster module, but there is a price. It has to be compiled against the GDAL lib, which is a pain in the neck on Windows. It's non-standard, unlike ctypes, which means one more package to deploy before you can even start to build it. And while developing in the Pyrex language beats writing an extension in C (I've been there), it's still not Python. I've become dependent on the "import pdb; pdb.set_trace()" trick, and it's not an option with Cython. Still, the higher performance is rather enticing.

Yes, the cost of processing a feature could easily overshadow the per-feature cost of iteration, but here again there's potential for Cython to help in combination with Refinery or Ogre. The collection iterators have an object hook to which one could bind functions from compiled extension modules.

Curious? The code is here: Ogre-0.0.tar.gz.

Transactional Refinery

Talk about serendipity: repoze.tm. I was just thinking about data transactions for Refinery, and now there's a data manager interface I can implement. It's hard to keep up with the rush of fresh ideas coming out of Repoze.

WSGI aside, I think this would be a smooth transactional API:

>>> w = refinery.workspace('/tmp/data')
>>> collection = w['places']
>>> transaction = collection.begin()
>>> collection.features.append(feature)
>>> transaction.commit() # write to disk

Use an OGR in-memory feature store or new Python structure for uncommitted features? That I don't yet know.