Martin Davis's post reminds me that the GIS industry, or at least the open source corner of it, still trails the Web community in thinking about data. I teased Paul Ramsey about baking in the assumption of SQL last summer. There wasn't a presentation at FOSS4G 2007 that questioned the RDBMS paradigm. N = 1 thinking prevails.
James Fee has been having fun with buzzwords and phrases. "GeoWeb" (or "Geospatial Web", or "Geospatial Semantic Web") is the one that really gets to me. I've used it a bit, and always feel like a snake oil salesman when I do. It's marketing: anybody who uses it without scare quotes is probably trying to sell you something.
The meaning of the term "World Wide Web" has also become diluted with time, but it does have a documented architecture that geeks like me can rally to. I know people who hate to read this, and feel that it verges on FUD, but the GIS industry has not been big on Web architecture. Our services are essentially (I'm singling out WFS here) relational database create, read, update, and delete (CRUD) operations wrapped in XML, tunneling over HTTP. True, a lot of the conventional non-geo Web is like this too, but it's the resources that link and provide hypertext navigation that make the Web into a singular application.
On the brighter side, I do think that KML is putting the "Web" back in "GeoWeb" by encouraging people to develop geospatial applications that use the architecture of the Web. Google obviously benefits from growing the Web (there's the marketing element again), but a return to Web architecture is also a win for the rest of us.
A Chicago Crime retrospective.
Frugosapalooza is a series of open source GIS meetups on the Front Range of Colorado. Brian Timoney has one set for Denver (19 Feb), I have a tentative date (26 or 27 Feb) for one here in Fort Collins, and Boulder and Colorado Springs are the other likely candidates. Watch the wiki page for details.
It's great to see more people in the geospatial community thinking outside the SOAP box. Still, there's more to REST than HTTP and plain old XML (POX) or JSON. Fuzzy's service architecture, typified by requests like:
GET v1/ws_geo_getextent.php?geotable=places&srid=4269¶meters=id=10 HTTP/1.1 Host: webservices.example.com
is the venerable REST-RPC hybrid explained in Chapter 1 of RESTful Web Services. This architecture is okay for read-only services, but doesn't easily accommodate creation, update, or deletion of resources (it doesn't actually expose any resources at all), and sets you up for troubles like those of SimpleDB. Even if you dodge the pitfall of:
GET v1/ws_geo_deleterow?geotable=places¶meters=id=10 HTTP/1.1 Host: webservices.example.com D'oh!
by properly using HTTP DELETE, you've lost uniform interface to your resources and no longer have a RESTful architecture.
Seems like a cool project. A little restraint in hyping REST is all I'm suggesting.
Like Shapely (1.0 release last Friday; toot, toot!), Rtree is designed to be a specialized, highly-reusable Python interface to an industrial-strength library. It doesn't do formats. It doesn't do projections. It's not a CGI program. It's a building block that does one thing well and otherwise stays out of your way. It indexes spatial data and provides query mechanisms, and that's all it does.
I've added Python 2.5 to the buildout. It adds a minute or two to the build time, but gives you a much more isolated environment in which to jam. I also found a work-around for the issue reported in zc.buildout bugs 110133 and 138260: building WordMill requires Cython in the working set of the custom python, something that cannot be accomplished using zc.recipe.egg. What I've done is used Kai's hexagonit.recipe.download to fetch the Cython source, and then used iw.recipe.cmd to install it into the buildout's custom python. See the cython-src and cython-install sections in buildout.cfg. If you've already fetched Gdawg once, I recommend you discard it and clone a fresh copy.
I just added Shapely 1.0 and Rtree 0.4 to the Gdawg buildout, where they join WorldMill 0.1. Together they create a friendly environment on the C Python platform where you can read GIS feature data, spatially index it, and manipulate its geometries. (Sorry, Windows users are out of luck until the next Rtree and WorldMill releases. Patches are welcome.)
Again, getting Mercurial (Hg) is as easy as
on a Debian/Ubuntu system. Check the downloads page for other installers (Gdawg does build on a Mac). After you've installed Hg, clone my repo, and build it out:
$ hg clone http://sgillies.net/sgillies/hg/gdawg my-gdawg $ cd my-gdawg $ python bootstrap.py $ ./bin/buildout
It could take up to 15 minutes to build GEOS and GDAL. In the meanwhile, grab some data to play with. I downloaded the Zillow Colorado neighborhoods and extracted them into /tmp to see if they lived up to the hype. When the buildout script finishes, start up the custom Python interpreter
I need to make a funny preamble someday, but there's the Python prompt. To begin, let's create a WorldMill workspace:
>>> from mill import workspace >>> ws = workspace('/tmp/ZillowNeighborhoods-CO.zip_FILES') >>> ws <mill.workspace.Workspace object at ...>
Which allows me to use the only Italian I haven't yet forgotten: va bene. A workspace is a mapping of collections, as you can see here:
Access the neighborhoods collection and inspect it briefly:
>>> co = ws['ZillowNeighborhoods-CO'] >>> co <mill.collection.Collection object at ...> >>> len(co) 95 >>> co.schema [('STATE', 4), ('COUNTY', 4), ('CITY', 4), ('NAME', 4), ('REGIONID', 2)]
95 neighborhoods (of note) in Colorado, 5 attributes per feature, all of them strings except for REGIONID, which is an int. Let's look now at the first neighborhood feature:
>>> x = co['0'] >>> x <mill.feature.Feature object at ...> >>> x.id '0' >>> x.properties['NAME'] 'Crossroads' >>> x.properties['CITY'] 'Boulder'
Hmm, they misspelled "Shelbyville". If you were to access x.geometry at the prompt, you'd get a small binary flood. By default, unless an object hook has been specified, feature geometry is expressed as WKB (Long/Lat). Let's now set an object hook for this geometry so that we get features with Shapely geometries:
>>> from mill.feature import Feature >>> from shapely.wkb import loads >>> def shapely_feature(id, properties, wkb): ... return Feature(id, properties.copy(), loads(wkb)) ... >>> co.object_hook = shapely_feature
The only requirement on the object hook is that it be a callable with 3 positional parameters. Now, get the first feature again:
>>> x = co['0'] >>> x.id '0' >>> x.geometry <shapely.geometry.polygon.Polygon object at ...> >>> x.geometry.bounds (-105.26320656676199, 40.010885702215496, -105.243224298828, 40.038423410870699)
And that's how you integrate Shapely and WorldMill. Now, how about spatially indexing the neighborhoods using Rtree? First, create a named index that will be persisted on disk next to the shapefile data:
>>> from rtree import Rtree >>> index = Rtree('/tmp/ZillowNeighborhoods-CO.zip_FILES/ZillowNeighborhoods-CO')
Then iterate over features in the collection, adding each to the index in turn:
Pretty fast, eh? Now let's find some of the "Crossroads" feature's neighbors by putting its bounding box back into an intersection query:
The index returns Python longs, but we can get the corresponding features from the collection like so:
>>> neighborhoods = [co[str(uid)] for uid in index.intersection(b)] >>> from pprint import pprint >>> names = [n.properties['NAME'] for n in neighborhoods] >>> pprint(names) ['North Boulder', 'Colorado University', 'Crossroads', 'Southeast Boulder', 'East Boulder', 'Palo Park', 'Central Boulder']
Incidentally, there don't appear to be any Fort Collins neighborhoods:
>>> index.intersection((-105.09, 40.58, -105.08, 40.59))  >>> [f.id for f in co.all if f.properties['COUNTY'] == 'Larimer'] 
which is fine because we don't really need anybody else moving here unless they are going to open a nice old world bakery or cheese shop downtown.
If I may say so, Shapely, Rtree, and WorldMill are just about the best trio since D. Boon, George Hurley, and Mike Watt.
By Simon Willison: Django People. Using GeoDjango, maybe? I'm tempted to smash together a Grok version.
Oh yeah, we get White-crowned Sparrows too, for a week or two later in the Spring.
I want to reassure Paul that I am in fact feeding seeds to the little birds, so here is a picture of McNutty the Red-breasted Nuthatch, which I took just a few minutes ago with my little PowerShot. He's giving me that annoyed look because he spent the night outside at -5F and I'm sticking the camera right in his face. 4 inches away (or approximately 20 centimeters according to my NASA Mars Mission unit conversion table). These little birds are fearless. If I had their agility I probably would be too. Nevermind Spiderman: Nuthatchman could easily kick his creepy arachnoid ass.