Earlier this year I had a need to do some vector wrangling, an itch for a better Python API, and curiosity about Pyrex and Cython; egged on by Matt Perry, I wrote a package I called WorldMill. It's getting used out there, I'm pleased to say, and I'm getting feature requests and patches for feature attribute filters. The patch I'm holding right now enables filters using OGR's attribute filter language, a subset of SQL. For example, the following filter expressions passes all features with a numerical "value1" attribute greater than zero:
value1 > 0
It's a limited subset of SQL. You can't compare the values of different feature attributes, for one thing:
value1 > value2 # fail
Nevermind that. What concerns me right now as I face this WorldMill patch is that SQL may not be the right model at all for this sort of domain specific language. Certainly not for WorldMill, maybe not right for any modern abstract interface. Yes, GIS data still tends to be flat, tabular stuff -- shapefiles are still common and most organizations that have moved beyond them have advanced to a RDBMS (the ultimate state of being for data if you believe Martin and Paul). However, exceptional data is increasing. GML permits, even promotes, complex, nested data structures; GeoJSON does as well. KML and Atom are not tabular formats, and "live" on the Web, not in a RDBMS. The mere existence of SPARQL says to me that SQL doesn't cut it for non-tabular data. I'm not sure I want WorldMill to span the universe of non-RDBMS storage, but a fraction of it for sure, hence my lack of enthusiasm about SQL-based feature filters.