Feature query languages

Earlier this year I had a need to do some vector wrangling, an itch for a better Python API, and curiosity about Pyrex and Cython; egged on by Matt Perry, I wrote a package I called WorldMill. It's getting used out there, I'm pleased to say, and I'm getting feature requests and patches for feature attribute filters. The patch I'm holding right now enables filters using OGR's attribute filter language, a subset of SQL. For example, the following filter expressions passes all features with a numerical "value1" attribute greater than zero:

value1 > 0

It's a limited subset of SQL. You can't compare the values of different feature attributes, for one thing:

value1 > value2  # fail

Nevermind that. What concerns me right now as I face this WorldMill patch is that SQL may not be the right model at all for this sort of domain specific language. Certainly not for WorldMill, maybe not right for any modern abstract interface. Yes, GIS data still tends to be flat, tabular stuff -- shapefiles are still common and most organizations that have moved beyond them have advanced to a RDBMS (the ultimate state of being for data if you believe Martin and Paul). However, exceptional data is increasing. GML permits, even promotes, complex, nested data structures; GeoJSON does as well. KML and Atom are not tabular formats, and "live" on the Web, not in a RDBMS. The mere existence of SPARQL says to me that SQL doesn't cut it for non-tabular data. I'm not sure I want WorldMill to span the universe of non-RDBMS storage, but a fraction of it for sure, hence my lack of enthusiasm about SQL-based feature filters.

What are the alternatives? MapServer expressions? Too quirky. GQL? Not much point since App Engine already has an API. OGC XML filters or a Python embodiment thereof? Good grief, no. Python expressions like I used in PCL? These days I prefer something more generic. Could Javascript be it? Fits GeoJSON like a glove, obviously; works in the browser; it's the view/query language for CouchDB, a popular non-relational data store (seriously, check out CouchDB views and tell me that isn't a neat paradigm), so it's available on the server or in the cloud as well. The Python-Spidermonkey project may have reawakened just in time for me. At any rate, I'm very curious how other folks evaluate filter and query languages.

Comments

Re: Feature query languages

Author: Alex Willmer

Perhaps css selectors (as used in jquery) or xquery expressions. Whatever you choose, better to copy an existing syntax than create something new.

CouchDB

Author: Stefano Costa

Just yesterday I was trying to understand whether CouchDB could be used to store geothings and their non-geo attributes. Its non relational and distributed model looks very attractive for a number of reasons. I found this general introduction http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-and-geodata%3A2008-05-03%3Aen%2Cgeo%2CCouchDB but it's not that interesting because the author doesn't give any reference to actual implementations of CouchDB as geodatabase. If I had more experience both with CouchDb and geodatabases probably I could investigate by myself...

Re: Feature query languages

Author: Martin Davis

Great minds think alike (but not always in the same language). I've been working on a SQL-based query language which can leverage all the great Java geospatial APIs out there. It's called JEQL - web site is http://tsusiatsoftware.net/jeql/main.html I chose the follow the SQL paradigm for several reasons: - familiar, mature, proven utility - good harmony with important projects such as PostGIS - I think the SQL paradigm has a lot of juice in it even when processing hierarchical data. See for instance Oracle's handling of XML data - maybe not quite as pretty as you'd like, but quite powerful. Also see the ability to express queries over relational hierarchies. I'm hopeful that this paradigm could be extended to allow querying over full graphs - which is required for some kinds of geometric processing. - SQL has some ugly warts which people always stumble over, but I was curious to see whether these could be resolved with better language functionality. So far I'm happy that they can... However, like you I do have an eye out for a better query paradigm which would provide cleaner handling of hierachical data. So far the best one I've seen is XQuery. However, IMO it is too bogged down in XML syntactic details. I want something which is representation-independent. XSLT is even worse - no-one should be forced to code in XML! As far as I can see SPARQL is both too limited and too general - it operates over RDF triples, but can process arbitrary graph structures. It's not clear to me that this would improve the ability to handle tabular data, let alone hierarchical data. The problem with very general languages is that their expressiveness is necessarily limited. Also, if I understand correctly SPARQL itself has fairly limited power - it needs an associated inference engine to provide more smarts. Fascinating subject, though. It seems suspicious to me that after alll these years no more powerful query language than SQL (or at least the relational paradigm) has gained widespread acceptance. Of course, this is not a proof that no such thing exists - but it makes me very cautious about spending a lot of energy trying to invent such a thing myself.

Re: Feature query languages

Author: Martin Davis

Another comment, on using Javascript (or any procedural language) as a filter or query language. Seems to me this has a few potential issues: - you are tied to the semantics of the underlying language for your data model - the expressions are not declarative (and thus hard to optimize) - you end up with a mixture of language syntax db40 has used a similar approach in Java. I think they do some clever tricks to do some optimization, but you're never going to get the kinds of deep optimization that any good SQL engine does.

Re: Feature query languages

Author: Sean

Martin, JEQL looks interesting. I read the CSV to KML example. It occurs to me that a DSL based on Ruby (JRuby in your case) might be more appreciated for all kinds of reason these days. I agree with you that SQL hits the sweet spot for relational data, but it's less useful for more loosely structured data, and it seems like there will be more and more of that in the future. I'd like to be able to manipulate loosely structured data in a more natural (even elegant) way than loading it into a RDBMS for processing and exporting the results.