Guerrilla SOA

If you are interested in applying Agile software development methodology to geographic or GIS Web services, you may also be interested in this talk by a couple of ThoughtWorks gurus: Does my bus look big in this?. In the second half, after examining the faults of conventional enterprise SOA, Martin Fowler and Jim Webber make analogies from Agile development practices and benefits to using the Web/REST for agile deployment and integration.

Feature query languages

Earlier this year I had a need to do some vector wrangling, an itch for a better Python API, and curiosity about Pyrex and Cython; egged on by Matt Perry, I wrote a package I called WorldMill. It's getting used out there, I'm pleased to say, and I'm getting feature requests and patches for feature attribute filters. The patch I'm holding right now enables filters using OGR's attribute filter language, a subset of SQL. For example, the following filter expressions passes all features with a numerical "value1" attribute greater than zero:

value1 > 0

It's a limited subset of SQL. You can't compare the values of different feature attributes, for one thing:

value1 > value2  # fail

Nevermind that. What concerns me right now as I face this WorldMill patch is that SQL may not be the right model at all for this sort of domain specific language. Certainly not for WorldMill, maybe not right for any modern abstract interface. Yes, GIS data still tends to be flat, tabular stuff -- shapefiles are still common and most organizations that have moved beyond them have advanced to a RDBMS (the ultimate state of being for data if you believe Martin and Paul). However, exceptional data is increasing. GML permits, even promotes, complex, nested data structures; GeoJSON does as well. KML and Atom are not tabular formats, and "live" on the Web, not in a RDBMS. The mere existence of SPARQL says to me that SQL doesn't cut it for non-tabular data. I'm not sure I want WorldMill to span the universe of non-RDBMS storage, but a fraction of it for sure, hence my lack of enthusiasm about SQL-based feature filters.

What are the alternatives? MapServer expressions? Too quirky. GQL? Not much point since App Engine already has an API. OGC XML filters or a Python embodiment thereof? Good grief, no. Python expressions like I used in PCL? These days I prefer something more generic. Could Javascript be it? Fits GeoJSON like a glove, obviously; works in the browser; it's the view/query language for CouchDB, a popular non-relational data store (seriously, check out CouchDB views and tell me that isn't a neat paradigm), so it's available on the server or in the cloud as well. The Python-Spidermonkey project may have reawakened just in time for me. At any rate, I'm very curious how other folks evaluate filter and query languages.

Comments

Re: Feature query languages

Author: Alex Willmer

Perhaps css selectors (as used in jquery) or xquery expressions. Whatever you choose, better to copy an existing syntax than create something new.

CouchDB

Author: Stefano Costa

Just yesterday I was trying to understand whether CouchDB could be used to store geothings and their non-geo attributes. Its non relational and distributed model looks very attractive for a number of reasons. I found this general introduction http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-and-geodata%3A2008-05-03%3Aen%2Cgeo%2CCouchDB but it's not that interesting because the author doesn't give any reference to actual implementations of CouchDB as geodatabase. If I had more experience both with CouchDb and geodatabases probably I could investigate by myself...

Re: Feature query languages

Author: Martin Davis

Great minds think alike (but not always in the same language). I've been working on a SQL-based query language which can leverage all the great Java geospatial APIs out there. It's called JEQL - web site is http://tsusiatsoftware.net/jeql/main.html I chose the follow the SQL paradigm for several reasons: - familiar, mature, proven utility - good harmony with important projects such as PostGIS - I think the SQL paradigm has a lot of juice in it even when processing hierarchical data. See for instance Oracle's handling of XML data - maybe not quite as pretty as you'd like, but quite powerful. Also see the ability to express queries over relational hierarchies. I'm hopeful that this paradigm could be extended to allow querying over full graphs - which is required for some kinds of geometric processing. - SQL has some ugly warts which people always stumble over, but I was curious to see whether these could be resolved with better language functionality. So far I'm happy that they can... However, like you I do have an eye out for a better query paradigm which would provide cleaner handling of hierachical data. So far the best one I've seen is XQuery. However, IMO it is too bogged down in XML syntactic details. I want something which is representation-independent. XSLT is even worse - no-one should be forced to code in XML! As far as I can see SPARQL is both too limited and too general - it operates over RDF triples, but can process arbitrary graph structures. It's not clear to me that this would improve the ability to handle tabular data, let alone hierarchical data. The problem with very general languages is that their expressiveness is necessarily limited. Also, if I understand correctly SPARQL itself has fairly limited power - it needs an associated inference engine to provide more smarts. Fascinating subject, though. It seems suspicious to me that after alll these years no more powerful query language than SQL (or at least the relational paradigm) has gained widespread acceptance. Of course, this is not a proof that no such thing exists - but it makes me very cautious about spending a lot of energy trying to invent such a thing myself.

Re: Feature query languages

Author: Martin Davis

Another comment, on using Javascript (or any procedural language) as a filter or query language. Seems to me this has a few potential issues: - you are tied to the semantics of the underlying language for your data model - the expressions are not declarative (and thus hard to optimize) - you end up with a mixture of language syntax db40 has used a similar approach in Java. I think they do some clever tricks to do some optimization, but you're never going to get the kinds of deep optimization that any good SQL engine does.

Re: Feature query languages

Author: Sean

Martin, JEQL looks interesting. I read the CSV to KML example. It occurs to me that a DSL based on Ruby (JRuby in your case) might be more appreciated for all kinds of reason these days. I agree with you that SQL hits the sweet spot for relational data, but it's less useful for more loosely structured data, and it seems like there will be more and more of that in the future. I'd like to be able to manipulate loosely structured data in a more natural (even elegant) way than loading it into a RDBMS for processing and exporting the results.

Penstemonium

This is P. Strictus, perhaps the most beautiful perennial wildflower of the Mountain West, just beginning to bloom today. We've got several of these around the yard, grown from seed I collected near Granby, CO in 2006. The neighborhood bees are also big fans, and last year inspired our little toddler to yell, "the bees are going in the tunnels"! Indeed. I'm rather pleased at how this shot turned out.

http://farm4.static.flickr.com/3190/2574374006_c16c1f45c1_d.jpg

In the background you can see a crimson eruption of P. eatonii, a native of the Colorado Plateau.

In other news, my wonderful gig at UNC is up. I've got a break before my next one starts (continuing Pleiades and getting more into digital humanities), and plan to spend some of it on vacation, some of it in the garden, some of it getting back into home brewing (beer for sure, electronics maybe a little), and some of it on cool Web projects.

Comments

Re: Penstamonium

Author: mpd

Don't mix electronics and beer!

Re: Penstemonium

Author: Sean

I got an introduction to Arduino at THATCamp and I'm hooked. I can think of a number of ways to use it in a nano-brewery.

Re: Penstemonium

Author: mpd

Electronics first, beer second is OK. The other way round is not so good.

Atom as service oriented architecture

Andrew Turner's act of data liberation reminded me that I'd made a similar point at THATCamp. Web applications are often coupled directly to a database as shown on the left of the diagram below, and other applications on the Web that can't access the database must scrape data from the primary app (illustrated by a dashed scrape-scrape-scape line). A better architectural design pattern is shown on the right of the diagram: use Atom or KML (especially for geographic apps) as a general purpose service layer to which many apps (including the cool apps of the future 2 and 3) can connect. The New York Public Library is one institution using this design, and my sense from attending the mashup session at THATCamp is that there will soon be others.

http://sgillies.net/images/atom-guerrilla-soa.png

This is not entirely new to GIS architects. If you subscribe exclusively to the OGC service architecture, you would of course use a WFS instead, but Atom has the advantages of being more generic and more attuned to the architecture of the Web itself.

Comments

Re: Atom as service oriented architecture

Author: Bryan

... which is of course why I'm trying to wrap my GML (and most everything else) in an atom service layer ... Spot on.

I'm sabotaging the fight for a sustainable climate?

Following up on the OSGeo Python API discussion (nothing new there), I stumbled onto this warning that Atom and JSON may be sealing our doom:

Some of the issues that are are attracting a lot of effort are about simplifying spatial data (GeoRSS, GeoJSON, BXFS etc). These appear to be about catering to the 'pretty picture' use of spatial information.

GeoRSS and GeoJSON are about disseminating data effectively on the Web and designing for serendipitous reuse. GeoRSS is a hypertext format for building distributed applications. GeoJSON is a wire format for passing geometries between clients and servers of "Web 2.0" applications. Both of these features will be increasingly important to future research applications, and GML is well suited to neither of them.

I'm regularly seeing serious efforts to address the analysis use of spatial data (e.g. GML 3 and complex features) ridiculed.

I have seen the complexity of GML ridiculed, yes, but I have never seen scientific analysis itself ridiculed by the same parties.

Meanwhile 2050 is fast approaching, if we are to believe the climate change predictions.

No, GML skepticism does not equate to ambivalence about science or the sustainability of our planet.

Comments

Re: I'm sabotaging the fight for a sustainable climate?

Author: Chester Latham

The only problem I've had with global climate data is researchers and governments not making it available. You'd think they'd want the sun shining on their important information.

Re: I'm sabotaging the fight for a sustainable climate?

Author: Bryan

Chester: Exactly what global climate data can't you get? Try http://www.ipcc-data.org/ As usual this particular issue manages to get my attention. I'm depressed but having my cake and eating it too!

Re: I'm sabotaging the fight for a sustainable climate?

Author: Sean

Bryan, you are exactly right in that post. Atom and GML aren't exclusive.

Everyone's a historian now

All of my non-humanities readers know that there is historical content on Wikipedia cribbed from previously published information, but may not be aware of other "croud-sourcing" scholarship that is producing entirely new historical information. Stephen Mihm's article in the Boston Globe highlights several different projects using the Internet and its communities to examine American History in new ways. It's too bad there aren't more crowds of ancient Greeks and Romans around today to source like this.

Via Dan Cohen, who I missed at the NEH/CNR last October, but get to meet this weekend at THATCamp.

AtomPub will drink WFS's milkshake

Sooner or later. If you don't believe me, read Dare Obasanjo:

... Back in the RSS vs. Atom days I used to get frustrated that people were spending so much time reinventing the wheel with an RSS clone when the real gaping hole in the infrastructure was a standard editing protocol. It took a little longer than I expected (Sam Ruby started talking about in 2003) but the effort has succeeded way beyond my wildest dreams. All I wanted was a standard editing protocol for blogs and content management systems and we've gotten so much more.

Structurally, GIS data is pretty mundane, tabular stuff, and a good fit for AtomPub. Unless the tooling coming out of Microsoft turns out to be vapor and Google's push to become the ITCZ of cloud computing bonks, AtomPub will be ubiquitous. You'll be using it for sound, photos, video, contacts, social data -- and you'll be using it for spatial data too.

Comments

Re: AtomPub will drink WFS's milkshake

Author: Roberto

Unless (...) Google's push to become the ITCZ of cloud computing bonks...
Made my day! :)

xISBN and REST

Looking at the xISBN service docs tonight while running Pleiades data import tests, I see that there is support for "REST-ful" short URLs in version 1 and cool URIs in version 2. That seems pretty cool to me, so let's try one out:

sean@lenny:~$ curl -v http://xisbn.worldcat.org/webservices/xid/isbn/0596002815
> GET /webservices/xid/isbn/0596002815 HTTP/1.1
> Host: xisbn.worldcat.org
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Content-Type: text/xml;charset=UTF-8
< Content-Length: 305
< Date: Wed, 28 May 2008 06:04:12 GMT
<
<?xml version="1.0" encoding="UTF-8"?>
<rsp xmlns="http://worldcat.org/xid/isbn/" stat="ok">
<isbn    >0596002815</isbn>
<isbn    >1565928938</isbn>
<isbn    >1565924649</isbn>
<isbn    >0596513984</isbn>
<isbn    >2841770893</isbn>
<isbn    >1600330215</isbn>
<isbn    >8371975961</isbn>
</rsp>

Looks good (links would be even better), but that "stat" attribute on the rsp element smells a little fishy ... let's try a bogus ISBN:

sean@lenny:~$ curl -v http://xisbn.worldcat.org/webservices/xid/isbn/05960bogus
> GET /webservices/xid/isbn/05960bogus HTTP/1.1
> Host: xisbn.worldcat.org
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Content-Type: text/xml;charset=UTF-8
< Content-Length: 103
< Date: Wed, 28 May 2008 06:13:38 GMT
<
<?xml version="1.0" encoding="UTF-8" ?>
<rsp xmlns="http://worldcat.org/xid/isbn/" stat="invalidId"/>

The response has a status code 200, which should mean success, but there's an error code specific to xISBN in the representation: "invalidID". The underlying RPC nature of xISBN leaks through the cool URI abstraction in this situation. That request has to result in a 404 ("Not Found") if xISBN is going to be RESTful, right?

Speaking of curl, Mark Nottingham points out a potential pitfall and collects a bunch of useful comments here.

Comments

Re: xISBN and REST

Author: erich

xISBN successfully found a result for the bogus isbn, and the result was that it was bogus. You have a problem with that???

Planet Geospatial

Did James Fee really reboot? I'm not seeing a difference.

It's weird what an influence his site has on the informal discourse of the GIS business. It's hugely popular with the GIS community. I've heard from James that the traffic is big. It's my greatest referrer by far, about 3x what I get from Google searches. I'm not sure how a new blogger would get found in the past few years without emailing James. I've depended on Planet Geospatial to find new bloggers (Dan Shoutis and Regina Obe, to name a couple -- thanks, James!), but finding this signal in all the noise is increasingly frustrating. Only a very small volume of Planet Geospatial interests me, so little that my Greasemonkey script has become useless -- it often wipes the page clean.

(Note: I've probably benefited from Planet Geospatial as much as anyone: it was Planet Me-o-spatial for a while after the last reboot because he, Andrew Turner, and Howard Butler, IIRC, weren't posting nearly as much as I was at the time. A lot of those readers have stuck around (subscribed, even), if only for the spectacle of it -- watching me rip on OSGeo, tear into the OGC's service architecture, bite off chicken heads, or whatever.)

James has said he doesn't want to be a "gatekeeper", and maybe we can spread the load around a bit more. New referring planets have appeared in my logs: Planet OSGeo (funny, because I'm possibly the only unaffiliated open source person in the world), Planet PerryGeo, the Spatial Galaxy planet. I'd like to see new bloggers email the admins of those sites as well and give folks like me new and different places to discover new blogs.

Comments

Re: Planet Geospatial

Author: Matt Perry

I didn't know anyone had actually found that planetperrygeo page.. honestly I had kind of forgotten about it. Good to see the cron job is still working! Maybe now i'll have some motivation to fix that css.

Re: Planet Geospatial

Author: James Fee

The first rule of Planet Geospatial is - you do not talk about Planet Geospatial.

Re: Planet Geospatial

Author: Sean

You could remove me ... I'm sorta curious about the impact that would have. I suspect most of the audience would applaud.

Re: Planet Geospatial

Author: James Fee

And what, be left with Google Maps satellite photo updates?

Re: Planet Geospatial

Author: Sean

And yes, blogging about Planet Geospatial is lame. I won't make a habit of it.

Re: Planet Geospatial

Author: Regina

Sean, Glad to see someone reads my blog. Makes me feel inspired to update it. I have to admit I was feeling kind of depressed after being taken off of James' list.