If you are interested in applying Agile software development methodology to geographic or GIS Web services, you may also be interested in this talk by a couple of ThoughtWorks gurus: Does my bus look big in this?. In the second half, after examining the faults of conventional enterprise SOA, Martin Fowler and Jim Webber make analogies from Agile development practices and benefits to using the Web/REST for agile deployment and integration.
Earlier this year I had a need to do some vector wrangling, an itch for a better Python API, and curiosity about Pyrex and Cython; egged on by Matt Perry, I wrote a package I called WorldMill. It's getting used out there, I'm pleased to say, and I'm getting feature requests and patches for feature attribute filters. The patch I'm holding right now enables filters using OGR's attribute filter language, a subset of SQL. For example, the following filter expressions passes all features with a numerical "value1" attribute greater than zero:
It's a limited subset of SQL. You can't compare the values of different feature attributes, for one thing:
Nevermind that. What concerns me right now as I face this WorldMill patch is that SQL may not be the right model at all for this sort of domain specific language. Certainly not for WorldMill, maybe not right for any modern abstract interface. Yes, GIS data still tends to be flat, tabular stuff -- shapefiles are still common and most organizations that have moved beyond them have advanced to a RDBMS (the ultimate state of being for data if you believe Martin and Paul). However, exceptional data is increasing. GML permits, even promotes, complex, nested data structures; GeoJSON does as well. KML and Atom are not tabular formats, and "live" on the Web, not in a RDBMS. The mere existence of SPARQL says to me that SQL doesn't cut it for non-tabular data. I'm not sure I want WorldMill to span the universe of non-RDBMS storage, but a fraction of it for sure, hence my lack of enthusiasm about SQL-based feature filters.
This is P. Strictus, perhaps the most beautiful perennial wildflower of the Mountain West, just beginning to bloom today. We've got several of these around the yard, grown from seed I collected near Granby, CO in 2006. The neighborhood bees are also big fans, and last year inspired our little toddler to yell, "the bees are going in the tunnels"! Indeed. I'm rather pleased at how this shot turned out.
In the background you can see a crimson eruption of P. eatonii, a native of the Colorado Plateau.
In other news, my wonderful gig at UNC is up. I've got a break before my next one starts (continuing Pleiades and getting more into digital humanities), and plan to spend some of it on vacation, some of it in the garden, some of it getting back into home brewing (beer for sure, electronics maybe a little), and some of it on cool Web projects.
Andrew Turner's act of data liberation reminded me that I'd made a similar point at THATCamp. Web applications are often coupled directly to a database as shown on the left of the diagram below, and other applications on the Web that can't access the database must scrape data from the primary app (illustrated by a dashed scrape-scrape-scape line). A better architectural design pattern is shown on the right of the diagram: use Atom or KML (especially for geographic apps) as a general purpose service layer to which many apps (including the cool apps of the future 2 and 3) can connect. The New York Public Library is one institution using this design, and my sense from attending the mashup session at THATCamp is that there will soon be others.
This is not entirely new to GIS architects. If you subscribe exclusively to the OGC service architecture, you would of course use a WFS instead, but Atom has the advantages of being more generic and more attuned to the architecture of the Web itself.
Following up on the OSGeo Python API discussion (nothing new there), I stumbled onto this warning that Atom and JSON may be sealing our doom:
Some of the issues that are are attracting a lot of effort are about simplifying spatial data (GeoRSS, GeoJSON, BXFS etc). These appear to be about catering to the 'pretty picture' use of spatial information.
GeoRSS and GeoJSON are about disseminating data effectively on the Web and designing for serendipitous reuse. GeoRSS is a hypertext format for building distributed applications. GeoJSON is a wire format for passing geometries between clients and servers of "Web 2.0" applications. Both of these features will be increasingly important to future research applications, and GML is well suited to neither of them.
I'm regularly seeing serious efforts to address the analysis use of spatial data (e.g. GML 3 and complex features) ridiculed.
I have seen the complexity of GML ridiculed, yes, but I have never seen scientific analysis itself ridiculed by the same parties.
Meanwhile 2050 is fast approaching, if we are to believe the climate change predictions.
No, GML skepticism does not equate to ambivalence about science or the sustainability of our planet.
All of my non-humanities readers know that there is historical content on Wikipedia cribbed from previously published information, but may not be aware of other "croud-sourcing" scholarship that is producing entirely new historical information. Stephen Mihm's article in the Boston Globe highlights several different projects using the Internet and its communities to examine American History in new ways. It's too bad there aren't more crowds of ancient Greeks and Romans around today to source like this.
Sooner or later. If you don't believe me, read Dare Obasanjo:
... Back in the RSS vs. Atom days I used to get frustrated that people were spending so much time reinventing the wheel with an RSS clone when the real gaping hole in the infrastructure was a standard editing protocol. It took a little longer than I expected (Sam Ruby started talking about in 2003) but the effort has succeeded way beyond my wildest dreams. All I wanted was a standard editing protocol for blogs and content management systems and we've gotten so much more.
Structurally, GIS data is pretty mundane, tabular stuff, and a good fit for AtomPub. Unless the tooling coming out of Microsoft turns out to be vapor and Google's push to become the ITCZ of cloud computing bonks, AtomPub will be ubiquitous. You'll be using it for sound, photos, video, contacts, social data -- and you'll be using it for spatial data too.
Looking at the xISBN service docs tonight while running Pleiades data import tests, I see that there is support for "REST-ful" short URLs in version 1 and cool URIs in version 2. That seems pretty cool to me, so let's try one out:
sean@lenny:~$ curl -v http://xisbn.worldcat.org/webservices/xid/isbn/0596002815 > GET /webservices/xid/isbn/0596002815 HTTP/1.1 > Host: xisbn.worldcat.org > Accept: */* > < HTTP/1.1 200 OK < Server: Apache-Coyote/1.1 < Content-Type: text/xml;charset=UTF-8 < Content-Length: 305 < Date: Wed, 28 May 2008 06:04:12 GMT < <?xml version="1.0" encoding="UTF-8"?> <rsp xmlns="http://worldcat.org/xid/isbn/" stat="ok"> <isbn >0596002815</isbn> <isbn >1565928938</isbn> <isbn >1565924649</isbn> <isbn >0596513984</isbn> <isbn >2841770893</isbn> <isbn >1600330215</isbn> <isbn >8371975961</isbn> </rsp>
Looks good (links would be even better), but that "stat" attribute on the rsp element smells a little fishy ... let's try a bogus ISBN:
sean@lenny:~$ curl -v http://xisbn.worldcat.org/webservices/xid/isbn/05960bogus > GET /webservices/xid/isbn/05960bogus HTTP/1.1 > Host: xisbn.worldcat.org > Accept: */* > < HTTP/1.1 200 OK < Server: Apache-Coyote/1.1 < Content-Type: text/xml;charset=UTF-8 < Content-Length: 103 < Date: Wed, 28 May 2008 06:13:38 GMT < <?xml version="1.0" encoding="UTF-8" ?> <rsp xmlns="http://worldcat.org/xid/isbn/" stat="invalidId"/>
The response has a status code 200, which should mean success, but there's an error code specific to xISBN in the representation: "invalidID". The underlying RPC nature of xISBN leaks through the cool URI abstraction in this situation. That request has to result in a 404 ("Not Found") if xISBN is going to be RESTful, right?
Speaking of curl, Mark Nottingham points out a potential pitfall and collects a bunch of useful comments here.
Did James Fee really reboot? I'm not seeing a difference.
It's weird what an influence his site has on the informal discourse of the GIS business. It's hugely popular with the GIS community. I've heard from James that the traffic is big. It's my greatest referrer by far, about 3x what I get from Google searches. I'm not sure how a new blogger would get found in the past few years without emailing James. I've depended on Planet Geospatial to find new bloggers (Dan Shoutis and Regina Obe, to name a couple -- thanks, James!), but finding this signal in all the noise is increasingly frustrating. Only a very small volume of Planet Geospatial interests me, so little that my Greasemonkey script has become useless -- it often wipes the page clean.
(Note: I've probably benefited from Planet Geospatial as much as anyone: it was Planet Me-o-spatial for a while after the last reboot because he, Andrew Turner, and Howard Butler, IIRC, weren't posting nearly as much as I was at the time. A lot of those readers have stuck around (subscribed, even), if only for the spectacle of it -- watching me rip on OSGeo, tear into the OGC's service architecture, bite off chicken heads, or whatever.)
James has said he doesn't want to be a "gatekeeper", and maybe we can spread the load around a bit more. New referring planets have appeared in my logs: Planet OSGeo (funny, because I'm possibly the only unaffiliated open source person in the world), Planet PerryGeo, the Spatial Galaxy planet. I'd like to see new bloggers email the admins of those sites as well and give folks like me new and different places to discover new blogs.