2007 (old posts, page 9)

Google JSON and Geo

Interesting, but Google's JSON, literally transcribed from Atom, is almost exactly as cumbersome as XML (example here). Apparently, it's all about evading the browser's same origin policy. Google should throw its weight behind Douglas Crockford's JSONRequest proposal.

My Comment Policy

Recent events, summarized by Tim O'Reilly here, have prompted me to spell out a policy for comments on this blog. I encourage people to write critically about our community, industry, and media, but I reserve the right to remove pointless flamebait or personal attacks as well as spam. If an email address accompanies extremely poor commentary, I'll contact the author personally about making sensible edits or retractions. If not, I'll simply delete it. Fair enough?

Update: changed the wording a bit so as not to point at anyone. The comments I get here provide much, if not all, of my blog's value. I won't delete non-spam comments lightly, but I have no obligation to give jerks a voice here.

Comments

Re: My Comment Policy

Author: Jason Birch

No comment.

Re: My Comment Policy

Author: Andy

It's your blog you can do whatever you want with it. I'm not half as nice as you are. I delete comments that are even mildly annoying. Blogs are a way for us to get our thoughts and our views out where others can see them. They aren't in most cases a running dialog with the readers. Comments are a privilege.

RESTful Feature APIs

Update (2007-04-10): a related demo.

Update: added examples of query results.

My previous post about WxS, RPC, and REST raised a few questions about whether queries fit into a RESTful GIS. The answer is: yes, queries remain indispensible. Indexes are a valuable part of your GIS, and a query API provides web agents access to the indexes.

Consider a very minimal municipal GIS which tracks properties, or parcels. Each parcel has many attributes, and the GIS indexes, at the very least, the following: a unique parcel id, the name of the parcel's owner, and the geospatial footprint of the parcel. These indexes allow a user to efficiently find all properties owned by an individual, or find all properties potentially impacted by construction along a particular path.

In a RESTful GIS, each parcel is a resource, and has a URL like:

http://example.org/parcels/[id]

Dereferencing the URL http://example.org/parcels/1 returns a representation, JSON in this case:

{ "id":       "1",
  "owner":    "Homer Simpson",
  "geometry": {
    "type":        "Polygon",
    "coordinates": [[...], ...] },
  ...
}

The "parcels" feature type is itself a resource. A useful representation of this resource would be a collection that includes URIs for, and data about, individual parcel resources -- all marshalled directly out of the GIS's indexes:

{ "parcels": [
  { "id": 1, "uri": "http://example.org/parcels/1",
    "owner": "Homer Simpson",
    "bbox": [1000.0, 1000.0, 1001.0, 1001.0] },
  { "id": 2, "uri": "http://example.org/parcels/2",
    "owner": "Ned Flanders",
    "bbox": [1001.0, 1001.0, 1002.0, 1002.0] },
  ...
  ]
}

That collection might easily include the precise footprints of properties, but we'll simply consider bounding boxes here.

A query API should return criteria-based subsets of that collection, leveraging the system's indexes. Which properties are going to be condemned to make way for the new monorail?:

GET /parcels/?bbox=0,0,2000,2000

The answer is: the parcels with URIs http://example.org/parcels/42 and http://example.org/parcels/83:

{ "parcels": [
  { "id": 42, "uri": "http://example.org/parcels/42",
    "owner": "Moe Szyslak",
    "bbox": [...] },
  { "id": 83, "uri": "http://example.org/parcels/83",
    "owner": "Kwik-E-Mart Corporation",
    "bbox": [...] }
  ]
}

Which properties are suffering catastrophic loss in value?:

GET /parcels/?adjacent(owner(Simpson))

The answer is: the parcel with URI http://example.org/parcels/2:

{ "parcels": [
  { "id": 2, "uri": "http://example.org/parcels/2",
    "owner": "Ned Flanders",
    "bbox": [...] }
  ]
}

The specific query parameters or URL templates to use are an implementation detail that I won't get into here (OpenSearch seems promising).

The gist of all this is that a RESTful feature query returns key, indexed data about feature resources along with a URI to the feature resources themselves in the same way that a Google Search returns data from its index, with links, instead of dumping the entire Web into your browser.

Comments

Re: RESTful Feature APIs

Author: Paul Ramsey

What happens when I have 2000000 parcels? Unlike web pages, "databasey" resources don't get automatically scaled to be of reasonable size. Do I end up with a system where I only want people to talk to the data via the query mechanism, because anything else would be too clumsy? At that point, who cares that I have resources?

Re: RESTful Feature APIs

Author: Sean

Paul, I don't understand what you mean by automatic scaling, and I don't understand quite where your concern about large quantities comes from. Are you talking about querying against millions of parcels, or about rendering millions of parcels into an image?

Re: RESTful Feature APIs

Author: Paul Ramsey

I mean, when I have 2M parcels, doing GET /parcels no longer returns me something remotely useful -- it will be too big, too slow, or both. I am forced to use the query API to do any useful action with the data.

Re: RESTful Feature APIs

Author: Christopher Schmidt

Paul: I don't see a URL of /parcels/ on its own anywhere in this post. There's no need to get a list of everything: when you want a larger-than-one subset of the data, you do a query via the query mechanism. The point is that each parcel has *a* URL: so when I query ?bbox=0,0,10,10, I get a list of parcels back, which I can always address in the future to get all the information about *a* feature back. So the answer to your question is probably "yes": You always find the list of parcels you're interested in via the query mechanism, but once you have it, you can put it anywhere else you want. At least, that's what I understand. I don't have any GIS data to speak of. :)

Re: RESTful Feature APIs

Author: Jason Birch

I actually had the same question, and picked it up from this quote: " The 'parcels' feature type is itself a resource. A useful representation of this resource would be a collection that includes URIs for, and data about, individual parcel resources " As much as I agree that it would ultimately be most useful if the parcels resource returned a complete list (it kinda reduces the discoverability of the resources if you don't) I can't see this working for me even with a small volume of parcels (35,000). In my playing around, I think I'm going to do is return an HTML formatted page, with OpenSearch links, and also with form elements describing all of the search APIs. The only restriction with the OpenSearch links that I can see is that it assumes that you apply a different URI for each content-type that you return rather than using an appropriate "Accepts" header. The only workaround that I can see is hacking an "&force_type=application/x-json" (or whatever the correct content type is) parameter at the end of the string. This seems a bit RESTless though... I guess if this is combined with proper header sniffing for intelligent clients, it's acceptable though? I think that with this strategy it would be relatively easy to provide JSON, GeoRSS, and HTML (microformat too) versions of the individual resources. I think I'll also include alternate links in the html representations, pointing to the JSON and GeoRSS representations, and maybe also to an image/png representation for a quickie map of the parcel. Hmm. For the JSON representation and search results, what kind of representation would work best for GeoJSON? EWKT? Jason

Re: RESTful Feature APIs

Author: Sean

You're right that the HTML representation of a feature type shouldn't be a list of thousands or millions of links, but if you want to get in Google's spatial index of the Web you will need a representation that does list everything. In my case, a KML variant serves this purpose, and the default HTML is simply a list of links to subsets of the full listing. The JSON content type is application/json, and consensus is building around geometry coordinates expressed as arrays (or arrays of arrays) of numeric values instead of WKT. I don't recommend getting too wound up in content negotiation until we have user agents that actually show a preference. Google Earth, for example, doesn't give KML a higher q value than it gives HTML.

Paging anyone?

Author: Mark Fredrickson

Why not just page the results? This is the technique that keeps HTML pages to manageable size. Provide a URI for the next page of data as part of the collection, and client systems should be able to pull down the next page without too many problems. That's the great thing about URIs in a RESTful implementation - they can be used in countless ways because their well understood. Of course, this is in the abstract, but I think it's a good place to start looking.

Re: RESTful Feature APIs

Author: Mark Fredrickson

As an example of paging, take a look at the ATOM spec (the canonical REST reference implementation): http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-14.html#partial-lists

Re: RESTful Feature APIs

Author: Sean

Yeah, that's what I've been looking at too.

Improving MapServer: a Specific Example

Earlier this month I made some hand-waving arguments for separating the concerns of MapServer's web application and cartographic engine on general principles. There is now a new MapServer development proposal that allows me to make a specific example.

It's been proposed that the MapServer web app could generate vanilla XML responses to GetFeature, GetFeatureInfo, and other WxS requests, and user-configured XSLT would provide a finishing touch to the responses. It's a nice idea. The early implementation plan was to load up existing configuration parameters with additional meaning. Stylesheet as metadata or something like that. Actually, specification of a XSLT stylesheet for transforming WxS responses has nothing to do with cartography, and nothing to do with map or layer metadata. It is solely a concern of mapserv, the web application. Therefore, let's configure it separately from the Map section of a mapfile. Transformation of a WFS GetFeature response might be done like this:

# The traditional cartographic config section
Map
  # All the usual map layers, etc.
  ...
End

# *NEW* application config section
Application
WFS
  GetFeature
    XSLT       On
    Stylesheet "/mapserver/xslt/example.xsl"
  End
End

That's declarative, orthogonal, and crystal clear.

GeoDjango

I met some of the GeoDjango folks this weekend, and am looking forward to collaborating with them. The Python Cartographic Library is not just for Zope3, after all.

Comments

Re: GeoDjango

Author: Allan

And I was just ogling TurboGears... hmmm... time for anther look at Django.

GeoRSS and Antiquities

GeoRSS evangelism is part of what I do for Pleiades. The new support for GeoRSS in Google Maps is the spark that will set it off in the digital humanities. For example, check out this feed of Celtic coin finds from the British Museum's Celtic Coin Index:

http://www.finds.org.uk/CCI/functions/atom-export-all.php?denomination=Stater

Feed on the map

Comments

Re: GeoRSS and Antiquities

Author: Allan

Google Maps says: "Parts of http://www.finds.org.uk/CCI... could not be displayed because it is too large." There are 1309 items in the feed. I think this illustrates the difference between what's essentially a database expressed in GeoRSS and a "live" feed. A live feed might only have the 10 or so latest entries. That's not a problem for GeoRSS, but rather something to think about when designing feeds. As people start learning about GeoRSS, a set of conventions could arise to deal with this. Nevertheless, this is a great example and does indeed show off the power of the whole idea.

Re: GeoRSS and Antiquities

Author: Sean

I'm hoping that Google buys into the Atom Publishing Protocol and its specification for paging through large collections.

Re: GeoRSS and Antiquities

Author: Gregor J. Rothfuss

nice :) although as a non-expert, i would love to see pictures of the actual coins in the info window. sort of a treasure map.

Re: GeoRSS and Antiquities

Author: Dan

Hey chaps I see you've commented on my dev feeds. I'm refining the work daily, the site isn't really live yet. Give it a month and it will be much more useful..... Coins do appear as pictures on the overlays I'm building (37k so far, another 20k images to add.) I'm doing it by myself, so it is a bit of a drag.... The locations are also obfuscated slightly as well! Dan

Re: GeoRSS and Antiquities

Author: Sean

I apologize if the attention was unwanted, Dan. I just found it irresistibly cool.

Re: GeoRSS and Antiquities

Author: Dan

Sean Attention is great! The database is starting to take shape. I've now done some more coding and custom feeds can be produced in various formats. Try this one out: http://maps.google.co.uk/maps?f=q&hl=en&q=http://www.finds.org.uk/CCI/functions/georss.php?geography1=BLA I've added image thumbnails in to the point data as suggested by Gregor. Dan PS Keep up the good work at Pleiades....

Yahoo and GeoRSS

Author: Mark Fredrickson

Looks like Yahoo is on the GeoRSS band wagon too with their TagMaps API While they don't quite follow the WFS BBOX style of bounding boxes, it's still looks promising.

Toward a Better Python Feature API

Previously, I asserted that the Python Cartographic Library feature API was superior to anything generated trivially from C++ code (even excellent C++) by SWIG. Of course, even PCL's API can be improved. I've been inspired by Django's database abstraction API to experiment with something even easier to use. Friday night I hacked on PCL's GeoRSS module, and tied up loose ends this afternoon. See branches/PCL-newfeatureapi/PCL-GeoRSS.

Feature sources are absent from the new API. A feature type class has a query manager attribute, features, and methods of this object provide iterators over features. For example, let's find items of a GeoRSS in a region of interest using a bounding box filter:

>>> url = 'http://pleiades.stoa.org/places/settlement.atom'
>>> store = GeoRSSFeatureStore(url)
>>> Entry = store.featuretype('entry')
>>> Entry.features.count
230

# Filter for intersection with a
# (29dE, 36dN, 30dE, 37dN) bounding box

>>> features = [f for f in Entry.features.filter(
...                        bbox=(29.0, 36.0, 30.0, 37.0)
...                        )
...            ]
>>> len(features)
62

# Inspect the first feature

>>> f = features[0]
>>> f.id
'http://pleiades.stoa.org/places/638749'
>>> f.properties.title
u'Antiphellos/Habesos'
>>> f.properties.the_geom.toWKT()
'POINT (29.6370000000000005 36.1931999999999974)'
>>> type(f.context)
<class 'feedparser.FeedParserDict'>

A GeoRSS feature's context is a reference to that item's parsed data structure. Everything feedparser can glean about the item (and that's nearly everything) is thereby available to a programmer.

Here's an example, using a better feed, of using a Python filter expression to obtain an iterator over only certain items tagged "web":

>>> url = 'http://sgillies.net/blog/feeds/entries/'
>>> store = GeoRSSFeatureStore(url)
>>> Entry = store.featuretype('entry')
>>> Entry.features.count
31

>>> features = [f for f in Entry.features.filter(
...                        properties="'web' in f.tags"
...                        )
...            ]
>>> len(features)
12

>>> f = features[0]
>>> f.id
'http://sgillies.net/blog/entries/412'
>>> f.properties.title
u'GeoRSS and Validation'
>>> f.properties.tags
[u'web']
>>> f.properties.the_geom.toWKT()
'POINT (-105.0958300000000065 40.5869900000000001)'

# More detail about the tags, via the feature context

>>> tag = f.context.tags[0]
>>> tag.term
u'web'
>>> tag.scheme
u'http://sgillies.net/blog/categories/'
>>> tag.label
u'Web'

That's dirt simple. Following the Django lead, creating and saving new features ought to be as straightforward as:

>>> new_georss_entry = Entry(title=u'GeoRSS Everywhere', ...)
>>> new_georss_entry.save()
>>> Entries.features.count
32

GeoRSS and Validation

Sam Ruby finds GeoRSS and finds it a bit confusing. My blog entries feed validates perfectly, unlike the slashgeo feed Google references.

Update: Simon Willison finds GeoRSS too. When Google adopts anything, the effect is huge.

Comments

Validates perfectly?

Author: Sam Ruby

Do you mean this feed?

In particular, note the extra slash added to the declaration of the georss namespace.

Re: GeoRSS and Validation

Author: Sean

Sorry, I accidentally typed "comments" instead of "entries". Updated above. I'm fixing my comments feed now. I've pointed the GeoRSS principals to your blog entry, and they should be able to provide more interesting and valid feeds.

Re: GeoRSS and Validation

Author: Sam Ruby

Thanks for fixing your feed. Hopefully the Feed Validator will do its part to encourage consistency, which in turn should help fuel adoption.

Re: GeoRSS and Validation

Author: Simon Willison

I've been playing around with GeoRSS for a while - but it's really neat finally being able to paste a GeoRSS URL in to a form and have it display in a sensible way.

Re: GeoRSS and Validation

Author: Christopher Schmidt

Though I'm sure Simon won't read back this far, I would like to point out that -- at least for points -- there has been an ability to paste a GeoRSS URL into a form and have it display in a sensible way for months, via OpenLayers. Just go to the OpenLayers GeoRSS page and drop your URL in.

Re: GeoRSS and Validation

Author: Alexandre Leroux

Hi Sean, You're right. Slashgeo's GeoRSS feed plugin is still in development. We're fixing these issues right now. Thanks!