RESTful Feature APIs

Update (2007-04-10): a related demo.

Update: added examples of query results.

My previous post about WxS, RPC, and REST raised a few questions about whether queries fit into a RESTful GIS. The answer is: yes, queries remain indispensible. Indexes are a valuable part of your GIS, and a query API provides web agents access to the indexes.

Consider a very minimal municipal GIS which tracks properties, or parcels. Each parcel has many attributes, and the GIS indexes, at the very least, the following: a unique parcel id, the name of the parcel's owner, and the geospatial footprint of the parcel. These indexes allow a user to efficiently find all properties owned by an individual, or find all properties potentially impacted by construction along a particular path.

In a RESTful GIS, each parcel is a resource, and has a URL like:

http://example.org/parcels/[id]

Dereferencing the URL http://example.org/parcels/1 returns a representation, JSON in this case:

{ "id":       "1",
  "owner":    "Homer Simpson",
  "geometry": {
    "type":        "Polygon",
    "coordinates": [[...], ...] },
  ...
}

The "parcels" feature type is itself a resource. A useful representation of this resource would be a collection that includes URIs for, and data about, individual parcel resources -- all marshalled directly out of the GIS's indexes:

{ "parcels": [
  { "id": 1, "uri": "http://example.org/parcels/1",
    "owner": "Homer Simpson",
    "bbox": [1000.0, 1000.0, 1001.0, 1001.0] },
  { "id": 2, "uri": "http://example.org/parcels/2",
    "owner": "Ned Flanders",
    "bbox": [1001.0, 1001.0, 1002.0, 1002.0] },
  ...
  ]
}

That collection might easily include the precise footprints of properties, but we'll simply consider bounding boxes here.

A query API should return criteria-based subsets of that collection, leveraging the system's indexes. Which properties are going to be condemned to make way for the new monorail?:

GET /parcels/?bbox=0,0,2000,2000

The answer is: the parcels with URIs http://example.org/parcels/42 and http://example.org/parcels/83:

{ "parcels": [
  { "id": 42, "uri": "http://example.org/parcels/42",
    "owner": "Moe Szyslak",
    "bbox": [...] },
  { "id": 83, "uri": "http://example.org/parcels/83",
    "owner": "Kwik-E-Mart Corporation",
    "bbox": [...] }
  ]
}

Which properties are suffering catastrophic loss in value?:

GET /parcels/?adjacent(owner(Simpson))

The answer is: the parcel with URI http://example.org/parcels/2:

{ "parcels": [
  { "id": 2, "uri": "http://example.org/parcels/2",
    "owner": "Ned Flanders",
    "bbox": [...] }
  ]
}

The specific query parameters or URL templates to use are an implementation detail that I won't get into here (OpenSearch seems promising).

The gist of all this is that a RESTful feature query returns key, indexed data about feature resources along with a URI to the feature resources themselves in the same way that a Google Search returns data from its index, with links, instead of dumping the entire Web into your browser.

Comments

Re: RESTful Feature APIs

Author: Paul Ramsey

What happens when I have 2000000 parcels? Unlike web pages, "databasey" resources don't get automatically scaled to be of reasonable size. Do I end up with a system where I only want people to talk to the data via the query mechanism, because anything else would be too clumsy? At that point, who cares that I have resources?

Re: RESTful Feature APIs

Author: Sean

Paul, I don't understand what you mean by automatic scaling, and I don't understand quite where your concern about large quantities comes from. Are you talking about querying against millions of parcels, or about rendering millions of parcels into an image?

Re: RESTful Feature APIs

Author: Paul Ramsey

I mean, when I have 2M parcels, doing GET /parcels no longer returns me something remotely useful -- it will be too big, too slow, or both. I am forced to use the query API to do any useful action with the data.

Re: RESTful Feature APIs

Author: Christopher Schmidt

Paul: I don't see a URL of /parcels/ on its own anywhere in this post. There's no need to get a list of everything: when you want a larger-than-one subset of the data, you do a query via the query mechanism. The point is that each parcel has *a* URL: so when I query ?bbox=0,0,10,10, I get a list of parcels back, which I can always address in the future to get all the information about *a* feature back. So the answer to your question is probably "yes": You always find the list of parcels you're interested in via the query mechanism, but once you have it, you can put it anywhere else you want. At least, that's what I understand. I don't have any GIS data to speak of. :)

Re: RESTful Feature APIs

Author: Jason Birch

I actually had the same question, and picked it up from this quote: " The 'parcels' feature type is itself a resource. A useful representation of this resource would be a collection that includes URIs for, and data about, individual parcel resources " As much as I agree that it would ultimately be most useful if the parcels resource returned a complete list (it kinda reduces the discoverability of the resources if you don't) I can't see this working for me even with a small volume of parcels (35,000). In my playing around, I think I'm going to do is return an HTML formatted page, with OpenSearch links, and also with form elements describing all of the search APIs. The only restriction with the OpenSearch links that I can see is that it assumes that you apply a different URI for each content-type that you return rather than using an appropriate "Accepts" header. The only workaround that I can see is hacking an "&force_type=application/x-json" (or whatever the correct content type is) parameter at the end of the string. This seems a bit RESTless though... I guess if this is combined with proper header sniffing for intelligent clients, it's acceptable though? I think that with this strategy it would be relatively easy to provide JSON, GeoRSS, and HTML (microformat too) versions of the individual resources. I think I'll also include alternate links in the html representations, pointing to the JSON and GeoRSS representations, and maybe also to an image/png representation for a quickie map of the parcel. Hmm. For the JSON representation and search results, what kind of representation would work best for GeoJSON? EWKT? Jason

Re: RESTful Feature APIs

Author: Sean

You're right that the HTML representation of a feature type shouldn't be a list of thousands or millions of links, but if you want to get in Google's spatial index of the Web you will need a representation that does list everything. In my case, a KML variant serves this purpose, and the default HTML is simply a list of links to subsets of the full listing. The JSON content type is application/json, and consensus is building around geometry coordinates expressed as arrays (or arrays of arrays) of numeric values instead of WKT. I don't recommend getting too wound up in content negotiation until we have user agents that actually show a preference. Google Earth, for example, doesn't give KML a higher q value than it gives HTML.

Paging anyone?

Author: Mark Fredrickson

Why not just page the results? This is the technique that keeps HTML pages to manageable size. Provide a URI for the next page of data as part of the collection, and client systems should be able to pull down the next page without too many problems. That's the great thing about URIs in a RESTful implementation - they can be used in countless ways because their well understood. Of course, this is in the abstract, but I think it's a good place to start looking.

Re: RESTful Feature APIs

Author: Mark Fredrickson

As an example of paging, take a look at the ATOM spec (the canonical REST reference implementation): http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-14.html#partial-lists

Re: RESTful Feature APIs

Author: Sean

Yeah, that's what I've been looking at too.

Improving MapServer: a Specific Example

Earlier this month I made some hand-waving arguments for separating the concerns of MapServer's web application and cartographic engine on general principles. There is now a new MapServer development proposal that allows me to make a specific example.

It's been proposed that the MapServer web app could generate vanilla XML responses to GetFeature, GetFeatureInfo, and other WxS requests, and user-configured XSLT would provide a finishing touch to the responses. It's a nice idea. The early implementation plan was to load up existing configuration parameters with additional meaning. Stylesheet as metadata or something like that. Actually, specification of a XSLT stylesheet for transforming WxS responses has nothing to do with cartography, and nothing to do with map or layer metadata. It is solely a concern of mapserv, the web application. Therefore, let's configure it separately from the Map section of a mapfile. Transformation of a WFS GetFeature response might be done like this:

# The traditional cartographic config section
Map
  # All the usual map layers, etc.
  ...
End

# *NEW* application config section
Application
WFS
  GetFeature
    XSLT       On
    Stylesheet "/mapserver/xslt/example.xsl"
  End
End

That's declarative, orthogonal, and crystal clear.

GeoDjango

I met some of the GeoDjango folks this weekend, and am looking forward to collaborating with them. The Python Cartographic Library is not just for Zope3, after all.

Comments

Re: GeoDjango

Author: Allan

And I was just ogling TurboGears... hmmm... time for anther look at Django.

GeoRSS and Antiquities

GeoRSS evangelism is part of what I do for Pleiades. The new support for GeoRSS in Google Maps is the spark that will set it off in the digital humanities. For example, check out this feed of Celtic coin finds from the British Museum's Celtic Coin Index:

http://www.finds.org.uk/CCI/functions/atom-export-all.php?denomination=Stater

Feed on the map

Comments

Re: GeoRSS and Antiquities

Author: Allan

Google Maps says: "Parts of http://www.finds.org.uk/CCI... could not be displayed because it is too large." There are 1309 items in the feed. I think this illustrates the difference between what's essentially a database expressed in GeoRSS and a "live" feed. A live feed might only have the 10 or so latest entries. That's not a problem for GeoRSS, but rather something to think about when designing feeds. As people start learning about GeoRSS, a set of conventions could arise to deal with this. Nevertheless, this is a great example and does indeed show off the power of the whole idea.

Re: GeoRSS and Antiquities

Author: Sean

I'm hoping that Google buys into the Atom Publishing Protocol and its specification for paging through large collections.

Re: GeoRSS and Antiquities

Author: Gregor J. Rothfuss

nice :) although as a non-expert, i would love to see pictures of the actual coins in the info window. sort of a treasure map.

Re: GeoRSS and Antiquities

Author: Dan

Hey chaps I see you've commented on my dev feeds. I'm refining the work daily, the site isn't really live yet. Give it a month and it will be much more useful..... Coins do appear as pictures on the overlays I'm building (37k so far, another 20k images to add.) I'm doing it by myself, so it is a bit of a drag.... The locations are also obfuscated slightly as well! Dan

Re: GeoRSS and Antiquities

Author: Sean

I apologize if the attention was unwanted, Dan. I just found it irresistibly cool.

Re: GeoRSS and Antiquities

Author: Dan

Sean Attention is great! The database is starting to take shape. I've now done some more coding and custom feeds can be produced in various formats. Try this one out: http://maps.google.co.uk/maps?f=q&hl=en&q=http://www.finds.org.uk/CCI/functions/georss.php?geography1=BLA I've added image thumbnails in to the point data as suggested by Gregor. Dan PS Keep up the good work at Pleiades....

Yahoo and GeoRSS

Author: Mark Fredrickson

Looks like Yahoo is on the GeoRSS band wagon too with their TagMaps API While they don't quite follow the WFS BBOX style of bounding boxes, it's still looks promising.

Toward a Better Python Feature API

Previously, I asserted that the Python Cartographic Library feature API was superior to anything generated trivially from C++ code (even excellent C++) by SWIG. Of course, even PCL's API can be improved. I've been inspired by Django's database abstraction API to experiment with something even easier to use. Friday night I hacked on PCL's GeoRSS module, and tied up loose ends this afternoon. See branches/PCL-newfeatureapi/PCL-GeoRSS.

Feature sources are absent from the new API. A feature type class has a query manager attribute, features, and methods of this object provide iterators over features. For example, let's find items of a GeoRSS in a region of interest using a bounding box filter:

>>> url = 'http://pleiades.stoa.org/places/settlement.atom'
>>> store = GeoRSSFeatureStore(url)
>>> Entry = store.featuretype('entry')
>>> Entry.features.count
230

# Filter for intersection with a
# (29dE, 36dN, 30dE, 37dN) bounding box

>>> features = [f for f in Entry.features.filter(
...                        bbox=(29.0, 36.0, 30.0, 37.0)
...                        )
...            ]
>>> len(features)
62

# Inspect the first feature

>>> f = features[0]
>>> f.id
'http://pleiades.stoa.org/places/638749'
>>> f.properties.title
u'Antiphellos/Habesos'
>>> f.properties.the_geom.toWKT()
'POINT (29.6370000000000005 36.1931999999999974)'
>>> type(f.context)
<class 'feedparser.FeedParserDict'>

A GeoRSS feature's context is a reference to that item's parsed data structure. Everything feedparser can glean about the item (and that's nearly everything) is thereby available to a programmer.

Here's an example, using a better feed, of using a Python filter expression to obtain an iterator over only certain items tagged "web":

>>> url = 'http://sgillies.net/blog/feeds/entries/'
>>> store = GeoRSSFeatureStore(url)
>>> Entry = store.featuretype('entry')
>>> Entry.features.count
31

>>> features = [f for f in Entry.features.filter(
...                        properties="'web' in f.tags"
...                        )
...            ]
>>> len(features)
12

>>> f = features[0]
>>> f.id
'http://sgillies.net/blog/entries/412'
>>> f.properties.title
u'GeoRSS and Validation'
>>> f.properties.tags
[u'web']
>>> f.properties.the_geom.toWKT()
'POINT (-105.0958300000000065 40.5869900000000001)'

# More detail about the tags, via the feature context

>>> tag = f.context.tags[0]
>>> tag.term
u'web'
>>> tag.scheme
u'http://sgillies.net/blog/categories/'
>>> tag.label
u'Web'

That's dirt simple. Following the Django lead, creating and saving new features ought to be as straightforward as:

>>> new_georss_entry = Entry(title=u'GeoRSS Everywhere', ...)
>>> new_georss_entry.save()
>>> Entries.features.count
32

GeoRSS and Validation

Sam Ruby finds GeoRSS and finds it a bit confusing. My blog entries feed validates perfectly, unlike the slashgeo feed Google references.

Update: Simon Willison finds GeoRSS too. When Google adopts anything, the effect is huge.

Comments

Validates perfectly?

Author: Sam Ruby

Do you mean this feed?

In particular, note the extra slash added to the declaration of the georss namespace.

Re: GeoRSS and Validation

Author: Sean

Sorry, I accidentally typed "comments" instead of "entries". Updated above. I'm fixing my comments feed now. I've pointed the GeoRSS principals to your blog entry, and they should be able to provide more interesting and valid feeds.

Re: GeoRSS and Validation

Author: Sam Ruby

Thanks for fixing your feed. Hopefully the Feed Validator will do its part to encourage consistency, which in turn should help fuel adoption.

Re: GeoRSS and Validation

Author: Simon Willison

I've been playing around with GeoRSS for a while - but it's really neat finally being able to paste a GeoRSS URL in to a form and have it display in a sensible way.

Re: GeoRSS and Validation

Author: Christopher Schmidt

Though I'm sure Simon won't read back this far, I would like to point out that -- at least for points -- there has been an ability to paste a GeoRSS URL into a form and have it display in a sensible way for months, via OpenLayers. Just go to the OpenLayers GeoRSS page and drop your URL in.

Re: GeoRSS and Validation

Author: Alexandre Leroux

Hi Sean, You're right. Slashgeo's GeoRSS feed plugin is still in development. We're fixing these issues right now. Thanks!

Irrelevant

I saw this Linus Torvalds quote (full interview here) in the OpenGeoData blog:

Me, I just don't care about proprietary software. It's not "evil" or "immoral," it just doesn't matter. I think that Open Source can do better, and I'm willing to put my money where my mouth is by working on Open Source, but it's not a crusade -- it's just a superior way of working together and generating code.

It's superior because it's a lot more fun and because it makes cooperation much easier (no silly NDA's or artificial barriers to innovation like in a proprietary setting), and I think Open Source is the right thing to do the same way I believe science is better than alchemy. Like science, Open Source allows people to build on a solid base of previous knowledge, without some silly hiding.

But I don't think you need to think that alchemy is "evil." It's just pointless because you can obviously never do as well in a closed environment as you can with open scientific methods.

Exactly right.

Comments

sticking up for alchemy

Author: Brian Timoney

Somewhere the ghost of Isaac Newton is extremely pissed off at Linus' dissing of alchemy.... BT

Re: Irrelevant

Author: Sean

The ghost of Isaac Newton is too busy experimenting with quantum gravity to read blogs.

Re: Irrelevant

Author: Andy

If it is a superior way to develop software why is the Linux desktop light years behind Windows and Mac? For something to be truly superior it must be superior to what it's competitors are currently offering. Linux isn't superior to Windows from the only perspective that really matters and that is the end user perspective. It isn't superior to Mac OS X from an end user perspective either. So how is this a superior way to develop an OS? There are some superior Open Source alternatives out there such as Mapserver, Apache, Firefox, and PostgreSql, but by and large proprietary software and systems lead in just about every metric. CAD desktop, GIS Desktop, Desktop documents, inter application communication and automation, fonts, UIs, general usability, SCADA systems, GPS Systems, Topo mapping software, routing software, ..... I could go on for a long list of what proprietary software has that is better than it's open source counter parts. I work on Open source projects, I support it financially as well because I believe in the underlying concept that information should be freely shared to improve the society in which we live in. I don't support it because I believe it currently creates better software at the same pace as proprietary software companies can. Developers have to eat and support their families. Until Open source can figure out how to coordinate large projects across hundreds of developers in a timely fashion, and pay them all good wages, Open Source software won't out pace the rate at which proprietary software puts out better solutions. The most successful Open Source projects have one person or a very small team of Core developers working in close communication towards a common goal. When projects get larger than that they fall behind their proprietary competitors or they fall apart completely. In the end Open Source the way it is currently done is a form of Communism and history has shown that Communism doesn't work on a large scale no matter how noble it's ideals are. Communism works in small groups like tribes, kibbutz's, etc. but it doesn't scale to a nation level. History has born this out many times over. Incentive based systems such as capitalism work better on a large scale by far than Communism does. Open Source today seems to be the same way. It works very well in small core groups and can produce outstanding results but when it tries to scale to encompass huge projects it falls apart. I believe the way to change this is to revamp the way Open Source is run so that it is no longer run in a communist fashion. The way to change this would be to have companies that develop software open up their source and take input on that source from the larger community while still paying their developers and generating revenue as a company from sales of their software and from the maintenance of it. Then you would have an incentive based system that still shared it's knowledge freely. This is the way AT&T Bell Labs did many of it's projects and it worked very well at the time. PGP also works this way. I think it could work on a large scale but I may never find out unless their is a radical shift in the way proprietary software companies start working. It is sort of a catch 22, the key to making Open Source really work is in the hands of those companies that fear it the most. If we can get them to change their mindset then I think the sky would be the limit and Open Source in it's new form would be a truly superior way to develop software. In it's current incarnation I don't believe Open Source is the best way to develop software on a large scale but we can change this over time and until we do get it changed it is still worth supporting because knowledge should be free and we should work to make our society better for everyone not just the ones who can afford it.

Re: Irrelevant

Author: Sean

There will eventually be excellent open source alternatives in every software category you listed. We're just getting started.

Re: Irrelevant

Author: Paul Ramsey

The Eclipse IDE is currently the best desktop integrated development environment (well, maybe Visual Studio is better, but regardless we are talking about a very close race). How can this be? No one is selling it. But there are lots of traditionally paid developers working for big companies working on it. Lots of different big companies too. The "communist" label is just a big red (ha ha) herring designed to rattle Americans who have not gotten over the propaganda surrounding their previous Official Enemy. Open source seems to flourish once a significant part of the marketplace decides that a particular piece of functionality is no longer useful as a product differentiator. Server operating systems (Linux), IDE/application frameworks (Eclipse), scripting languages (Python/Perl/PHP/Ruby/etc), web servers (Apache). Desktop operating system interfaces have innovated enough in the last five years (thanks, Apple!) to keep marginally ahead of their open source followers, but if they slow down for too long, they too will feel pain as "good enough" and free alternatives catch up. Oracle is vacating the database market as fast as it can, and moving into areas where it can offer real value, like business intelligence and CRM -- they see the writing on that particular wall. There will always be a place for proprietary software in the niches, but this is a very long game, and the onus is on the proprietary companies to continuously improve their products to stay ahead of the game -- to deliver real value for money (like Apple does with OS/X). The days of locking down a customer base and charging monopoly rents ad infinitum are over.

Re: Irrelevant

Author: Andy

"The days of locking down a customer base and charging monopoly rents ad infinitum are over." This I agree with completely.

Re: Irrelevant

Author: Dave Smith

I wouldn't characterize either as superior. Each has its' pros and cons. Certainly the collaborative aspect and low cost of open source makes it fun, accessible and provides a great deal of value and sustainability. However, end users have little control over QA problems and patches, little control on enhancements, have limited means of support, and either have to roll their sleeves up and fix/modify the product themselves or wait for someone else to deal with it. That is fine if your project has budget, capability and resources for scratch-building things, but otherwise for production end-users, it causes some concern and risk. On the other hand, some (but certainly not all) of that is alleviated with COTS products, however here you are stuck with proprietary code, formats, APIs, high cost and a host of other issues. In the long run, however, these things tend to follow a cycle of commoditization - where a piece of technology becomes less unique and more ubiquitous, and is relegated to a state of commoditization, at which point proprietary pieces become irrelevant as there are at that point many Open Source pieces which have evolved as stable and low-risk, to push the proprietary aside. At this point, the proprietary needs to turn to modularization and cutting loose the commoditized pieces to turn its efforts to other pursuits. It's a dynamic and continually-emergent process.

More ArcGIS and JSON

Again, found this from Jithen Singh while stalking keywords. Still no details about whether it's for geospatial features or other application data.

There is a fair bit of noise in the Technorati search feeds, but no more so than on Planet Geospatial. Maybe even less.

Update: now there are details from Matt Priour.

Comments

Re: More ArcGIS and JSON

Author: Brian Flood

AGS does not currently support REST and they are only *considering* adding it to 9.3. I for one hope they do and made sure to tell as many ESRI folks as I could about it. the more chatter they hear, the more they will consider it. I think Matt's (excellent) post above is essentially rolling his own JSON support which is definitely possible right now. I would just love it if ESRI baked it in by default, IMO this would open AGS as a platform to many more frontends (openlayers, gmaps, ve, etc) and position it as a callable service instead of just the monolithic system they promote now (e.g. all ESRI, all the time) cheers brian

Re: More ArcGIS and JSON

Author: Matt Priour

According to Rex & Art at ESRI, the ArcGIS Server Web ADF should emit & consume JSON at 9.3 . This is a separate issue from REST support. Brian is correct that my above referenced post is purely a DIY thing. This has nothing directly to do with ArcGIS Server or its JSON support. I am merely demostrating how to take your own data which in not inherently spatial but is geo-referenced in some way and emitting it as a serialized object which ties the geo-reference to the data. From here you now have an object that you can use in a number of web-mapping platforms. I plan on demostrating ArcWeb Explorer, Google Maps, Yahoo Maps, & Virtual Earth.

Re: More ArcGIS and JSON

Author: Brian Flood

hey matt yea, I can see future conversations now: "Hey, does AGS support REST? sure does, the ADF emits/consumes JSON." and some helpless developer reports that all is well in ESRI land. FUD reigns true as an implementation detail of the ADF is mistaken for easy interoperability. AGS 9.3 *might* have REST endpoints, after talking to several developers there, I did not walk away thinking it's a done deal. I do hope they add it in and I certainly hope it's not part of the ADF. cheers brian

Re: More ArcGIS and JSON

Author: Sean

I'm skeptical about JSON interoperability. Rolling your own JSON for use in your own application, as Matt and I are doing, is fine. If you need interoperability and extensibility, use XML. But I reserve the right to change my mind on this if anything compelling comes out of the Geo-JSON working group ;) RESTful ArcGIS? Non-SOAP services, sure, but I can't imagine anything more than that.