Busting RESTful GIS myths

I'm going to use the announcement of Nanaimo's "authentic Web" GIS as an occasion to debunk some myths about REST and the Web, and their fitness for designing alternatives to the OGC's service architecture, that surfaced on Twitter last week.

Myth: RESTful Web services aren't based on standards.

Indeed, there are APIs on the programmable web touting "REST" which are very unlike each other. Not all of them are even RESTful when you get right down to it. They come from different and varying domains. It's understandable that a quick glance leaves some with the mistaken impression that interoperability is not a property of RESTful Web services.

First, and this can't be said enough, because it still isn't really understood in the GIS community: REST is a particularly constrained style of architecture which just so happens to be the style of the World Wide Web. It is not a standard, but needs and shapes standards. Interoperability depends on standards, whether your architecture is RESTful or not. Paul Prescod, who I'm quoting often these days, enumerates the necessary kinds of standards:

In application-level networking, there are three basic things to be standardized

  • Addressing -- how do we locate objects

  • Methods or Verbs -- what can we ask objects to do

  • Message payloads or Nouns -- what data can we pass to the objects to ask them to accomplish their goals

For RESTful Web services, the first two are standardized by HTTP/1.1:

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

HTTP has been in use by the World-Wide Web global information initiative since 1990. This specification defines the protocol referred to as "HTTP/1.1", and is an update to RFC 2068.

You wanted standards? Even OGC service specifications acknowledge HTTP/1.1 (though not without misusing it, more on that in a future post). And HTTP/1.1 has been shaped by REST.

As Prescod points out, RESTful Web services push all interoperability problems into the third standardization category: message payload. This is a conscious decision. Interoperability may not be perfectly solvable, but you can isolate its problems, and RESTful services should do so. Do read all of Prescod's article on standardization. We spend way too much time talking about the "which" of standards in GIS without really thinking about the "what".

Myth: RESTful Web services aren't "lights out" accessible.

In fact, a properly RESTful service has better accessibility than an OGC service. To use the same analogy, what if you drop your special OGC service client in the dark and can't find it? How do you access your OGC service? Pardon me, but you're screwed: standing knee-deep in other web programming tools, and none of them can make sense of the OGC's unique addressing schemes and unique methods of interaction. With a RESTful service you can poke at it in a standardized way (HTTP/1.1 again) with curl, or XHR, or whatever, and get an actionable description. Of course, you're a GIS geek, and you keep your OGC client firmly attached to a reeling key chain, but consider the other agents on the Web that don't have an OGC service client at all. The OGC's special addressing scheme and methods make it very hard for those agents to get even a partial understanding of the service.

Myth: REST is too immature for GIS.

In 2000, Roy Fielding wrote:

Since 1994, the REST architectural style has been used to guide the design and development of the architecture for the modern Web. This chapter describes the experience and lessons learned from applying REST while authoring the Internet standards for the Hypertext Transfer Protocol (HTTP) and Uniform Resource Identifiers (URI), the two specifications that define the generic interface used by all component interactions on the Web, as well as from the deployment of these technologies in the form of the libwww-perl client library, the Apache HTTP Server Project, and other implementations of the protocol standards.

Web architecture has been done in the REST style since 1994. HTTP/1.1 began to be adopted in 1997. The Web itself goes back to 1990. The OGC's service architecture is not more mature than this.

Myth: RESTful Web services are fine for small solutions, not for large, inter-agency solutions.

The World Wide Web is our largest network application. It's global. Interoperability is promoted through the use of common hypertext formats. One would think that geographic information systems using the same architecture, in the same style, could scale equally well, right? I think the onus is instead on the naysayers to make the case that their favored architectures can scale like the Web and bridge domains like the Web.

Comments

Re: Busting RESTful GIS myths

Author: Paul Ramsey

Some of the "rest isn't ready" myths, related to the "opengis is enterprise ready" one, come from the lack of a complete bundling of restful approaches into a documented implementation profile.

Starting from the premise that "I want to write one client that can read/write features from multiple sources", the OGC answer is simple: use the WFS document, that describes your protocol, and the GML and Filter documents describe your encoding.

The bits that are missing from a unified "REST feature server" specification aren't large... there's lots of encodings, and they can be re-used from the OGC stuff. But there's no specification that would allow us to put a server team and client team in separate rooms and allow them to come out with two pieces of software that talked to one another using REST principles.

Re: Busting RESTful GIS myths

Author: Dave Smith

In recent exhanges via Twitter, I'm afraid the 140char limitation does not always lend itself well to being able to adequately convey meaning, and as such, you may have misinterpreted the points I was trying to make.

I have no problem with REST. Whatsoever. I wholly agree that REST has standards, that it is mature and robust, and that it can and no doubt will serve large, disparate enterprise and cross-agency applications.

My problem is specifically with regard to how *geospatial* data and processing are handled *within* REST, not with REST itself. Currently it's that implementation piece that leaves many questions unanswered. Certainly there are many pieces and parts already existing in the geo world and elsewhere which may lend themselves to handling things like how to handle datums and projections, e.g. EPSG and WKT, how to handle bounding boxes, styling, filtering, temporal slices, and the like, but as of yet, there is not yet any consistent way to do so within geospatial REST implementations - currently there are many disparate approaches to how these are treated in RESTful implementations.

And that's where the challenge in consistent discovery, access and use of geospatial data and processing comes into play in a consistent fashion.

Perhaps the answer may be a standard way of structuring RESTful geo assets, or perhaps it may be a means of conveying to a calling application how to access its capabilities, what is and what isn't supported, or some combination thereof, ala OGC getCapabilities. These are things that are currently implemented and supported in the OGC world, which provide that kind of "lights-out" cross-agency facility of integration.

Re: Busting RESTful GIS myths

Author: Martin Davis

I really want to like REST. I have no particular love for the schema-heavy OGC standards, or the even more complex (and different!) world of SOAP. But for all the apparent simplicity of the REST approach, I just don't see how it provides the required level of self-description required for true discovery and inter-operability.

For instance, I don't think you have shown how REST addresses "Application-Level Network Requirement #3 - Message Payloads or Nouns". Where is the meta-schema that describes the allowable syntax of URL and message payloads?

Also, I think there should be a 4th requirement: "Response Payloads - what we expect an object to tell us". And this also needs a meta-schema to describe its syntax.

So basically I disagree with the statement that REST allows a higher degree of "lights-out" interoperability than the OGC standards. If you know that an endpoint is OGC-W*S-compliant, you can trace through it in a automated fashion and discover exactly what syntax is allowed for all requests and responses. AFAIK the same is NOT true for REST.

And this probably all comes back to your opening point - REST is an architecture, not a standard. And that's fine - having architectural patterns is a good thing. But for REST to be anything more than a debating point, it needs to have some clear standards defined around it.

Re: Busting RESTful GIS myths

Author: Sean

Martin, URLs are opaque, navigability concerns are pushed onto the formats plate (#3) and are dealt with using standard hypertext formats like Atom, KML, or GML (xlink). That's the strategy, and I'll concede that it takes some getting to know.

AtomPub is a nice model for the structured protocols we GIS people crave. An AtomPub service document looks a bit like an OGC service capabilities document, but doesn't have to concern itself with resources methods because all resources have the same methods. All interop issues are dealt with as format issues, and so the service doc merely specifies the allowable content types for collections (think layers or feature types).

KML hints at more organic RESTful GIS design.

Re: Busting RESTful GIS myths

Author: Jason Birch

I personally believe that what we really need are some good solid RESTful GIS implementations (like ESRI's and hopefully MapGuide's) to flush out problems and limitations, long before even thinking about standards.

If practice determines that standards are required to make REST accessible to GIS clients, then they should come after this initial proving stage, should only specify what is absolutely required, and should refer to general internet standards where possible rather than writing a GIS one-off. Codifying things that don't need to be codified--such as the link relationship restrictions in the OGC KML spec--may make application developers' lives easier, but they also cripple the power of the architecture.

As an example of a problem: when working a way to present the HTML representation of the Nanaimo data, the thing that we found most difficult was making our query capabilities discoverable. OpenSearch goes part of the way, as do standard HTML forms, but neither have the ability to describe complex multi-term attribute and spatial query capabilities. Without this layer, the query capabilities are not truly discoverable, and all implementations require clients with special knowledge.

Is the best place to fix this with an OGC standard? I don't think so. What I think is really needed is to work to ensure that our needs are met by existing practices such as OpenSearch or URI Template. Then, once these components are in place and only if absolutely required, OGC can publish a profile that says "use this from here, that from there, BBOX means this DE9-IM relationship, etc".

As a side note, I haven't digested URI Template to the level that I'm comfortable that it can describe something like (a=1 or (a=2 and b=3)).

Re: Busting RESTful GIS myths

Author: Martin Davis

Sean,

Ok, URLs are opaque, and they are provided by documents presumably obtained by some previous query. And (per Jason's comment) that query is specified using something like OpenSearch.

What do you mean that "KML hints at a more RESTful design"?

For me it would help to crystallize this to see an implemented, specified example of this technology mix being used for non-trivial spatial query and retrieval. Eg. something that provides equivalent capabilities to WFS (rich query, multiple feature classes, etc), and which is well-specified enough to allow generic tools to be developed to use it. Is there anything like this out there?

@Jason: I agree that's it's nice to have practice drive standards. What I wonder about is where is the centre of gravity that is going to make the various experiments coalesce into a clear, effective standard. You mention ESRI and MapGuide. And then you mention OpenSearch and URI Template. Are ESRI and MapGuide working towards those specifications? Or are they simply doing their own thing?

Another thing that strikes me: most of this discussion is about query and formats. Am I correct in thinking that those are orthogonal to REST? In which case, it seems to me that the focus on REST is distracting attention from tackling the harder issues.

Re: Busting RESTful GIS myths

Author: Martin Davis

Sean,

Have you seen this?

http://geoserver.org/display/GEOSDOC/RESTful+Configuration+API

This seems like a really thorough proposal to access to both GeoServer configuration information and the underlying spatial data.

But I'm confused about something. The proposal seems to fundamentally depend on the URLs being NON-opaque. Does this mean it is not in fact RESTful? Would it be better designed with opaque URLs? And if so, how would this work

Thanks for assisting my efforts to understand REST and its ramifications...

Re: Busting RESTful GIS myths

Author: Sean

Yes, Martin, GeoServer's API seems (I don't have an instance handy to test) to be lacking the hypertext constraint, and therefore isn't technically RESTful. One possible remedy would be to formalize their URI construction rules and serve them up in a discovery doc like the one specified for the OpenSocial API (search for "5. Discovery"):

http://www.opensocial.org/Technical-Resources/opensocial-spec-v081/restful-protocol

Also blogged about at:

http://www.abstractioneer.org/2008/05/xrds-simple-yadis-uri-templates.html

If the discovery doc is treated like a web form, coupling between clients and servers can be minimized, and servers retain the freedom to evolve by changing the container paths in the URI templates.

A GeoServer instance won't have millions of entities, so could probably also use an index doc in every container with explicit links to its children, as in AtomPub.

I said a KML application would be more "organic", placemarks linking to placemarks in an unstructured way.

Re: Busting RESTful GIS myths

Author: Martin Davis

Re the OpenSocial/URI Template stuff - great, this is what I like to see - clear, formal-enough specification documents which clearly lay out the interactions and data formats required to get a service to function!

This is what I'd hope to see to explain how REST can be used as a substitute for W*S. This stuff is just too complicated IMHO to be understandable via blog posts, or even appealing-but-limited examples.

Speaking of which, I get the same feeling from reading these documents that I had watching XML-RPC spiral into the murky depths of SOAP. It started out as real simple back-of-a-napkin concept, but as people started to realize what was required to make things formally-specified, discoverable, toolable, etc. it turned into a tottering edifice of obscure spec documents which only major projects could hope to boil down into code.

I think there might be an irreducible minimum of information required to support heterogeneous distributed communication. The only way to simplify it is to settle on one protocol which is so widespread and so well supported that it becomes a no-brainer to use. The obvious examples are TCP/IP and HTTP. Neither is trivial, but nobody writes their own TCP/IP or HTTP drivers - they use widely available industrial-strength ones.

Re: Busting RESTful GIS myths

Author: Ryan Baumann

I think where a lot of those REST myths come from is somewhere along the line a lot of people started saying they had a "REST API" which was really just RPC with idiosyncratic XML (but not XML-RPC) over HTTP GET (not POST or anything else, though usually with clean, but undiscoverable, URLs). Of course, this is hardly a new observation, I just thought it may be worth pointing out - just because an API calls itself RESTful does not mean it is a RESTful API. Unfortunately, the only real way to combat this is with true RESTful APIs in practice, so that the advantages can be demonstrated (and hopefully illustrate the practical differences with APIs which just claim to be RESTful as well).

Re: Busting RESTful GIS myths

Author: Sean

Thanks, everyone, for the comments. I hope you've appreciated my attempt to make commenting here suck a bit less. A related thread has started on the geo-web-rest google group and you might want to join if you're interested in helping push RESTful GIS forward.

A more perfect union, continued

On to cascaded unions for Shapely ...

>>> from osgeo import ogr
>>> from shapely.wkb import loads
>>> ds = ogr.Open('/Users/seang/data/census/co99_d00.shp')
>>> co99 = ds.GetLayer(0)
>>> counties = []
>>> while 1:
...    f = co99.GetNextFeature()
...    if f is None: break
...    g = f.geometry()
...    counties.append(loads(g.ExportToWkb()))
...
>>> len(counties)
3489

Matplotlib makes a pretty picture of those 3489 polygons:

>>> import pylab
>>> from numpy import asarray
>>> fig = pylab.figure(1, figsize=(5.5, 3.0), dpi=150, facecolor='#f8f8f8')
>>> for co in counties:
...    a = asarray(co.exterior)
...    pylab.plot(a[:,0], a[:,1], aa=True, color='#666666', lw=0.5)
...
>>> pylab.show()
http://farm4.static.flickr.com/3267/3231620059_31f44bc535_o_d.jpg

Shapely never had the power to dissolve adjacent polygons in a collection before, or at least not over large collections of real-world data. GEOS 3.1's cascaded unions are a big help:

>>> from shapely.ops import cascaded_union
>>> u = cascaded_union(counties)
>>> len(u.geoms)
219
>>> for part in u.geoms:
...     a = asarray(part.exterior)
...     pylab.plot(a[:,0], a[:,1], aa=True, color='#666666', lw=0.5)
...
>>> pylab.show()
http://farm4.static.flickr.com/3503/3232469846_61fb247502_o_d.jpg

There's user interest in leveraging the new reentrant API in GEOS 3.1, and releasing the GIL when calling GEOS functions to improve performance in multithreaded apps. I'm all for it.

Efficient batch operations for Shapely

I began exposing some of the new features of GEOS 3.1 in Shapely today. Geometries can be tuned up, or "prepared" ala SQL, for efficient batch operations. For example consider this set of random polygons scattered around (0.5, 0.0) and their intersection with a triangle:

>>> from shapely.geometry import Point, Polygon
>>> from random import random
>>> spots = [Point(random()*2.0-0.5, random()*2.0-1.0).buffer(0.1) for i in xrange(200)]
>>> triangle = Polygon(((0.0, 0.0), (1.0, 1.0), (1.0, -1.0)))
>>> x = [s for s in spots if triangle.intersects(s)]
>>> len(x)
67

Without preparing the triangle for batch intersection (or using an index), it takes about 12 ms to get the 67 intersecting spots:

>>> import timeit
>>> t = timeit.Timer('x = [s for s in spots if triangle.intersects(s)]',
...                  setup='from __main__ import spots, triangle')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
11917.51 usec/pass

A prepared triangle finds the same 67 in just a little more than 1/4 of the time:

>>> from shapely.prepared import prep
>>> pt = prep(triangle)
>>> t = timeit.Timer('x = [s for s in spots if pt.intersects(s)]',
...                  setup='from __main__ import spots, pt')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
3145.49 usec/pass

Update (2009-01-26): let's try an example from the real world. The boundary of Larimer County, Colorado, as represented in 2000 Census data, is a polygon with a 370 vertex exterior ring:

>>> from osgeo import ogr
>>> ds = ogr.Open('/Users/seang/data/census/co99_d00.shp')
>>> counties = ds.GetLayer(0)
>>> co069 = counties.GetFeature(1151)
>>> g = co069.geometry()
>>> larimer = loads(g.ExportToWkt())
>>> # nb: OGR properties and methods usually can't be safely chained
>>> len(larimer.exterior.coords)
370

Scatter 200 spots around the neighborhood of Larimer County:

>>> b = larimer.bounds
>>> w = b[2] - b[0]
>>> h = b[3] - b[1]
>>> cx = b[0] + w/2.0
>>> cy = b[1] + h/2.0
>>> from random import random
>>> from shapely.geometry import Point
>>> def spotter():
...     x = (random()*2.0-1.0)*w + cx
...     y = (random()*2.0-1.0)*h + cy
...     return Point(x, y).buffer(0.05)
...
>>> spots = [spotter() for i in xrange(200)]
>>> x = [s for s in spots if larimer.intersects(s)]
>>> len(x)
48

48 spots intersect the county, and it takes about 22 ms to find them using the unoptimized method:

>>> import timeit
>>> t = timeit.Timer('x = [s for s in spots if larimer.intersects(s)]',
...                  setup='from __main__ import spots, larimer')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
21728.65 usec/pass

It only takes about 1.5 ms using a prepared geometry, a speed-up of about 14X:

>>> from shapely.prepared import prep
>>> pt = prep(larimer)
>>> t = timeit.Timer('x = [s for s in spots if pt.intersects(s)]',
...                  setup='from __main__ import spots, pt')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
1570.41 usec/pass

Comments

Re: Efficient batch operations for Shapely

Author: Paul Ramsey

I've amazed that preparing a triangle makes a damn bit of difference. The difference is probably in the extra short-circuit tests in the prepared operation rather than the indexed shape (since a triangle has so few edges to index). Try this with a more complex polygon to really see the differences. It's the difference between N and log(N) on the number of edges, so increasing the polygon complexity is where things will soar.

Re: Efficient batch operations for Shapely

Author: Sean

I was a bit surprised too. See the update for an example of the kind of speed-up you were thinking of.

Re: Efficient batch operations for Shapely

Author: Paul Ramsey

That makes a nice difference !

In order to form a more perfect union

Paul Ramsey on GEOS 3.1 and PostGIS 1.4 improvements:

Here's a less contrived result, the 3141 counties in the United States. Using the old ST_Union(), the union takes 42 seconds. Using the new ST_Union() (coming in PostGIS 1.4.0) the union takes 3.7 seconds.

Now that's change we can believe in.

Comments

Re: In order to form a more perfect union

Author: Guillaume

Yes, we can !

KML and atom:link

Jason Birch is right in wanting to use rel="alternate" in his KML atom:link, and the OGC KML spec is wrong in limiting us to "rel=related". Andrew Turner has written even more about what you can do with "alternate" links here. I remember commenting that the KML spec's public comment period was a bit short and ill-timed (Christmas 2007). Perhaps this error would have been caught otherwise?

Related: kml description considered harmful

Comments

Re: KML and atom:link

Author: Sean

I don't think that XSD constrains @rel at all. I believe it was probably the intention of the KML spec writers to import all of atom:link and that the language in the OGC KML spec is just erroneous. If developers go to the Atom syntax spec to understand atom:link, they'll be fine.

Re: KML and atom:link

Author: Jason Birch

I checked the schema too, and it didn't appear to place any restrictions. The only reason I ran into this (I don't make a habit of reading specifications) is that Galdos' KML validator picked it up.

Services and web resources

David Smith and I have been discussing web "services" and web "resources". He'd like to use the terms interchangeably, but I feel that's improper. Not all resources are services. Is the HTML page representing this blog post a service? No. Are the images within it services? No. Is my blog a service? No, although it has ambition sometimes. On the other hand, not all services are web resources (CORBA, DCOM, Ice, Twisted, SOAP), and many of the rest are poor web resources. The situation looks a bit like this:

http://farm4.static.flickr.com/3362/3215405911_cab3667f15_o_d.png

What makes a web resource is explained in http://www.w3.org/TR/webarch/. Consider this classic diagram from that document:

http://www.w3.org/TR/webarch/uri-res-rep.png

That's the architecture of the Web summarized in a single picture. Resources are identified by URIs, and agents interact with resources by sending messages to (for example) retrieve their representations. There is harmony and consistency among the three concepts in the picture above. Now consider a similar picture of an OGC web something service, rendered for effect in the same style:

http://farm4.static.flickr.com/3093/3213872061_2f4270b082_o_d.png

(I'm using the GeoBase service as an example because of its high profile. It's typical of WxS service implementations.)

Does the service's "Online Resource" URL (http://www.geobase.ca/wms-bin/cubeserv.cgi) identify a web service resource? As much as you'd like to think so, it's not immediately clear that this is true. I've put a question mark in the diagram. Dereferencing that URL might provide more information:

seang$ curl -i http://wms.geobase.ca/wms-bin/cubeserv.cgi?
HTTP/1.1 200 OK
Date: Wed, 21 Jan 2009 20:48:08 GMT
Server: Apache/2.0.52 (Red Hat)
Connection: close
Transfer-Encoding: chunked
Content-Type: application/vnd.ogc.se+xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<ServiceExceptionReport version="1.1.3" xmlns="http://www.opengis.net/ows"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/ows ...">
<ServiceException>
CubeSERV-35000: Missing REQUEST parameter
(raised in function handleWmsRequest() of file "main.c" line 422)
</ServiceException>
</ServiceExceptionReport>

The '200 OK' response, in accord with RFC 2616, section 10.2.1, indicates that the response carries the representation of the resource identified by http://www.geobase.ca/wms-bin/cubeserv.cgi. That representation has content type 'application/vnd.ogc.se+xml' and contains a traceback (running in debug mode or what?). Interpretation: http://www.geobase.ca/wms-bin/cubeserv.cgi identifies not a service, but a service exception document. An agent can't stick to HTTP/1.1 and interpret this in another way.

Just to show that this is not just the fault of GeoBase, here's an interaction with another prominent service:

seang$ curl -i http://gisdata.usgs.net/wmsconnector/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD
HTTP/1.1 200 OK
Date: Wed, 21 Jan 2009 20:49:01 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Pragma: No-cache
Cache-Control: no-cache
Expires: Wed, 31 Dec 1969 18:00:00 CST
Content-Type: application/vnd.ogc.se_xml
Content-Length: 294

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<ServiceExceptionReport version="1.1.1">
<ServiceException>
Missing mandatory REQUEST parameter. Possibilities are
{capabilities|GetCapabilities|map|GetMap|feature_info|GetFeatureInfo}
</ServiceException>
</ServiceExceptionReport>

Again, http://gisdata.usgs.net/wmsconnector/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD identifies not a service, but a service exception document.

Are these OGC web services not web resources at all, or just broken ones that might be patched up with appropriate representations and HTTP status codes? The former, I think: the OGC service architecture originated apart from the Web, and although the web is the primary transport (middleware, as Paul Prescod says, or "DCP" in OGC terms) nowadays, the "Online Resource URL" isn't really a universal identifier in the webarch sense. That's the source of the disharmony among entities in the WMS picture above.

Comments

Re: Services and web resources

Author: Ron

In speaking, I tend to call data sources "services," and information sources "resources." (I know these aren't rigorous at all.) Let's say I provide a service that returns street addresses in a zip code. It's probably going to return too many lines for any effective representation in human terms. That, I would call a "service." It is basically a data-oriented response. Format is documented so that it is usable (XML or json or RonText or whatever,) but it is not a representation by any human measure. It's just service returning data in a format. Now, a resource (in my mind, anyway) makes some attempt to tune the data for human consumption -- search and sort, paginated html pages, Web 2.0 schnazifications, an adaptation for the tiny screen, or SVG animations -- something to render the data into information for humans. Perhaps, in Bateson's terms, a "service" just reports a difference; in order to provide a "resource," you have represent a difference that difference makes.

Oops, that was an illustration!

Author: Ron

Let me try it again with a little structure:

In speaking, I tend to call data sources "services," and information sources "resources." (I know these aren't rigorous at all.)

Let's say I provide a service that returns street addresses in a zip code. It's probably going to return too many lines for any effective representation in human terms. That, I would call a "service."

It is basically a data-oriented response. Format is documented so that it is usable (XML or json or RonText or whatever,) but it is not a representation by any human measure. It's just a service returning data in a format.

Now, a resource (in my mind, anyway) makes some attempt to tune the data for human consumption -- search and sort, paginated html pages, Web 2.0 schnazifications, an adaptation for the tiny screen, or SVG animations -- something to render the data into information for humans.

Perhaps, in Bateson's terms, a "service" just reports a difference; in order to provide a "resource," you have represent a difference that difference makes.

Re: Services and web resources

Author: Sean

Resources provide pages for human agents, services provide data for computational agents? The architecture of the Web (not to mention the Semantic Web) does not make this distinction. It's all resources. Text resources, image resources, audio resources, data resources. The audience of a resource (modulo authentication, authorization, and language) is determined by the content types of its representations.

Re: Services and web resources

Author: Andrew Turner

The WxS is actually really close to what would be a good operational model here. If you dereference the URI to the 'service', it should return a 200OK and instead of an Exception could return the GetCapabilities document. This way the "resource" is the description of the map (just not in a pretty picture way, but in a we have a map with these layers in this area with this title, etc.) The capabilities really isn't different from an entity/resource from a PNG or KML. The latter merely contain the actual features, but there is no necessity that a resource directly include all subsequent child resources. An Exception should only be returned if the subsequent query parameters are not valid, and then would be an HTTP 400 "Bad Request". Unfortunately, I assume the fault lies in designing the spec around implementation details (servers being written to say 200 if there wasn't a failure raised)

Re: Services and web resources

Author: Dave Smith

What's referenced is just a base URI, e.g. http://wms.geobase.ca/wms-bin/cubeserv.cgi? - as such, it's an incomplete URI scheme for the resource, and that's why you get the broken response. Obviously you would then have to ask it for something, http://wms.geobase.ca/wms-bin/cubeserv.cgi?request=getCapabilities or... http://wms.geobase.ca/wms-bin/cubeserv.cgi?SERVICE=WMS&VERSION=1.1.3&REQUEST=GetMap&BBOX=-81.758684,46.435561,-74.056742,50.968655&SRS=EPSG:4326&WIDTH=893&HEIGHT=526&LAYERS=DNEC_250K%3AELEVATION%2FELEVATION&STYLES=&FORMAT=image/png&TRANSPARENT=TRUE Those are the full URI schemes. For a RESTful service (or any other kind of service) you would similarly need to pass in parameters, e.g. ask it for capabilities, ask it for an image, identify a feature, and so on. This yields an immense number of permutations, different image sizes, different layers, different styling requests, and so on - and this is why a base URI is not so unreasonable as it provides the base starting point. Perhaps, though, it would make more sense to point to the Capabilities document as that starting point. The other issue is in connecting to it, "lights-out". This is done via a.) OGC standard and b.) Capabilities statement. These things, capabilities and standards are crucial for interoperability and enterprise-oriented approaches. In a vacuum, one could build the greatest, most wonderful service in the world, yet it would do anyone else no good if they don't know how to discover and access it consistently.

Re: Services and web resources

Author: Sean

Dave, I agree with you about discovery and access. Happily, webarch and HTTP/1.1 have this covered, and better than any OGC spec: through URIs, links, and the "follow your nose" discovery that crawlers and search engines can exploit. How is using HTTP not "lights-out" access? It's good enough for your feed reading, your web browsing, your Twitter clients ... even WxS and SOAP use HTTP as transport.

You're misusing the term "URI scheme", which is defined in webarch. Our URI schemes are "http", "ftp", "urn", "info", et al. To assert that the WxS "online resource URL" string is a URI scheme is to create immediate conflict with the architecture of the Web. There would be a profusion of WxS URI schemes, one for each service installation (500 or so), all of them extending the "http" scheme in a non-standard way. Talk about "stovepipes". Remember, too, that WxS services are supposed to support POST requests to the thing at that "online resource URL" for capabilities docs and data. You POST to a resource identified by a URI, you can't meaningfully POST to a URI scheme.

I feel that when you write "URI scheme" you're trying to express concepts related to URI templating. See the IETF URI templating draft for the way to do this right, but understand that even if WxS were to do proper URI templating, its "online resource URLs" would have to identify proper resources for POST's sake.

Re: Services and web resources

Author: Dave Smith

With regard to discovery and access, we can discover and access RSS and ATOM feeds only because they have a defined standard - and even that is not without the occasional wrinkle. Similarly, the Twitter API is documented and defined. But you certainly wouldn't be able to immediately figure out how these work without first knowing at least a little bit about the feed/API and its parameters - hence capabilities and standards. What I mean by "lights-out" access is being able to programmatically discover and access with little more than a handful of predefined rules - as opposed to always making a human read docs unique to each API or feed, and write custom code for integration. People want to focus on doing science, analysis and solving business problems, and not writing custom code that might break with each change on the far end. I'd agree that OGC isn't quite there on some of your points, but again, it points up the need for consistency.

Re: Services and web resources

Author: Sean

It doesn't follow from my criticism of WxS that I am against standards in general. I'm strongly in favor of good protocol and format standards; "good" to me meaning that something works well with our global information infrastructure (also known as the "Web"). In this sense, WxS is not so good, though its formats are better than its protocols.

Mocking GEOS

My use of mocks isn't as sophisticated as Dave's, perhaps, but I stumbled onto a simple testing pattern that might be useful to other Python geospatial/GIS developers who are wrapping C libs using ctypes.

Consider Shapely: it wraps the GEOS library, the quality and accuracy of which we take as a given (though not blindly, because I do contribute fixes and enhancements to GEOS). The predicates and topological functions of GEOS are called from within Python descriptors, classes that perform argument validation and handle GEOS errors. For Shapely, I'm testing these descriptors, the GEOS wrappers, not GEOS itself. What pair of geometries would I have to pass to GEOSDisjoint (for example) in order to get the return value of 2 that signifies an error? Even if known, they might be subject to issues of numerical precision, or be sensitive to changes in GEOS. I'd rather not fuss with this. Instead, I want some function to stand in for GEOSDisjoint and friends, one that takes 2 arguments and has very predictable return values in the range (0, 1, 2). A function like libc's strcmp():

>>> import ctypes
>>> libc = ctypes.CDLL('libc.dylib') # this is OS X
>>> libc.strcmp('\0', '\0')
0
>>> libc.strcmp('\1', '\0')
1
>>> libc.strcmp('\2', '\0')
2

Meaningless, but handy, isomorphism between strcmp() and GEOS binary operations in hand, a generic wrapper for GEOS can be fully tested like this:

import ctypes
import unittest

from shapely import predicates

BN = libc.strcmp

class CompMockGeom(object):
    # Values chosen with libc.strcmp in mind
    vals = {'0': '\0', '1': '\1', '2': '\2'}
    def __init__(self, cat):
        self._geom = ctypes.c_char_p(self.vals[cat])
    comp = predicates.BinaryPredicate(BN)

class BinaryPredicateAttributeTestCase(unittest.TestCase):

    def test_bin_false(self):
        g1 = CompMockGeom('0')
        g2 = CompMockGeom('0')
        self.assertEquals(g1.comp(g2), False)

    def test_bin_true(self):
        g1 = CompMockGeom('1')
        g2 = CompMockGeom('0')
        self.assertEquals(g1.comp(g2), True)

    def test_bin_error(self):
        g1 = CompMockGeom('2')
        g2 = CompMockGeom('0')
        self.assertRaises(predicates.PredicateError, g1.comp, g2)

Comments

Re: Mocking GEOS

Author: ajfowler

Hi, Unrelated to this post, but I came across one of your old blog posts about web-mapping accessibility. Is this topic off of your radar now? aj

Re: Mocking GEOS

Author: Sean

Yes, but it looks like it's on yours. What's up with web map accessibility?

Re: Mocking GEOS

Author: ajfowler

Well I'm looking into creating a text description of a map. There aren't a lot of resources out there, but I'm avidly searching.

Toward Shapely 1.1

Over the holiday I created a 1.0 branch for Shapely and began working toward Shapely 1.1. The next release will have the same API, but with some new and improved implementations of the same classes and methods, and a few new features. So far, I've managed to cut the code base by about 6% (the less code, the better, I say), not including the new tests written to get coverage to 97%:

Name                               Stmts   Exec  Cover   Missing
----------------------------------------------------------------
shapely                                0      0   100%
shapely.array                         14     12    85%   22-23
shapely.deprecation                   13     13   100%
shapely.factory                      195    195   100%
shapely.geometry                       8      8   100%
shapely.geometry.base                265    265   100%
shapely.geometry.collection           12      8    66%   25-28
shapely.geometry.geo                  48     48   100%
shapely.geometry.linestring           55     55   100%
shapely.geometry.multilinestring      44     44   100%
shapely.geometry.multipoint           82     82   100%
shapely.geometry.multipolygon         59     59   100%
shapely.geometry.point                92     92   100%
shapely.geometry.polygon             177    177   100%
shapely.geometry.proxy                31     31   100%
shapely.geos                          59     32    54%   12-24, 29-38, 40, 45-51, 84, 90-91
shapely.iterops                       30     30   100%
shapely.ops                           24     24   100%
shapely.predicates                    35     35   100%
shapely.topology                      29     29   100%
shapely.wkb                           22     22   100%
shapely.wkt                           22     22   100%
----------------------------------------------------------------
TOTAL                               1316   1283    97%
----------------------------------------------------------------------
Ran 196 tests in 0.836s

FAILED (errors=1, failures=1)

Feel free to grab the new code from its Subversion repository:

$ svn co http://svn.gispython.org/svn/gispy/Shapely/trunk Shapely

By the way, I've used git and git-svn exclusively for my work since the 1.0/1.1 branching, and am becoming a fan.

I remain unenthusiastic about implementing heterogeneous geometry collections. I never use them ... what am I missing?

Comments

Re: Toward Shapely 1.1

Author: Martin Daly

You will be missing completeness, and compatibility with Simple Features. For example, what is the union of a Point and a LineString, if not a GeometryCollection? You might argue that that is a poor-ish example because you can just say "I won't allow that". A better example would be the intersection between LineString-s that are colinear for some portion of their length, and cross at some other point. The result of that - according to Simple Features - is a LineString and a Point, and it would be hard to disallow intersections between LineString-s just because you don't like the resulting geometry type.

Re: Toward Shapely 1.1

Author: Sean

Okay, thanks Martin. Can you think of an example of how heterogeneous geometries might be part of the same single feature (sharing the same attributes) in a GIS?

Re: Toward Shapely 1.1

Author: Martin Daly

You mean something in practice, not theory? I'm out :) I can't remember having seen an example in data that we have been given. I could probably make something up, but it would be just that: made up. Of course Shapefiles have no provision for geometry collections, so they don't actually exist, right?

So where's the git repo?

Author: Holger

So, where is the public git repository for all of us who don't want to have to learn yet another version control system, just to get to your subversion repo?

Re: Toward Shapely 1.1

Author: Sean

No need to learn svn:

$ git svn clone http://svn.gispython.org/svn/gispy/Shapely/trunk Shapely

Re: Toward Shapely 1.1

Author: Sean

Alright, I'm sticking with heterogeneous collections which might be incidental products of operations, but are otherwise discouraged. And we've reached a limit of code coverage. I can only get to 100% now by fooling the test runner into believing that it's on different platforms that have or do not have numpy.

Name                               Stmts   Exec  Cover   Missing
----------------------------------------------------------------
shapely                                0      0   100%
shapely.array                         14     12    85%   22-23
shapely.deprecation                   13     13   100%
shapely.factory                      195    195   100%
shapely.geometry                       8      8   100%
shapely.geometry.base                265    265   100%
shapely.geometry.collection           12     12   100%
shapely.geometry.geo                  48     48   100%
shapely.geometry.linestring           55     55   100%
shapely.geometry.multilinestring      44     44   100%
shapely.geometry.multipoint           82     82   100%
shapely.geometry.multipolygon         59     59   100%
shapely.geometry.point                92     92   100%
shapely.geometry.polygon             177    177   100%
shapely.geometry.proxy                31     31   100%
shapely.geos                          59     32    54%   12-24, 29-38, 40, 45-51, 84, 90-91
shapely.iterops                       30     30   100%
shapely.ops                           24     24   100%
shapely.predicates                    35     35   100%
shapely.topology                      29     29   100%
shapely.wkb                           22     22   100%
shapely.wkt                           22     22   100%
----------------------------------------------------------------
TOTAL                               1316   1287    97%
----------------------------------------------------------------------
Ran 196 tests in 0.849s

FAILED (errors=1, failures=1)

Open access to National GIS data

A corollary to Jeff Thurston's grammatically challenged geospatial thought for the day:

Let’s be clear: If government pays for geodata, then makes it available for free. Then it is not free. You ARE paying for it.

is this:

If you're paying for it, you own it, and should have the right to unfettered access to unclassified portions of it.

The National Institute of Health mandates open access to the published results of science it funds. Similar open access to all publicly funded research is currently the 12th ranked suggestion to Obama's future CTO. An equivalent policy for National GIS data is in my opinion, a must. I don't mean access to a service endpoint, I mean access to shapefile downloads.

I believe I will write my new Senator, Mark Udall (do I ever love typing that phrase!), and see if he's interested in doing something about it.

Update (2009-01-16): related, more thoughtful post here.

Update (2009-01-28): more from Sean Gorman and Paul Ramsey.

Comments

Re: Open access to National GIS data

Author: Kirk

I don't think I'd like for the public to have access to precise locations of archaeological sites, would you?

Re: Open access to National GIS data

Author: Eric Wolf

I guess people don't read before they make suggestions. The Obama platform specifically cited increased access to Government information as an important goal of his administration. And open access is generally the norm at the USGS. I believe it is by law that USGS-collected data cannot be copyrighted and is free (libre) for any use. Unfortunately, there are so many snafus related to the free (gratis) problem that the bureaucrats get stuck in a tailspin. The past eight years, the Department of Interior has operated under a mantra of "we must become like a commercial operation" because, as we all know, the market is always right... right? We also have the technical issue of USGS data is not in shapefile format because of the magnitude of the data and the diversity of the data types. Most of the data is stored as custom geodatabases, sometimes centralized but frequently distributed. Providing service endpoints is easier than shapefiles, especially for the centralized geodatabases - all we have to do is front-end the database with the appropriate protocol. The Seamless server (http://seamless.usgs.gov/website/seamless/viewer.htm) already provides shapefile downloads. But because of the way the data is stored at the USGS, it must first be extracted from the databases and then turned into a shapefile. The debate, really, is: would you rather the USGS spend your tax dollars maintaining a database structure (i.e., independent shapefiles like "transportation for Colorado") that doesn't fit the Survey's own internal needs for its mission of furthering environment science? In the past, the USGS charged for data delivery to help compensate for this difference between internal and external data format needs. Of course, if you take this to the next step, you get the FGDC and SDTS. I won't embarrass myself by going there... I'd suggest, in addition to righting Udall, also CC: that other famous Colorado politician, Ken Salazar, the incoming Secretary of the Interior. Salazar's role is in interpreting administrative guidelines into policy for the USGS and the rest of the DOI.

Re: Open access to National GIS data

Author: Sean

Kirk, as far as I'm concerned that's another kind of classified. Moot here, because archaeology and cultural heritage isn't part of the National GIS proposal, but the same issue does come up in regard to wildlife habitat. There are people who might bring on the bulldozers upon discovering that their property intersects with endangered species habitat.

There is a mind-blowing cave near my old hometown, Logan, Utah. As a kid I went in there a bunch of times. Increasing numbers of visitors, some who camped inside, built fires, etc, made life hard for Townsend's big-eared bats. The Forest Service tried some seasonal closures to protect the bat population, and some hillbillies (who are probably related to me -- this is Utah, after all) responded by trying to eliminate the bats. The cave is now gated, and closed. Sadly, I don't think this kind of vandalism is particular to the Intermountain West.

The paranoid may say parcel data likewise needs to be kept out of the hands of evil-doers, but I think this is bogus.

Thanks for the Salazar reminder, Eric. I busted my ass for him in 2004, and he owes me a favor ;)

Re: Open access to National GIS data

Author: Dave Smith

There are quite a few different reasons why access might be controlled - not just sensitivity due to national security, archaeological or natural security, but also others - e.g. governmental regulation on business may make government privy to information about a company's business processes, suppliers, and so on - which might otherwise by confidential trade secrets. However, I tend to think that the datasets which genuinely require sensitivity are the exception to the rule. The vast majority should be open and accessible. However, another consideration is that many governmental entities also face unfunded mandates which dictate that they collect and manage data. How to pay for it? Charge users, is one model, unfortunately. Or... don't collect the data at all. Or... rob Peter to pay Paul, and borrow a little funding from another program and get the most basic data collected, which in turn, might not be in an easily-sharable form. Many obstacles. Should USGS be maintaining data outside of their own mandate? Probably not. But meanwhile, can they access said data from DOT/FHWA or other sources in a seamless fashion? Heck no. So everywhere across government, we have all these disconnected little stovepipes, which without the rest of the background data, would generally be of limited utility. FGDC, "GIS for the Nation", GOS, OMB Geospatial Line of Business and all of these should be pursuing a national FRAMEWORK for providing this - they have accomplished a few things here and there, but the technical architecture is still sorely lacking. And without sound guidance, governance, and a solid national architecture and framework, the Dangermond proposal could seriously threaten to only propagate the same type of thing. Who manages and houses what? How is the data to be published, discovered and accessed? Technology is not the hurdle. The hurdle is cultural.

Re: Open access to National GIS data

Author: Sean

Eric, did you bring up the Seamless app as an example of how data should be shared? It is so wrong, in so many ways. I'm not counting on the usual suspects to deliver anything better for a national GIS, and that's why I'm saying the USGS should just release the data periodically and let others remix it into useful services.

Re: Open access to National GIS data

Author: Kirk

Sean, I hadn't really thought about wildlife data being classified, but see what you mean. I live not far from Bracken Cave, which seems fairly well protected by BCI ... http://tinyurl.com/brackencave. Notice how "find bat locations" takes you to a page that tells you everything about the cave and its bats - except for the location. I was thinking more about the antiquities databases you build tools for. Does ISAW try to discourage treasure hunters from gaining access?

Re: Open access to National GIS data

Author: Tyler Erickson

There was a fairly good keynote talk related to this subject last December at the AGU Fall Meeting, an academic scientific conference that draws 15,000+ attendees. Michael Jones of Google spoke on spreading scientific knowledge, and one of his main points was that all government funded research should require open publishing of the work (data, source code, and results) so that others can easily reproduce and build upon it. The talk seemed to be well received, given that most of the audience members are dependent on government funding for the majority of their research. At least I hope that they see the big picture and agree with it: if everyone gives away their one precious dataset/algorithm, everyone will have access to thousands of new datasets and algorithms to for used in their own research.

Re: Open access to National GIS data

Author: Sean

Kirk, I do think that location obscurity is the very least that digital antiquities people should provide for sensitive archaeological sites that can't be better secured. ISAW projects are different: we aggregate and provide tools for study of already known places, people, and texts. Databases of hidden treasures aren't part of our mission. Ideally, our workflow engine and editorial board publish no material before its time, but that's principally to maintain a high standard of scholarship.

Re: Open access to National GIS data

Author: Dave Smith

Carrying over discussion from Sean Gorman's site - The issue with just providing data (e.g. shapefiles) is that they require download/conversion/etc... a process. In this process, how often do you update? Do you have/provide adequate metadata to know whether or not it's most-current data? Do then you need to build a refresh process, to schedule a mechanism to perform the download and update on your end? Are there going to be dozens of other stakeholders all making redundant investments in the same type of refresh processes? With Apps for Democracy et al, it was beyond just "data" but specifically directly-mashable data feeds - and this can be a means of providing and ensuring currentness, via KML network links, live GeoRSS feeds et al. Part of my concern is in economies of scale (why not build it once, use it many times) and in potential liabilities, e.g. folks who might not be dilligent in routinely updating the datasets that feed their apps. Easiest solution would be to just publish a live feed. Have agencies provide direct data access via KML network link, GeoRSS, WxS services, tile services, e.g. GeoServer. With a modicum of infrastructure planning, this could be quite scalable and robust, and serve a vast majority of need across the entire community. And, the data would reside in-place with each steward, in a federated NSDI. This is basic stuff, not complicated star-wars physics. The flipside of the equation is in data collection efforts - e.g. EPA's Exchange Network, which collects data from all 50 states, tribes and other participants. Or... you have OAM, great idea for crowdsourced data, but what happened here?- again, infrastructure crunch, needing sponsorship and funding. "Just do it" is all fine and good, but definitely has its practical limits, particularly when dealing with an entire national dataset and applications which require cross-agency and inter-agency data. With respect to obscuring data, touch base with NatureServe - they are working on ways to allow site screening for sensitive/endangered species without exposing the actual location.

Re: Open access to National GIS data

Author: Sean

I invited trouble by reducing my desire for excellent, standardized, syndicated data to "shapefiles". I am in favor of funding agencies to create, manage, publish (using simple and robust mechanisms like RSS), and curate this data. My only objection is to the proposed shiny service architectures and portals; the GIS industry/community rarely gets that stuff right.

Re: Open access to National GIS data

Author: Eric Wolf

Sean: I brought up Seamless not as an example of how data should be served, but an example of how the USGS is actively trying to come up with a scheme for providing the diverse range of data it collects, creates and maintains to a diverse user base. Essentially, the problem is similar to the Census DIME and TIGER files. The Census gives you dumps from the database and a schema to help you decode the data dump. The problem is the USGS doesn't have one database. We have many. And the larger databases are comparable in size and complexity to the Census data. And unlike the Census which is really only updated once every decade, many of the USGS databases are updated in real-time. I'm not trying make excuses. I'm trying to help you understand the challenges. My colleagues and I in CEGIS at the USGS are actively trying to understand how to best manage data dissemination. So we appreciate being told what is wrong with what we are doing and what people actually want.

Re: Open access to National GIS data

Author: Dave Smith

Eric raises another point - EPA has similar flows, e.g. FRS, where the data it contains comes from a large number of disparate stewards, and which, based on varying practices and standards in place with external stewards, may have a host of issues when it arrives, e.g. mismatched datum, reversed lat/long, signs on longitude values, and so on- further, representation of the "place" may mean very different discrete things - e.g. water outfall, air stack, front gate outside of a plant, and so on, along with other issues which need to be harmonized in order to provide a seamless national dataset of regulated facilities. And as with the USGS database, these are refreshed on a continual basis. As such, there are hurdles to be overcome before even turning over the data, and that's been half the battle. However, once the data can be gotten to this point, the solutions for delivery become a lot more straightforward, at least in today's terms. It should also be considered that, for example, EPA's web-based GIS applications began life in the 1990s, when current technologies and architectures were not yet conceived, with many pieces scratch-built. Many functionalities can and are being replaced for more current technologies - however again, availability of resources has been an issue. Dealing with complex processes, legacy systems and disparate resources across and outside an enterprise is never as easy as building something new. But hopefully existing efforts and technologies, such as GeoServer can be employed to provide robust, low-cost infrastructure to serve these types of needs in the future.