Diving into geolocation

Speaking of the open web, here's Mark Pilgrim's take on HTML5 geolocation:

Geolocation is the art of figuring out where you are in the world and (optionally) sharing that information with people you trust. There are many ways to figure out where you are — your IP address, your wireless network connection, which cell tower your phone is talking to, or dedicated GPS hardware that receives latitude and longitude information from satellites in the sky.

You can also pick your location, or any other location at all that suits your needs, from a map using René-Luc's Firefox Geolocater.

GeoWeb blues

Apparently, a lot of the "GeoWeb" is made of blue legos. Say what you will about HTML and SVG, but open web stuff comes less and less with these kind of nasty surprises and titanic games of chicken thundering around your business.

In which we go into the weeds for some REST

On the descending portion of the hype cycle now it seems that, like a guy in a "Rock Star" t-shirt, a "REST API" most likely isn't. It might be using HTTP as a uniform interface and identifying things with URIs, but then you find it provides text/xml or application/json responses with no links and out-of-band rules for teleporting (you can't call it traversing) to other parts of the API. Tight coupling like that is not what REST is about.

One that's getting very close is GeoServer's Configuration API. It has links from workspaces to datastores to layers, and a non-HTML client should in theory be able to follow them, changing the configuration state of the service in a step-by-step manner, led by the service itself, much in the same way you would through a web browser. All from one bookmarkable URI. This is what REST is about.

I say "in theory" because the GeoServer API doesn't hold water for formats other than HTML. Here's the problem: given a bookmarked URI ending in "workspaces" like http://example.com/workspaces, how does a client determine that this URI identifies a resource to which you can POST a new workspace and begin the configuration process using in-band information only? If you're working with a text/html representation of the resource, you'll be shown a form, and away you go, RESTfully. The semantics of forms, and specifically that submitting one sends data to certain URI, are defined in the text/html media type standard. A client doesn't need any out-of-band information: the form is in the representation, the semantics are specified by the standard "text/html" value of Content-Type header, both in-band. Now, if the server sends you back a text/xml response, there's no way for a client to know only from in-band information how it is to act on the response. That it's a certain type of resource (a GeoServer Workspace) because the URI ends in "workspaces" and the representation has a root <workspaces> element? That's out of band. That the bookmarked URI is a "GeoServer workspace bookmark"? That's out of band too.

AtomPub, on the other hand, holds water because the POST-ability of service resources (for creating new collections) is standardized under the media type "application/atomsvc+xml". If a client GETs a URI and that format comes back, the POST-ability is communicated, in-band. The "application/atom+xml" media type does the same for collections and entries, especially in its specification that an "edit" link tells the client via which resource it modifies entry and collection state. Standardizing on Atom and AtomPub, if you can, is therefore a good bet.

The interesting thing about REST that distinguishes itself from other styles is that interaction is driven by in-band information. Loose coupling, evolvability, and longevity are properties of a system that has the hypertext constraint. To get these properties, GeoServer and other APIs need to eliminate the out-of-band communication. Standardize on media types like HTML or Atom, mint their own media types (application/vnd.geoserver+xml or some such), or use links with standard relations in HTTP headers (aka Web Linking) and push for client support of those.

Comments

Re: In which we go into the weeds for some REST

Author: Chris Holmes

Thanks for the review Sean, our goal is to make GeoServer as RESTful as possible (indeed when we have the time we'd like to do REST feature access alternative to WFS).

Practically I'm still not sure of the best way forward. Atom does seem better than text/xml, but even if we did that wouldn't we still want to have text/xml representations of resources? Or are you advocating replacing all the text/xml responses with Atom/AtomPub?

As for application/vnd.geoserver+xml - isn't that out of band in its own way? Like developing a client against it you'd still need to know something about that format? Or you're saying it'd just be a better self-documenting one? I'd be interested in your ideas of what exactly that looks like. And again, would it replace text/xml responses?

As for Web Linking, it looks great but is it even accepted yet? Not that we're opposed to implementing a developing standard and encouraging its adoption, but I think things like feature access through REST are higher priority for us. And the idea with that is we'd just add http headers to our text/xml responses? If you want to help us you could sketch out exactly what headers we should add - I think it's pretty easy to add in extra http headers, so if it's not much of an effort we might be able to do it soon.

Re: In which we go into the weeds for some REST

Author: Allan Doyle

Chris asks "As for application/vnd.geoserver+xml - isn't that out of band in its own way? " -- That was my question, too. I was going to ask about "application/atomsvc+xml" instead.

Sean said

AtomPub, on the other hand, holds water because the POST-ability of service resources (for creating new collections) is standardized under the media type "application/atomsvc+xml". If a client GETs a URI and that format comes back, the POST-ability is communicated, in-band.

Isn't that only because RFC 5023 says it's POST-able? Then RFC 5023 is the out-of-band knowledge.

Re: In which we go into the weeds for some REST

Author: Sean

RFC 5023 isn't out-of-band: it and application/atomsvc+xml and application/atom+xml are part of the fabric of the web.

For a different take on the subject you should check out http://www.subbu.org/blog/2009/12/media-types-and-plumbing.

Governance of out of band semantics

Author: Rob Atkinson

There is a significant implication in using a special MIME type that is "part of the fabric" of the web to indicate how a client is supposed to interpret content, and the actions it may take as a result. This basically implies that conformant RESTful semantics are only possible within the governance framework of the web "fabric" - its not open to application domains to define semantics or behaviour of APIs.

Perhaps application API semantics have to be considered as an out of band (from REST point of view) on top of REST. I.e. REST semantics is an out-of-band part of any application API. This perhaps makes sense, because strong governance (fabric of the web) is useful in an out-of-band context, whereas private application semantics are much more problematic as out-of-band information, because they are hard to discover, formalise consistently, create and interpret.

Re: In which we go into the weeds for some REST

Author: Sean

I really must rethink what I've said about the GeoServer API not holding water after another read of Subbu's post. I've agreed with those who say "application/xml is not the media type you're looking for" (Mark Baker, Jim Webber) if you want to use web links, not XLinks. That link semantics aren't conveyed by the Atom namespace, but by the media type. While that's still my take, I should give Subbu's a try. If you stick to the minimum HTTP protocol (Atom invents a somewhat specialized one) and straight XML processing (Atom has distinctly different processing rules, like "must ignore"), application/xml might be fine. At any rate, I think you'd at least need an actual namespace for workspace and data store elements to make this hold water using application/xml.

No mistake: a configuration API is a great place to start getting into the REST style, and GeoServer is getting close to nailing it.

Re: In which we go into the weeds for some REST

Author: Sean

There's another issue in the GeoServer docs, which document a practice of interaction driven through a fixed URI hierarchy (see Fielding's 4 bullet at http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven). There's a protocol implied there that application/xml doesn't begin to hint at. Better: drive interaction through the links that are already present in GeoServer's workspace (and friends) representations. GeoServer is ready for REST in a lot of ways, but documents a contrary usage that will result in unnecessary coupling.

Linking UK data

I haven't seen any links from geospatial or GIS blogs to Jeni Tennison's excellent piece about the motivation for choosing the web's architecture as the architecture for the UK's open data initiative and for choosing linked data instead of the usual "services" that make up US data initiatives.

Why?

Because linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.

Read it and check out the links to tutorials about creating linked data.

Comments

Re: Linking UK data

Author: Ian Turton

I think that is because all the UK data is just text and excel tables. The OS will give up their data when it's pried from their cold dead fingers, and don't even think about geocoding via a postcode Royal Mail are even worse!

Ian

Re: Linking UK data

Author: Sean

GIS data, rasters excepted, is also largely tabular, wouldn't you say? What's a shapefile if not a table? GML allows different structures, but is less commonly used in that way, and the RDF model is equally suited for those special complex features cases.

Speaking of the OS, it may not giving away coordinates yet, but has interesting and possibly useful linked data at http://data.ordnancesurvey.co.uk/.

Manipulimization of whatchamacallems?

I posted a link to the listing of GIS stuff in the Python Package index the other day and was reminded that there's not quite enough information there for people looking to match their requirements with software. Click through some of the links and you may find no distribution at all, or a dependency on some C library (that would be my work). The descriptions can also be a little vague, such as "Geospatial geometries, predicates, and operations". Click through the accompanying link to Shapely and you get a slightly more verbose description:

Shapely is a Python package for manipulation and analysis of 2D geospatial geometries. It is based on GEOS (http://geos.refractions.net). Shapely 1.0 is not concerned with data formats or coordinate reference systems. Responsibility for reading and writing data and projecting coordinates is left to other packages like WorldMill and pyproj. For more information, see:

  • Shapely wiki

  • Shapely manual

Shapely requires Python 2.4+.

That's a little more to go on, but assumes that you're already a GIS programmer. It's a terrible assumption to be making when you consider how wedded GIS programmers are to not-open source or not-Python platforms and how unlikely they are to be trying Shapely. The real audience is Python programmers coming from outside the GIS business and it doesn't explain to them at all why they'd want to use Shapely. That's something I'm going to remedy, starting with a little blog rambling.

Imagine a situation where you'd like to find or index a substring within another string. Is there "overlap" between the strings, and if so, what is it? Or maybe you'd like to replace certain characters in a string with others. Now imagine that you're compelled to load the text strings into a relational database to perform these operations because such string functions aren't available in any other context. No knock on the RDBMS, a tremendously useful thing, but that's a unacceptable situation.

The premise of Shapely, or one of the premises, is that Python programmers should be able to perform PostGIS type geometry operations outside of an RDBMS. Another is that Python idioms trump GIS (or Java, in this case, since the GEOS library is derived from JTS, a Java project) idioms. Shapely, in a nutshell lets you do PostGIS-y stuff outside the context of a database using idiomatic Python. I've got a ways to go yet in explaining this because I've left you needing to know what PostGIS-y stuff is.

That's the theory, but are spatial operations outside of a RDBMS all that interesting in practice? They are to the GIS community: the OGC's Web Processing Service specification was written to standardize such practice, and I know a bunch of Python programmers that are gonzo for WPS. Myself, I'm -1 on the WPS standard, but +1 on spatially smart web intermediaries. This is one of the possible classes of Shapely applications.

My toy Mush is an example of a pipe-like app that uses Shapely. Feeds go in, feeds come out. Check out this map of intersecting regions around geo-referenced photos in a Flickr feed produced using Shapely. Look, Ma: no database.

Proposed standard for web linking

The Web Linking internet draft is in final call. This means that soon we'll have a standardized registry of web link relation types, rules for extending the set of registered links, and rules for serializing links in HTTP headers and/or request and response bodies. The ID also defines what a link is:

In this specification, a link is a typed connection between two resources that are identified by IRIs [RFC3987], and is comprised of:

  • A context IRI, and

  • a link relation type (Section 4), and

  • a target IRI, and

  • optionally, target attributes.

A link can be viewed as a statement of the form "{context IRI} has a {relation type} resource at {target IRI}, which has {target attributes}."

An IRI, if you don't know, is an Internationalized Resource Indentifier, the Unicode complement to the URI. The draft uses IRI in its language, but you can read it as URI or URL without loss of meaning.

I'm not going to blog about every last call, but this one is especially interesting to me and relevant to the discussion about GIS and Web architectural styles. If you look at the header on its status page, you see that the draft is a "Proposed Standard". It would be a standard for the entire internet, not just a particular business domain. New media can standardize on it. Library systems can standardize on it. Geospatial systems can standardize on it. The proposed Web Linking standard has been the context for my writing and blogging about a where link relation which I'd like to submit for registration soon – let me know if you recognize yourself as a stakeholder and we'll do it together.

This last call comes at about the same time Ron Lake wrote the following in an article partly responding to "some people" who ask where is the web in "GeoWeb":

Some of the issues revolve around the weak typing and weak semantics of a hyperlink. In the web of documents this does not matter so much, since this is a world with a person in the loop. Get the wrong document? Check again. Much tighter specification of type and semantics is required in the web of systems, or chaos may result.

His article was illustrated with a (different) image of a staffed switchboard to emphasize or exaggerate the dependency of the web on human operators. I believe that is in fact not Andrew Turner at the very back of this one I found on Flickr.

http://farm4.static.flickr.com/3007/2680257100_69b12c6e7d_d.jpg

Item 24092, City Light Photographic Negatives (Record Series 1204-01), Seattle Municipal Archives.

An HTML <img> element is a specialized link with very tight semantics that is often wrapped, as in the case of the very image above, by a more generalized link to a home page for the image. What the Flickr resource means to this blog post is rather underspecified by the link I'm using, but the semantics of the <img> tag need no human interpreter at all.

Let's consider what links bring to a modern web mapping application in your web browser. When you use the browser to fetch the HTML representation of a web map page, it finds among other things HTML <link> elements with rel="stylesheet" and various <script> elements. A script is a link with extra well-defined semantics. A web browser "knows" via the processing rules labeled "text/html" these semantics – that it's supposed to fetch the stylesheet resources identified by those links using HTTP GET and apply them in rendering the HTML page. Following other rules in the same "text/html" set, the browser fetches javascript files and interprets them. That code might create new <script> elements in the DOM, thereby loading, dynamically, more javascript without any human intervention. Only after this (in general) does a human enter the loop. That human uses the javascript UI to choose an area of interest, code creates <img> elements in the page's DOM (as I wrote before, an <img> is yet another specialized link), and the browser "knows" once again following others in the same set of rules that it is to fetch the imagery and render it in the page to show the user. HTML is full of links with strong semantics and non-human agents use them to great effect. In not one of those cases there did a human need to judge the semantics of a link or the type of thing it references. Non-browser web applications can exploit links in similar ways to accomplish different tasks.

The initial registry for Web Linking includes some fuzzy relation types like "payment" (indicates a resource where payment is accepted), but also sharper ones like "previous" and "next". Extension types may be as semantically fine as necessary. My feeling about a "where" link relation is that it ought to indicate a resource representing the coordinates of the link's context so that it could be used, with a gazetteer, in place of literal geometries in (for example) an Atom feed:

...
<entry>
...
<link
  rel="where"
  href="http://www.geonames.org/5577147/fort-collins.html
  />
...

In practice, the target of the link ought to come in a standard content type such as RDF/XML, GML, or KML that has well-defined geometries, or as HTML with an alternate link to a geographically-suited format.

Read the section about links in HTTP headers too: imagine turning legacy GIS data files into linked data with just a few rewrite rules.

Shapely 1.2a1

Update (2010-02-18): http://sgillies.net/blog/1001/shapely-1-2b1

Update (2010-02-09): 1.2a6 (12 cumulative bug fixes) is ready at http://gispython.org/dist/Shapely-1.2a6.tar.gz.

Shapely 1.2a1 has been tagged and uploaded to http://gispython.org/dist so that people don't get it by mistake from PyPI. To install and try it out (in a virtualenv):

$ pip install http://gispython.org/dist/Shapely-1.2a1.tar.gz

or

$ easy_install http://gispython.org/dist/Shapely-1.2a1.tar.gz

You'll need a GEOS version >= 3.1.1 to try the new prepared geometry class:

>>> from shapely.geometry import Point, Polygon
>>> triangle = Polygon(((0.0, 0.0), (1.0, 1.0), (1.0, -1.0)))
>>> from shapely.prepared import prep
>>> p = prep(triangle) # pre-analyze for efficient ops
>>> p.intersects(Point(0.5, 0.5))
True

Most of the work toward 1.2 has been done by Aron Bierbaum. Other features include geometry simplification, a switch to the new reentrant functions in libgeos_c, setup script consolidation, and more tests.

Comments

Re: Shapely 1.2a1

Author: Ian Bicking

Being new to this stuff, I'm a big vague on what Shapely is for, examples would be really helpful.

Re: Shapely 1.2a1

Author: Sean

Ian, I'm excited to see you crossing over into geographic applications! Anything I can do, please let me know.

All I've got for Shapely is a manual, a wiki page, and a readme. All of these, admittedly, lack a good "motivation" section (I'm as guilty of falling for the self-evidence of GIS as anyone), which could be reduced to this: Shapely allows you to do PostGIS-y stuff with idiomatic Python outside the context of a database. I should probably just replace all the GIS professional language with exactly that in the readme and wiki.

For example, you can pull a GeoRSS feed like this from Flickr and map the intersection of regions around items within it without importing it into a database.

The code for this app I was calling Mush is at http://sgillies.net/hg/mush/file/115d942603d3/mush/overlap.py.

Re: Shapely 1.2a1

Author: Michael Weisman

He Sean,

This looks great, but I think I'm missing something about when prepared geometries should be used.

I modified some code I have that does point on polygon overlays with n points to ~90,000 polygons to use prepared geometries and "time python script.py" tells me the that it actually takes 41.493 sec/point with prepared geoms vs 37.942 sec/point with standard shapely Polygon objects. The quick sample in your post also runs slightly slower with prepared geometries (0.098 sec vs 0.095 sec). When do the benefits of using prepared geoms outweigh the cost of creating them?

Thanks,

Michael

Re: Shapely 1.2a1

Author: Sean

Michael, you're prepping a point for comparison to many polygons? No gain there, I think. The feature is designed for the other case: preparing a complex polygon for comparison to many simpler geometries. We'll make sure that there's adequate documentation of this before a final 1.2.

Re: Shapely 1.2a1

Author: Michael Weisman

I was actually prepping 90,000 polys and leaving the points as standard Point geom objects. I'll play with this some more an see what I can work out.

Thanks!

Re: Shapely 1.2a1

Author: Sean

I've uploaded a prepared geometry benchmarking script to http://trac.gispython.org/lab/wiki/ShapelyBenchmarks. In the first, not entirely unrealistic case, I'm seeing 4x faster contains.