Proposed standard for web linking

Sean Gillies

2010-01-25 00:00

The Web Linking internet draft is in final call. This means that soon we'll have a standardized registry of web link relation types, rules for extending the set of registered links, and rules for serializing links in HTTP headers and/or request and response bodies. The ID also defines what a link is:

In this specification, a link is a typed connection between two resources that are identified by IRIs [RFC3987], and is comprised of:

A context IRI, and

a link relation type (Section 4), and

a target IRI, and

optionally, target attributes.

A link can be viewed as a statement of the form "{context IRI} has a {relation type} resource at {target IRI}, which has {target attributes}."

An IRI, if you don't know, is an Internationalized Resource Indentifier, the Unicode complement to the URI. The draft uses IRI in its language, but you can read it as URI or URL without loss of meaning.

I'm not going to blog about every last call, but this one is especially interesting to me and relevant to the discussion about GIS and Web architectural styles. If you look at the header on its status page, you see that the draft is a "Proposed Standard". It would be a standard for the entire internet, not just a particular business domain. New media can standardize on it. Library systems can standardize on it. Geospatial systems can standardize on it. The proposed Web Linking standard has been the context for my writing and blogging about a where link relation which I'd like to submit for registration soon – let me know if you recognize yourself as a stakeholder and we'll do it together.

This last call comes at about the same time Ron Lake wrote the following in an article partly responding to "some people" who ask where is the web in "GeoWeb":

Some of the issues revolve around the weak typing and weak semantics of a hyperlink. In the web of documents this does not matter so much, since this is a world with a person in the loop. Get the wrong document? Check again. Much tighter specification of type and semantics is required in the web of systems, or chaos may result.

His article was illustrated with a (different) image of a staffed switchboard to emphasize or exaggerate the dependency of the web on human operators. I believe that is in fact not Andrew Turner at the very back of this one I found on Flickr.

http://farm4.static.flickr.com/3007/2680257100_69b12c6e7d_d.jpg — Item 24092, City Light Photographic Negatives (Record Series 1204-01), Seattle Municipal Archives.

An HTML <img> element is a specialized link with very tight semantics that is often wrapped, as in the case of the very image above, by a more generalized link to a home page for the image. What the Flickr resource means to this blog post is rather underspecified by the link I'm using, but the semantics of the <img> tag need no human interpreter at all.

Let's consider what links bring to a modern web mapping application in your web browser. When you use the browser to fetch the HTML representation of a web map page, it finds among other things HTML <link> elements with rel="stylesheet" and various <script> elements. A script is a link with extra well-defined semantics. A web browser "knows" via the processing rules labeled "text/html" these semantics – that it's supposed to fetch the stylesheet resources identified by those links using HTTP GET and apply them in rendering the HTML page. Following other rules in the same "text/html" set, the browser fetches javascript files and interprets them. That code might create new <script> elements in the DOM, thereby loading, dynamically, more javascript without any human intervention. Only after this (in general) does a human enter the loop. That human uses the javascript UI to choose an area of interest, code creates <img> elements in the page's DOM (as I wrote before, an <img> is yet another specialized link), and the browser "knows" once again following others in the same set of rules that it is to fetch the imagery and render it in the page to show the user. HTML is full of links with strong semantics and non-human agents use them to great effect. In not one of those cases there did a human need to judge the semantics of a link or the type of thing it references. Non-browser web applications can exploit links in similar ways to accomplish different tasks.

The initial registry for Web Linking includes some fuzzy relation types like "payment" (indicates a resource where payment is accepted), but also sharper ones like "previous" and "next". Extension types may be as semantically fine as necessary. My feeling about a "where" link relation is that it ought to indicate a resource representing the coordinates of the link's context so that it could be used, with a gazetteer, in place of literal geometries in (for example) an Atom feed:

...
<entry>
...
<link
  rel="where"
  href="http://www.geonames.org/5577147/fort-collins.html
  />
...

In practice, the target of the link ought to come in a standard content type such as RDF/XML, GML, or KML that has well-defined geometries, or as HTML with an alternate link to a geographically-suited format.

Read the section about links in HTTP headers too: imagine turning legacy GIS data files into linked data with just a few rewrite rules.