2010 (old posts, page 1)

Linking UK data

I haven't seen any links from geospatial or GIS blogs to Jeni Tennison's excellent piece about the motivation for choosing the web's architecture as the architecture for the UK's open data initiative and for choosing linked data instead of the usual "services" that make up US data initiatives.

Why?

Because linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.

Read it and check out the links to tutorials about creating linked data.

Comments

Re: Linking UK data

Author: Ian Turton

I think that is because all the UK data is just text and excel tables. The OS will give up their data when it's pried from their cold dead fingers, and don't even think about geocoding via a postcode Royal Mail are even worse!

Ian

Re: Linking UK data

Author: Sean

GIS data, rasters excepted, is also largely tabular, wouldn't you say? What's a shapefile if not a table? GML allows different structures, but is less commonly used in that way, and the RDF model is equally suited for those special complex features cases.

Speaking of the OS, it may not giving away coordinates yet, but has interesting and possibly useful linked data at http://data.ordnancesurvey.co.uk/.

Manipulimization of whatchamacallems?

I posted a link to the listing of GIS stuff in the Python Package index the other day and was reminded that there's not quite enough information there for people looking to match their requirements with software. Click through some of the links and you may find no distribution at all, or a dependency on some C library (that would be my work). The descriptions can also be a little vague, such as "Geospatial geometries, predicates, and operations". Click through the accompanying link to Shapely and you get a slightly more verbose description:

Shapely is a Python package for manipulation and analysis of 2D geospatial geometries. It is based on GEOS (http://geos.refractions.net). Shapely 1.0 is not concerned with data formats or coordinate reference systems. Responsibility for reading and writing data and projecting coordinates is left to other packages like WorldMill and pyproj. For more information, see:

  • Shapely wiki
  • Shapely manual

Shapely requires Python 2.4+.

That's a little more to go on, but assumes that you're already a GIS programmer. It's a terrible assumption to be making when you consider how wedded GIS programmers are to not-open source or not-Python platforms and how unlikely they are to be trying Shapely. The real audience is Python programmers coming from outside the GIS business and it doesn't explain to them at all why they'd want to use Shapely. That's something I'm going to remedy, starting with a little blog rambling.

Imagine a situation where you'd like to find or index a substring within another string. Is there "overlap" between the strings, and if so, what is it? Or maybe you'd like to replace certain characters in a string with others. Now imagine that you're compelled to load the text strings into a relational database to perform these operations because such string functions aren't available in any other context. No knock on the RDBMS, a tremendously useful thing, but that's a unacceptable situation.

The premise of Shapely, or one of the premises, is that Python programmers should be able to perform PostGIS type geometry operations outside of an RDBMS. Another is that Python idioms trump GIS (or Java, in this case, since the GEOS library is derived from JTS, a Java project) idioms. Shapely, in a nutshell lets you do PostGIS-y stuff outside the context of a database using idiomatic Python. I've got a ways to go yet in explaining this because I've left you needing to know what PostGIS-y stuff is.

That's the theory, but are spatial operations outside of a RDBMS all that interesting in practice? They are to the GIS community: the OGC's Web Processing Service specification was written to standardize such practice, and I know a bunch of Python programmers that are gonzo for WPS. Myself, I'm -1 on the WPS standard, but +1 on spatially smart web intermediaries. This is one of the possible classes of Shapely applications.

My toy Mush is an example of a pipe-like app that uses Shapely. Feeds go in, feeds come out. Check out this map of intersecting regions around geo-referenced photos in a Flickr feed produced using Shapely. Look, Ma: no database.

Proposed standard for web linking

The Web Linking internet draft is in final call. This means that soon we'll have a standardized registry of web link relation types, rules for extending the set of registered links, and rules for serializing links in HTTP headers and/or request and response bodies. The ID also defines what a link is:

In this specification, a link is a typed connection between two resources that are identified by IRIs [RFC3987], and is comprised of:

  • A context IRI, and
  • a link relation type (Section 4), and
  • a target IRI, and
  • optionally, target attributes.

A link can be viewed as a statement of the form "{context IRI} has a {relation type} resource at {target IRI}, which has {target attributes}."

An IRI, if you don't know, is an Internationalized Resource Indentifier, the Unicode complement to the URI. The draft uses IRI in its language, but you can read it as URI or URL without loss of meaning.

I'm not going to blog about every last call, but this one is especially interesting to me and relevant to the discussion about GIS and Web architectural styles. If you look at the header on its status page, you see that the draft is a "Proposed Standard". It would be a standard for the entire internet, not just a particular business domain. New media can standardize on it. Library systems can standardize on it. Geospatial systems can standardize on it. The proposed Web Linking standard has been the context for my writing and blogging about a where link relation which I'd like to submit for registration soon – let me know if you recognize yourself as a stakeholder and we'll do it together.

This last call comes at about the same time Ron Lake wrote the following in an article partly responding to "some people" who ask where is the web in "GeoWeb":

Some of the issues revolve around the weak typing and weak semantics of a hyperlink. In the web of documents this does not matter so much, since this is a world with a person in the loop. Get the wrong document? Check again. Much tighter specification of type and semantics is required in the web of systems, or chaos may result.

His article was illustrated with a (different) image of a staffed switchboard to emphasize or exaggerate the dependency of the web on human operators. I believe that is in fact not Andrew Turner at the very back of this one I found on Flickr.

http://farm4.static.flickr.com/3007/2680257100_69b12c6e7d_d.jpg

Item 24092, City Light Photographic Negatives (Record Series 1204-01), Seattle Municipal Archives.

An HTML <img> element is a specialized link with very tight semantics that is often wrapped, as in the case of the very image above, by a more generalized link to a home page for the image. What the Flickr resource means to this blog post is rather underspecified by the link I'm using, but the semantics of the <img> tag need no human interpreter at all.

Let's consider what links bring to a modern web mapping application in your web browser. When you use the browser to fetch the HTML representation of a web map page, it finds among other things HTML <link> elements with rel="stylesheet" and various <script> elements. A script is a link with extra well-defined semantics. A web browser "knows" via the processing rules labeled "text/html" these semantics – that it's supposed to fetch the stylesheet resources identified by those links using HTTP GET and apply them in rendering the HTML page. Following other rules in the same "text/html" set, the browser fetches javascript files and interprets them. That code might create new <script> elements in the DOM, thereby loading, dynamically, more javascript without any human intervention. Only after this (in general) does a human enter the loop. That human uses the javascript UI to choose an area of interest, code creates <img> elements in the page's DOM (as I wrote before, an <img> is yet another specialized link), and the browser "knows" once again following others in the same set of rules that it is to fetch the imagery and render it in the page to show the user. HTML is full of links with strong semantics and non-human agents use them to great effect. In not one of those cases there did a human need to judge the semantics of a link or the type of thing it references. Non-browser web applications can exploit links in similar ways to accomplish different tasks.

The initial registry for Web Linking includes some fuzzy relation types like "payment" (indicates a resource where payment is accepted), but also sharper ones like "previous" and "next". Extension types may be as semantically fine as necessary. My feeling about a "where" link relation is that it ought to indicate a resource representing the coordinates of the link's context so that it could be used, with a gazetteer, in place of literal geometries in (for example) an Atom feed:

...
<entry>
...
<link
  rel="where"
  href="http://www.geonames.org/5577147/fort-collins.html
  />
...

In practice, the target of the link ought to come in a standard content type such as RDF/XML, GML, or KML that has well-defined geometries, or as HTML with an alternate link to a geographically-suited format.

Read the section about links in HTTP headers too: imagine turning legacy GIS data files into linked data with just a few rewrite rules.

Shapely 1.2a1

Update (2010-02-18): http://sgillies.net/blog/1001/shapely-1-2b1

Update (2010-02-09): 1.2a6 (12 cumulative bug fixes) is ready at http://gispython.org/dist/Shapely-1.2a6.tar.gz.

Shapely 1.2a1 has been tagged and uploaded to http://gispython.org/dist so that people don't get it by mistake from PyPI. To install and try it out (in a virtualenv):

$ pip install http://gispython.org/dist/Shapely-1.2a1.tar.gz

or

$ easy_install http://gispython.org/dist/Shapely-1.2a1.tar.gz

You'll need a GEOS version >= 3.1.1 to try the new prepared geometry class:

>>> from shapely.geometry import Point, Polygon
>>> triangle = Polygon(((0.0, 0.0), (1.0, 1.0), (1.0, -1.0)))
>>> from shapely.prepared import prep
>>> p = prep(triangle) # pre-analyze for efficient ops
>>> p.intersects(Point(0.5, 0.5))
True

Most of the work toward 1.2 has been done by Aron Bierbaum. Other features include geometry simplification, a switch to the new reentrant functions in libgeos_c, setup script consolidation, and more tests.

Comments

Re: Shapely 1.2a1

Author: Ian Bicking

Being new to this stuff, I'm a big vague on what Shapely is for, examples would be really helpful.

Re: Shapely 1.2a1

Author: Sean

Ian, I'm excited to see you crossing over into geographic applications! Anything I can do, please let me know.

All I've got for Shapely is a manual, a wiki page, and a readme. All of these, admittedly, lack a good "motivation" section (I'm as guilty of falling for the self-evidence of GIS as anyone), which could be reduced to this: Shapely allows you to do PostGIS-y stuff with idiomatic Python outside the context of a database. I should probably just replace all the GIS professional language with exactly that in the readme and wiki.

For example, you can pull a GeoRSS feed like this from Flickr and map the intersection of regions around items within it without importing it into a database.

The code for this app I was calling Mush is at http://sgillies.net/hg/mush/file/115d942603d3/mush/overlap.py.

Re: Shapely 1.2a1

Author: Michael Weisman

He Sean,

This looks great, but I think I'm missing something about when prepared geometries should be used.

I modified some code I have that does point on polygon overlays with n points to ~90,000 polygons to use prepared geometries and "time python script.py" tells me the that it actually takes 41.493 sec/point with prepared geoms vs 37.942 sec/point with standard shapely Polygon objects. The quick sample in your post also runs slightly slower with prepared geometries (0.098 sec vs 0.095 sec). When do the benefits of using prepared geoms outweigh the cost of creating them?

Thanks,

Michael

Re: Shapely 1.2a1

Author: Sean

Michael, you're prepping a point for comparison to many polygons? No gain there, I think. The feature is designed for the other case: preparing a complex polygon for comparison to many simpler geometries. We'll make sure that there's adequate documentation of this before a final 1.2.

Re: Shapely 1.2a1

Author: Michael Weisman

I was actually prepping 90,000 polys and leaving the points as standard Point geom objects. I'll play with this some more an see what I can work out.

Thanks!

Re: Shapely 1.2a1

Author: Sean

I've uploaded a prepared geometry benchmarking script to http://trac.gispython.org/lab/wiki/ShapelyBenchmarks. In the first, not entirely unrealistic case, I'm seeing 4x faster contains.

Dotted JSON namespaces

JSON quacks a lot like a Python dict, so then why not Python (or Java) style dotted namespaces? An arbitrary bit of JSON might be safely extended with GeoJSON geometry like:

{
  ...
  "org.geojson.type": "Point",
  "org.geojson.coordinates": [0.0, 0.0]
}

Or with a feature:

{
  ...
  "org.geojson.geometry": {
    "org.geojson.type": "Point",
    "org.geojson.coordinates": [0.0, 0.0]
  }
}

The GeoJSON working group rejected XML namespaces (such as in JDIL) for 1.0. I recall that I had the most strongly expressed opinion: that GeoJSON should be distinctly JSON and not XML without angle brackets. JSON is less abstracted than XML, closer to code, and dotted namespaces seem like a win to me.

Comments

Re: Dotted JSON namespaces

Author: Tom Kralidis

Wouldn't this result in a bulkier encoding? At least with JDIL, the URIs are defined once (at the top/header) and the prefixes are shorthand refs to them.

I do agree that your proposal above is closer to code.

Re: Dotted JSON namespaces

Author: Sean

A little bulkier, yes, but it's not going to add up to a lot of extra bytes relative to the accompanying coordinates. Flatter namespaces will certainly be easier on the eyes.

Magnificent seven plus two

A question on geowanking reminded me again of Bill de hÓra's enumeration of the keys to Atom's value:

  • atom:id
  • atom:updated
  • atom:link
  • the extension rules (mustIgnore, foreign markup)
  • the date construct rules
  • the content encoding rules
  • unordered elements

He wrote:

Even if you don't like Atom (or XML for that matter), if your carrier format is going to survive on the web, you need to have addressed these 7 primitives.

For the sake of carrying geographic information on the web, we'll need two additional primitives: location and location construct rules. An item or entry should have one location that is more or less analogous to atom:updated – the current location of the entry in a particular space (Which space? We'll get to that) – not ruling out the possibility of other semantically different locations. The Atom spec says that dates will abide by RFC 3339, period. This means one calendar system (read that as "spatial reference system"), period. Likewise, there should be a single dirt simple construct for location instead of competing options. Date and time can be represented elegantly as text strings with precision that increases as the string grows to the right, but geometries aren't so neatly captured and bulky blocks of XML or JSON seem unavoidable. Of all the geodata formats on the web, KML is doing the best job here: one coordinate system and a single simple yet powerful enough representations for geometries. KML is tied to the Earth, of course, for a Sun-centered (or other) system we'd need a different media type than vnd.google-earth.kml+xml.

KML also gets most of the other primitives right. KML's content encoding rules are an utter mess, but clearly the mass of free aerial and street view imagery is more than compensatory for most web developers. Your own mileage may vary.

Comments

Re: Magnificent seven plus two

Author: Allan Doyle

1. Yikes. Somehow I've fallen off the geowanking list! Thanks for making me realize that.

2. What do you think is sufficient for location? A single point? Or a simple polygon? I'd have said bbox, but I'm guessing bbox is not good enough. Or do we need to allow for all three?

Re: Magnificent seven plus two

Author: Sean

All three of them. KML's points, lines, and polygons aren't too complicated. Multipart geometries are on the complexity borderline. If you're communicating some network event that's relevant to Hawaii, there's little to be gained by representing the archipelago as a multipolygon and excluding the interstitial water. I honestly forget what are the real use cases for multipart geometries, the ones that can't be modeled as a collection of simple parts.

Whether to use polygon, box, or point is an information design decision akin to (Atom analogy here) the decisions whether or not you include full blog text in your feed entries and whether or not you include comments in the feed. That's a choice of content scale. The concept of cartographic scale has worked very well on paper and fairly well on screen, but I'm uneasy (without fully understanding why) about introducing it in headless applications.

Re: Magnificent seven plus two

Author: Allan Doyle

Scale is one of those slippery slope things. It's not strictly necessary, it means different things to different people, and including it would open the floodgates for all sorts of other attributes. Sure, it would be nice to know whether your point refers to something the size of a street lamp or the size of a city. But not knowing that doesn't stop you from doing useful things.

WS-REST 2010

To be held at WWW 2010 in Raleigh, North Carolina next 26 April 2010 [site]:

This first edition of WS-REST, co-located with the WWW2010 conference, aims at providing an academic forum for discussing current emerging research topics centered around the application of REST, as well as advanced application scenarios for building large scale distributed systems.

It would be neat to see some papers on geospatial applications of the REST architectural style on the program.

What geo-intelligence failures?

Busy and having fun! And disgustingly glib about it. The GEOINT community either can't see or won't face its role in the past decade of bamboozlement:

Here we are on the precipice of a new decade. Where did this past decade go? They say time flies when you are having fun or really, really busy. For the GEOINT community, the past decade offered a mixture of both. Many believe, rightly so, that GEOINT came of age in this past decade. The confluence of technologies maturing, political/events (i.e. two wars) and natural disasters has created opportunities and challenges for both the private and public sector. And, as a result, we have seen both sides step up — creating solutions that are light years beyond what was happening ten years ago. And, as we have seen at the most recent GEOINT Symposiums, geospatial intelligence will continue to be the cornerstone for national defense. So, to this we would like to wish the entire GEOINT community a happy and prosperous New Years, as we head into another exciting decade.

The past decade was a slam dunk, baby!