Design gap

It's odd that Jeroen would mention this huge design gap and in the next breath invoke SDI and standards-based internet GIS (note that Jeroen, in his blog, often espouses rethinking SDI in light of the web).

GIS/SDI architecture design is to web architecture design as 18-button mouse is to "magic mouse". Buttons vs. gestures. A multitude of complex, specialized, rigid protocols vs. one dumb protocol that can channel different, complex, emergent interactions.

(Of course, it's hard to imagine that Apple is going to make it easy for mere mortals to create their own gestures, but hacks should be forthcoming.)

Itertools to the rescue

I'm waist deep in a messy data wrangling project with two preliminary conclusions: that Inkscape and the Gimp do not equal the best platform for collecting group control points for imagery, and that itertools has some great functional programming recipes.

I'm collating vertices of simple paths drawn in each application. Inkscape's SVG expresses the paths in the simplest fashion: M(ove to) X Y L(ine to) X Y .... Gimp exports them – pointlessly in this case – as curves:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
              "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">

<svg xmlns="http://www.w3.org/2000/svg"
     width="109.417in" height="36.4722in"
     viewBox="0 0 7878 2626">
  <path id="0102"
        fill="none" stroke="black" stroke-width="1"
        d="M 349.00,512.00
           C 349.00,512.00 475.00,1245.00 475.00,1245.00
             475.00,1245.00 481.00,2054.00 481.00,2054.00
             481.00,2054.00 3792.00,2108.00 3792.00,2108.00
             3792.00,2108.00 7555.00,2127.00 7555.00,2127.00
             7555.00,2127.00 7618.00,1528.00 7618.00,1528.00
             7618.00,1528.00 7738.00,588.00 7738.00,588.00
             7738.00,588.00 3743.00,550.00 3743.00,550.00" />
</svg>

It's full of duplicate vertices: couplets at the ends and triplets in the middle of the path. Itertools provides a handy recipe for getting just the unique vertices:

from itertools import groupby, imap
from operator import itemgetter

def unique_justseen(iterable, key=None):
    "List unique elements, preserving order. Remember only the element just seen."
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D
    return imap(next, imap(itemgetter(1), groupby(iterable, key)))

Without having to think hard at all, I can get a list of locally unique pairs of numbers like [(349.00, 512.00), (475.00, 1245.00), (481.00, 2054.00), ...]. If you're using Python 2.5 (or older), without the builtin next() function, you'll need to replace it with an equivalent function like lambda x: x.next(). The itemgetter function from the operator module recasts Python's sequence item accessor in functional form:

>>> itemgetter(1)('ABCDEFG')
'B'
>>> itemgetter(1,3,5)('ABCDEFG')
('B', 'D', 'F')
>>> itemgetter(slice(2,None))('ABCDEFG')
'CDEFG'

The collated output I'm after looks like -gcp 349.0 512.0 723.90418 2348.0269 0.0 ... . You can probably see where I'm going with this.

NSF funding for historical GIS

Wow, indeed (via @brettbobley):

It isn’t often the National Science Foundation funds a project from a humanities discipline with a history professor as a principal investigator, but that is exactly what happened earlier this fall when ISU history professor J.B. “Jack” Owens received an award for a project titled “Understanding social networks within complex, nonlinear systems: geographically-integrated history and dynamics GIS.” About $1.3 million of the four-year grant will go to ISU, with about $471,000 to go to the University of Oklahoma and co-principal investigator May Yuan.

That's big money for digital humanities. It's the first I've heard of their project. Good news for MapWindow, perhaps?

Can't happen here

Says Paul Ramsey in comments on Paul Bissett's great post about disruption of the data and navigation business:

Right, a black hole isn’t “evil”, but that doesn’t change the fact that it massively distorts the shape of space-time everywhere it goes, which can be a bummer for any object in its immediate neighbourhood.

Thank the gods that Digital Classicism isn't GIS. It can't happen here.

GeoJSON Data URIs

In a previous post I described how I was getting the geographic context for a web page from an alternate JSON representation. It was pointed out that this requires an extra request for what could be a fairly small bit of data. Via Sam Ruby, I see that I can head off the extra request and yet keep my links with a Data URI from RFC 2397. Instead of

<link type="application/json" href="http://example.com/where"/>

where

$ curl http://example.com/where
{"type": "Point", "coordinates": [0.0, 0.0]}

One can encode location directly in a URI using escaped GeoJSON:

<link href="data:application/json,%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D"/>

Andrew Turner came up with something like this in an old GeoJSON discussion, but none of us thought of Data URIs.

A round trip through the encoding specified in RFC 2397 is simple with Python:

>>> from urllib import quote, unquote
>>> from json import dumps, loads # or simplejson
>>> where = {'type': 'Point', 'coordinates': (0.0, 0.0)}
>>> json = dumps(where)
>>> json
'{"type": "Point", "coordinates": [0.0, 0.0]}'
>>> uri = 'data:application/json,' + quote(json)
>>> uri
'data:application/json,%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D'
>>> data = uri.split(',')[-1]
>>> data
'%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D'
>>> json = unquote(data)
>>> json
'{"type": "Point", "coordinates": [0.0, 0.0]}'
>>> where = loads(json)
>>> where
{'type': 'Point', 'coordinates': [0.0, 0.0]}

and a Data URI is similarly easy to handle with Javascript:

>>> uri = 'data:application/json,%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D'
"data:application/json,%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D"
>>> data = uri.split(',').pop()
"%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D"
>>> json = unescape(data)
"{"type": "Point", "coordinates": [0.0, 0.0]}"

The missing piece is a link relation type that specifies what such a Data URI means in the context of the document. Something like rel="where" [where-link-relation-type]:

<link rel="where" href="data:application/json,..."/>
%7B%22type%22%3A%20%22Point%22%2C%20%22coordinates%22%3A%20%5B0.0%2C%200.0%5D%7D"/>

Atom threading

Reading about Salmon this morning and its use of Atom Threading (RFC 4685) reminded me that before anybody outside of Google had ever heard about Wave, let alone place-locating Wave bots (see slide 15) that would geocode and annotate places that you mention, I'd suggested that you might arrange for an app to walk your posts and leave geographic annotation in comments using the same Atom extension. Salmon might make the arrangement unnecessary: geotagging might just come knocking.

Lessons of standardization

Stefan Tilkov tipped me off to Adam Bosworth's lessons learned in creating standards:

  1. Keep the standard as simple and stupid as possible.

  2. The data being exchanged should be human readable and easy to understand.

  3. Standards work best when they are focused.

  4. Standards should have precise encodings.

  5. Always have real implementations that are actually being used as part of design of any standard.

  6. Put in hysteresis for the unexpected.

  7. Make the spec itself free, public on the web, and include lots of simple examples on the web site.

Bosworth goes into each of these in detail, I've only reproduced the first sentence.

I feel like we achieved these well for GeoJSON. It's a simple, readable, precise, and stupid format. For example, coordinates are represented in [x, y] pairs instead of arrays of x and arrays of y like you'd implement for software optimized for performance. Stupid in that sense, but much more transparent. The lack of feature attribute schemas and feature classes also seems pretty stupid to some people, but that's fine.

GeoJSON focuses on serialization of basic GIS features. It doesn't create any new protocols. It doesn't require any new abstract models of the world. The spec is plain HTML with links so you can refer to sections when you throw the book at someone, short and sweet, has clear examples, and no bizarre click-through license agreement. Not everything in GeoJSON had a real implementation before publication, but most elements of it had several independent implementations.

Update (2009-11-10): Dale Lutz has blogged this too.

Python GIS features

How do you access the attributes and geometry of features in your favorite Python and GIS environment? Let's imagine we're interested in the state_name attribute from a US States data set and compare. I'll employ the iterable adapters I introduced in a previous post to simplify the code.

ESRI:

for row in ESRICollection(rows):
    state_name = row.state_name
    geom = row.shape

That's not bad at all, although there's potential for name collisions.

FME:

for f in FMECollection(features):
    state_name = f.getAttribute('state_name')
    geom_type = f.getGeometryType()
    geom_coords = f.getCoordinates()

Evil or not, getters and setters aren't needed in Python.

OGR (earlier than 1.5):

for f in OGRCollection(layer):
    state_name = f.GetFieldAsString('state_name')
    geom = f.GetGeometryRef()

    # Don't forget to free the memory allocated for the feature!
    f.Destroy()

Destroy!

OGR (current, also known as osgeo.ogr):

for f in layer:
    state_name = f.state_name
    geom = f.geometry()

Howard Butler has made osgeo.ogr much more peaceful.

QGIS:

for f in QGISCollection(provider):
    attribs = f.attributeMap()
    state_name = attribs['state_name'].toString()
    geom = f.geometry()

A mapping of attributes is design that I like, but it could be more user friendly.

5 different environments, 5 different ways.

Ideally, feature attributes (not to be confused with Python object attributes) are accessed through a mapping attribute to prevent collisions in the feature's own namespace, and the feature geometry is simply another attribute:

for f in features:
    # A dict has all kinds of nice built-in features, like
    attribute_items = f.properties.items() # [('state_name', 'Utah'), ...]
    assert 'state_name' in f.properties # True

    state_name = f.properties['state_name']
    geom = f.geometry

Looks a bit like GeoJSON? Yes, it does.

Why learn to program?

Why would a geographer need to learn to program? I don't see that the motivation is very different from than that of a historian. To paraphrase from William J. Turkel and Alan MacEachern's The Programming Historian:

We think that at least some [analysts] really will need to learn how to program. Think of it like learning how to cook. You may prefer fresh pasta to boxed macaroni and cheese, but if you don't want to be stuck eating the latter, you have to learn to cook or pay someone else to do it for you. Learning how to program is like learning to cook in another way: it can be a very gradual process. One day you're sitting there eating your macaroni and cheese and you decide to liven it up with a bit of Tabasco, Dijon mustard or Worcestershire sauce. Bingo! Soon you're putting grated cheddar in, too. You discover that the ingredients that you bought for one dish can be remixed to make another. You begin to linger in the spice aisle at the grocery store. People start buying you cookware. You get to the point where you're willing and able to experiment with recipes. Although few people become master chefs, many learn to cook well enough to meet their own needs.

If you don't program, your [business] process will always be at the mercy of those who do.

I've substituted analyst for historian, and business for research. There's nothing like a cooking analogy, is there?

Comments

Re: Why learn to program?

Author: Kirk

mmmm, is that spaghetti code I smell ?

Re: Why learn to program?

Author: Mick

Awesome paraphrase. Spot on, for those that have the passion to and wisdom to program. I recall the charge in the eighties. But the admin of whatever organization you're really cooking for has to contribute to the bill at the grocery store. And they often see the coal being shoveled into the oven to make SOS as good enough efficency. Too much organizational communication is somehow equivalent to too many cooks in the kitchen. They're paying for new kitchen tools that are not exploited, except to support the way their fore-fathers cooked. I guess if the heat's too hot, consider getting out of the kitchen. Support the cooks with the underemployed passion to be chefs. They're the ones shoveling coal and daydreaming of how much more efficient things could be, and how many satisfied customers they'd see. Consider the guy hanging in the dungeon in Monty Python's Life of Brian; "A great culture, the Romans", (obviously paraphrased). He dreamed of building a GIS in a target-rich environment, and kept smiling.

Re: Why learn to program?

Author: gilles

Funny, I teach programming basics to GIS students and the very first program example I gave them was actually a cooking recipe. It is quite useful to explain them what is an algorithm and that there's nothing to be afraid of.

Tomorrow is the last session of this course, I'll mention your cooking analogy.

Re: Why learn to program?

Author: Sean

I'm curious, Gilles: How soon do you introduce proper programming techniques? I've been arguing in my blog that we shouldn't teach beginners to make spaghetti code and getting some strong contrary opinions.

Iterators, again

If your favorite Python GIS environment doesn't provide iterators, you can easily implement adapters like these.

ESRI (ArcGIS 9.3):

class ESRICollection(object):

    def __init__(self, context):
        self.context = context

    def __iter__(self):
        row = self.context.next()
        while row:
            yield row
            row = self.context.next()
        raise StopIteration

More or less as done here. Usage:

>>> for f in ESRICollection(rows):
...     # work with f

QGIS:

class QGISCollection(object):

    def __init__(self, context):
        self.context = context

    def __iter__(self):
        feat = QgsFeature()
        while provider.getNextFeature(feat):
            yield feat
            feat = QgsFeature()
        raise StopIteration

Returning new objects from the iterator rather than returning the same object with new values heads off unpleasant surprises. Usage:

>>> for f in QGISCollection(provider):
...     # work with f

OGR (earlier than 1.5):

class OGRCollection(object):

    def __init__(self, context):
        self.context = context

    def __iter__(self):
      feature = layer.GetNextFeature()
      while feature:
          yield feature
          feature = layer.GetNextFeature()
      raise StopIteration

Usage:

>>> for f in OGRCollection(layer):
...     # work with f

Notice a big difference between yield and return: the former gives a value, but execution continues after the statement rather than breaking as with the latter.

Comments

Re: Iterators, again

Author: Eric Wolf

Thanks Sean! I've been wondering how to overload iteration in Python. Haven't bothered to find out on my own. Should make stupid ESRI cursor code a little cleaner.