2012 (old posts, page 4)

More IPython Notebook and Shapely

I've been using the IPython Notebook to plot geometric objects while writing new tests for Shapely.

$ ipython notebook --pylab inline
from functools import reduce
from itertools import islice
from descartes import PolygonPatch
from shapely.geometry import Point
from shapely.ops import unary_union

BLUE = '#6699cc'

def halton(base):
    """Returns an iterator over an infinite Halton sequence."""
    def value(index):
        result = 0.0
        f = 1.0/base
        i = index
        while i > 0:
            result += f * (i % base)
            i = i/base
            f = f/base
        return result
    i = 1
    while i > 0:
        yield value(i)
        i += 1

def draw(axes, item):
    """Given matplotlib axes and a geometric item, adds the item as a patch
    and returns the axes so that reduce() can accumulate more patches."""
        PolygonPatch(item, fc=BLUE, ec=BLUE, alpha=0.5, zorder=2))
    return axes

The islice function is handy: you give it a child's head and it tells you if you need to go buy insecticidal shampoo. Actually, it slices iterators, even infinite ones like the Halton sequences above. Halton sequences are pseudorandom and deterministic; I'm using them instead of better random number generators to make the Shapely tests repeatable.

# Zip together two 100-item sequences to make 100 pseudo-random points.
coords = zip(
    list(islice(halton(5), 20, 120)),
    list(islice(halton(7), 20, 120)) )

# Buffer the points to make overlapping patches.
patches = [Point(xy).buffer(0.06) for xy in coords]

# Note: with ipython's --pylab option, we've effectively imported
# all symbols from matplotlib's pylab module.
figsize(8, 8)

# Perform a left fold on the patches, applying the draw function (above)
# with the current axes as the accumulator. Aka "map rendering for hipsters."
reduce(draw, patches, gca())

Shapely's unary_union function will replace the old cascaded_union function. It operates on any type of geometry, not just polygons.

# Dissolve continuously overlapping patches with the unary_union function.
u = unary_union(patches)
print u.geom_type
print u.area

# Output:
# MultiPolygon
# 0.87333863506

# A MultiPolygon is an iterator over its polygon parts, so we can perform
# a fold on it as well.
reduce(draw, u, gca())

The notebook file is here: https://gist.github.com/3503994.

Like its predecessor, unary_union currently requires a sequence of geometric objects. I'd love to allow iterators (lazy sequences) as well.

GeoServices REST RFC

OGC Seeks Comments on GeoServices REST API Candidate Standard:

Using this API, clients, such as a web browser application, issue requests to resources on the server identified by structured Uniform Resource Locators (URLs). The server responds with map images, text-based location details or other representations of geospatial information. From a services perspective the API offers a mechanism to interact with map, image and feature services and perform geospatial analysis. This JavaScript Object Notation (JSON)-based, RESTful API is intended to make implementing servers instantly usable by developers working with widely used REST-based scripting and programming languages, as well as mobile platforms. End users can then use these developers' Web applications to discover and access the services and use them in their workflows.

Widely used REST-based scripting and programming languages? Really? As someone said to me last night, this is remarkable handwaving even for a standards organization. Has the meaning of REST diffused entirely into the void? My enthusiasm for reading the candidate standard is a little dampened, I must say, and it was soggy to begin with. This is all another reminder of how "Standards are Great, but Standardisation is a Really Bad Idea."

When it rains it pours

After a long dry spell, moisture has arrived from the tropics and the skies have been opening up. We got an inch Friday evening, an inch and a half Saturday evening, and an inch or so by 9:30 PM Sunday night with another 4 hours of light rain after that. The ditch behind my house overflowed and so did its parent, Spring Creek. My bike's bottom bracket is due some work anyway, so early this morning I peddled down the flooded trail to see what there was to see.


The cattails in the ditch add some greenery and structure to an otherwise ugly and weedy lot, and provide some nice habitat for red-winged blackbirds, but have mostly filled the channel and may have contributed a bit to the flooding here.


The pond on Spring Creek which had been disappearing was full again and the creek had flooded the bike path just west of the Centre Avenue underpass. On the dotted yellow line of the bike path, I came across this crayfish. It looked a bit battered, missing one pincer and one antenna.


Shapely 1.2.15

Shapely 1.2.5 is out: http://pypi.python.org/pypi/Shapely/1.2.15. This release is mostly concerned with helping packagers downstream. Changes:

  • Eliminate numerical sensitivity in a method chaining test (Debian bug #663210).
  • Account for cascaded union of random buffered test points being a polygon or multipolygon (Debian bug #666655).
  • Use Cython to build speedups if it is installed.
  • Avoid stumbling over SVN revision numbers in GEOS C API version strings.

There is a growing list of features in GEOS 3.3.x that Shapely isn't accessing yet and I'm looking forward to getting to these later this summer.

More field goals, fewer pratfalls

For a computer user in the humanities who doesn't develop their own tools and information systems (for all kinds of good reasons), using technology "the right way" may look like an ever-growing list of fashion prescriptions.

  • "Use MS Access or Filemaker"
  • "Use a relational database"
  • "Use TEI XML"
  • "Implement web services"
  • "Provide RSS feeds"
  • "Make a web API"
  • "Cool URIs for everything"
  • "Use RDF"
  • "Use a triple store"
  • "Use ontology X"
  • and so on ...

Of all the discussions at LAWDI, the one that's on my mind this morning is the very short one I had with Eric Kansa about what happens when linked data principles start being used as criteria for evaluating the fundworthiness of projects in classics and archaeology. It could be disruptive, and it's on me and Eric and others to make sure that we're not setting researchers up for a frustrating run at a football that is pulled away at the last moment.

Maybe the following might be useful prescriptions for we linked data evangelists.

  • Don't focus too much on counting triples.
  • Don't beat projects up about their ugly URIs.
  • Don't make openness a moral issue.
  • Let projects get easy wins from simple vocabularies and ontologies (SKOS, for example).
  • Show people what to do instead of telling people what to do.
  • Emphasize results and getting things done.

I'm sure others can think of more.

Ancient Toponym of the Week: Uri

I've been at ISAW this week teaching scholars and researchers about the web and linked data. Among other things, this has meant a lot of talk about URIs (Uniform Resource Identifiers). Pleiades has URIs for places, of course, and Brad Hafford (Director of the Penn Museum's Ur Digitization Project) revealed that it has a URI for a place named Ur or (according to the Barrington) Uri: http://pleiades.stoa.org/places/912985. According to Brad and Steve Tinney (Oracc), the full Sumerian name would have been Urim. It was commonly shorted in its day to Uri, and at some point further to Ur.

Gearing up for LAWDI

I'm beginning to work on my presentation for the upcoming Linked Ancient World Data Institute at ISAW. Here's what I'd like to accomplish in three bullets.

  • Engage attendees in thinking outside the database and thinking and talking about the architecture of the web.
  • Make a case for using HTTP URIs (aka URLs) whenever possible instead of other identifiers or addresses.
  • Talk about using links in data for doing work (using verbs) in contrast to using linked data for reasoning (with nouns).

How to turn expertly curated non-linked data (digital scholarly editions of texts, etc) into RDF is one linked data problem, the one we're most familiar with and most focused on. How to use semantic web architecture and links to initiate and curate "born-linked" data is another interesting and important set of problems – to me, at least, and I hope to be able to make it compelling to everyone else.

Pleiades remains the only classics project in the Linked Open Data cloud today (http://thedatahub.org/group/lodcloud?q=classics) and I'd also like to talk about how other projects can join it, but time may be too short for this.

Elsewhere on APIs and downloads

The big GIS industry blogs are debating one of my favorite topics: download or API?. In my posts I was considering downloads and web APIs with the same media type and data: a big bucket of GML features (for example) in a file vs spoonfuls of GML features via WFS. James Fee, instead, is considering APIs that are lossy for one reason or another (candidates: format conversion, aggregation and clustering, geometry simplification, or attribute filtering). Life's too short to listen to podcasts (that aren't This American Life), though... was there anything interesting in the Directions one?