Idiommatic Python is catching on

Sean Gillies

2012-12-17 00:00

Use of Python's map, reduce, and zip is spreading from obscure programming blogs to the geospatial enterprise: http://blog.opengeo.org/2012/12/17/hacking-with-geoscript/. On seeing this, I could not help but tent my fingers and say "excellent" in my best imitation of Mr. Burns.

Libspatialindex 1.8.0 released

Sean Gillies

2012-12-13 00:00

Good bye, LGPL. Hello, MIT (and iOS). Howard Butler writes:

I'm pleased to announce the release of libspatialindex 1.8.0. Two significant changes have been made in this release to necessitate incrementing the libtool versioning. The first exciting change is the license of libspatialindex was changed from LGPL 2.1 to MIT. This licensing change is to support the use of libspatialindex in embedded systems development situations (such as iOS) where dynamic linking to satisfy the LGPL is not possible for whatever reason. The second, more mundane change, is to add a flush() method to the IStorageManager to support saving the index in situations where you do not want to completely tear down the object. This is useful in situations where you are using libspatialindex on top of some other system and are providing your own index storage implementation.

Howard has also compiled Windows binaries: http://libspatialindex.github.com/#download.

C programming, Python programming

Sean Gillies

2012-12-11 00:00

It's said that you can write C in any programming language. I used to do this all the time – C was my first language, and I came to Python from Perl – but I am doing it much less these days and profiting from the change. Look at how I was parsing 2D GeoRSS coordinates or a GML posList into tuples back in 2005 in the 6 lines below.

# value is "33.338611 52.566592 32.537546 52.527238 31.744196 52.409555"

latlons = value.split()
coords = []
for i in range(0, len(latlons), 2):
    lat = float(latlons[i])
    lon = float(latlons[i+1])
    coords.append((lon, lat))

# coords is [(52.566592, 33.338611), ...]

Counters, increments, offsets – It's rudimentary C pointer arithmetic in Python. I recently rewrote this in a different style.

def take_tuples(itr):
    nxt = itr.next
    while True:
        lat = nxt()
        lon = nxt()
        yield lon, lat

latlons = iter(map(float, text.split()))
coords = list(take_tuples(latlons))

Python isn't a functional language, but I still enjoy and benefit from using it in a functional style. The statements in my first, older batch of code are concerned with how to extract coordinate pairs. The statements in the second express what values I want. I want the results of float mapped over all the items split from the initial value, and then I want a list of pairs of these results.

For a more simple example, If what I want is a list of integers from 1 to 10, I write this:

numbers = list(range(1, 11))

instead of this:

numbers = []
for i in range(1, 11):
    numbers.append(i)

And if what I want is integers converted to floats, I write this:

numbers = map(float, range(1, 11))

instead of this:

numbers = []
for i in range(1, 11):
    numbers.append(float(i))

The generator function take_tuples (code section 2, above) is also written in terms of what results I want. It calls next twice on an iterator and yields the pair, swapped. When it hits the end of the iterator, it raises StopIteration. This is expected and normal for functions that accept iterators, like list(). It's the what-not-how way to get things from a sequence, pairwise.

>>> def take_pairs(value):
...     itr = iter(value)
...     while True:
...         yield itr.next(), itr.next()
...
>>> list(take_pairs(range(4)))
[(0, 1), (2, 3)]
>>> list(take_pairs("abc"))
[('a', 'b')]

Now, here's the icing: my new code (section 2 above, with map and list and the generator) benchmarks 10% faster than my old C style code. Binding the itr.next to a local variable (nxt) makes a big difference. I'm eating my cake and having it, too.

Comments

Re: C programming, Python programming

Author: Victor

But the magic really happens when you get rid of the take_tuples function:

latlons = iter(map(float, text.split()))
coords = zip(latlons, latlons)

Re: C programming, Python programming

Author: Sean

Victor, that's super cool. It benchmarks 30% faster than my C style code, and it easily extends to n-tuples (say I expected X,Y,Z triples) ala

<pre>: coords = zip(*[latlons]*n) </pre> but I am swapping the order of lat and lon in take_tuples(), so zip() doesn't quite work.</p>

System Message: WARNING/2 (<string>, line 125); backlink

Inline emphasis start-string without end-string.

Re: C programming, Python programming

Author: Michal Migurski

Python does Numpy-style strides, so:

lonlats = zip(latlons[1::2], latlons[0::2])

Re: C programming, Python programming

Author: Sean

Michal, that's even 1-2% faster than Victor's (my original benchmark was a linear ring with 64 vertices, approximating a circle, but difference is the same for 30 and 100 vertices). Even though slices copy data, they are apparently (at least with Python 2.7.3) a bit faster than iterators.

Re: C programming, Python programming

Author: John Drake

"Now, here's the icing: my new code (section 2 above, with map and list and the generator) benchmarks 10% faster than my old C style code. Binding the itr.next to a local variable (nxt) makes a big difference. I'm eating my cake and having it, too." That's really interesting, I wonder why it runs so much faster. It would seem that the opposite should be true.

Topojson.py

Sean Gillies

2012-11-27 00:00

I've finally made an honest module of it. The code squats in a fork of Mike Bostock's javascript project right now (not good), but at least you can install it if you're interested.

pip install https://github.com/sgillies/topojson/archive/master.tar.gz

Comments

Re: Topojson.py

Author: Jake

Any plans to write up a python function to do the geometry to topology conversion?

Re: Topojson.py

Author: Sean

None at the moment.

TopoJSON with Python

Sean Gillies

2012-11-21 00:00

Intrigued by Mike Bostock's TopoJSON project, I've written a module of functions that convert TopoJSON objects to plain old GeoJSON objects: https://github.com/sgillies/topojson/blob/master/topojson.py. I learned a couple of new things about slicing along the way: None is the same as nothing in slicing syntax,

>>> range(4)[None:None] == range(4)[:] == [0, 1, 2, 3]
True

and expressions in slice indices are powerful things. I'm using them to truncate or reverse lists of coordinates depending on their index in another list. For example, say I have a list of words in which the last character of one word is the same as the first character of the following word, and I'd like to concatenate them while removing the duplicates where they join.

>>> words = ["abcd", "defg", "ghij"]
>>> from itertools import chain
>>> "".join(chain(*
...     [word[i > 0:] for i, word in enumerate(words)] ))
'abcdefghij'

There is implicit conversion to type int in the slice expression word[i > 0:] above: True becomes 1, False becomes 0. word[0:] is the orginal word, while word[1:] is a copy with the first character omitted.

Next, say I want to describe a particular way of concatenating the words using a list of integers, the values of which correspond to indices of the words list. I want [0, 1, 2] to come out as above, 'abcdefghij', and a list of complements like [~2, ~1, ~0] (or equivalently, [-3, -2, -1]) to come out as 'jihgfedcba'. All this takes is another slice expression that reverses words when their index < 0: [::j >= 0 or -1].

>>> def merge(word_indices):
...     return "".join(chain(*
...         [words[j if j>=0 else ~j][::j >= 0 or -1][i > 0:] \
...         for i, j in enumerate(word_indices)] ))
...
>>> merge([0,1,2])
'abcdefghij'
>>> merge([-3,-2,-1])
'jihgfedcba'

This is more or less how you merge TopoJSON arcs into GeoJSON geometry coordinates.

The program below was made for an IPython Notebook with pylab inline. A lot of the code has appeared in my recent posts. Ultimately, there's a function to project shapes (the "geoms") and a rendering function that's used to fold them into a map image.

from descartes import PolygonPatch
from functools import partial, reduce
import json
import logging
from pyproj import Proj, transform
import urllib

from topojson import geometry

# Fetch some US states TopoJSON.
data = urllib.urlopen(
    "https://raw.github.com/mbostock/topojson/master/examples/topo/us-states.json"
    ).read()
topology = json.loads(data)
scale = topology['transform']['scale']
translate = topology['transform']['translate']
arcs = topology['arcs']

# The following functions support LLC projection of shapes.

def transform_coords(func, rec):
    # Transform record geometry coordinates using the provided function.
    # Returns a record or None.
    try:
        if rec['type'] == "Polygon":
            new_coords = list(zip(*func(*zip(*ring))
                ) for ring in rec['coordinates']
        elif rec['type'] == "MultiPolygon":
            new_coords = []
            for part in rec['coordinates']:
                new_part_coords = list(zip(*func(*zip(*ring))
                    ) for ring in part)
                new_coords.append(new_part_coords)
        return {'type': rec['type'], 'coordinates': new_coords}
    except Exception, e:
        # Writing untransformed features to a different shapefile
        # is another option.
        logging.exception(
            "Error transforming record %s:", rec)
        return None

epsg4326 = Proj(init="epsg:4326", no_defs=True)
lcc = Proj(
    proj="lcc",
    lon_0=-96.0,
    lat_0=23.0,
    lat_1=23.0,
    lat_2=60.0,
    ellps="WGS84",
    datum="WGS84",
    units="m",
    no_defs=True)
transform_lcc = partial(transform, epsg4326, lcc)
lambert = partial(transform_coords, transform_lcc)

# The rendering of a single shape is done by the following.

BLUE = "#6699cc"
RED = "#cc6666"

def add_record(axes, rec, fc=BLUE, ec=BLUE, alpha=1.0):
    """Makes a patch of the record's geometry and adds it to the axes."""
    if rec['type'] == "Polygon":
        axes.add_patch(
            PolygonPatch(rec, fc=fc, ec=ec, alpha=alpha) )
    elif rec['type'] == "MultiPolygon":
        for part in rec['coordinates']:
            axes.add_patch(
                PolygonPatch(
                    {'type': "Polygon", 'coordinates': part},
                    rec, fc=fc, ec=ec, alpha=alpha ))
    axes.autoscale(tight=False)
    return axes

add_feature_alpha = partial(add_record, alpha=0.5)

# Transform the relative coordinates of geometries to absolute coordinates.
geoms = [
    geometry(
        obj,
        arcs,
        scale,
        translate ) for obj in topology['objects'][0]['geometries'] ]

# Fold all the geoms up into a nice projected map image.
figsize(12, 12)
gca().set_aspect('equal')
reduce(add_feature_alpha, map(lambert, geoms), gca())

The result:

http://farm9.staticflickr.com/8346/8205362930_e26a1e62b5_b_d.jpg

Alaska is a little strange. I'll look into it.

Update (2012-11-21): I fixed a bug with locating the arc at index 0 and Alaska now looks fine.

http://farm9.staticflickr.com/8338/8205615405_62940b17e0_b_d.jpg

Shapely Windows installers

Sean Gillies

2012-10-31 00:00

Christoph Gohlke has 5 different 32 and 64 bit Windows installers for the latest version of Shapely (1.2.16) and scores of other CPython extension package installers (GDAL, Numpy, etc) at http://www.lfd.uci.edu/%7Egohlke/pythonlibs/#shapely (Via GIS StackExchange).

Update (2012-11-01): Following a suggestion by Howard Butler, Christoph has added Rtree to the list.

Ancient Toponym of the Week: Moron

Sean Gillies

2012-10-19 00:00

The hunt for an unlocated Moron continues in Roman Lusitania.

According to Smith,

A town of Lusitania upon the Tagus, which Brutus Callaïous made his headquarters in his campaign against the Lusitanians. (Strab. iii. p.152.) Its exact site is unknown.

The Barrington suggests Chões de Alpompé, Portugal.

The right profile

Sean Gillies

2012-10-19 00:00

What's the difference between GeoJSON and plain old RFC 4627 JSON? It seems like a naïve question, but it's actually rather important. GeoJSON data is JSON data, following all the rules of RFC 4627, but has additional constraints. The most significant of these are:

GeoJSON object must have a 'type' member.
The value of that member must be one of the following: "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection", "Feature", or "FeatureCollection."
The items of a FeatureCollection will be arrayed in its 'features' member.
A Feature's representation in mathematical space is expressed in its 'geometry' member.
A geometry's coordinate values are arrayed in X (or easting), Y (or northing), Z order.

Say you're writing a HTTP client that requests JSON data from a server and processes it. How does it "know" how to find the features in the data or the geometries of those features? If there's exactly one server involved and it provides only GeoJSON data, you can have a coupled, but reliable client-server system where the way the data is processed depends on the address of the server. But if there are multiple JSON APIs involved the brittleness of this kind of system quickly increases. Just having a JSON schema isn't a silver bullet because schemata aren't standardized in RFC 4627; the question then becomes how does the client "know" where to find the schema?

I've written before that GeoJSON might need its own media type to be a good citizen of Formatland, something like application/vnd.geojson+json. GitHub (for example) does this. Lately, I'm persuaded by people like Mark Nottingham who argue that a flatter space of fewer media types is better. GeoJSON's extra constraints are only about the structure and interpretation of the data, they don't affect parsing of the data at all. An application/json parser need not skip a beat on GeoJSON and a generic application/json application can do plenty of shallow processing on it.

The profile concept from the 'profile' Link Relation Type I-D seems to suit GeoJSON well and I'm rooting strongly for this draft. If and when it is finalized, we can declare GeoJSON to be just a profile of application/json and clients can use the profile declaration to make more sense of data instead of out-of-band information or sniffing and guessing.

Profiles could also be a means for making sense of the various profiles of GeoJSON already found in the wild today. I've come up with a short list of five, apologies for any omissions.

Microsoft's OData removes one constraint from GeoJSON and adds two more:

Any GeoJSON value that is used in OData SHOULD order the keys with type first, then coordinates, then any other keys. This improves streaming parser performance when parsing values on open types or in other cases where metadata is not present.

The GeoJSON [GeoJSON] standard requires that LineString contains a minimum number of Positions in its coordinates collection. This prevents serializing certain valid geospatial values. Therefore, in the GeoJSON requirement “For type ‘LineString’, the ‘coordinates’ member must be an array of two or more positions” is replaced with the requirement “For type ‘LineString’, the ‘coordinates’ member must be an array of positions” when used in OData.

All other arrays in GeoJSON are allowed to be empty, so no change is necessary. GeoJSON does require that any LinearRing contain a minimum of four positions. That requirement still holds that LinearRings can show up only in other arrays and that those arrays can be empty.

GeoJSON allows multiple types of CRS. In OData, only one of those types is allowed. In GeoJSON in OData, a CRS MUST be a Named CRS. In addition, OGC CRS URNs are not supported. The CRS identifier MUST be an EPSG SRID legacy identifier.

Madrona's KML-ish nested feature collections are another profile.

Leaflet examples, like that for L.GeoJSON, hint at a profile where features have 'title' and 'description' properties.

Pleiades has its own profile of GeoJSON in which we add a representative point member to features, an anchor for labels and popups. The Pleiades profile also has 'title' and 'description' properties.

The 'geo' member in Twitter's 1.1 API uses a classic profile of GeoJSON, the "surely you mean latitude, longitude order!" profile.

In anticipation of the 'profile' Link Relation Type I-D getting past the IETF and IANA hurdles, I've created a strawman proposal for how GeoJSON might use it: https://gist.github.com/3853624. Please check it out, comment, or fork it if you've got your own profile of GeoJSON and are interested in expressing it using profile links.

The fading shape of alpha

Sean Gillies

2012-10-13 00:00

Surely you've seen the Flickr shapefiles public dataset and read Aaron Straup Cope's The Shape of Alpha about the why and the how of making those shapes. For some time I've been thinking that we could apply this kind of analysis to get a sense of the shape of fuzzy Pleiades places like the Mare Erythraeum, the territory of the Volcae Arecomici, or even Ptolemy's world. Not precise boundaries, which are problematic in many ways, but shapes that permit reasonably accurate queries for places bordering on the Volcae Arecomici or west of the Mare Erythraeum.

The Mare Adriaticum is our current test bed in Pleiades. We've made 9 different connections to it from precisely located islands, ports, and promontories. These play the part of the tagged photos in the Flickr story, and here they are as a GeoJSON feature collection: https://gist.github.com/3886076#file_mare_adriaticum_connections.json. My alpha shape analysis begins with a Delaunay triangulation of the space between these points. I used scipy.spatial.Delaunay to do it and benefited from the examples in this Stack Overflow post: http://stackoverflow.com/questions/6537657/python-scipy-delaunay-plotting-point-cloud.

import json
import numpy as np
from scipy.spatial import Delaunay

# JSON data from the gist above.
coordinates = [f['geometry']['coordinates'] for f in json.loads(data)]
points = np.array(coordinates)
tri = Delaunay(np.array(points))

# Rest as in SO post above.
...

The nodes and lines of this triangulation are shown to the left below. Dissolving them yields the convex hull of the points, a figure that is slightly too big for Pleiades: points we want to have on the boundary of the Adriatic Sea, Iader to the north and the tip of the Gargano Promontory to the sourth, fall inside the hull. By what criteria might we toss out the peripheral triangles of which they are nodes and obtain an alpha shape more representative of the Adriatic? Printing out the radii of each triangle's circumcircle gives us this list:

0.916234205371
21.0145482369
1.85517410861
1.79158041604
0.887999941617
0.983745009567
1.2507779965
1.01471746864
0.430582735954
6.79200819757

All of these are radii on the order of 1 except for 2 outliers. Those outliers are the narrow triangles we want to exclude from our hull. Setting a radius limit of 2 (α=0.5) via the code below removes the outlier triangles.

import math

edges = set()
edge_points = []
alpha = 0.5

# loop over triangles:
# ia, ib, ic = indices of corner points of the triangle
for ia, ib, ic in tri.vertices:
    pa = points[ia]
    pb = points[ib]
    pc = points[ic]

    # Lengths of sides of triangle
    a = math.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2)
    b = math.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2)
    c = math.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2)

    # Semiperimeter of triangle
    s = (a + b + c)/2.0

    # Area of triangle by Heron's formula
    area = math.sqrt(s*(s-a)*(s-b)*(s-c))

    circum_r = a*b*c/(4.0*area)

    # Here's the radius filter.
    if circum_r < 1.0/alpha:
        add_edge(ia, ib)
        add_edge(ib, ic)
        add_edge(ic, ia)

lines = LineCollection(edge_points)
plt.figure()
plt.title('Alpha=2.0 Delaunay triangulation')
plt.gca().add_collection(lines)
plt.plot(points[:,0], points[:,1], 'o', hold=1)

http://farm9.staticflickr.com/8464/8083837092_241559d7f2_o_d.png

http://farm9.staticflickr.com/8195/8083837188_761c01d0ed_o_d.png

I dissolved the triangles using Shapely and render the hull using PolygonPatch from descartes.

from descartes import PolygonPatch
from shapely.geometry import MultiLineString
from shapely.ops import cascaded_union, polygonize

m = MultiLineString(edge_points)
triangles = list(polygonize(m))

plt.figure()
plt.title("Alpha=2.0 Hull")
plt.gca().add_patch(PolygonPatch(cascaded_union(triangles), alpha=0.5))
plt.gca().autoscale(tight=False)
plt.plot(points[:,0], points[:,1], 'o', hold=1)
plt.show()

http://farm9.staticflickr.com/8334/8083842127_b119c195c4_o_d.png

http://farm9.staticflickr.com/8191/8083837212_2f4855be2e_o_d.png

Can you see the difference? The hull pulls down a bit on the north edge and bumps up a bit on the south edge. It may not look like much to you, but now we're rescuing the Gargano Massif from the Adriatic – a fascinating place with unique geologic formations and 3 species of native giant hamsters. Who knows how many or how large these may have been in ancient times? (Update 2012-10-15: It's unlikely that Romans encountered the giant hamsters and hedgehogs of Pliocene Gargano.)

I've put a lot of work into making Shapely play well with Numpy and Matplotlib through the Python geospatial data protocol; it's fun to experiment with them like this. That interface also makes publishing a web map of my alpha shape trivial. The shape is exported to GeoJSON with a few lines of Python

from shapely.geometry import mapping

hull_layer = dict(
    type='FeatureCollection',
    features=[dict(
        type='Feature',
        id='1',
        title='Hull (Alpha=2.0)',
        description='See http://sgillies.net/blog/1154/the-fading-shape-of-alpha',
        geometry=mapping(cascaded_union(triangles)) )])

print json.dumps(hull_layer, indent=2)

and copied to a script in http://sgillies.github.com/mare-adriaticum to be used with Leaflet. Leaflet! If the new OpenLayers is designed around first-class treatment for GeoJSON we may yet return, but for now it's all Leaflet all the time in Pleiades.

I've run out of time today for playing with and writing about ancient alpha shapes, but have list of next things to do. It's obvious that I should be using the Web Mercator projection at least and probably an equal area projection when computing the circumcircles and triangulation and also obvious that there's no globally ideal alpha value. A statistical analysis of the circumcircle radii for each shape is certainly the right way to determine the outliers, and there may be different kinds of distributions involved. Meanwhile, this alpha shape of the Adriatic gets better as we anchor it to more precisely located places. A few more at the north end and the resulting shape has bounding box and centroid properties not much different than what you might come up with from an Adriatic Sea shape derived from a world coastlines dataset.

Enumeration of the problems I see in asserting fine boundaries for unmeasurable or intrinsically fuzzy places is another TODO item.

(With apologies to Aaron Straup Cope for the title of this post.)

Feed parsing and mapping

Sean Gillies

2012-09-30 00:00

I've written that I was going to do something about the lack of GeoRSS support in feedparser. JSON seems to have eclipsed Atom and RSS at the cutting edge of things, but XML syndication formats remain a sound, minimally-coupled way to integrate systems on the web. The feedparser is still important to me and I want to see it have first-class spatial support. Today I tracked down the official GitHub repo of the current maintainer, forked it, and patched my old diff in, tests and all. The results are here: https://github.com/sgillies/feedparser/tree/georss.

Below is a bit of code that builds on my previous example of making Robinson projection maps using Pyproj, matplotlib, and no for loops. There is a function to make circular buffers around feed entry locations with a radius proportional to earthquake magnitude and one to label the locations. Each of these functions are simplified by the GeoJSON style property attached to entries by my feedparser patch. The feed I'm using is one of the last hour's quakes from the USGS.

from shapely.geometry import mapping, shape

feed = feedparser.parse(
    "http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml")

def quake(entry):
    # entries have a ``where`` item that provides the Python geospatial
    # interface and can be operated on by geospatial packages
    title = entry.get('title', "M 0.0,")
    m, val = title.split(",")[0].split(" ")
    radius = 1.0 + 4.0*float(val)
    patch = shape(entry['where']).buffer(radius)
    return {"title": title, "geometry": mapping(patch)}

def label(fig, entry):
    x, y = transform_robin(*entry['where']['coordinates'])
    fig.gca().text(x, y, entry['title'], fontsize='large')
    return fig

quakes = imap(quake, filter(lambda x: x.get('where'), feed.entries))
fig = reduce(
        partial(add_record, fc=RED, ec=RED, alpha=0.5),
        imap(robinson, quakes),
        fig )
fig = reduce(
        label,
        filter(lambda x: x.get('where'), feed.entries),
        fig )
fig.gca().set_title(feed['updated'])
fig.savefig("quake.png")

The result is below. The circular buffers are not circles in the Robinson projection.

http://farm9.staticflickr.com/8039/8037802126_7cd17b3c38_c_d.jpg

With my feedparser patch, GeoRSS feed entries become simple to operate on with Shapely, descartes, PySAL, ArcPy, or other software that understands the GeoJSON-like Python geospatial interface: https://gist.github.com/2217756. Please, like-minded people, leave a favourable comment at https://github.com/kurtmckee/feedparser/pull/9.