2012

Pleiades tags on Flickr, continued

Have you heard of Flickr, the photo sharing site with Creative Commons licensing options and a great API for access to photos and metadata? I ask because our collective attention span is so short that I wonder whether our memory hasn't been similarly impaired. Flickr has weathered harsh criticism from former employees and speculation about the number of days until Yahoo kills it, and has regained for now the fickle favor of the web. A year ago, I wrote about Flickr's extra love for Pleiades machine tags: http://sgillies.net/blog/1103/flickr-support-for-ancient-world-places. I'm going to take the opportunity of Flickr's comeback to write about what's new in tagging photos with Pleiades places.

Flickr reports almost 7000 photos with pleiades:*= tags: http://www.flickr.com/photos/tags/pleiades:*=/. Of these, over 4800 use pleiades:depicts= to indicate that they depict the site(s) of an ancient place; the ancient place, or what remains of it, is the subject of the photo. Over 1500 photos use pleiades:findspot= to indicate that the subject of the photo was found in the context of the ancient place. The pleiades:origin= tag is another interesting one and Flickr has almost 200 of these.

My favorite is the pleiades:atteststo= tag for photo subjects that bear the name of or otherwise attest to the existence of an ancient place. The queen of these (in my opinion) is Dan Diffendale's photo of an inscribed decree concerning the founding of the colony of Brea.

http://farm4.staticflickr.com/3236/3035342766_6ff49e27a2_z_d.jpg

Making the Pleiades machine tags easy to use is a challenge we haven't yet overcome. It's in the nature of these tags to be specific to a relatively small number of photos. The opaqueness of our numerical identifiers further hinders reuse; when you type "pleiades:" in the tag field on Flickr (or Aperture or whatever) the resulting list of suggestions like:

pleiades:depicts=42
pleiades:depicts=43
pleiades:depicts=44
...

isn't super helpful to mere mortals (which is why things like rdfs:label exist in more robust semantic web systems, right?). Despite the limitations, I've known that we could do better at showing to users what to do and yesterday I checked off something that has been on the todo list for a while: cut-and-paste machine tags for the Pleiades place pages.

On the right side of pages (like Palmyra's, http://pleiades.stoa.org/places/668331) you'll see section labeled "Photos" that may contain a portrait of the place and links to related Flickr photos. Below this you will now see two suggested tags:

Use this tag in Flickr to mark depictions of this place's site(s):

  pleiades:depicts=668331

or this one to mark objects found here:

  pleiades:findspot=668331

If you use these, I'd appreciate a note. The approach is very low tech (select, ctrl-c, ctrl-v) and user tracking in such detail is not in our budget. I'm looking forward to finding a bump in tagged photos soon.

http://farm8.staticflickr.com/7264/7734238866_01b994ceea_z_d.jpg

Libspatialindex 1.8.0 released

Good bye, LGPL. Hello, MIT (and iOS). Howard Butler writes:

I'm pleased to announce the release of libspatialindex 1.8.0. Two significant changes have been made in this release to necessitate incrementing the libtool versioning. The first exciting change is the license of libspatialindex was changed from LGPL 2.1 to MIT. This licensing change is to support the use of libspatialindex in embedded systems development situations (such as iOS) where dynamic linking to satisfy the LGPL is not possible for whatever reason. The second, more mundane change, is to add a flush() method to the IStorageManager to support saving the index in situations where you do not want to completely tear down the object. This is useful in situations where you are using libspatialindex on top of some other system and are providing your own index storage implementation.

Howard has also compiled Windows binaries: http://libspatialindex.github.com/#download.

C programming, Python programming

It's said that you can write C in any programming language. I used to do this all the time – C was my first language, and I came to Python from Perl – but I am doing it much less these days and profiting from the change. Look at how I was parsing 2D GeoRSS coordinates or a GML posList into tuples back in 2005 in the 6 lines below.

# value is "33.338611 52.566592 32.537546 52.527238 31.744196 52.409555"

latlons = value.split()
coords = []
for i in range(0, len(latlons), 2):
    lat = float(latlons[i])
    lon = float(latlons[i+1])
    coords.append((lon, lat))

# coords is [(52.566592, 33.338611), ...]

Counters, increments, offsets – It's rudimentary C pointer arithmetic in Python. I recently rewrote this in a different style.

def take_tuples(itr):
    nxt = itr.next
    while True:
        lat = nxt()
        lon = nxt()
        yield lon, lat

latlons = iter(map(float, text.split()))
coords = list(take_tuples(latlons))

Python isn't a functional language, but I still enjoy and benefit from using it in a functional style. The statements in my first, older batch of code are concerned with how to extract coordinate pairs. The statements in the second express what values I want. I want the results of float mapped over all the items split from the initial value, and then I want a list of pairs of these results.

For a more simple example, If what I want is a list of integers from 1 to 10, I write this:

numbers = list(range(1, 11))

instead of this:

numbers = []
for i in range(1, 11):
    numbers.append(i)

And if what I want is integers converted to floats, I write this:

numbers = map(float, range(1, 11))

instead of this:

numbers = []
for i in range(1, 11):
    numbers.append(float(i))

The generator function take_tuples (code section 2, above) is also written in terms of what results I want. It calls next twice on an iterator and yields the pair, swapped. When it hits the end of the iterator, it raises StopIteration. This is expected and normal for functions that accept iterators, like list(). It's the what-not-how way to get things from a sequence, pairwise.

>>> def take_pairs(value):
...     itr = iter(value)
...     while True:
...         yield itr.next(), itr.next()
...
>>> list(take_pairs(range(4)))
[(0, 1), (2, 3)]
>>> list(take_pairs("abc"))
[('a', 'b')]

Now, here's the icing: my new code (section 2 above, with map and list and the generator) benchmarks 10% faster than my old C style code. Binding the itr.next to a local variable (nxt) makes a big difference. I'm eating my cake and having it, too.

Comments

Re: C programming, Python programming

Author: Victor

But the magic really happens when you get rid of the take_tuples function:

latlons = iter(map(float, text.split()))
coords = zip(latlons, latlons)

Re: C programming, Python programming

Author: Sean

Victor, that's super cool. It benchmarks 30% faster than my C style code, and it easily extends to n-tuples (say I expected X,Y,Z triples) ala

System Message: WARNING/2 (<string>, line 124)

Explicit markup ends without a blank line; unexpected unindent.
<pre>

coords = zip(*[latlons]*n) </pre> but I am swapping the order of lat and lon in take_tuples(), so zip() doesn't quite work.</p>

System Message: WARNING/2 (<string>, line 125); backlink

Inline emphasis start-string without end-string.

Re: C programming, Python programming

Author: Michal Migurski

Python does Numpy-style strides, so:

lonlats = zip(latlons[1::2], latlons[0::2])

Re: C programming, Python programming

Author: Sean

Michal, that's even 1-2% faster than Victor's (my original benchmark was a linear ring with 64 vertices, approximating a circle, but difference is the same for 30 and 100 vertices). Even though slices copy data, they are apparently (at least with Python 2.7.3) a bit faster than iterators.

Re: C programming, Python programming

Author: John Drake

"Now, here's the icing: my new code (section 2 above, with map and list and the generator) benchmarks 10% faster than my old C style code. Binding the itr.next to a local variable (nxt) makes a big difference. I'm eating my cake and having it, too." That's really interesting, I wonder why it runs so much faster. It would seem that the opposite should be true.

Topojson.py

I've finally made an honest module of it. The code squats in a fork of Mike Bostock's javascript project right now (not good), but at least you can install it if you're interested.

pip install https://github.com/sgillies/topojson/archive/master.tar.gz

Comments

Re: Topojson.py

Author: Jake

Any plans to write up a python function to do the geometry to topology conversion?

Re: Topojson.py

Author: Sean

None at the moment.

TopoJSON with Python

Intrigued by Mike Bostock's TopoJSON project, I've written a module of functions that convert TopoJSON objects to plain old GeoJSON objects: https://github.com/sgillies/topojson/blob/master/topojson.py. I learned a couple of new things about slicing along the way: None is the same as nothing in slicing syntax,

>>> range(4)[None:None] == range(4)[:] == [0, 1, 2, 3]
True

and expressions in slice indices are powerful things. I'm using them to truncate or reverse lists of coordinates depending on their index in another list. For example, say I have a list of words in which the last character of one word is the same as the first character of the following word, and I'd like to concatenate them while removing the duplicates where they join.

>>> words = ["abcd", "defg", "ghij"]
>>> from itertools import chain
>>> "".join(chain(*
...     [word[i > 0:] for i, word in enumerate(words)] ))
'abcdefghij'

There is implicit conversion to type int in the slice expression word[i > 0:] above: True becomes 1, False becomes 0. word[0:] is the orginal word, while word[1:] is a copy with the first character omitted.

Next, say I want to describe a particular way of concatenating the words using a list of integers, the values of which correspond to indices of the words list. I want [0, 1, 2] to come out as above, 'abcdefghij', and a list of complements like [~2, ~1, ~0] (or equivalently, [-3, -2, -1]) to come out as 'jihgfedcba'. All this takes is another slice expression that reverses words when their index < 0: [::j >= 0 or -1].

>>> def merge(word_indices):
...     return "".join(chain(*
...         [words[j if j>=0 else ~j][::j >= 0 or -1][i > 0:] \
...         for i, j in enumerate(word_indices)] ))
...
>>> merge([0,1,2])
'abcdefghij'
>>> merge([-3,-2,-1])
'jihgfedcba'

This is more or less how you merge TopoJSON arcs into GeoJSON geometry coordinates.

The program below was made for an IPython Notebook with pylab inline. A lot of the code has appeared in my recent posts. Ultimately, there's a function to project shapes (the "geoms") and a rendering function that's used to fold them into a map image.

from descartes import PolygonPatch
from functools import partial, reduce
import json
import logging
from pyproj import Proj, transform
import urllib

from topojson import geometry

# Fetch some US states TopoJSON.
data = urllib.urlopen(
    "https://raw.github.com/mbostock/topojson/master/examples/topo/us-states.json"
    ).read()
topology = json.loads(data)
scale = topology['transform']['scale']
translate = topology['transform']['translate']
arcs = topology['arcs']

# The following functions support LLC projection of shapes.

def transform_coords(func, rec):
    # Transform record geometry coordinates using the provided function.
    # Returns a record or None.
    try:
        if rec['type'] == "Polygon":
            new_coords = list(zip(*func(*zip(*ring))
                ) for ring in rec['coordinates']
        elif rec['type'] == "MultiPolygon":
            new_coords = []
            for part in rec['coordinates']:
                new_part_coords = list(zip(*func(*zip(*ring))
                    ) for ring in part)
                new_coords.append(new_part_coords)
        return {'type': rec['type'], 'coordinates': new_coords}
    except Exception, e:
        # Writing untransformed features to a different shapefile
        # is another option.
        logging.exception(
            "Error transforming record %s:", rec)
        return None

epsg4326 = Proj(init="epsg:4326", no_defs=True)
lcc = Proj(
    proj="lcc",
    lon_0=-96.0,
    lat_0=23.0,
    lat_1=23.0,
    lat_2=60.0,
    ellps="WGS84",
    datum="WGS84",
    units="m",
    no_defs=True)
transform_lcc = partial(transform, epsg4326, lcc)
lambert = partial(transform_coords, transform_lcc)

# The rendering of a single shape is done by the following.

BLUE = "#6699cc"
RED = "#cc6666"

def add_record(axes, rec, fc=BLUE, ec=BLUE, alpha=1.0):
    """Makes a patch of the record's geometry and adds it to the axes."""
    if rec['type'] == "Polygon":
        axes.add_patch(
            PolygonPatch(rec, fc=fc, ec=ec, alpha=alpha) )
    elif rec['type'] == "MultiPolygon":
        for part in rec['coordinates']:
            axes.add_patch(
                PolygonPatch(
                    {'type': "Polygon", 'coordinates': part},
                    rec, fc=fc, ec=ec, alpha=alpha ))
    axes.autoscale(tight=False)
    return axes

add_feature_alpha = partial(add_record, alpha=0.5)

# Transform the relative coordinates of geometries to absolute coordinates.
geoms = [
    geometry(
        obj,
        arcs,
        scale,
        translate ) for obj in topology['objects'][0]['geometries'] ]

# Fold all the geoms up into a nice projected map image.
figsize(12, 12)
gca().set_aspect('equal')
reduce(add_feature_alpha, map(lambert, geoms), gca())

The result:

http://farm9.staticflickr.com/8346/8205362930_e26a1e62b5_b_d.jpg

Alaska is a little strange. I'll look into it.

Update (2012-11-21): I fixed a bug with locating the arc at index 0 and Alaska now looks fine.

http://farm9.staticflickr.com/8338/8205615405_62940b17e0_b_d.jpg

Ancient Toponym of the Week: Moron

The hunt for an unlocated Moron continues in Roman Lusitania.

According to Smith,

A town of Lusitania upon the Tagus, which Brutus Callaïous made his headquarters in his campaign against the Lusitanians. (Strab. iii. p.152.) Its exact site is unknown.

The Barrington suggests Chões de Alpompé, Portugal.

The right profile

What's the difference between GeoJSON and plain old RFC 4627 JSON? It seems like a naïve question, but it's actually rather important. GeoJSON data is JSON data, following all the rules of RFC 4627, but has additional constraints. The most significant of these are:

  • GeoJSON object must have a 'type' member.
  • The value of that member must be one of the following: "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection", "Feature", or "FeatureCollection."
  • The items of a FeatureCollection will be arrayed in its 'features' member.
  • A Feature's representation in mathematical space is expressed in its 'geometry' member.
  • A geometry's coordinate values are arrayed in X (or easting), Y (or northing), Z order.

Say you're writing a HTTP client that requests JSON data from a server and processes it. How does it "know" how to find the features in the data or the geometries of those features? If there's exactly one server involved and it provides only GeoJSON data, you can have a coupled, but reliable client-server system where the way the data is processed depends on the address of the server. But if there are multiple JSON APIs involved the brittleness of this kind of system quickly increases. Just having a JSON schema isn't a silver bullet because schemata aren't standardized in RFC 4627; the question then becomes how does the client "know" where to find the schema?

I've written before that GeoJSON might need its own media type to be a good citizen of Formatland, something like application/vnd.geojson+json. GitHub (for example) does this. Lately, I'm persuaded by people like Mark Nottingham who argue that a flatter space of fewer media types is better. GeoJSON's extra constraints are only about the structure and interpretation of the data, they don't affect parsing of the data at all. An application/json parser need not skip a beat on GeoJSON and a generic application/json application can do plenty of shallow processing on it.

The profile concept from the 'profile' Link Relation Type I-D seems to suit GeoJSON well and I'm rooting strongly for this draft. If and when it is finalized, we can declare GeoJSON to be just a profile of application/json and clients can use the profile declaration to make more sense of data instead of out-of-band information or sniffing and guessing.

Profiles could also be a means for making sense of the various profiles of GeoJSON already found in the wild today. I've come up with a short list of five, apologies for any omissions.

Microsoft's OData removes one constraint from GeoJSON and adds two more:

Any GeoJSON value that is used in OData SHOULD order the keys with type first, then coordinates, then any other keys. This improves streaming parser performance when parsing values on open types or in other cases where metadata is not present.

The GeoJSON [GeoJSON] standard requires that LineString contains a minimum number of Positions in its coordinates collection. This prevents serializing certain valid geospatial values. Therefore, in the GeoJSON requirement “For type ‘LineString’, the ‘coordinates’ member must be an array of two or more positions” is replaced with the requirement “For type ‘LineString’, the ‘coordinates’ member must be an array of positions” when used in OData.

All other arrays in GeoJSON are allowed to be empty, so no change is necessary. GeoJSON does require that any LinearRing contain a minimum of four positions. That requirement still holds that LinearRings can show up only in other arrays and that those arrays can be empty.

GeoJSON allows multiple types of CRS. In OData, only one of those types is allowed. In GeoJSON in OData, a CRS MUST be a Named CRS. In addition, OGC CRS URNs are not supported. The CRS identifier MUST be an EPSG SRID legacy identifier.

Madrona's KML-ish nested feature collections are another profile.

Leaflet examples, like that for L.GeoJSON, hint at a profile where features have 'title' and 'description' properties.

Pleiades has its own profile of GeoJSON in which we add a representative point member to features, an anchor for labels and popups. The Pleiades profile also has 'title' and 'description' properties.

The 'geo' member in Twitter's 1.1 API uses a classic profile of GeoJSON, the "surely you mean latitude, longitude order!" profile.

In anticipation of the 'profile' Link Relation Type I-D getting past the IETF and IANA hurdles, I've created a strawman proposal for how GeoJSON might use it: https://gist.github.com/3853624. Please check it out, comment, or fork it if you've got your own profile of GeoJSON and are interested in expressing it using profile links.