2011

Shapely from the outside

Right before the the start of my holiday, I stumbled onto a post from a GIS blog I'd never seen before. It's an older post that compares Shapely to OGR's Python module and GeoScript. As comments are disabled on that post, I'll make a few here. At the time the post was written, Shapely 1.2.8 was released and the manual was fully mature. Greg Corradini writes:

Shapely is a little bit different than OGR and GeoScript. One big difference is that you can’t read and write to file formats with it. The idea, at least the idea that I’ve taken from it’s website, is that Shapely is a “swiss-army-knife” spatial analysis tool — it does some things really really fast and pythonic but it isn’t going to fix a flat tire. Or something like that.

Shapely is more limited than a Swiss Army knife. It's just a wrapper for the GEOS library, using idiommatic Python. It tries to do one thing – or rather, one fairly large set of related things – very well. It doesn't read or write GIS data formats, so if it's a Swiss Army knife it's one without can and bottle openers.

The coolo-neato things about Shapely include the ability to serialize/deserialize to/from data formats that implement a GeoJson-like interface and to/from numpy arrays. There’s also a couple helpful methods — linemerge and polygonize — that it wraps from the GEOS library.

I'm glad to see that somebody appreciates the GeoJSON-like Python geo interface.

These methods aren’t as easy to reach in GeoScript or OGR built with GEOS support off the proverbial ‘shelf’ because you’d have to call Java libraries in the case of GeoScript and C++ libraries using Python ctypes. Shapely makes using these methods easy, even though you could develop your own wrappers to get at them in GeoScript and OGR with GEOS support.

I hadn't thought of these functions as killer features. I suspect they're probably available in osgeo.ogr by now.

I want to say one last thing about Shapely, though it might reflect some naivete on my part about the functionality of this tool. It seems to me that you can do about 80% of what Shapely does if you build OGR against GEOS and use those bindings. Although it might be more pythonic, optimized in certain ways and syntactically simpler, I’m still unsure who the users are and why. I think the TileStache project uses this library on the backend. I’m interested to find more examples.

I used to work in a GIS shop where at least half of the work was taking some data in one obscure format, doing something relatively simple to it (reprojection, for example), and writing it out in some other arcane format. Shapely by itself is practically useless for that kind of work, and I understand that it's purpose can be baffling to the programming GIS analyst. I think I might have found it baffling before my career change.

Finally, Corradini writes:

Since Shapely can’t read or write geometries we have to lean on an existing interop library. Example 3 will use OGR for reads and writes.

Or better yet: Fiona.

It's an interesting post, and the way it weighs a few different software packages is rare these days. It also has me thinking about whether I should be benchmarking Shapely and Fiona versus GeoScript. Assuming the testit module is supported by Jython and produces directly comparable numbers (because I won't be able to benchmark my C Python modules in the same process), it should be pretty easy to do.

Comments

Re: Shapely from the outside

Author: Tom Kralidis

We use Shapely in

pycsw

to do low level geometry operations (for spatial filters) as well as manage the geometry lifecycle of metadata records. IMHO, it's a portable approach to acheiving this functionality without having to carry OGR, etc.

Fiona and matplotlib: simply plotting features

Fiona goes very well with matplotlib and descartes.

from matplotlib import pyplot
from descartes import PolygonPatch

from fiona import collection

# Set up the figure and axes.
BLUE = '#6699cc'
fig = pyplot.figure(1, figsize=(6, 6), dpi=90)
ax = fig.add_subplot(111)

# For each feature in the collection, add a patch to the axes.
with collection("docs/data/test_uk.shp", "r") as input:
    for f in input:
        ax.add_patch(
            PolygonPatch(
                f['geometry'], fc=BLUE, ec=BLUE, alpha=0.5 ))

# Should be able to get extents from the collection in a future version
# of Fiona.
ax.set_xlim(-9.25, 2.75)
ax.set_ylim(49.5, 61.5)

fig.savefig('test_uk.png')

The resulting image:

http://farm8.staticflickr.com/7005/6554613263_9bc5761f72_o_d.png

I value simplicity. The problems I work on are complex and tools with extra, incidental complexity wear me out needlessly. How about you? Fiona, unlike other geospatial software, is designed to be simple.

Comments

Re: Fiona and matplotlib: simply plotting features

Author: Reinout van Rees

Wow, this looks neat. The simplicity here and in your previous example make it feel very thought-out and practical.

I have a django project where we use shape files. Small errors in them keep bringing parts of sites down. Sending them through Fiona and using the filtered and clean geojson output should help a lot. And simplify our code a lot.

Now all I have to do is get rid of my back pain and start experimenting with this. I already sent off an email to work to look at this :-)

Re: Fiona and matplotlib: simply plotting features

Author: Sean

Well, remember that my earlier script only gets out mild stains. Geodata is about as dirty as data gets.

Fiona and Shapely: spatially cleaning features

Inspired by comments I saw today on the internet, this is how you spatially clean features using Fiona and Shapely (for small values of dirty).

import logging
import sys

from shapely.geometry import mapping, shape

from fiona import collection


logging.basicConfig(stream=sys.stderr, level=logging.INFO)

with collection("docs/data/test_uk.shp", "r") as input:
    schema = input.schema.copy()
    with collection(
            "with-shapely.shp", "w", "ESRI Shapefile", schema
            ) as output:
        for f in input:

            try:
                # Make a shapely object from the dict.
                geom = shape(f['geometry'])
                if not geom.is_valid:

                    # Use the 0-buffer polygon cleaning trick
                    clean = geom.buffer(0.0)
                    assert clean.geom_type == 'Polygon'
                    assert clean.is_valid
                    geom = clean

                # Make a dict from the shapely object.
                f['geometry'] = mapping(geom)
                output.write(f)

            except Exception, e:
                # Writing uncleanable features to a different shapefile
                # is another option.
                logging.exception("Error cleaning feature %s:", f['id'])

Makes a neat example, I think. Shapely just happens to provide this handy zero buffer feature (via GEOS), it's not required at all. Fiona just reads and writes GeoJSON-like mappings (like Python's dict), it doesn't care what you do in between or what you do it with.

Update (2011-12-25): The error I noted in the comments is fixed above.

Comments

Re: Fiona and Shapely: spatially cleaning features

Author: Sean

Oops, there's an error above. I mean to assert that the cleaned geom is valid.

Fiona's half-way mark

By my measure, Fiona is half-way to production readiness. Version 0.5 is up on PyPI: http://pypi.python.org/pypi/Fiona/0.5. It now writes mapping features to almost any of OGR's formats. Almost.

In Python, I'm not very interested in OGR's support for databases or services, and so Fiona isn't either. I'd rather use httplib2 to connect fetch any JSON or XML (KML, GML, GeoRSS) document on the web, and Python's built-in json or xml.etree modules to parse them with some help from geojson and keytree. For databases, I'll use SQLAlchemy or GeoAlchemy. Fiona is mainly about reading and writing the arcane file formats like shapefiles, GPX, etc for which there is pretty much no other access than through OGR's drivers. Fiona doesn't do OGR "data sources". It has collections of a single feature type. Think shapefile or single PostGIS table. The door isn't completely shut on databases; the first version of Fiona had the concept of a "workspace", and we could bring that back if needed.

The latest installation instructions and usage example are on Fiona's GitHub page: https://github.com/sgillies/Fiona and there's more documentation to come. A Shapely-level manual will be the cornerstone of a 1.0 release, although I expect the Fiona manual to be much smaller.

Installation

If you've got GDAL installed in a well-known location and Python 2.6, you may be able to install as easily as:

$ pip install Fiona

If, like me, you have virtualenvs with their own GDAL libs, you'll need to do something like this.

$ virtualenv .
$ source bin/activate
$ pip install -d Fiona http://pypi.python.org/packages/source/F/Fiona/Fiona-0.5.tar.gz
$ export GDAL=${PATH_TO_GDAL}
$ cd Fiona
$ python setup.py build_ext -I ${GDAL}/include -L ${GDAL}/lib install

Reading benchmark

Is Fiona anywhere as fast as osgeo.ogr? Benchmark: get all polygon-type features from a shapefile, read their attributes into a dict, get a reference to their geometry (in the OGR case) or geometry as GeoJSON (in the Fiona case), and get their unique ids. Here's the script:

import timeit
from fiona import collection
from osgeo import ogr

PATH = 'docs/data/test_uk.shp'
NAME = 'test_uk'

# Fiona
s = """
with collection(PATH, "r") as c:
    for f in c:
        id = f["id"]
"""
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import collection, PATH, NAME'
    )
print "Fiona 0.5"
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
print

# OGR
s = """
source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
    fdefn = ldefn.GetFieldDefn(n)
    schema.append((fdefn.name, fdefn.type))
layer.ResetReading()
while 1:
    feature = layer.GetNextFeature()
    if not feature:
        break
    id = feature.GetFID()
    props = {}
    for i in range(feature.GetFieldCount()):
        props[schema[i][0]] = feature.GetField(i)
    geometry = feature.GetGeometryRef()
    feature.Destroy()
source.Destroy()
"""
print "osgeo.ogr 1.7.2"
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import ogr, PATH, NAME'
    )
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)

That's just 3 statements with Fiona, 19 with osgeo.ogr. Do you like that? I like that. A lot. Next are the numbers.

(Fiona)krusty-2:Fiona seang$ python benchmark.py
Fiona 0.5
2579.40 usec/pass

osgeo.ogr 1.7.2
3355.43 usec/pass

Result: even though Fiona is doing a fair amount of extra coordinate copying, it's still faster than the Python bindings for OGR 1.7.2. Simpler + faster seems like a big win.

Writing benchmark

How about writing speed? Benchmark: open a shapefile for writing and write 50 identical (other than their local ids) point-type features to it.

import os
import timeit

from fiona import collection
from osgeo import ogr

FEATURE = {'id': '1', 'geometry': {'type': 'Point', 'coordinates': (0.0, 0.0)},
           'properties': {'label': u"Foo"}}
SCHEMA = {'geometry': 'Point', 'properties': {'label': 'str'}}

# Fiona
s = """
with collection("fiona.shp", "w", "ESRI Shapefile", SCHEMA) as c:
    for i in range(50):
        f = FEATURE.copy()
        f['id'] = str(i)
        c.write(f)
"""
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import collection, SCHEMA, FEATURE'
    )
print "Fiona 0.5"
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
print

# OGR
s = """
drv = ogr.GetDriverByName("ESRI Shapefile")
if os.path.exists('ogr.shp'):
    drv.DeleteDataSource('ogr.shp')
ds = drv.CreateDataSource("ogr.shp")
lyr = ds.CreateLayer("ogr", None, ogr.wkbPoint)
field_defn = ogr.FieldDefn("label", ogr.OFTString)
lyr.CreateField(field_defn)
for i in range(50):
    feat = ogr.Feature(lyr.GetLayerDefn())
    feat.SetField("label", u"Foo")
    pt = ogr.Geometry(ogr.wkbPoint)
    x, y = FEATURE['geometry']['coordinates']
    pt.SetPoint_2D(0, x, y)
    feat.SetGeometry(pt)
    feat.SetFID(i)
    lyr.CreateFeature(feat)
    feat.Destroy()
ds.Destroy()
"""
print "osgeo.ogr 1.7.2"
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import ogr, os, FEATURE'
    )
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)

Fiona only needs five statements compared to 18 for osgeo.ogr, and Fiona writes these features faster.

(Fiona)krusty-2:Fiona seang$ python write-benchmark.py
Fiona 0.5
4435.63 usec/pass

osgeo.ogr 1.7.2
4565.33 usec/pass

I was rather surprised by this. I've been benchmarking feature reading for a while but this is the first time I've done it for writes.

Tests

Do you like tests?

(Fiona)krusty-2:Fiona seang$ python setup.py nosetests --nologcapture --with-coverage --cover-package=fiona
running nosetests
running egg_info
writing src/Fiona.egg-info/PKG-INFO
writing top-level names to src/Fiona.egg-info/top_level.txt
writing dependency_links to src/Fiona.egg-info/dependency_links.txt
reading manifest file 'src/Fiona.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.txt' under directory 'tests'
writing manifest file 'src/Fiona.egg-info/SOURCES.txt'
running build_ext
copying build/lib.macosx-10.5-i386-2.6/fiona/ogrinit.so -> src/fiona
copying build/lib.macosx-10.5-i386-2.6/fiona/ogrext.so -> src/fiona
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
nose.plugins.cover: INFO: Coverage report will include only packages: ['fiona']
test_invalid_mode (tests.test_collection.CollectionTest) ... ok
test_no_path (tests.test_collection.CollectionTest) ... ok
test_w_args (tests.test_collection.CollectionTest) ... ok
test_append_point (tests.test_collection.ShapefileAppendTest) ... ok
test_context (tests.test_collection.ShapefileCollectionTest) ... ok
test_filter_1 (tests.test_collection.ShapefileCollectionTest) ... ok
test_io (tests.test_collection.ShapefileCollectionTest) ... ok
test_iter_list (tests.test_collection.ShapefileCollectionTest) ... ok
test_iter_one (tests.test_collection.ShapefileCollectionTest) ... ok
test_len (tests.test_collection.ShapefileCollectionTest) ... ok
test_no_write (tests.test_collection.ShapefileCollectionTest) ... ok
test_schema (tests.test_collection.ShapefileCollectionTest) ... ok
test_no_read (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_write_point (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_write_polygon (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_linestring (tests.test_feature.PointTest) ... ok
test_point (tests.test_feature.PointTest) ... ok
test_polygon (tests.test_feature.PointTest) ... ok
test (tests.test_geometry.GeometryCollectionRoundTripTest) ... ok
test (tests.test_geometry.LineStringRoundTripTest) ... ok
test_line (tests.test_geometry.LineStringTest) ... ok
test (tests.test_geometry.MultiLineStringRoundTripTest) ... ok
test_multilinestring (tests.test_geometry.MultiLineStringTest) ... ok
test (tests.test_geometry.MultiPointRoundTripTest) ... ok
test_multipoint (tests.test_geometry.MultiPointTest) ... ok
test (tests.test_geometry.MultiPolygonRoundTripTest) ... ok
test_multipolygon (tests.test_geometry.MultiPolygonTest) ... ok
test (tests.test_geometry.PointRoundTripTest) ... ok
test_point (tests.test_geometry.PointTest) ... ok
test (tests.test_geometry.PolygonRoundTripTest) ... ok
test_polygon (tests.test_geometry.PolygonTest) ... ok

Name               Stmts   Miss  Cover   Missing
------------------------------------------------
fiona                 15      0   100%
fiona.collection      56      0   100%
------------------------------------------------
TOTAL                 71      0   100%
----------------------------------------------------------------------
Ran 31 tests in 4.955s

OK

I haven't figured out how to get Fiona's OGR extension module covered, but I'd be surprised if it wasn't > 90%.

I've been telling people that Fiona's objective was 80% of osgeo.ogr's functionality at 20% of the complexity. Maybe it can have 5% better performance, too?

Comments

Re: Fiona's half-way mark

Author: NathanW

>>That's just 3 statements with Fiona, 19 with osgeo.ogr. Do you like that? I like that.

Like it? Love it! So much cleaner; less stuffing around just get in and do what you what. The less red (read: noise code) the better.

Keep up the good work! Looking at ways I can apply the same kind of API logic to QGIS, as the API is a little verbose sometimes just to do simple thing.

Any reason why you use f['id'] vs something like f.id and do some magic in __getattr__() and __setattr__()?

Re: Fiona's half-way mark

Author: Sean

Fiona originally had such GeoJSON-like objects with `id`, `geometry`, and `properties` attributes. But as I coded, I kept asking myself "this is redundant: why not just use GeoJSON-like mappings?" This question never went away, and I finally decided to ditch the classes and just use dicts. Dicts and other mappings are thoroughly documented, well-tested, and built-in. The GeoJSON format is also very well known these days and maps easily to Python dicts. I feel like I'm trading less code for more usability: win-win.

Re: Fiona's half-way mark

Author: Howard Butler

Most of your performance win is probably related to cython vs SWIG. SWIG can generate terrible Python from a performance perspective.

What's cython's story with regard to PyPy or IronPython? A Fiona based on ctypes would get access to those interpeters for practically no effort. Maybe cython's story with regard to that is the same...

What about coordinate systems? Do they live in Fiona? Should they?

Re: Fiona's half-way mark

Author: Sean

I'm not interested in IronPython anymore. For PyPy, ctypes definitely seems to be the way to go: http://codespeak.net/pypy/dist/pypy/doc/extending.html. Cython's PyPy story isn't so clear. Switching Fiona to ctypes shouldn't be that hard.

Lessons learned from Zope

Chris McDonough:

  • Most developers are very, very risk-averse. They like taking small steps, or no steps at all. You have to allow them to consume familiar technologies and allow them to disuse things that get in their way.
  • The allure of a completely integrated, monolithic system that effectively prevents the use of alternate development techniques and technologies eventually wears off. And when it does, it wears off with a vengeance.
  • The natural reaction to the problems caused by monolithic systems, which is to make anything and everything uniformly "pluggable" is also a net lose.
  • Tests are very important.
  • Documentation is even more important than tests.
  • When trying to replace a complex system, make it simpler, don't just push the complexity around.
  • Brands are important. You, ideally, must make sure that the question "what is 'noun'" have one answer. Sublety and branding do not mix; they produce a toxic haze of confusion.

My story with Zope parallels Chris's, though not as nearly as deep, and trailing by about a year and a half. I've been using his software and docs for about a decade now. For the past 4-5 years I've been trying to apply the Zope lessons above to GIS software; Shapely and Fiona are thoroughly steeped in them.

Flickr support for ancient world places

I've already written a little about the extra love Pleiades is getting from Flickr on the Pleiades blog. This post is the version for developers, and also appears on the Flickr developer blog: http://code.flickr.com/blog/2011/12/16/pleiades-a-guest-post/. I think this may be my first guest blog post ever.

Background

In August of 2010, Dan Pett and Ryan Baumann suggested that we coin Flickr machine tags in a "pleiades" namespace so that Flickr users could assert connections between their photos and places in antiquity and search for photos based on these connections. Ryan is a programmer for the University of Kentucky's Center for Visualization and Virtual Environments and collaborates with NYU and ISAW on Papyri.info. Dan works at the British Museum and is the developer of the Portable Antiquities Scheme's website: finds.org.uk. At about the same time, ISAW had launched its Flickr-hosted Ancient World Image Bank and was looking for ways to exploit these images, many of which were on the web for the first time. AWIB lead Tom Elliott, ISAW's Associate Director for Digital Programs, and AWIB Managing Editor Nate Nagy started machine tagging AWIB photos in December 2010. When Dan wrote "Now to get flickr's system to link back a la openplaques etc." in an email, we all agreed that would be quite cool, but weren't really sure how to make it happen.

As AWIB picked up steam this year, Tom blogged about the machine tags. His post was read by Dan Diffendale, who began tagging his photos of cultural objects to indicate their places of origin or discovery. In email, Tom and Dan agreed that it would be useful to distinguish between findspot and place of origin in photos of objects and to distinguish these from photos depicting the physical site of an ancient place. They resolved to use some of the predicates from the Concordia project, a collaboration between ISAW and the Center for Computing in the Humanities at King's College, London (now the Arts and Humanities Research Institute), jointly funded by the NEH and JISC. For findspots, pleiades:findspot=PID (where PID is the short key of a Pleiades place) would be used. Place of origin would be tagged by pleiades:origin=PID. A photo depicting a place would be tagged pleiades:depicts=PID. The original pleiades:place=PID tag would be for a geographic-historic but otherwise unspecified relationship between a photo and a place. Concordia's original approach was not quite RDF forced into Atom links, and was easily adapted to Flickr's "not quite RDF forced into tags" infrastructure.

I heard from Aaron Straup Cope at State of the Map (the OpenStreetMap annual meeting) in Denver that he'd seen Tom's blog post and, soon after, that it was on the radar at Flickr. OpenStreetMap machine tags (among some others) get extra love at Flickr, meaning that Flickr uses the machine tag as a key to external data shown on or used by photo pages. In the OSM case, that means structured data about ways ways and nodes, structured data that surfaces on photo pages like http://flickr.com/photos/frankieroberto/3396068360/ as "St George's House is a building in OpenStreetMap." Outside Flickr, OSM users can query the Flickr API for photos related to any particular way or node, enabling street views (for example) not as a product, but as an grassroots project. Two weeks later, to our delight, Daniel Bogan contacted Tom about giving Pleiades machine tags the same kind of treatment. He and Tom quickly came up with good short labels for our predicates and support for the Pleiades machine tags went live on Flickr in the middle of November.

The Pleiades machine tags

Pleiades mainly covers the Greek and Roman world from about 900 BC - 600 AD. It is expanding somewhat into older Egyptian, Near East and Celtic places, and more recent Byzantine and early Medieval Europe places. Every place has a URL of the form http://pleiades.stoa.org/places/$PID and it is these PID values that go in machine tags. It's quite easy to find Pleiades places through the major search engines as well as through the site's own search form.

The semantics of the tags are as follows:

pleiades:depicts=PID
The PID place (or what remains) is depicted in the photo
pleiades:findspot=PID
The PID place is where a photo subject was found
pleiades:origin=PID
The PID place is where a photo subject was produced
pleiades:where=PID
The PID place is the location of the photo subject
pleiades:place=PID
The PID place is otherwise related to the photo or its subject

At Pleiades, our immediate use for the machine tags is giving our ancient places excellent portrait photos.

On the Flickr Side

Here's how it works on the Flickr side, as seen by a user. When you coin a new, never before used on Flickr machine tag like pleiades:depicts=440947682 (as seen on AWIB's photo Tombs at El Kab by Iris Fernandez), Flickr fetches the JSON data at http://pleiades.stoa.org/places/440947682/json in which the ancient place is represented as a GeoJSON feature collection. A snippet of that JSON, fetched with curl and pretty printed with python

$ curl http://pleiades.stoa.org/places/440947682/json | python -mjson.tool

is shown here:

{
  ...
  "id": "440947682",
  "title": "El Kab",
  "type": "FeatureCollection"
}

[Gist: https://gist.github.com/1488270]

The title is extracted and used to label a link to the Pleiades place under the photo's "Additional info".

http://farm8.staticflickr.com/7161/6522002861_537ca823d4_b_d.jpg

Flickr is in this way a user of the Pleiades not-quite-an-API that I blogged about two weeks ago.

Flickr as external Pleiades editor

On the Pleiades end, we're using the Flickr website to identify and collect openly licensed photos that will serve as portraits for our ancient places. We can't control use of tags but would like some editorial control over images, so we've created a Pleiades Places group and pull portrait photos from its pool. The process goes like this:

http://farm8.staticflickr.com/7172/6522275377_bbda2a70ac_o_d.png

We're editing (in this one way) Pleiades pages entirely via Flickr. We get a kick out of this sort of thing at Pleiades. Not only do we love to see small pieces loosely joined in action, we also love not reinventing applications that already exist.

Watch the birdie

This system for acquiring portraits uses two Flickr API methods: flickr.photos.search and flickr.groups.pools.getPhotos. The guts of it is this Python class:

class RelatedFlickrJson(BrowserView):

    """Makes two Flickr API calls and writes the number of related
    photos and URLs for the most viewed related photo from the Pleiades
    Places group to JSON like

    {"portrait": {
       "url": "http://flickr.com/photos/27621672@N04/3734425631/in/pool-1876758@N22",
       "img": "http://farm3.staticflickr.com/2474/3734425631_b15979f2cd_m.jpg",
       "title": "Pont d'Ambroix by sgillies" },
     "related": {
       "url": ["http://www.flickr.com/photos/tags/pleiades:*=149492/"],
       "total": 2 }}

    for use in the Flickr Photos portlet on every Pleiades place page.
    """

    def __call__(self, **kw):
        data = {}

        pid = self.context.getId() # local id like "149492"

        # Count of related photos

        tag = "pleiades:*=" + pid

        h = httplib2.Http()
        q = dict(
            method="flickr.photos.search",
            api_key=FLICKR_API_KEY,
            machine_tags="pleiades:*=%s" % self.context.getId(),
            format="json",
            nojsoncallback=1 )

        resp, content = h.request(FLICKR_API_ENDPOINT + "?" + urlencode(q), "GET")

        if resp['status'] == "200":
            total = 0
            photos = simplejson.loads(content).get('photos')
            if photos:
                total = int(photos['total'])

            data['related'] = dict(total=total, url=FLICKR_TAGS_BASE + tag)

        # Get portrait photo from group pool

        tag = "pleiades:depicts=" + pid

        h = httplib2.Http()
        q = dict(
            method="flickr.groups.pools.getPhotos",
            api_key=FLICKR_API_KEY,
            group_id=PLEIADES_PLACES_ID,
            tags=tag,
            extras="views",
            format="json",
            nojsoncallback=1 )

        resp, content = h.request(FLICKR_API_ENDPOINT + "?" + urlencode(q), "GET")

        if resp['status'] == '200':
            total = 0
            photos = simplejson.loads(content).get('photos')
            if photos:
                total = int(photos['total'])
            if total < 1:
                data['portrait'] = None
            else:
                # Sort found photos by number of views, descending
                most_viewed = sorted(
                    photos['photo'], key=lambda p: p['views'], reverse=True )
                photo = most_viewed[0]

                title = photo['title'] + " by " + photo['ownername']
                data['portrait'] = dict(
                    title=title, img=IMG_TMPL % photo, url=PAGE_TMPL % photo )

        self.request.response.setStatus(200)
        self.request.response.setHeader('Content-Type', 'application/json')
        return simplejson.dumps(data)

[Gist: https://gist.github.com/1482469]

The same thing could be done with urllib, of course, but I'm a fan of httplib2. Javascript on Pleiades place pages asynchronously fetches data from this view and updates the DOM. The end result is a "Flickr Photos" section at the bottom right of every place page that looks (when we have a portrait) like this:

http://farm8.staticflickr.com/7012/6522002865_350997d652_o_d.jpg

We're excited about the extra love for Pleiades places and can clearly see it working. The number of places tagged pleiades:*= is rising quickly – up 50% just this week – and we've gained new portraits for many of our well-known places. I think it will be interesting to see what developers at Flickr, ISAW, or museums make of the pleiades:findspot= and pleiades:origin= tags.

Thanks

We're grateful to Flickr and Daniel Bogan for the extra love and opportunity to blog about it. Work on Pleiades is supported by the NEH and ISAW. Our machine tag predicates come from a NEH-JISC project – still bearing fruit several years later.

Yours truly, Fiona

Fiona now writes feature collections to disk. Here's a bit of code from the tests, dressed up with extra comments:

from fiona import collection
from shapely import asShape, mapping

# Open a source of features
with collection("docs/data/test_uk.shp", "r") as source:

    # Define a schema for the feature sink
    schema = input.schema.copy()
    schema['geometry'] = 'Point'

    # Open a new sink for features
    with collection(
        "test_write.shp", "w", driver="ESRI Shapefile", schema=schema
        ) as sink:

        # Process only the features intersecting a box
        for f in source.filter(bbox=(-5.0, 55.0, 0.0, 60.0)):

            # Get their centroids using Shapely
            f['geometry'] = mapping(asShape(f['geometry']).centroid)

            # Stage feature for writing
            sink.write(f)

    # The sink shapefile is written to disk when its ``with`` block ends

That's just 9 statements. Fiona isn't just about less code, it's about taking advantage of Python built-ins and idioms to shrink the API's cranial memory footprint. You already know dicts, and data are better than objects, so features are modeled as GeoJSON-like mappings. Feature schemas are mappings, too. You already know how Python file I/O works, so persisted featured collections are modeled like files. Obviousness and familiarity are what I'm going for here. If you have to call help(fiona) more than 8 times in your entire life, I'll have failed.

I still need to work on support for writing geometry types other than 'Point', coordinate reference systems and make sure it's tight memory-wise (Fiona is all C underneath). It also might be nice to let the sink collection's schema be set from the first written feature, making the above example only 7 statements. The OGR library is so loaded with features – making a simple wrapper API is almost entirely about saying no and throwing features out. And how I've thrown things out. Geometries, out. Fields, out. Features, out. Cursors, out. Layers, out. There's almost nothing left except "open file", "filter iterator", "next dict", "append dict" and "close file". It almost goes without saying that this is for minimalists only.

Update (2011-12-10): the "writing" branch of Fiona now writes polylines and polygons. Multipart geometry types coming soon.

Comments

Re: Yours truly, Fiona

Author: Nathan W

Very cool! Nice job. I like the look of this very much and how much more readable it makes OGR. I have tried a few times to use OGR from Python and while it wasn't hard it still felt very "this is a C++ API with Python on top" this makes it feel a lot more native Python.

Will keep an eye on the projects progress.

Re: Yours truly, Fiona

Author: Sean

Thanks, Nathan. Follow on GitHub if you haven't already and let me know how the library suits your own data access needs.

Does Pleiades have an API?

This is a becoming a frequently asked question, and as I work on the definitive answer for the Pleiades FAQ, I'll think out loud about it here in my blog. Does Pleiades have an API? In truth, it has a number of APIs, some good and some bad. Does it have a HTTP + JSON API like all the cool kids do? No. Well, yes, sort of.

Before I get into tl;dr territory, I'll write down one of the guiding principles of the Pleiades project:

Data is usually better than an API.

It's not that we're uncomfortable with interfaces in Pleiades. Our application is based on Zope and Plone, so you know it has all kinds of interfaces under the hood. I'm even a bit of a geek about designing nice APIs (see also Shapely, Fiona, etc). It's just that data is better ... usually.

By "data" above, I mean a document or file or sequence of bytes containing related information, in bulk. The entire text of a book, for example, is better to have than an API for fetching the N-th sentence on page M. All the coordinates of a simple feature linestring (as GeoJSON, say) are better to have than an API for getting the N-th coordinate value of the M-th vertex of a line object. Given all the data, we're not bound to a particular way of indexing and searching it and can use the tools of our choice. APIs are typically chatty, slow and pointlessly different from others in the same line of business. Subbu Allamaraju goes deep into the trouble of working with inconsistent systems in "APIs are a Pain" and with more hard earned wisdom than I have, so I won't pile on here. Data is better ... usually.

An API, and here I mean "web API", can be better in the following and probably not exhaustive list of situations:

  • Sheer mass of data making dissemination practically impossible
  • Rapidly changing data making dumps and downloads out of date
  • Desire to control access to individual data records
  • Desire to monetize data (ads, for example)
  • Desire to impose a certain point of view
  • Desire to track use

Tracking use lets us tweak the experience of users. "People who viewed record M might also be interested in record N" and the like. It doesn't have to be nefarious tracking, just nudging users into useful and mutually profitable patterns. Only one of these situations is very relevant to Pleiades and so we're not designing APIs to sort them all out like other enterprises must. The RDF and KML serializations of the entire 34,000 place Pleiades dataset are not large by modern standards and don't change very rapidly. An application (like the Pelagios Graph Explorer or GapVis) that fetched and cached them once a day could stay quite up to date. The number of Pleiades contributors is growing, but they are primarily enriching existing places; I don't expect Pleiades to ever become so large that those files couldn't be transferred in less than a minute on a good internet connection. We control access to data that's in development, yes, but the locations, names and places that pass through review into a published state are completely open access and not private to any individual user or group of users. In only one part of Pleiades are we concerned about controlling a narrative through an API: the slideshow that plays on the Pleiades home page uses an API that stumbles through the most recently modified places and progressively mixes in more randomly selected ones.

Instead of fancy APIs, then, we have boring CSV, KML, and RDF downloads. The shapefile format, by the way, is inadequate for our purposes. Information will be lost in making a shapefile from the Pleiades model (any number of locations and names per place) and we're going to let people decide for themselves what to give up if they want this. The downloads are updated daily.

Pleiades also has JSON, KML, and RDF data for any particular place. Data that is current and linked from every page (http://pleiades.stoa.org/places/422987, for example) with HTML <link> and <a> elements. It's not an API ... or is it? The map on the page about Norba gets its overlay features from those very same JSON and KML resources. Looking at it in this way, you could say we do have an API here: the web is the API. When I finally finish the Pleiades implementation of OpenSearch (with Geo extension by Andrew Turner), I can replace Plone's crufty search API with even more consistency and interoperability from The Web as API.

Pleiades doesn't need the same kind of API that Twitter or Facebook have (obviously) or that OpenStreetMap has. We simply don't have anywhere near that much data, that much churn or (in the Twitter/Facebook case) that much need to control what you access.

Comments

Re: Does Pleiades have an API?

Author: josh livni

Another reason for an API would be a desire to allow adding new data or modifying a subset of the data, using different tools than your default web ui, no?

Re: Does Pleiades have an API?

Author: Sean

Maybe ... edits change everything (so to speak), so I'll have to mull that over. There are certainly other ways to incorporate changes that don't involve web APIs: diff and patch, for example, or git.

Simple in theory

Rich Hickey's "Simple Made Easy" presentation at Strange Loop, recommended to me by my Clojure programming co-worker Hugh Cayless, is flat out awesome. "Guardrail Programming" and "Knitted Castle" are my new favorite metaphors. Hickey has a compelling theory about complexity and after watching the presentation, I feel like I can be a better advocate for simplicity. Advocate to those who like theory, at least. For others, the proof remains in the pudding, whether simple means better software.

REST, the architectural style, didn't factor into Hickey's talk at all, but is a great example of an approach that chooses simplicity over ease. REST is hard. It is. You're wrong if you've been thinking that REST is easier than SOAP or COM. Look at almost any (there are exceptions, yes) so called "REST API" and you'll see something produced by web programmers that tried to apply the REST style and either couldn't get their heads around it or gave up on it under pressure to deliver. REST is hard to understand and it can be difficult to explain its benefits to managers and customers that prioritize ease over simplicity. REST is hard, but REST is simple. It is predictable and you can reason about what you can or cannot do with it.

There's a notion in the humanities that DH (digital humanities) is undertheorized. I'm not a humanist, really, just a programmer, but I strongly disagree. Programmers in the humanities are doing a great amount of theoretical work. As well as reading Hugh's recent posts, digital humanities theorists owe themselves a look at Hickey's theory of complexity and Roy Fielding's theory of representational state transfer. The world of programming and the field of humanities programming and computer are more theorized than they appears to non-programmers.

GeoJSON wrap up

We collected 16 bullet points worth of potential proposals and 2 of these matured enough to be seriously considered for inclusion in the specification. In the end, there was no consensus for accepting them among the authors of the 1.0 document. The specification will not now be revised and will stay at 1.0.

Ellipses and circles were not accepted because authors were not all willing to add a feature that would require knowledge and parameterization of the World Geodetic System for computing new latitudes and longitudes from distances measured in meters in the most common, no-CRS GeoJSON situation. Another concern was that the proposal couldn't provide any basis for representing semi-circles or products of circles, ellipses and other GeoJSON geometry types and that since consumers would be required to approximate them as polygons in most cases, why not just make them polygons to begin with?

The Data Series proposal struck authors as too far outside the scope of describing simple geographic features and as something that wasn't precluded by the current 1.0 specification.

Work on a 1.1 version has ended for now. I did my best to keep the process short and avoid burning people out so that we may start up again when the time is right. You can follow the entire discussion and consensus making process in the GeoJSON list archive from September, through October and ending in November.