Coordinate reference systems for Fiona

It's important to me that Fiona work with pyproj. This is how it does.

import logging
import sys

from pyproj import Proj, transform

from fiona import collection

logging.basicConfig(stream=sys.stderr, level=logging.INFO)

with collection("docs/data/test_uk.shp", "r") as input:

    schema = input.schema.copy()

    # This is the input reference system.
    p_in = Proj(input.crs)

    with collection(
            "with-pyproj.shp", "w", "ESRI Shapefile",
            schema=schema,

            # New for 0.6: we can write coordinate reference systems.
            crs={'init': "epsg:27700"}

            ) as output:

        # This is the output reference system.
        p_out = Proj(output.crs)

        for f in input:

            try:
                assert f['geometry']['type'] == "Polygon"

                # Transform the coordinates of every ring.
                new_coords = []
                for ring in f['geometry']['coordinates']:
                    x, y = transform(p_in, p_out, *zip(*ring))
                    new_coords.append(zip(x, y))

                f['geometry']['coordinates'] = new_coords
                output.write(f)

            except Exception, e:
                # Writing untransformable features to a different shapefile
                # is another option.
                logging.exception("Error transforming feature %s:", f['id'])

As I've said before, Python's zip is the essential coordinate wrangling function.

Update (2012-01-05): after https://github.com/Toblerity/Fiona/commit/8b355678bab52f41b737383e2701f4137c2002b4 a crs takes the form of a mapping instead of a string.

Comments

Re: Coordinate reference systems for Fiona

Author: Even Rouault

Sean,

Why not

f['geometry']['coordinates'] = transform(p_in, p_out, f['geometry']['coordinates'])

or something close instead of iterating over each ring/subgeometry part ?

OGR SWIG bindings have a Geometry.Transform() ;-)

Re: Coordinate reference systems for Fiona

Author: Sean

The reason why is that pyproj's transform function is very simple and limited. It operates only on coordinate sequences (Python buffers, to be precise). It doesn't have a concept of geometry or parts or rings. But because of its simple design, you can combine it with Numpy, matplotlib, and the universe of not-GIS Python software.

Geometry.Transform() embodies a number of patterns that I'm deliberately avoiding in my work on Fiona such as excessive object orientation, mutability, and capitalized method names ;) And while it may be easier in the short run, I'm in pursuit of simpler.

More Fiona and OGR benchmarks

I think it's fair to say that Fiona makes different assumptions about how you you access and process geospatial vector data than osgeo.ogr does and that they're each suited for somewhat different tasks. I've developed a few new benchmarks to help sketch out the domains of each.

Minimum feature access

Let's say you only want the names of features within their collections. Here's the osgeo.ogr code.

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
for feature in layer:
    id = feature.GetFID()
source.Destroy()

The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-min.py):

(Fiona)krusty-2:Fiona seang$ python benchmark-min.py
Fiona 0.5
2733.99 usec/pass

osgeo.ogr 1.7.2 (minimum)
1194.01 usec/pass

As I mentioned in an earlier post, Fiona does a lot of extra copying in this case that osgeo.ogr does not. I'm certain you'd find osgeo.ogr about this much faster in the cases where you were only interested in the value of a single property of features (ignoring coordinates) or only interested in the type or bounding box of feature geometries (ignoring coordinates and properties).

Maximum feature access

Now, let's say you want the schema of the collection and for every feature, it's name, the value of every one of its property, and all coordinate values from its geometry. The Full Monty. Here's the OGR code for that:

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)

schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
    fdefn = ldefn.GetFieldDefn(n)
    schema.append((fdefn.name, fdefn.type))

for feature in layer:
    id = feature.GetFID()
    props = {}
    for i in range(feature.GetFieldCount()):
        props[schema[i][0]] = feature.GetField(i)

    coordinates = []
    for part in feature.GetGeometryRef():
        ring = []
        for i in range(part.GetPointCount()):
            xy = part.GetPoint(i)
            ring.append(xy)
        coordinates.append(ring)

source.Destroy()

Update (2012-01-03): A comment below reminds me to point out here the code you'd write to do the same with Fiona:

with collection(PATH, "r") as c:
    for f in c:
        id = f['id']
        props = f['properties']
        coordinates = f['geometry']['coordinates']

The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-max.py):

(Fiona)krusty-2:Fiona seang$ python benchmark-max.py
Fiona 0.5
2717.32 usec/pass

osgeo.ogr 1.7.2 (maximum)
8790.97 usec/pass

This is the kind of terrain for which Fiona is geared to perform. Accessing feature properties via osgeo.ogr is as slow as it can be. Fiona accesses feature properties via a C extension module, making it as fast as it can be. The story is the same for coordinate access, and because there are many geometry vertices per property in the data I'm using, the effect is magnified.

From a performance perspective, osgeo.ogr seems better suited to making small tweaks to data and Fiona better suited to pulling, bending, cutting, and hammering on all aspects of it.

Comments

Re: More Fiona and OGR benchmarks

Author: Martin Davis

The Fiona code in benchmark-min and benchmark-max seems to be the same. Is that intentional?

Anyway, your Fiona work and these benchmarks are very interesting.

The statement about "Fiona making feature access as fast as it can be" may be true for the Python/OGR/C world, but this is not a general statement of shapefile read performance. Out of curiosity I ran the same test in JEQL, and it runs in 125 ms (on a standard 2 GHz 6-yr old PC).

Re: More Fiona and OGR benchmarks

Author: Martin Davis

By the way, the JEQL script for the benchmark is:

ShapefileReader t file: "test_uk.shp";
Mem t;

JEQL has similar goals to Fiona - brevity of code combined with maximum data functionality - but using declarative SQL rather than procedural code.

Re: More Fiona and OGR benchmarks

Author: Sean

Yes, Martin, the code is the same for Fiona in both cases. There isn't a option to get just a little of the data with Fiona like there is with osgeo.ogr, it's all or nothing.

My benchmarks are in *micro* seconds (I've lazily written "μ" as "u"). 2.7 milliseconds for Fiona on this 2.8 GHz Intel Core 2 Duo. I don't know how this compares to your numbers at all since Python's timeit module (http://docs.python.org/library/timeit.html) turns off garbage collection by default to eliminate some differences between independent runs. I'll try running it with GC on.

Re: More Fiona and OGR benchmarks

Author: Sean

Enabling GC didn't make any difference in my numbers. I don't think we can meaningfully compare my testit results to yours.

Re: More Fiona and OGR benchmarks

Author: Martin Davis

2.7 microseconds! Very impressive, and I fully retract my statement about performance!

And yes, it does seem impossible to compare the numbers after all - if for no other reason than our hardware is so different.

Re: More Fiona and OGR benchmarks

Author: Sean

2700 microseconds. 2.7 milliseconds. I wonder what the best framework would be for precisely racing Python and Java code?

Station Identification

Since March 2005, this is the blog of Sean Gillies.

These are my words only and not those of my employer, coworkers, family, or friends. I have agendas, but no particular strategy. I write when I can about whatever I find interesting, and it will be more of the same this year: commercial free observations, thinking out loud, big blocks of code, and lessons learned. I published 56 posts in 2012, up from 47 in 2011, and hope to bump the quantity and quality up again in 2013. Comments are welcome, a privilege of having a modest number of always thoughtful readers, but please do read my comments policy.

The idea to post a yearly station identification comes from John Scalzi's Whatever blog.

Peviously: 2012.

Shapely from the outside

Right before the the start of my holiday, I stumbled onto a post from a GIS blog I'd never seen before. It's an older post that compares Shapely to OGR's Python module and GeoScript. As comments are disabled on that post, I'll make a few here. At the time the post was written, Shapely 1.2.8 was released and the manual was fully mature. Greg Corradini writes:

Shapely is a little bit different than OGR and GeoScript. One big difference is that you can’t read and write to file formats with it. The idea, at least the idea that I’ve taken from it’s website, is that Shapely is a “swiss-army-knife” spatial analysis tool — it does some things really really fast and pythonic but it isn’t going to fix a flat tire. Or something like that.

Shapely is more limited than a Swiss Army knife. It's just a wrapper for the GEOS library, using idiommatic Python. It tries to do one thing – or rather, one fairly large set of related things – very well. It doesn't read or write GIS data formats, so if it's a Swiss Army knife it's one without can and bottle openers.

The coolo-neato things about Shapely include the ability to serialize/deserialize to/from data formats that implement a GeoJson-like interface and to/from numpy arrays. There’s also a couple helpful methods — linemerge and polygonize — that it wraps from the GEOS library.

I'm glad to see that somebody appreciates the GeoJSON-like Python geo interface.

These methods aren’t as easy to reach in GeoScript or OGR built with GEOS support off the proverbial ‘shelf’ because you’d have to call Java libraries in the case of GeoScript and C++ libraries using Python ctypes. Shapely makes using these methods easy, even though you could develop your own wrappers to get at them in GeoScript and OGR with GEOS support.

I hadn't thought of these functions as killer features. I suspect they're probably available in osgeo.ogr by now.

I want to say one last thing about Shapely, though it might reflect some naivete on my part about the functionality of this tool. It seems to me that you can do about 80% of what Shapely does if you build OGR against GEOS and use those bindings. Although it might be more pythonic, optimized in certain ways and syntactically simpler, I’m still unsure who the users are and why. I think the TileStache project uses this library on the backend. I’m interested to find more examples.

I used to work in a GIS shop where at least half of the work was taking some data in one obscure format, doing something relatively simple to it (reprojection, for example), and writing it out in some other arcane format. Shapely by itself is practically useless for that kind of work, and I understand that it's purpose can be baffling to the programming GIS analyst. I think I might have found it baffling before my career change.

Finally, Corradini writes:

Since Shapely can’t read or write geometries we have to lean on an existing interop library. Example 3 will use OGR for reads and writes.

Or better yet: Fiona.

It's an interesting post, and the way it weighs a few different software packages is rare these days. It also has me thinking about whether I should be benchmarking Shapely and Fiona versus GeoScript. Assuming the testit module is supported by Jython and produces directly comparable numbers (because I won't be able to benchmark my C Python modules in the same process), it should be pretty easy to do.

Comments

Re: Shapely from the outside

Author: Tom Kralidis

We use Shapely in

pycsw

to do low level geometry operations (for spatial filters) as well as manage the geometry lifecycle of metadata records. IMHO, it's a portable approach to acheiving this functionality without having to carry OGR, etc.

Fiona and matplotlib: simply plotting features

Fiona goes very well with matplotlib and descartes.

from matplotlib import pyplot
from descartes import PolygonPatch

from fiona import collection

# Set up the figure and axes.
BLUE = '#6699cc'
fig = pyplot.figure(1, figsize=(6, 6), dpi=90)
ax = fig.add_subplot(111)

# For each feature in the collection, add a patch to the axes.
with collection("docs/data/test_uk.shp", "r") as input:
    for f in input:
        ax.add_patch(
            PolygonPatch(
                f['geometry'], fc=BLUE, ec=BLUE, alpha=0.5 ))

# Should be able to get extents from the collection in a future version
# of Fiona.
ax.set_xlim(-9.25, 2.75)
ax.set_ylim(49.5, 61.5)

fig.savefig('test_uk.png')

The resulting image:

http://farm8.staticflickr.com/7005/6554613263_9bc5761f72_o_d.png

I value simplicity. The problems I work on are complex and tools with extra, incidental complexity wear me out needlessly. How about you? Fiona, unlike other geospatial software, is designed to be simple.

Comments

Re: Fiona and matplotlib: simply plotting features

Author: Reinout van Rees

Wow, this looks neat. The simplicity here and in your previous example make it feel very thought-out and practical.

I have a django project where we use shape files. Small errors in them keep bringing parts of sites down. Sending them through Fiona and using the filtered and clean geojson output should help a lot. And simplify our code a lot.

Now all I have to do is get rid of my back pain and start experimenting with this. I already sent off an email to work to look at this :-)

Re: Fiona and matplotlib: simply plotting features

Author: Sean

Well, remember that my earlier script only gets out mild stains. Geodata is about as dirty as data gets.

Fiona and Shapely: spatially cleaning features

Inspired by comments I saw today on the internet, this is how you spatially clean features using Fiona and Shapely (for small values of dirty).

import logging
import sys

from shapely.geometry import mapping, shape

from fiona import collection


logging.basicConfig(stream=sys.stderr, level=logging.INFO)

with collection("docs/data/test_uk.shp", "r") as input:
    schema = input.schema.copy()
    with collection(
            "with-shapely.shp", "w", "ESRI Shapefile", schema
            ) as output:
        for f in input:

            try:
                # Make a shapely object from the dict.
                geom = shape(f['geometry'])
                if not geom.is_valid:

                    # Use the 0-buffer polygon cleaning trick
                    clean = geom.buffer(0.0)
                    assert clean.geom_type == 'Polygon'
                    assert clean.is_valid
                    geom = clean

                # Make a dict from the shapely object.
                f['geometry'] = mapping(geom)
                output.write(f)

            except Exception, e:
                # Writing uncleanable features to a different shapefile
                # is another option.
                logging.exception("Error cleaning feature %s:", f['id'])

Makes a neat example, I think. Shapely just happens to provide this handy zero buffer feature (via GEOS), it's not required at all. Fiona just reads and writes GeoJSON-like mappings (like Python's dict), it doesn't care what you do in between or what you do it with.

Update (2011-12-25): The error I noted in the comments is fixed above.

Comments

Re: Fiona and Shapely: spatially cleaning features

Author: Sean

Oops, there's an error above. I mean to assert that the cleaned geom is valid.

Fiona's half-way mark

By my measure, Fiona is half-way to production readiness. Version 0.5 is up on PyPI: http://pypi.python.org/pypi/Fiona/0.5. It now writes mapping features to almost any of OGR's formats. Almost.

In Python, I'm not very interested in OGR's support for databases or services, and so Fiona isn't either. I'd rather use httplib2 to connect fetch any JSON or XML (KML, GML, GeoRSS) document on the web, and Python's built-in json or xml.etree modules to parse them with some help from geojson and keytree. For databases, I'll use SQLAlchemy or GeoAlchemy. Fiona is mainly about reading and writing the arcane file formats like shapefiles, GPX, etc for which there is pretty much no other access than through OGR's drivers. Fiona doesn't do OGR "data sources". It has collections of a single feature type. Think shapefile or single PostGIS table. The door isn't completely shut on databases; the first version of Fiona had the concept of a "workspace", and we could bring that back if needed.

The latest installation instructions and usage example are on Fiona's GitHub page: https://github.com/sgillies/Fiona and there's more documentation to come. A Shapely-level manual will be the cornerstone of a 1.0 release, although I expect the Fiona manual to be much smaller.

Installation

If you've got GDAL installed in a well-known location and Python 2.6, you may be able to install as easily as:

$ pip install Fiona

If, like me, you have virtualenvs with their own GDAL libs, you'll need to do something like this.

$ virtualenv .
$ source bin/activate
$ pip install -d Fiona http://pypi.python.org/packages/source/F/Fiona/Fiona-0.5.tar.gz
$ export GDAL=${PATH_TO_GDAL}
$ cd Fiona
$ python setup.py build_ext -I ${GDAL}/include -L ${GDAL}/lib install

Reading benchmark

Is Fiona anywhere as fast as osgeo.ogr? Benchmark: get all polygon-type features from a shapefile, read their attributes into a dict, get a reference to their geometry (in the OGR case) or geometry as GeoJSON (in the Fiona case), and get their unique ids. Here's the script:

import timeit
from fiona import collection
from osgeo import ogr

PATH = 'docs/data/test_uk.shp'
NAME = 'test_uk'

# Fiona
s = """
with collection(PATH, "r") as c:
    for f in c:
        id = f["id"]
"""
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import collection, PATH, NAME'
    )
print "Fiona 0.5"
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
print

# OGR
s = """
source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
    fdefn = ldefn.GetFieldDefn(n)
    schema.append((fdefn.name, fdefn.type))
layer.ResetReading()
while 1:
    feature = layer.GetNextFeature()
    if not feature:
        break
    id = feature.GetFID()
    props = {}
    for i in range(feature.GetFieldCount()):
        props[schema[i][0]] = feature.GetField(i)
    geometry = feature.GetGeometryRef()
    feature.Destroy()
source.Destroy()
"""
print "osgeo.ogr 1.7.2"
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import ogr, PATH, NAME'
    )
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)

That's just 3 statements with Fiona, 19 with osgeo.ogr. Do you like that? I like that. A lot. Next are the numbers.

(Fiona)krusty-2:Fiona seang$ python benchmark.py
Fiona 0.5
2579.40 usec/pass

osgeo.ogr 1.7.2
3355.43 usec/pass

Result: even though Fiona is doing a fair amount of extra coordinate copying, it's still faster than the Python bindings for OGR 1.7.2. Simpler + faster seems like a big win.

Writing benchmark

How about writing speed? Benchmark: open a shapefile for writing and write 50 identical (other than their local ids) point-type features to it.

import os
import timeit

from fiona import collection
from osgeo import ogr

FEATURE = {'id': '1', 'geometry': {'type': 'Point', 'coordinates': (0.0, 0.0)},
           'properties': {'label': u"Foo"}}
SCHEMA = {'geometry': 'Point', 'properties': {'label': 'str'}}

# Fiona
s = """
with collection("fiona.shp", "w", "ESRI Shapefile", SCHEMA) as c:
    for i in range(50):
        f = FEATURE.copy()
        f['id'] = str(i)
        c.write(f)
"""
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import collection, SCHEMA, FEATURE'
    )
print "Fiona 0.5"
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
print

# OGR
s = """
drv = ogr.GetDriverByName("ESRI Shapefile")
if os.path.exists('ogr.shp'):
    drv.DeleteDataSource('ogr.shp')
ds = drv.CreateDataSource("ogr.shp")
lyr = ds.CreateLayer("ogr", None, ogr.wkbPoint)
field_defn = ogr.FieldDefn("label", ogr.OFTString)
lyr.CreateField(field_defn)
for i in range(50):
    feat = ogr.Feature(lyr.GetLayerDefn())
    feat.SetField("label", u"Foo")
    pt = ogr.Geometry(ogr.wkbPoint)
    x, y = FEATURE['geometry']['coordinates']
    pt.SetPoint_2D(0, x, y)
    feat.SetGeometry(pt)
    feat.SetFID(i)
    lyr.CreateFeature(feat)
    feat.Destroy()
ds.Destroy()
"""
print "osgeo.ogr 1.7.2"
t = timeit.Timer(
    stmt=s,
    setup='from __main__ import ogr, os, FEATURE'
    )
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)

Fiona only needs five statements compared to 18 for osgeo.ogr, and Fiona writes these features faster.

(Fiona)krusty-2:Fiona seang$ python write-benchmark.py
Fiona 0.5
4435.63 usec/pass

osgeo.ogr 1.7.2
4565.33 usec/pass

I was rather surprised by this. I've been benchmarking feature reading for a while but this is the first time I've done it for writes.

Tests

Do you like tests?

(Fiona)krusty-2:Fiona seang$ python setup.py nosetests --nologcapture --with-coverage --cover-package=fiona
running nosetests
running egg_info
writing src/Fiona.egg-info/PKG-INFO
writing top-level names to src/Fiona.egg-info/top_level.txt
writing dependency_links to src/Fiona.egg-info/dependency_links.txt
reading manifest file 'src/Fiona.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.txt' under directory 'tests'
writing manifest file 'src/Fiona.egg-info/SOURCES.txt'
running build_ext
copying build/lib.macosx-10.5-i386-2.6/fiona/ogrinit.so -> src/fiona
copying build/lib.macosx-10.5-i386-2.6/fiona/ogrext.so -> src/fiona
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
nose.plugins.cover: INFO: Coverage report will include only packages: ['fiona']
test_invalid_mode (tests.test_collection.CollectionTest) ... ok
test_no_path (tests.test_collection.CollectionTest) ... ok
test_w_args (tests.test_collection.CollectionTest) ... ok
test_append_point (tests.test_collection.ShapefileAppendTest) ... ok
test_context (tests.test_collection.ShapefileCollectionTest) ... ok
test_filter_1 (tests.test_collection.ShapefileCollectionTest) ... ok
test_io (tests.test_collection.ShapefileCollectionTest) ... ok
test_iter_list (tests.test_collection.ShapefileCollectionTest) ... ok
test_iter_one (tests.test_collection.ShapefileCollectionTest) ... ok
test_len (tests.test_collection.ShapefileCollectionTest) ... ok
test_no_write (tests.test_collection.ShapefileCollectionTest) ... ok
test_schema (tests.test_collection.ShapefileCollectionTest) ... ok
test_no_read (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_write_point (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_write_polygon (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_linestring (tests.test_feature.PointTest) ... ok
test_point (tests.test_feature.PointTest) ... ok
test_polygon (tests.test_feature.PointTest) ... ok
test (tests.test_geometry.GeometryCollectionRoundTripTest) ... ok
test (tests.test_geometry.LineStringRoundTripTest) ... ok
test_line (tests.test_geometry.LineStringTest) ... ok
test (tests.test_geometry.MultiLineStringRoundTripTest) ... ok
test_multilinestring (tests.test_geometry.MultiLineStringTest) ... ok
test (tests.test_geometry.MultiPointRoundTripTest) ... ok
test_multipoint (tests.test_geometry.MultiPointTest) ... ok
test (tests.test_geometry.MultiPolygonRoundTripTest) ... ok
test_multipolygon (tests.test_geometry.MultiPolygonTest) ... ok
test (tests.test_geometry.PointRoundTripTest) ... ok
test_point (tests.test_geometry.PointTest) ... ok
test (tests.test_geometry.PolygonRoundTripTest) ... ok
test_polygon (tests.test_geometry.PolygonTest) ... ok

Name               Stmts   Miss  Cover   Missing
------------------------------------------------
fiona                 15      0   100%
fiona.collection      56      0   100%
------------------------------------------------
TOTAL                 71      0   100%
----------------------------------------------------------------------
Ran 31 tests in 4.955s

OK

I haven't figured out how to get Fiona's OGR extension module covered, but I'd be surprised if it wasn't > 90%.

I've been telling people that Fiona's objective was 80% of osgeo.ogr's functionality at 20% of the complexity. Maybe it can have 5% better performance, too?

Comments

Re: Fiona's half-way mark

Author: NathanW

>>That's just 3 statements with Fiona, 19 with osgeo.ogr. Do you like that? I like that.

Like it? Love it! So much cleaner; less stuffing around just get in and do what you what. The less red (read: noise code) the better.

Keep up the good work! Looking at ways I can apply the same kind of API logic to QGIS, as the API is a little verbose sometimes just to do simple thing.

Any reason why you use f['id'] vs something like f.id and do some magic in __getattr__() and __setattr__()?

Re: Fiona's half-way mark

Author: Sean

Fiona originally had such GeoJSON-like objects with `id`, `geometry`, and `properties` attributes. But as I coded, I kept asking myself "this is redundant: why not just use GeoJSON-like mappings?" This question never went away, and I finally decided to ditch the classes and just use dicts. Dicts and other mappings are thoroughly documented, well-tested, and built-in. The GeoJSON format is also very well known these days and maps easily to Python dicts. I feel like I'm trading less code for more usability: win-win.

Re: Fiona's half-way mark

Author: Howard Butler

Most of your performance win is probably related to cython vs SWIG. SWIG can generate terrible Python from a performance perspective.

What's cython's story with regard to PyPy or IronPython? A Fiona based on ctypes would get access to those interpeters for practically no effort. Maybe cython's story with regard to that is the same...

What about coordinate systems? Do they live in Fiona? Should they?

Re: Fiona's half-way mark

Author: Sean

I'm not interested in IronPython anymore. For PyPy, ctypes definitely seems to be the way to go: http://codespeak.net/pypy/dist/pypy/doc/extending.html. Cython's PyPy story isn't so clear. Switching Fiona to ctypes shouldn't be that hard.

Lessons learned from Zope

Chris McDonough:

  • Most developers are very, very risk-averse. They like taking small steps, or no steps at all. You have to allow them to consume familiar technologies and allow them to disuse things that get in their way.
  • The allure of a completely integrated, monolithic system that effectively prevents the use of alternate development techniques and technologies eventually wears off. And when it does, it wears off with a vengeance.
  • The natural reaction to the problems caused by monolithic systems, which is to make anything and everything uniformly "pluggable" is also a net lose.
  • Tests are very important.
  • Documentation is even more important than tests.
  • When trying to replace a complex system, make it simpler, don't just push the complexity around.
  • Brands are important. You, ideally, must make sure that the question "what is 'noun'" have one answer. Sublety and branding do not mix; they produce a toxic haze of confusion.

My story with Zope parallels Chris's, though not as nearly as deep, and trailing by about a year and a half. I've been using his software and docs for about a decade now. For the past 4-5 years I've been trying to apply the Zope lessons above to GIS software; Shapely and Fiona are thoroughly steeped in them.

Flickr support for ancient world places

I've already written a little about the extra love Pleiades is getting from Flickr on the Pleiades blog. This post is the version for developers, and also appears on the Flickr developer blog: http://code.flickr.com/blog/2011/12/16/pleiades-a-guest-post/. I think this may be my first guest blog post ever.

Background

In August of 2010, Dan Pett and Ryan Baumann suggested that we coin Flickr machine tags in a "pleiades" namespace so that Flickr users could assert connections between their photos and places in antiquity and search for photos based on these connections. Ryan is a programmer for the University of Kentucky's Center for Visualization and Virtual Environments and collaborates with NYU and ISAW on Papyri.info. Dan works at the British Museum and is the developer of the Portable Antiquities Scheme's website: finds.org.uk. At about the same time, ISAW had launched its Flickr-hosted Ancient World Image Bank and was looking for ways to exploit these images, many of which were on the web for the first time. AWIB lead Tom Elliott, ISAW's Associate Director for Digital Programs, and AWIB Managing Editor Nate Nagy started machine tagging AWIB photos in December 2010. When Dan wrote "Now to get flickr's system to link back a la openplaques etc." in an email, we all agreed that would be quite cool, but weren't really sure how to make it happen.

As AWIB picked up steam this year, Tom blogged about the machine tags. His post was read by Dan Diffendale, who began tagging his photos of cultural objects to indicate their places of origin or discovery. In email, Tom and Dan agreed that it would be useful to distinguish between findspot and place of origin in photos of objects and to distinguish these from photos depicting the physical site of an ancient place. They resolved to use some of the predicates from the Concordia project, a collaboration between ISAW and the Center for Computing in the Humanities at King's College, London (now the Arts and Humanities Research Institute), jointly funded by the NEH and JISC. For findspots, pleiades:findspot=PID (where PID is the short key of a Pleiades place) would be used. Place of origin would be tagged by pleiades:origin=PID. A photo depicting a place would be tagged pleiades:depicts=PID. The original pleiades:place=PID tag would be for a geographic-historic but otherwise unspecified relationship between a photo and a place. Concordia's original approach was not quite RDF forced into Atom links, and was easily adapted to Flickr's "not quite RDF forced into tags" infrastructure.

I heard from Aaron Straup Cope at State of the Map (the OpenStreetMap annual meeting) in Denver that he'd seen Tom's blog post and, soon after, that it was on the radar at Flickr. OpenStreetMap machine tags (among some others) get extra love at Flickr, meaning that Flickr uses the machine tag as a key to external data shown on or used by photo pages. In the OSM case, that means structured data about ways ways and nodes, structured data that surfaces on photo pages like http://flickr.com/photos/frankieroberto/3396068360/ as "St George's House is a building in OpenStreetMap." Outside Flickr, OSM users can query the Flickr API for photos related to any particular way or node, enabling street views (for example) not as a product, but as an grassroots project. Two weeks later, to our delight, Daniel Bogan contacted Tom about giving Pleiades machine tags the same kind of treatment. He and Tom quickly came up with good short labels for our predicates and support for the Pleiades machine tags went live on Flickr in the middle of November.

The Pleiades machine tags

Pleiades mainly covers the Greek and Roman world from about 900 BC - 600 AD. It is expanding somewhat into older Egyptian, Near East and Celtic places, and more recent Byzantine and early Medieval Europe places. Every place has a URL of the form http://pleiades.stoa.org/places/$PID and it is these PID values that go in machine tags. It's quite easy to find Pleiades places through the major search engines as well as through the site's own search form.

The semantics of the tags are as follows:

pleiades:depicts=PID
The PID place (or what remains) is depicted in the photo
pleiades:findspot=PID
The PID place is where a photo subject was found
pleiades:origin=PID
The PID place is where a photo subject was produced
pleiades:where=PID
The PID place is the location of the photo subject
pleiades:place=PID
The PID place is otherwise related to the photo or its subject

At Pleiades, our immediate use for the machine tags is giving our ancient places excellent portrait photos.

On the Flickr Side

Here's how it works on the Flickr side, as seen by a user. When you coin a new, never before used on Flickr machine tag like pleiades:depicts=440947682 (as seen on AWIB's photo Tombs at El Kab by Iris Fernandez), Flickr fetches the JSON data at http://pleiades.stoa.org/places/440947682/json in which the ancient place is represented as a GeoJSON feature collection. A snippet of that JSON, fetched with curl and pretty printed with python

$ curl http://pleiades.stoa.org/places/440947682/json | python -mjson.tool

is shown here:

{
  ...
  "id": "440947682",
  "title": "El Kab",
  "type": "FeatureCollection"
}

[Gist: https://gist.github.com/1488270]

The title is extracted and used to label a link to the Pleiades place under the photo's "Additional info".

http://farm8.staticflickr.com/7161/6522002861_537ca823d4_b_d.jpg

Flickr is in this way a user of the Pleiades not-quite-an-API that I blogged about two weeks ago.

Flickr as external Pleiades editor

On the Pleiades end, we're using the Flickr website to identify and collect openly licensed photos that will serve as portraits for our ancient places. We can't control use of tags but would like some editorial control over images, so we've created a Pleiades Places group and pull portrait photos from its pool. The process goes like this:

http://farm8.staticflickr.com/7172/6522275377_bbda2a70ac_o_d.png

We're editing (in this one way) Pleiades pages entirely via Flickr. We get a kick out of this sort of thing at Pleiades. Not only do we love to see small pieces loosely joined in action, we also love not reinventing applications that already exist.

Watch the birdie

This system for acquiring portraits uses two Flickr API methods: flickr.photos.search and flickr.groups.pools.getPhotos. The guts of it is this Python class:

class RelatedFlickrJson(BrowserView):

    """Makes two Flickr API calls and writes the number of related
    photos and URLs for the most viewed related photo from the Pleiades
    Places group to JSON like

    {"portrait": {
       "url": "http://flickr.com/photos/27621672@N04/3734425631/in/pool-1876758@N22",
       "img": "http://farm3.staticflickr.com/2474/3734425631_b15979f2cd_m.jpg",
       "title": "Pont d'Ambroix by sgillies" },
     "related": {
       "url": ["http://www.flickr.com/photos/tags/pleiades:*=149492/"],
       "total": 2 }}

    for use in the Flickr Photos portlet on every Pleiades place page.
    """

    def __call__(self, **kw):
        data = {}

        pid = self.context.getId() # local id like "149492"

        # Count of related photos

        tag = "pleiades:*=" + pid

        h = httplib2.Http()
        q = dict(
            method="flickr.photos.search",
            api_key=FLICKR_API_KEY,
            machine_tags="pleiades:*=%s" % self.context.getId(),
            format="json",
            nojsoncallback=1 )

        resp, content = h.request(FLICKR_API_ENDPOINT + "?" + urlencode(q), "GET")

        if resp['status'] == "200":
            total = 0
            photos = simplejson.loads(content).get('photos')
            if photos:
                total = int(photos['total'])

            data['related'] = dict(total=total, url=FLICKR_TAGS_BASE + tag)

        # Get portrait photo from group pool

        tag = "pleiades:depicts=" + pid

        h = httplib2.Http()
        q = dict(
            method="flickr.groups.pools.getPhotos",
            api_key=FLICKR_API_KEY,
            group_id=PLEIADES_PLACES_ID,
            tags=tag,
            extras="views",
            format="json",
            nojsoncallback=1 )

        resp, content = h.request(FLICKR_API_ENDPOINT + "?" + urlencode(q), "GET")

        if resp['status'] == '200':
            total = 0
            photos = simplejson.loads(content).get('photos')
            if photos:
                total = int(photos['total'])
            if total < 1:
                data['portrait'] = None
            else:
                # Sort found photos by number of views, descending
                most_viewed = sorted(
                    photos['photo'], key=lambda p: p['views'], reverse=True )
                photo = most_viewed[0]

                title = photo['title'] + " by " + photo['ownername']
                data['portrait'] = dict(
                    title=title, img=IMG_TMPL % photo, url=PAGE_TMPL % photo )

        self.request.response.setStatus(200)
        self.request.response.setHeader('Content-Type', 'application/json')
        return simplejson.dumps(data)

[Gist: https://gist.github.com/1482469]

The same thing could be done with urllib, of course, but I'm a fan of httplib2. Javascript on Pleiades place pages asynchronously fetches data from this view and updates the DOM. The end result is a "Flickr Photos" section at the bottom right of every place page that looks (when we have a portrait) like this:

http://farm8.staticflickr.com/7012/6522002865_350997d652_o_d.jpg

We're excited about the extra love for Pleiades places and can clearly see it working. The number of places tagged pleiades:*= is rising quickly – up 50% just this week – and we've gained new portraits for many of our well-known places. I think it will be interesting to see what developers at Flickr, ISAW, or museums make of the pleiades:findspot= and pleiades:origin= tags.

Thanks

We're grateful to Flickr and Daniel Bogan for the extra love and opportunity to blog about it. Work on Pleiades is supported by the NEH and ISAW. Our machine tag predicates come from a NEH-JISC project – still bearing fruit several years later.