Yours truly, Fiona

Fiona now writes feature collections to disk. Here's a bit of code from the tests, dressed up with extra comments:

from fiona import collection
from shapely import asShape, mapping

# Open a source of features
with collection("docs/data/test_uk.shp", "r") as source:

    # Define a schema for the feature sink
    schema = input.schema.copy()
    schema['geometry'] = 'Point'

    # Open a new sink for features
    with collection(
        "test_write.shp", "w", driver="ESRI Shapefile", schema=schema
        ) as sink:

        # Process only the features intersecting a box
        for f in source.filter(bbox=(-5.0, 55.0, 0.0, 60.0)):

            # Get their centroids using Shapely
            f['geometry'] = mapping(asShape(f['geometry']).centroid)

            # Stage feature for writing
            sink.write(f)

    # The sink shapefile is written to disk when its ``with`` block ends

That's just 9 statements. Fiona isn't just about less code, it's about taking advantage of Python built-ins and idioms to shrink the API's cranial memory footprint. You already know dicts, and data are better than objects, so features are modeled as GeoJSON-like mappings. Feature schemas are mappings, too. You already know how Python file I/O works, so persisted featured collections are modeled like files. Obviousness and familiarity are what I'm going for here. If you have to call help(fiona) more than 8 times in your entire life, I'll have failed.

I still need to work on support for writing geometry types other than 'Point', coordinate reference systems and make sure it's tight memory-wise (Fiona is all C underneath). It also might be nice to let the sink collection's schema be set from the first written feature, making the above example only 7 statements. The OGR library is so loaded with features – making a simple wrapper API is almost entirely about saying no and throwing features out. And how I've thrown things out. Geometries, out. Fields, out. Features, out. Cursors, out. Layers, out. There's almost nothing left except "open file", "filter iterator", "next dict", "append dict" and "close file". It almost goes without saying that this is for minimalists only.

Update (2011-12-10): the "writing" branch of Fiona now writes polylines and polygons. Multipart geometry types coming soon.

Comments

Re: Yours truly, Fiona

Author: Nathan W

Very cool! Nice job. I like the look of this very much and how much more readable it makes OGR. I have tried a few times to use OGR from Python and while it wasn't hard it still felt very "this is a C++ API with Python on top" this makes it feel a lot more native Python.

Will keep an eye on the projects progress.

Re: Yours truly, Fiona

Author: Sean

Thanks, Nathan. Follow on GitHub if you haven't already and let me know how the library suits your own data access needs.

Does Pleiades have an API?

This is a becoming a frequently asked question, and as I work on the definitive answer for the Pleiades FAQ, I'll think out loud about it here in my blog. Does Pleiades have an API? In truth, it has a number of APIs, some good and some bad. Does it have a HTTP + JSON API like all the cool kids do? No. Well, yes, sort of.

Before I get into tl;dr territory, I'll write down one of the guiding principles of the Pleiades project:

Data is usually better than an API.

It's not that we're uncomfortable with interfaces in Pleiades. Our application is based on Zope and Plone, so you know it has all kinds of interfaces under the hood. I'm even a bit of a geek about designing nice APIs (see also Shapely, Fiona, etc). It's just that data is better ... usually.

By "data" above, I mean a document or file or sequence of bytes containing related information, in bulk. The entire text of a book, for example, is better to have than an API for fetching the N-th sentence on page M. All the coordinates of a simple feature linestring (as GeoJSON, say) are better to have than an API for getting the N-th coordinate value of the M-th vertex of a line object. Given all the data, we're not bound to a particular way of indexing and searching it and can use the tools of our choice. APIs are typically chatty, slow and pointlessly different from others in the same line of business. Subbu Allamaraju goes deep into the trouble of working with inconsistent systems in "APIs are a Pain" and with more hard earned wisdom than I have, so I won't pile on here. Data is better ... usually.

An API, and here I mean "web API", can be better in the following and probably not exhaustive list of situations:

  • Sheer mass of data making dissemination practically impossible

  • Rapidly changing data making dumps and downloads out of date

  • Desire to control access to individual data records

  • Desire to monetize data (ads, for example)

  • Desire to impose a certain point of view

  • Desire to track use

Tracking use lets us tweak the experience of users. "People who viewed record M might also be interested in record N" and the like. It doesn't have to be nefarious tracking, just nudging users into useful and mutually profitable patterns. Only one of these situations is very relevant to Pleiades and so we're not designing APIs to sort them all out like other enterprises must. The RDF and KML serializations of the entire 34,000 place Pleiades dataset are not large by modern standards and don't change very rapidly. An application (like the Pelagios Graph Explorer or GapVis) that fetched and cached them once a day could stay quite up to date. The number of Pleiades contributors is growing, but they are primarily enriching existing places; I don't expect Pleiades to ever become so large that those files couldn't be transferred in less than a minute on a good internet connection. We control access to data that's in development, yes, but the locations, names and places that pass through review into a published state are completely open access and not private to any individual user or group of users. In only one part of Pleiades are we concerned about controlling a narrative through an API: the slideshow that plays on the Pleiades home page uses an API that stumbles through the most recently modified places and progressively mixes in more randomly selected ones.

Instead of fancy APIs, then, we have boring CSV, KML, and RDF downloads. The shapefile format, by the way, is inadequate for our purposes. Information will be lost in making a shapefile from the Pleiades model (any number of locations and names per place) and we're going to let people decide for themselves what to give up if they want this. The downloads are updated daily.

Pleiades also has JSON, KML, and RDF data for any particular place. Data that is current and linked from every page (http://pleiades.stoa.org/places/422987, for example) with HTML <link> and <a> elements. It's not an API ... or is it? The map on the page about Norba gets its overlay features from those very same JSON and KML resources. Looking at it in this way, you could say we do have an API here: the web is the API. When I finally finish the Pleiades implementation of OpenSearch (with Geo extension by Andrew Turner), I can replace Plone's crufty search API with even more consistency and interoperability from The Web as API.

Pleiades doesn't need the same kind of API that Twitter or Facebook have (obviously) or that OpenStreetMap has. We simply don't have anywhere near that much data, that much churn or (in the Twitter/Facebook case) that much need to control what you access.

Comments

Re: Does Pleiades have an API?

Author: josh livni

Another reason for an API would be a desire to allow adding new data or modifying a subset of the data, using different tools than your default web ui, no?

Re: Does Pleiades have an API?

Author: Sean

Maybe ... edits change everything (so to speak), so I'll have to mull that over. There are certainly other ways to incorporate changes that don't involve web APIs: diff and patch, for example, or git.

Simple in theory

Rich Hickey's "Simple Made Easy" presentation at Strange Loop, recommended to me by my Clojure programming co-worker Hugh Cayless, is flat out awesome. "Guardrail Programming" and "Knitted Castle" are my new favorite metaphors. Hickey has a compelling theory about complexity and after watching the presentation, I feel like I can be a better advocate for simplicity. Advocate to those who like theory, at least. For others, the proof remains in the pudding, whether simple means better software.

REST, the architectural style, didn't factor into Hickey's talk at all, but is a great example of an approach that chooses simplicity over ease. REST is hard. It is. You're wrong if you've been thinking that REST is easier than SOAP or COM. Look at almost any (there are exceptions, yes) so called "REST API" and you'll see something produced by web programmers that tried to apply the REST style and either couldn't get their heads around it or gave up on it under pressure to deliver. REST is hard to understand and it can be difficult to explain its benefits to managers and customers that prioritize ease over simplicity. REST is hard, but REST is simple. It is predictable and you can reason about what you can or cannot do with it.

There's a notion in the humanities that DH (digital humanities) is undertheorized. I'm not a humanist, really, just a programmer, but I strongly disagree. Programmers in the humanities are doing a great amount of theoretical work. As well as reading Hugh's recent posts, digital humanities theorists owe themselves a look at Hickey's theory of complexity and Roy Fielding's theory of representational state transfer. The world of programming and the field of humanities programming and computer are more theorized than they appears to non-programmers.

GeoJSON wrap up

We collected 16 bullet points worth of potential proposals and 2 of these matured enough to be seriously considered for inclusion in the specification. In the end, there was no consensus for accepting them among the authors of the 1.0 document. The specification will not now be revised and will stay at 1.0.

Ellipses and circles were not accepted because authors were not all willing to add a feature that would require knowledge and parameterization of the World Geodetic System for computing new latitudes and longitudes from distances measured in meters in the most common, no-CRS GeoJSON situation. Another concern was that the proposal couldn't provide any basis for representing semi-circles or products of circles, ellipses and other GeoJSON geometry types and that since consumers would be required to approximate them as polygons in most cases, why not just make them polygons to begin with?

The Data Series proposal struck authors as too far outside the scope of describing simple geographic features and as something that wasn't precluded by the current 1.0 specification.

Work on a 1.1 version has ended for now. I did my best to keep the process short and avoid burning people out so that we may start up again when the time is right. You can follow the entire discussion and consensus making process in the GeoJSON list archive from September, through October and ending in November.

Ancient Toponym of the Week: Error Ins.

I saw Error? Ins. yesterday in the slideshow that runs in the new Pleiades homepage and immediately thought "there's an Atlas bug." Tom Elliott, managing editor of Pleiades, checked his copy of the Itineraria Provinciarum et Maritimum (also known as the Antonine Itinerary) and found:

item inter cartaginem spartariam et cesaream mauretanie: insula erroris et tauria, inter se habent stadia LXXV.

Translation: "between Spartarian(?) Cartagina and Caesarea Mauretania: the island of Error and Tauria, between which there are 75 stadia."

The Barrington Atlas (and Pleiades) annotated this toponym with a question mark to indicate some uncertainty in association between this name and the small island now known as Île Plane.

Tom also reminded me that the Antonine Itinerary is covered in Chapter 14 of The History of Cartography, Volume 1 – available as a free PDF from the University of Chicago Press: http://www.press.uchicago.edu/books/HOC/HOC_V1/Volume1.html.

Comments

Itinerary

Author: Tom Elliott

Unfortunately I'm not aware of any open-access edition of the itinerary. The standard print edition is Otto Cuntz (ed.), Itineraria Romana, vol. I, Stuttgart: Teubner, 1929. My 1990s reprint copy (ISBN 3-519-04273-8) cost something like 96 euros via ABE Books.

Tom

Getting the GeoJSON band back together

The authors of the 1.0 specification have agreed to address the deficiencies and ambiguities of the GeoJSON format and revise the specification document. Consensus is that we will avoid going overboard; this will be a 1.1 not a 2.0. The primary components of the process are:

  • Consensus

  • Good faith

  • Transparency

  • GitHub

All authors signed off on every part of 1.0 and consensus among the authors will decide everything for 1.1. We probably can't tackle every worthy idea that is floated, but we'll do our best. Discussion, debate, and consensus forming will take place on a public (and archived) mailing list and wiki and the process of revision will be managed using Git and GitHub. Please see https://github.com/GeoJSONWG/geojson-spec and join us in keeping the format fit for future use.

Notes on deploying a Pyramid app to Heroku

My first try at deploying a Pyramid app to Heroku's new Python platform took all of 20 minutes, 15 of which were spent figuring out that I needed to pin my application to PasteScript 1.7.3 to use cpwsgi_server and that putting a Python executable in the Procfile's web line seemed to be the key to getting Heroku to understand this was a Python (and not Ruby on Rails) app. I'm sure the Heroku example using Flask can get you up and running in less than 5 minutes. Heroku is probably not news to many of you, but I'm blown away at how easy the Heroku team has made this. Impressive.

My application itself is nothing much. Try out the Greek to modern Latin transliteration scheme used by the Classical Atlas Project and Pleiades at http://fierce-sky-2201.herokuapp.com/ if you like. There's a link to the source of my Pyramid app in the form's page.

Fiona

I while ago I created a project named WorldMill to learn about writing Python extension modules with Cython and experiment with designing a slicker OGR API. Interest in the project is rising again and after some discussion I've persuaded users that we should change the name because the *Mill project name space is getting a little crowded. The new project: Fiona. Fiona is OGR's neater (or nicer/nimble/no-nonsense) API – elegant on the outside, unstoppable OGR(e) power on the inside.

What I'd like out of Fiona: a clear alternative to the complex layers and cursors and fussy geometry objects of OGR and ArcPy; Python generators serving as sources and sinks of GeoJSON-like objects; and above all, no reference counting duty dumped on users, no need to explicitly "del" anything. I think an API like this would be productive and make new types of Python data processing programs possible. For example, one might use the enhanced generator protocol of PEP 342 to create pipelines of coroutines that receive and send GeoJSON-like objects, bringing into being something like a WSGI for Python spatial data processing. See https://gist.github.com/1232852 for the pipelinedemo module code wherein the pipeline components below are declared. The demo tasks simply increment the value of a particular feature property (adding the property if it doesn't already exist) and send the feature down the pipe. The demo writer appends received features to a list and serializes them to JSON in a file.

>>> from pipelinedemo import pipeline, task1, task2, writer
>>> features = [{'id': "1"}, {'id': "2"}, {'id': "3"}]
>>> pipeline(
...     features,
...     task1(
...         task2(
...              writer(open("pipeline-demo.json", "w")))
...         )
...     )
>>> print open("pipeline-demo.json").read()
{
  "features": [
    {
      "id": "1",
      "properties": {
        "count": 2
      }
    },
    {
      "id": "2",
      "properties": {
        "count": 2
      }
    },
    {
      "id": "3",
      "properties": {
        "count": 2
      }
    }
  ]
}

Fiona already provides feature source generators that leverage the OGR format drivers. Work on the feature sinks at the other end of the processing pipeline is clearly the next step. Follow or fork; your ideas and pull requests are welcome.

By the way, A Curious Course on Coroutines and Concurrency has a very readable introduction to this "push" style of pipeline for Python along with excellent advice in general on using enhanced generators.

State of the Map Saturday

I'm attending the State of the Map next Saturday, not to present but to catch up with some Python folks, meet some nodes of my social networks, and hear about the state of the art in collecting, modeling, and using geographic data. The OpenStreetMap project has been and continues to be a source of inspiration and social and technical guidance for Pleiades. I hope I'll see you there.

I won't be at FOSS4G this year. My wife is off to an international weed science symposium and I'm solo dad all next week. I wanted to come down Wednesday to organize an informal Python session; inability to get any babysitting help before or after school is going to prevent me from doing so. I will miss a lot of people who won't be at SoTM, but $350 for a brief visit during school hours is more than I can justify.

Trail Ridge

My youngest finally got her first taste of camping in Rocky Mountain National Park last weekend, and also her first ramble on the tundra (or rather a trail through protected tundra) up on top of Trail Ridge. The Rock Cut parking lot on Trail Ridge Road (US-34) is at 12,110 feet, or just less than 3700 meters above sea level.

http://farm7.static.flickr.com/6184/6073550484_2a1f670cc0_z_d.jpg

Coming down from the viewpoint, her big sister spotted this vast Moose-shaped snowfield:

http://farm7.static.flickr.com/6186/6073550490_d88553f377_z_d.jpg

There's a hint of the Moose in Bing's imagery, but not in Google's.