2012 (old posts, page 3)

Reintroducing a Python protocol for geospatial data



Re: Reintroducing a Python protocol for geospatial data

Author: `Stanley Fish <http://law.fiu.edu/faculty-2/stanley-fish/ >`__

Dear Mr. Gillies,

In truth it was your somewhat reductive blog post on Python's reduce() function on which I wished to comment, but I find myself thwarted by your practice of shutting down all reader commentary after a fortnight has passed -- something even I have not yet been accused of doing.

I thought you would like to know that I have made reference to your scintillating work on the Pleiades project in my fourth essay on the so-called "digital humanities" movement. These are to be found in my New York Times column (not to say, blog). I would be interested in your thoughts: http://bit.ly/H4Suf4


Stanley Fish

Re: Reintroducing a Python protocol for geospatial data

Author: Sean

Ho, ho. For readers outside of academia: Stanley Fish is an English literature professor and opinion writer for the New York Times who has lately been winding up my digital humanities colleagues. Some Background: http://www.bogost.com/blog/this_is_a_blog_post_about_the.shtml.

Docutils System Messages

System Message: ERROR/3 (<string>); backlink

Anonymous hyperlink mismatch: 1 references but 0 targets. See "backrefs" attribute for IDs.

Pleiades software reuse

Yesterday I read Melissa Terras On Making, Use and Reuse in Digital Humanities:

That's right. The code we made is now in use by another institution, to do their own transcription project. Hurrah!

It was always our aim in Transcribe Bentham to provide the code to others: it was a key part of our project proposal. But you always have to wonder if that is going to happen. Its the kind of thing that everyone writes in project proposals. And whilst lots of people talk about making things in Digital Humanities, and whether or not you have to make things to be a Digital Humanist, we've shied away - as a community - from the spectre of reuse: who takes our code and reappropriates it once we are done? How can we demonstrate impact through the things we've built being utilised beyond just us and - quite frankly - our mates?

So I'm happy as larry that the code we developed, and the system we have built, is both useful to us, but is now useful to others. I'm not sure how much I want to prod the sleeping monster that is general code reuse in Digital Humanities... dont draw attention to our deficiencies!

But I would be delighted if anyone else could point me to examples where code and systems in Digital Humanities were repurposed beyond their original project, just as we would wish?

I was unable to persuade Recaptcha to let me leave a comment and congratulations on Melissa's blog, so am writing a brief post here. TL;DR Transcribe Bentham: congrats! My own horn: toot! toot!

To my knowledge, no one has set up another instance of Pleiades as a gazetteer. But code written for Pleiades gets reused more and more widely the further down in our stack you look. I designed it to be modular and reusable – a stack of tools, not a single tool – so I'd be disappointed if that wasn't the case. I'll explain how it works for us.

Pleiades uses the Plone (and Zope) web application framework. Our Plone products and packages have taken on a life of their own as the collective.geo project. IW:LEARN is a good example of a collective.geo site. Contributions to collective.geo by almost 20 other people have been making Pleiades better. And this is just the start of our reuse story.

Pleiades and collective.geo Zope and Plone packages are based on Python GIS packages spun off from Pleiades such as Shapely, Rtree, and Geojson. At PyCon a couple of weeks ago, I ran into a lot of Shapely users. I saw it mentioned on slides in talks and on posters in the poster session. Famous web and geography hackers even write about using Shapely from time to time. Shapely feature-wise, Pleiades now gets as much back from others as we give out. This is a fantastic position to be in.

Digging deeper in the stack, I've contributed (as "sgillies") to the development of GEOS as part of my work on Pleiades, so Pleiades has thereby played a tiny role in making thousands of open source GIS programs and web sites more spatially capable. A Python protocol for sharing geospatial data that we invented for Pleiades has been implemented rather widely in GIS software. Anywhere you see __geo_interface__ and shape() or asShape(), or programmers sharing data as GeoJSON-like Python mappings, that's the impact of Pleiades. The GeoJSON format itself has some of its roots in Pleiades.

Even if no one ever sets up another Pleiades site, we're having a significant impact on GIS software and systems, even on big time GIS software being used for Spatial Humanities work. The keys to having a similar impact are, in my mind, 1) modularization and generalization to increase the number of potential users and contributors, and 2) a policy of open sourcing from day zero instead of open sourcing after completion of the project – and after people have lost interest and moved on to other software.


open sourcing from day zero

Author: Christian Ledermann


Release often, release early. It is important to let the community know what you are up to, though it might feel a little embarrassing to make unfinished buggy alpha code available to the public it has big advantages. E.g. when I released the alpha of collective.geo.index the UI was unfinished and ugly, not one of my priorities. It would probably still be unfinished and ugly if it was not for David who gave the UI some TLC just days after the initial announcement.

Thank you very much for your hard work and your great products :)

Reconsidering APIs

This is good advice from Peter Krantz:

TL;DR See if it is possible to publish your open data as file dumps instead of building advanced API's that force entrepreneurs to integrate their apps with your infrastructure.

Agreed. I'm still convinced that data is usually better than an API.

Via http://sunlightlabs.com/blog/2012/government-do-you-really-need-an-api/. Sunlight Labs director Tom Lee makes a great point in there about web APIs for GIS.


Re: Reconsidering APIs

Author: Peter Rushforth

Disagree Sean. Good REST practice dictates that data be exposed as resources, and clients interact with their representation.

That way servers, who deliver that open data, can keep their clients informed of changes as required.

My $.02



Re: Reconsidering APIs

Author: Martin Davis

Agree 100% that data is better than an API.

Now if only there was a good, simple, open, expressive format for spatial data.

(and no, I don't think that sqlite is it - for the same reason, it's an API, not an open format).

Re: Reconsidering APIs

Author: dw

This is crazy talk.

What happens when the data is updated?

What happens when different entities are storing your data in different, ineffective ways to the public?

People not using REST effectively isn't a reason to not use REST.

Re: Reconsidering APIs

Author: Sean

Martin: isn't sqlite more of a hybrid? There's an API, but it's a local - not network - API and all the data is included with it.

My understanding of REST is that it's different from the kind of APIs discussed in the blog posts I referenced. I don't see why an entire modestly sized dataset can't be a first class web resource.

Re: Reconsidering APIs

Author: Martin Davis

Yes, the sqlite API is local. But it is basically opaque - at least, I've never looked but I'm pretty sure that a format designed to support a full database is not going to very friendly for reading directly. And I've never seen the format published. So it has the same problem that's being raised.

I this might be considered a bit OT. But it's a pet peeve of mine that we're still stuck with shapefiles after all these years. In my wilder moments I contemplate how nice it would be if the FOSS4G community got together and created a new standard file format to replace the shapefile (fixing the obviously broken bits and moving it in the 21st cent in a sane way)

Spatial for IPython HTML Notebook

The IPython HTML Notebook made a big splash at PyCon and I'm trying it out at lunch. It's more than just a little bit awesome. I wonder how long until I'm able to clean data in a Notebook using Pandas instead of Google Refine? And for anybody out there who is thinking about visualizing geospatial data in Notebook, my descartes package makes it easy to use GeoJSON-ish data.


Click the image above for larger versions. Code in the notebook also available at https://gist.github.com/2023182.

PyCon conference day ~0

It's the last day of PyCon for me. This means I'll miss the 4 days of sprints, which I regret and intend never to miss again – this event is just getting started for many attendees – but I need to get back and resume sprinting on house remodeling and preparing to move.

I'm very happy that I decided to volunteer to be a session runner. I got to help David Beazley, the session chair, a programmer and teacher whom I've referenced many times on my blog. I've been using his software (including SWIG) for many years. It's neat to be able to tell someone, in person, that you admire and have profited from their work, right? I got to meet speakers: Jason Scheirer, Zain Memmon, Ricardo Kirkner. I got to meet Barry Warsaw, the GNU Mailman lead, in the Green Room. Mailman! And Alex Martelli. And Paul Smith. And conference staff, volunteers all, like Doug Napoleone and Anna Martelli Ravenscroft. Want to meet people serendipitously? Volunteer to chair or run a PyCon session.

I met a bunch of other great folks that I've only met online, you know who you are: thanks for tracking me down or allowing me to track you down.

Permission or Forgiveness

A few minutes ago Alex Martelli gave a talk on "Permission or Forgiveness?"

Grace Murray Hopper's famous motto, "It's easier to ask forgiveness than permission", has many useful applications -- in Python, in concurrency, in networking, as well of course as in real life. However, it's not universally valid. This talk explores both useful and damaging applications of this principle.

Howard Butler clued me in to this motto years ago. I'd like to read more about Grace Hopper. She invented a compiler in 1952 and here's the cover of a book about its language.


Martelli pointed out that just because it's easier to ask forgiveness doesn't make it always ethical or right to not ask permission instead. Use of the imagery above falls into the category of cases in which you ask for permission. The Computer History Museum readily grants permission, as long as I mention the following:

Image courtesy of Computer History Museum.

The cover is masterpiece of midcentury (mid-20th, that is) graphic design: the waves, the typography, and most of all, our friend the atom.


Re: Permission or Forgiveness

Author: Martin Davis

Nice post, Sean.

Gotta love that postwar starry-eyed optimism about science - "if it has atoms it must be great!".

Also gotta love the title - "Automatic Programming". All we have do is to sit back and watch the teletype churn away! Ironic that Grace Hopper was apparently also the originator of the expression "bug".

Zen of Python vector data processing

Just for fun. You're tried import this in the Python interpreter, right?

>>> from fiona import this_bis
The Zen of Python Vector Data Processing, by Sean Gillies

Files are just files.
A record that's read stays read.
Objects are good, but data is better.
Record types aren't special enough to break the rules.
Functional programming is a great idea -- let's do more of that!

Source: https://github.com/sgillies/Fiona/blob/981b78ed22677efbb6c918c62574a832b1b50f85/src/fiona/this_bis.py

Feedparser and GeoRSS/GML

Remember GeoRSS? It's no longer the new hotness, but it's still around. There's a question about parsing the format with Python on GIS Stack Exchange which got me to dust off an old project.

In 2007, I submitted a patch for feedparser. It parsed both the simple and GML flavors of GeoRSS into GeoJSON [1] style dicts. It had a usage example. It had tests! Nevertheless, the maintainer of feedparser dismissed it because (I paraphrase) "what does JSON have to do with RSS?"

This is the only WONTFIX that has ever really gotten under my skin. To scratch this itch for good, I've patched feedparser 4.1 and, until I come up with a better idea, am distributing it from my public Dropbox folder: http://dl.dropbox.com/u/10325831/feedparser-4.1-georss.tar.gz. To install:

$ pip install http://dl.dropbox.com/u/10325831/feedparser-4.1-georss.tar.gz

or download it any way you want and install using the setup.py script in the tarball.

It's dead simple to use.

>>> import feedparser
>>> feed = feedparser.parse("http://earthquake.usgs.gov/earthquakes/catalogs/1hour-M1.xml")
>>> feed.entries[0]['where']
{'type': 'Point', 'coordinates': (-122.8282, 38.844700000000003)}

[1] The little format that could. http://geojson.org.