More field goals, fewer pratfalls

For a computer user in the humanities who doesn't develop their own tools and information systems (for all kinds of good reasons), using technology "the right way" may look like an ever-growing list of fashion prescriptions.

  • "Use MS Access or Filemaker"
  • "Use a relational database"
  • "Use TEI XML"
  • "Implement web services"
  • "Provide RSS feeds"
  • "Make a web API"
  • "Cool URIs for everything"
  • "Use RDF"
  • "Use a triple store"
  • "Use ontology X"
  • and so on ...

Of all the discussions at LAWDI, the one that's on my mind this morning is the very short one I had with Eric Kansa about what happens when linked data principles start being used as criteria for evaluating the fundworthiness of projects in classics and archaeology. It could be disruptive, and it's on me and Eric and others to make sure that we're not setting researchers up for a frustrating run at a football that is pulled away at the last moment.

Maybe the following might be useful prescriptions for we linked data evangelists.

  • Don't focus too much on counting triples.
  • Don't beat projects up about their ugly URIs.
  • Don't make openness a moral issue.
  • Let projects get easy wins from simple vocabularies and ontologies (SKOS, for example).
  • Show people what to do instead of telling people what to do.
  • Emphasize results and getting things done.

I'm sure others can think of more.

Ancient Toponym of the Week: Uri

I've been at ISAW this week teaching scholars and researchers about the web and linked data. Among other things, this has meant a lot of talk about URIs (Uniform Resource Identifiers). Pleiades has URIs for places, of course, and Brad Hafford (Director of the Penn Museum's Ur Digitization Project) revealed that it has a URI for a place named Ur or (according to the Barrington) Uri: According to Brad and Steve Tinney (Oracc), the full Sumerian name would have been Urim. It was commonly shorted in its day to Uri, and at some point further to Ur.

Gearing up for LAWDI

I'm beginning to work on my presentation for the upcoming Linked Ancient World Data Institute at ISAW. Here's what I'd like to accomplish in three bullets.

  • Engage attendees in thinking outside the database and thinking and talking about the architecture of the web.
  • Make a case for using HTTP URIs (aka URLs) whenever possible instead of other identifiers or addresses.
  • Talk about using links in data for doing work (using verbs) in contrast to using linked data for reasoning (with nouns).

How to turn expertly curated non-linked data (digital scholarly editions of texts, etc) into RDF is one linked data problem, the one we're most familiar with and most focused on. How to use semantic web architecture and links to initiate and curate "born-linked" data is another interesting and important set of problems – to me, at least, and I hope to be able to make it compelling to everyone else.

Pleiades remains the only classics project in the Linked Open Data cloud today ( and I'd also like to talk about how other projects can join it, but time may be too short for this.

Elsewhere on APIs and downloads

The big GIS industry blogs are debating one of my favorite topics: download or API?. In my posts I was considering downloads and web APIs with the same media type and data: a big bucket of GML features (for example) in a file vs spoonfuls of GML features via WFS. James Fee, instead, is considering APIs that are lossy for one reason or another (candidates: format conversion, aggregation and clustering, geometry simplification, or attribute filtering). Life's too short to listen to podcasts (that aren't This American Life), though... was there anything interesting in the Directions one?

Reintroducing a Python protocol for geospatial data



Re: Reintroducing a Python protocol for geospatial data

Author: `Stanley Fish < >`__

Dear Mr. Gillies,

In truth it was your somewhat reductive blog post on Python's reduce() function on which I wished to comment, but I find myself thwarted by your practice of shutting down all reader commentary after a fortnight has passed -- something even I have not yet been accused of doing.

I thought you would like to know that I have made reference to your scintillating work on the Pleiades project in my fourth essay on the so-called "digital humanities" movement. These are to be found in my New York Times column (not to say, blog). I would be interested in your thoughts:


Stanley Fish

Re: Reintroducing a Python protocol for geospatial data

Author: Sean

Ho, ho. For readers outside of academia: Stanley Fish is an English literature professor and opinion writer for the New York Times who has lately been winding up my digital humanities colleagues. Some Background:

Docutils System Messages

System Message: ERROR/3 (<string>); backlink

Anonymous hyperlink mismatch: 1 references but 0 targets. See "backrefs" attribute for IDs.

Pleiades software reuse

Yesterday I read Melissa Terras On Making, Use and Reuse in Digital Humanities:

That's right. The code we made is now in use by another institution, to do their own transcription project. Hurrah!

It was always our aim in Transcribe Bentham to provide the code to others: it was a key part of our project proposal. But you always have to wonder if that is going to happen. Its the kind of thing that everyone writes in project proposals. And whilst lots of people talk about making things in Digital Humanities, and whether or not you have to make things to be a Digital Humanist, we've shied away - as a community - from the spectre of reuse: who takes our code and reappropriates it once we are done? How can we demonstrate impact through the things we've built being utilised beyond just us and - quite frankly - our mates?

So I'm happy as larry that the code we developed, and the system we have built, is both useful to us, but is now useful to others. I'm not sure how much I want to prod the sleeping monster that is general code reuse in Digital Humanities... dont draw attention to our deficiencies!

But I would be delighted if anyone else could point me to examples where code and systems in Digital Humanities were repurposed beyond their original project, just as we would wish?

I was unable to persuade Recaptcha to let me leave a comment and congratulations on Melissa's blog, so am writing a brief post here. TL;DR Transcribe Bentham: congrats! My own horn: toot! toot!

To my knowledge, no one has set up another instance of Pleiades as a gazetteer. But code written for Pleiades gets reused more and more widely the further down in our stack you look. I designed it to be modular and reusable – a stack of tools, not a single tool – so I'd be disappointed if that wasn't the case. I'll explain how it works for us.

Pleiades uses the Plone (and Zope) web application framework. Our Plone products and packages have taken on a life of their own as the collective.geo project. IW:LEARN is a good example of a collective.geo site. Contributions to collective.geo by almost 20 other people have been making Pleiades better. And this is just the start of our reuse story.

Pleiades and collective.geo Zope and Plone packages are based on Python GIS packages spun off from Pleiades such as Shapely, Rtree, and Geojson. At PyCon a couple of weeks ago, I ran into a lot of Shapely users. I saw it mentioned on slides in talks and on posters in the poster session. Famous web and geography hackers even write about using Shapely from time to time. Shapely feature-wise, Pleiades now gets as much back from others as we give out. This is a fantastic position to be in.

Digging deeper in the stack, I've contributed (as "sgillies") to the development of GEOS as part of my work on Pleiades, so Pleiades has thereby played a tiny role in making thousands of open source GIS programs and web sites more spatially capable. A Python protocol for sharing geospatial data that we invented for Pleiades has been implemented rather widely in GIS software. Anywhere you see __geo_interface__ and shape() or asShape(), or programmers sharing data as GeoJSON-like Python mappings, that's the impact of Pleiades. The GeoJSON format itself has some of its roots in Pleiades.

Even if no one ever sets up another Pleiades site, we're having a significant impact on GIS software and systems, even on big time GIS software being used for Spatial Humanities work. The keys to having a similar impact are, in my mind, 1) modularization and generalization to increase the number of potential users and contributors, and 2) a policy of open sourcing from day zero instead of open sourcing after completion of the project – and after people have lost interest and moved on to other software.


open sourcing from day zero

Author: Christian Ledermann


Release often, release early. It is important to let the community know what you are up to, though it might feel a little embarrassing to make unfinished buggy alpha code available to the public it has big advantages. E.g. when I released the alpha of collective.geo.index the UI was unfinished and ugly, not one of my priorities. It would probably still be unfinished and ugly if it was not for David who gave the UI some TLC just days after the initial announcement.

Thank you very much for your hard work and your great products :)

Reconsidering APIs

This is good advice from Peter Krantz:

TL;DR See if it is possible to publish your open data as file dumps instead of building advanced API's that force entrepreneurs to integrate their apps with your infrastructure.

Agreed. I'm still convinced that data is usually better than an API.

Via Sunlight Labs director Tom Lee makes a great point in there about web APIs for GIS.


Re: Reconsidering APIs

Author: Peter Rushforth

Disagree Sean. Good REST practice dictates that data be exposed as resources, and clients interact with their representation.

That way servers, who deliver that open data, can keep their clients informed of changes as required.

My $.02



Re: Reconsidering APIs

Author: Martin Davis

Agree 100% that data is better than an API.

Now if only there was a good, simple, open, expressive format for spatial data.

(and no, I don't think that sqlite is it - for the same reason, it's an API, not an open format).

Re: Reconsidering APIs

Author: dw

This is crazy talk.

What happens when the data is updated?

What happens when different entities are storing your data in different, ineffective ways to the public?

People not using REST effectively isn't a reason to not use REST.

Re: Reconsidering APIs

Author: Sean

Martin: isn't sqlite more of a hybrid? There's an API, but it's a local - not network - API and all the data is included with it.

My understanding of REST is that it's different from the kind of APIs discussed in the blog posts I referenced. I don't see why an entire modestly sized dataset can't be a first class web resource.

Re: Reconsidering APIs

Author: Martin Davis

Yes, the sqlite API is local. But it is basically opaque - at least, I've never looked but I'm pretty sure that a format designed to support a full database is not going to very friendly for reading directly. And I've never seen the format published. So it has the same problem that's being raised.

I this might be considered a bit OT. But it's a pet peeve of mine that we're still stuck with shapefiles after all these years. In my wilder moments I contemplate how nice it would be if the FOSS4G community got together and created a new standard file format to replace the shapefile (fixing the obviously broken bits and moving it in the 21st cent in a sane way)

Spatial for IPython HTML Notebook

The IPython HTML Notebook made a big splash at PyCon and I'm trying it out at lunch. It's more than just a little bit awesome. I wonder how long until I'm able to clean data in a Notebook using Pandas instead of Google Refine? And for anybody out there who is thinking about visualizing geospatial data in Notebook, my descartes package makes it easy to use GeoJSON-ish data.

Click the image above for larger versions. Code in the notebook also available at