2012 (old posts, page 1)

Notes on learning Clojure

I'm learning Clojure and having fun with it. I never learned a Lisp in school like many programmers my age did. The one variant I did try, about 15 years ago, was Scheme. I did a little Gimp scripting with it but nothing else. I think I had to mature a bit before I could appreciate the Lisp style for what it is.

For a language that's designed to be more simple than easy, it's surprisingly easy to use Java classes in Clojure. This is the first code I've written using JTS classes in a while.

user=> (.buffer
  (.read (com.vividsolutions.jts.io.WKTReader.) "POINT (0 0)")
  1.0)
#<Polygon POLYGON ((1 0, 0.9807852804032304 -0.1950903220161282, ...))>

I assumed I'd have to write something like a Python C extension module to do this and am thrilled to be wrong.

Comments

Re: Notes on learning Clojure

Author: Michael Weisman

One of the great things about alternative JVM languages is how simple it is to use existing java libraries. I managed to get JTS talking to Google Refine through the Jython query interface without having to write any wrappers.

Re: Notes on learning Clojure

Author: Sean

I remember you mentioning this, Mike. So much less complicated than ctypes, Cython, etc.

Shapely 1.2.14

Shapely 1.2.14 is up on PyPI: http://pypi.python.org/pypi/Shapely and the documentation has been updated: http://toblerity.github.com/shapely/. Mike Toews has made coordinate and (homogeneous) geometry sequences sliceable. For example, you can now get every other vertex of a line string or linear ring like this:

>>> from shapely.geometry import Point
>>> ring = Point(0.0, 0.0).buffer(1.0).exterior
>>> len(ring.coords)
66
>>> evens = ring.coords[::2]
>>> len(evens)
33

and a geometry collecting the odd points of another multi point geometry can be had like:

>>> from shapely.geometry import MultiPoint
>>> multi = MultiPoint(evens)
>>> len(multi)
33
>>> odds = multi[1::2]
>>> odds.geom_type
'MultiPoint'
>>> len(odds)
16
>>> list(odds)
[<shapely.geometry.point.Point object at 0x744d30>, ... ]

Heterogeneous geometry collections can't be sliced in 1.2.14. Down the road, maybe.

Linked Ancient World Data Institute

From the ISAW blog:

ISAW will host the Linked Ancient World Data Institute (LAWDI) from May 31st to June 2nd, 2012 in New York City. Applications are due 17 February 2012.

LAWDI, funded by the Office of Digital Humanities of the National Endowment for Humanities, will bring together an international faculty of practitioners working in the field of Linked Data with twenty attendees who are implementing or planning the creation of digital resources.

More information, including a list of faculty, and application instructions are available at the LAWDI page on the Digital Classicist wiki.

I'm excited to get to play a small part in this. I was on the faculty of the UVA Scholars' Lab's Institute for Enabling Geospatial Scholarship in 2009 and got to see first hand how an excellent institute is run. We'll try to live up to the high standards of #geoinst.

Geoprocessing for humans: the Fiona manual

I'm making progress on the Fiona manual: https://github.com/Toblerity/Fiona/blob/master/docs/manual.rst. When it's finished, there will be a Fiona 1.0 release. If this doesn't help you understand when and where you'd want to use Fiona over OGR's own Python bindings, let me know. Also let me know if you think I'm misrepresenting those bindings. I'm a big fan of GDAL and OGR, but I don't want to tip toe around its gotchas. I want to banish them.

Home renovation 2012

We finally got our permit and demolition in our new house started on Friday.

http://farm8.staticflickr.com/7157/6655561953_8fe62a84e1_z_d.jpg

There have been a few surprises but nothing terrible yet. It's a ranch style house, built in 1959, and an ordained city landmark. We love the size and layout; all it needs is a new kitchen and a little modernization here and there.

Fiona 0.6.1

Fiona 0.6.1, the version with support for coordinate reference systems (and undefined reference systems thanks to Ariel Núñez) is on PyPI: http://pypi.python.org/pypi/Fiona/0.6.1. I think it's ready for wider testing. If you've been able to install OGR's own Python bindings, you'll find Fiona at least as easy to install on Unix or Linux. It might be as simple as

$ pip install Fiona

or

$ easy_install Fiona

The README has more detailed installation instructions for people like me with non-standard environments.

Update (2012-01-22): See http://pypi.python.org/pypi/Fiona/0.6.2

Coordinate reference systems for Fiona

It's important to me that Fiona work with pyproj. This is how it does.

import logging
import sys

from pyproj import Proj, transform

from fiona import collection

logging.basicConfig(stream=sys.stderr, level=logging.INFO)

with collection("docs/data/test_uk.shp", "r") as input:

    schema = input.schema.copy()

    # This is the input reference system.
    p_in = Proj(input.crs)

    with collection(
            "with-pyproj.shp", "w", "ESRI Shapefile",
            schema=schema,

            # New for 0.6: we can write coordinate reference systems.
            crs={'init': "epsg:27700"}

            ) as output:

        # This is the output reference system.
        p_out = Proj(output.crs)

        for f in input:

            try:
                assert f['geometry']['type'] == "Polygon"

                # Transform the coordinates of every ring.
                new_coords = []
                for ring in f['geometry']['coordinates']:
                    x, y = transform(p_in, p_out, *zip(*ring))
                    new_coords.append(zip(x, y))

                f['geometry']['coordinates'] = new_coords
                output.write(f)

            except Exception, e:
                # Writing untransformable features to a different shapefile
                # is another option.
                logging.exception("Error transforming feature %s:", f['id'])

As I've said before, Python's zip is the essential coordinate wrangling function.

Update (2012-01-05): after https://github.com/Toblerity/Fiona/commit/8b355678bab52f41b737383e2701f4137c2002b4 a crs takes the form of a mapping instead of a string.

Comments

Re: Coordinate reference systems for Fiona

Author: Even Rouault

Sean,

Why not

f['geometry']['coordinates'] = transform(p_in, p_out, f['geometry']['coordinates'])

or something close instead of iterating over each ring/subgeometry part ?

OGR SWIG bindings have a Geometry.Transform() ;-)

Re: Coordinate reference systems for Fiona

Author: Sean

The reason why is that pyproj's transform function is very simple and limited. It operates only on coordinate sequences (Python buffers, to be precise). It doesn't have a concept of geometry or parts or rings. But because of its simple design, you can combine it with Numpy, matplotlib, and the universe of not-GIS Python software.

Geometry.Transform() embodies a number of patterns that I'm deliberately avoiding in my work on Fiona such as excessive object orientation, mutability, and capitalized method names ;) And while it may be easier in the short run, I'm in pursuit of simpler.

More Fiona and OGR benchmarks

I think it's fair to say that Fiona makes different assumptions about how you you access and process geospatial vector data than osgeo.ogr does and that they're each suited for somewhat different tasks. I've developed a few new benchmarks to help sketch out the domains of each.

Minimum feature access

Let's say you only want the names of features within their collections. Here's the osgeo.ogr code.

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
for feature in layer:
    id = feature.GetFID()
source.Destroy()

The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-min.py):

(Fiona)krusty-2:Fiona seang$ python benchmark-min.py
Fiona 0.5
2733.99 usec/pass

osgeo.ogr 1.7.2 (minimum)
1194.01 usec/pass

As I mentioned in an earlier post, Fiona does a lot of extra copying in this case that osgeo.ogr does not. I'm certain you'd find osgeo.ogr about this much faster in the cases where you were only interested in the value of a single property of features (ignoring coordinates) or only interested in the type or bounding box of feature geometries (ignoring coordinates and properties).

Maximum feature access

Now, let's say you want the schema of the collection and for every feature, it's name, the value of every one of its property, and all coordinate values from its geometry. The Full Monty. Here's the OGR code for that:

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)

schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
    fdefn = ldefn.GetFieldDefn(n)
    schema.append((fdefn.name, fdefn.type))

for feature in layer:
    id = feature.GetFID()
    props = {}
    for i in range(feature.GetFieldCount()):
        props[schema[i][0]] = feature.GetField(i)

    coordinates = []
    for part in feature.GetGeometryRef():
        ring = []
        for i in range(part.GetPointCount()):
            xy = part.GetPoint(i)
            ring.append(xy)
        coordinates.append(ring)

source.Destroy()

Update (2012-01-03): A comment below reminds me to point out here the code you'd write to do the same with Fiona:

with collection(PATH, "r") as c:
    for f in c:
        id = f['id']
        props = f['properties']
        coordinates = f['geometry']['coordinates']

The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-max.py):

(Fiona)krusty-2:Fiona seang$ python benchmark-max.py
Fiona 0.5
2717.32 usec/pass

osgeo.ogr 1.7.2 (maximum)
8790.97 usec/pass

This is the kind of terrain for which Fiona is geared to perform. Accessing feature properties via osgeo.ogr is as slow as it can be. Fiona accesses feature properties via a C extension module, making it as fast as it can be. The story is the same for coordinate access, and because there are many geometry vertices per property in the data I'm using, the effect is magnified.

From a performance perspective, osgeo.ogr seems better suited to making small tweaks to data and Fiona better suited to pulling, bending, cutting, and hammering on all aspects of it.

Comments

Re: More Fiona and OGR benchmarks

Author: Martin Davis

The Fiona code in benchmark-min and benchmark-max seems to be the same. Is that intentional?

Anyway, your Fiona work and these benchmarks are very interesting.

The statement about "Fiona making feature access as fast as it can be" may be true for the Python/OGR/C world, but this is not a general statement of shapefile read performance. Out of curiosity I ran the same test in JEQL, and it runs in 125 ms (on a standard 2 GHz 6-yr old PC).

Re: More Fiona and OGR benchmarks

Author: Martin Davis

By the way, the JEQL script for the benchmark is:

ShapefileReader t file: "test_uk.shp";
Mem t;

JEQL has similar goals to Fiona - brevity of code combined with maximum data functionality - but using declarative SQL rather than procedural code.

Re: More Fiona and OGR benchmarks

Author: Sean

Yes, Martin, the code is the same for Fiona in both cases. There isn't a option to get just a little of the data with Fiona like there is with osgeo.ogr, it's all or nothing.

My benchmarks are in *micro* seconds (I've lazily written "μ" as "u"). 2.7 milliseconds for Fiona on this 2.8 GHz Intel Core 2 Duo. I don't know how this compares to your numbers at all since Python's timeit module (http://docs.python.org/library/timeit.html) turns off garbage collection by default to eliminate some differences between independent runs. I'll try running it with GC on.

Re: More Fiona and OGR benchmarks

Author: Sean

Enabling GC didn't make any difference in my numbers. I don't think we can meaningfully compare my testit results to yours.

Re: More Fiona and OGR benchmarks

Author: Martin Davis

2.7 microseconds! Very impressive, and I fully retract my statement about performance!

And yes, it does seem impossible to compare the numbers after all - if for no other reason than our hardware is so different.

Re: More Fiona and OGR benchmarks

Author: Sean

2700 microseconds. 2.7 milliseconds. I wonder what the best framework would be for precisely racing Python and Java code?

Station Identification

Since March 2005, this is the blog of Sean Gillies.

These are my words only and not those of my employer, coworkers, family, or friends. I have agendas, but no particular strategy. I write when I can about whatever I find interesting, and it will be more of the same this year: commercial free observations, thinking out loud, big blocks of code, and lessons learned. I published 56 posts in 2012, up from 47 in 2011, and hope to bump the quantity and quality up again in 2013. Comments are welcome, a privilege of having a modest number of always thoughtful readers, but please do read my comments policy.

The idea to post a yearly station identification comes from John Scalzi's Whatever blog.

Peviously: 2012.