More Fiona and OGR benchmarks

I think it's fair to say that Fiona makes different assumptions about how you you access and process geospatial vector data than osgeo.ogr does and that they're each suited for somewhat different tasks. I've developed a few new benchmarks to help sketch out the domains of each.

Minimum feature access

Let's say you only want the names of features within their collections. Here's the osgeo.ogr code.

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
for feature in layer:
    id = feature.GetFID()
source.Destroy()

The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-min.py):

(Fiona)krusty-2:Fiona seang$ python benchmark-min.py
Fiona 0.5
2733.99 usec/pass

osgeo.ogr 1.7.2 (minimum)
1194.01 usec/pass

As I mentioned in an earlier post, Fiona does a lot of extra copying in this case that osgeo.ogr does not. I'm certain you'd find osgeo.ogr about this much faster in the cases where you were only interested in the value of a single property of features (ignoring coordinates) or only interested in the type or bounding box of feature geometries (ignoring coordinates and properties).

Maximum feature access

Now, let's say you want the schema of the collection and for every feature, it's name, the value of every one of its property, and all coordinate values from its geometry. The Full Monty. Here's the OGR code for that:

source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)

schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
    fdefn = ldefn.GetFieldDefn(n)
    schema.append((fdefn.name, fdefn.type))

for feature in layer:
    id = feature.GetFID()
    props = {}
    for i in range(feature.GetFieldCount()):
        props[schema[i][0]] = feature.GetField(i)

    coordinates = []
    for part in feature.GetGeometryRef():
        ring = []
        for i in range(part.GetPointCount()):
            xy = part.GetPoint(i)
            ring.append(xy)
        coordinates.append(ring)

source.Destroy()

Update (2012-01-03): A comment below reminds me to point out here the code you'd write to do the same with Fiona:

with collection(PATH, "r") as c:
    for f in c:
        id = f['id']
        props = f['properties']
        coordinates = f['geometry']['coordinates']

The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-max.py):

(Fiona)krusty-2:Fiona seang$ python benchmark-max.py
Fiona 0.5
2717.32 usec/pass

osgeo.ogr 1.7.2 (maximum)
8790.97 usec/pass

This is the kind of terrain for which Fiona is geared to perform. Accessing feature properties via osgeo.ogr is as slow as it can be. Fiona accesses feature properties via a C extension module, making it as fast as it can be. The story is the same for coordinate access, and because there are many geometry vertices per property in the data I'm using, the effect is magnified.

From a performance perspective, osgeo.ogr seems better suited to making small tweaks to data and Fiona better suited to pulling, bending, cutting, and hammering on all aspects of it.

Comments

Re: More Fiona and OGR benchmarks

Author: Martin Davis

The Fiona code in benchmark-min and benchmark-max seems to be the same. Is that intentional?

Anyway, your Fiona work and these benchmarks are very interesting.

The statement about "Fiona making feature access as fast as it can be" may be true for the Python/OGR/C world, but this is not a general statement of shapefile read performance. Out of curiosity I ran the same test in JEQL, and it runs in 125 ms (on a standard 2 GHz 6-yr old PC).

Re: More Fiona and OGR benchmarks

Author: Martin Davis

By the way, the JEQL script for the benchmark is:

ShapefileReader t file: "test_uk.shp";
Mem t;

JEQL has similar goals to Fiona - brevity of code combined with maximum data functionality - but using declarative SQL rather than procedural code.

Re: More Fiona and OGR benchmarks

Author: Sean

Yes, Martin, the code is the same for Fiona in both cases. There isn't a option to get just a little of the data with Fiona like there is with osgeo.ogr, it's all or nothing.

My benchmarks are in *micro* seconds (I've lazily written "μ" as "u"). 2.7 milliseconds for Fiona on this 2.8 GHz Intel Core 2 Duo. I don't know how this compares to your numbers at all since Python's timeit module (http://docs.python.org/library/timeit.html) turns off garbage collection by default to eliminate some differences between independent runs. I'll try running it with GC on.

Re: More Fiona and OGR benchmarks

Author: Sean

Enabling GC didn't make any difference in my numbers. I don't think we can meaningfully compare my testit results to yours.

Re: More Fiona and OGR benchmarks

Author: Martin Davis

2.7 microseconds! Very impressive, and I fully retract my statement about performance!

And yes, it does seem impossible to compare the numbers after all - if for no other reason than our hardware is so different.

Re: More Fiona and OGR benchmarks

Author: Sean

2700 microseconds. 2.7 milliseconds. I wonder what the best framework would be for precisely racing Python and Java code?