I think it's fair to say that Fiona makes different assumptions about how you
you access and process geospatial vector data than osgeo.ogr does and that
they're each suited for somewhat different tasks. I've developed a few new
benchmarks to help sketch out the domains of each.
Minimum feature access
Let's say you only want the names of features within their collections. Here's
the osgeo.ogr code.
source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
for feature in layer:
id = feature.GetFID()
source.Destroy()
The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-min.py):
(Fiona)krusty-2:Fiona seang$ python benchmark-min.py
Fiona 0.5
2733.99 usec/pass
osgeo.ogr 1.7.2 (minimum)
1194.01 usec/pass
As I mentioned in an earlier post, Fiona does a lot of extra copying in this
case that osgeo.ogr does not. I'm certain you'd find osgeo.ogr about this
much faster in the cases where you were only interested in the value of
a single property of features (ignoring coordinates) or only interested in the
type or bounding box of feature geometries (ignoring coordinates and
properties).
Maximum feature access
Now, let's say you want the schema of the collection and for every feature,
it's name, the value of every one of its property, and all coordinate values
from its geometry. The Full Monty. Here's the OGR code for that:
source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
fdefn = ldefn.GetFieldDefn(n)
schema.append((fdefn.name, fdefn.type))
for feature in layer:
id = feature.GetFID()
props = {}
for i in range(feature.GetFieldCount()):
props[schema[i][0]] = feature.GetField(i)
coordinates = []
for part in feature.GetGeometryRef():
ring = []
for i in range(part.GetPointCount()):
xy = part.GetPoint(i)
ring.append(xy)
coordinates.append(ring)
source.Destroy()
Update (2012-01-03): A comment below reminds me to point out here the code you'd write to do the same with Fiona:
with collection(PATH, "r") as c:
for f in c:
id = f['id']
props = f['properties']
coordinates = f['geometry']['coordinates']
The results (of https://github.com/sgillies/Fiona/blob/master/benchmark-max.py):
(Fiona)krusty-2:Fiona seang$ python benchmark-max.py
Fiona 0.5
2717.32 usec/pass
osgeo.ogr 1.7.2 (maximum)
8790.97 usec/pass
This is the kind of terrain for which Fiona is geared to perform. Accessing
feature properties via osgeo.ogr is as slow as it can be. Fiona accesses
feature properties via a C extension module, making it as fast as it can be.
The story is the same for coordinate access, and because there are many
geometry vertices per property in the data I'm using, the effect is magnified.
From a performance perspective, osgeo.ogr seems better suited to
making small tweaks to data and Fiona better suited to pulling, bending,
cutting, and hammering on all aspects of it.
Re: More Fiona and OGR benchmarks
Author: Martin Davis
The Fiona code in benchmark-min and benchmark-max seems to be the same. Is that intentional?
Anyway, your Fiona work and these benchmarks are very interesting.
The statement about "Fiona making feature access as fast as it can be" may be true for the Python/OGR/C world, but this is not a general statement of shapefile read performance. Out of curiosity I ran the same test in JEQL, and it runs in 125 ms (on a standard 2 GHz 6-yr old PC).
Re: More Fiona and OGR benchmarks
Author: Martin Davis
By the way, the JEQL script for the benchmark is:
ShapefileReader t file: "test_uk.shp";
Mem t;
JEQL has similar goals to Fiona - brevity of code combined with maximum data functionality - but using declarative SQL rather than procedural code.
Re: More Fiona and OGR benchmarks
Author: Sean
Yes, Martin, the code is the same for Fiona in both cases. There isn't a option to get just a little of the data with Fiona like there is with osgeo.ogr, it's all or nothing.
My benchmarks are in *micro* seconds (I've lazily written "μ" as "u"). 2.7 milliseconds for Fiona on this 2.8 GHz Intel Core 2 Duo. I don't know how this compares to your numbers at all since Python's timeit module (http://docs.python.org/library/timeit.html) turns off garbage collection by default to eliminate some differences between independent runs. I'll try running it with GC on.
Re: More Fiona and OGR benchmarks
Author: Sean
Enabling GC didn't make any difference in my numbers. I don't think we can meaningfully compare my testit results to yours.
Re: More Fiona and OGR benchmarks
Author: Martin Davis
2.7 microseconds! Very impressive, and I fully retract my statement about performance!
And yes, it does seem impossible to compare the numbers after all - if for no other reason than our hardware is so different.
Re: More Fiona and OGR benchmarks
Author: Sean
2700 microseconds. 2.7 milliseconds. I wonder what the best framework would be for precisely racing Python and Java code?