By my measure, Fiona is half-way to production readiness. Version 0.5 is up on PyPI: http://pypi.python.org/pypi/Fiona/0.5. It now writes mapping features to almost any of OGR's formats. Almost.
In Python, I'm not very interested in OGR's support for databases or services, and so Fiona isn't either. I'd rather use httplib2 to connect fetch any JSON or XML (KML, GML, GeoRSS) document on the web, and Python's built-in json or xml.etree modules to parse them with some help from geojson and keytree. For databases, I'll use SQLAlchemy or GeoAlchemy. Fiona is mainly about reading and writing the arcane file formats like shapefiles, GPX, etc for which there is pretty much no other access than through OGR's drivers. Fiona doesn't do OGR "data sources". It has collections of a single feature type. Think shapefile or single PostGIS table. The door isn't completely shut on databases; the first version of Fiona had the concept of a "workspace", and we could bring that back if needed.
The latest installation instructions and usage example are on Fiona's GitHub page: https://github.com/sgillies/Fiona and there's more documentation to come. A Shapely-level manual will be the cornerstone of a 1.0 release, although I expect the Fiona manual to be much smaller.
If you've got GDAL installed in a well-known location and Python 2.6, you may be able to install as easily as:
$ pip install Fiona
If, like me, you have virtualenvs with their own GDAL libs, you'll need to do something like this.
$ virtualenv .
$ source bin/activate
$ pip install -d Fiona http://pypi.python.org/packages/source/F/Fiona/Fiona-0.5.tar.gz
$ export GDAL=${PATH_TO_GDAL}
$ cd Fiona
$ python setup.py build_ext -I ${GDAL}/include -L ${GDAL}/lib install
Is Fiona anywhere as fast as osgeo.ogr? Benchmark: get all polygon-type features from a shapefile, read their attributes into a dict, get a reference to their geometry (in the OGR case) or geometry as GeoJSON (in the Fiona case), and get their unique ids. Here's the script:
import timeit
from fiona import collection
from osgeo import ogr
PATH = 'docs/data/test_uk.shp'
NAME = 'test_uk'
# Fiona
s = """
with collection(PATH, "r") as c:
for f in c:
id = f["id"]
"""
t = timeit.Timer(
stmt=s,
setup='from __main__ import collection, PATH, NAME'
)
print "Fiona 0.5"
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
print
# OGR
s = """
source = ogr.Open(PATH)
layer = source.GetLayerByName(NAME)
schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
fdefn = ldefn.GetFieldDefn(n)
schema.append((fdefn.name, fdefn.type))
layer.ResetReading()
while 1:
feature = layer.GetNextFeature()
if not feature:
break
id = feature.GetFID()
props = {}
for i in range(feature.GetFieldCount()):
props[schema[i][0]] = feature.GetField(i)
geometry = feature.GetGeometryRef()
feature.Destroy()
source.Destroy()
"""
print "osgeo.ogr 1.7.2"
t = timeit.Timer(
stmt=s,
setup='from __main__ import ogr, PATH, NAME'
)
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
That's just 3 statements with Fiona, 19 with osgeo.ogr. Do you like that? I like that. A lot. Next are the numbers.
(Fiona)krusty-2:Fiona seang$ python benchmark.py
Fiona 0.5
2579.40 usec/pass
osgeo.ogr 1.7.2
3355.43 usec/pass
Result: even though Fiona is doing a fair amount of extra coordinate copying, it's still faster than the Python bindings for OGR 1.7.2. Simpler + faster seems like a big win.
How about writing speed? Benchmark: open a shapefile for writing and write 50 identical (other than their local ids) point-type features to it.
import os
import timeit
from fiona import collection
from osgeo import ogr
FEATURE = {'id': '1', 'geometry': {'type': 'Point', 'coordinates': (0.0, 0.0)},
'properties': {'label': u"Foo"}}
SCHEMA = {'geometry': 'Point', 'properties': {'label': 'str'}}
# Fiona
s = """
with collection("fiona.shp", "w", "ESRI Shapefile", SCHEMA) as c:
for i in range(50):
f = FEATURE.copy()
f['id'] = str(i)
c.write(f)
"""
t = timeit.Timer(
stmt=s,
setup='from __main__ import collection, SCHEMA, FEATURE'
)
print "Fiona 0.5"
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
print
# OGR
s = """
drv = ogr.GetDriverByName("ESRI Shapefile")
if os.path.exists('ogr.shp'):
drv.DeleteDataSource('ogr.shp')
ds = drv.CreateDataSource("ogr.shp")
lyr = ds.CreateLayer("ogr", None, ogr.wkbPoint)
field_defn = ogr.FieldDefn("label", ogr.OFTString)
lyr.CreateField(field_defn)
for i in range(50):
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetField("label", u"Foo")
pt = ogr.Geometry(ogr.wkbPoint)
x, y = FEATURE['geometry']['coordinates']
pt.SetPoint_2D(0, x, y)
feat.SetGeometry(pt)
feat.SetFID(i)
lyr.CreateFeature(feat)
feat.Destroy()
ds.Destroy()
"""
print "osgeo.ogr 1.7.2"
t = timeit.Timer(
stmt=s,
setup='from __main__ import ogr, os, FEATURE'
)
print "%.2f usec/pass" % (1000000 * t.timeit(number=1000)/1000)
Fiona only needs five statements compared to 18 for osgeo.ogr, and Fiona writes these features faster.
(Fiona)krusty-2:Fiona seang$ python write-benchmark.py
Fiona 0.5
4435.63 usec/pass
osgeo.ogr 1.7.2
4565.33 usec/pass
I was rather surprised by this. I've been benchmarking feature reading for a while but this is the first time I've done it for writes.
Do you like tests?
(Fiona)krusty-2:Fiona seang$ python setup.py nosetests --nologcapture --with-coverage --cover-package=fiona
running nosetests
running egg_info
writing src/Fiona.egg-info/PKG-INFO
writing top-level names to src/Fiona.egg-info/top_level.txt
writing dependency_links to src/Fiona.egg-info/dependency_links.txt
reading manifest file 'src/Fiona.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.txt' under directory 'tests'
writing manifest file 'src/Fiona.egg-info/SOURCES.txt'
running build_ext
copying build/lib.macosx-10.5-i386-2.6/fiona/ogrinit.so -> src/fiona
copying build/lib.macosx-10.5-i386-2.6/fiona/ogrext.so -> src/fiona
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
nose.plugins.cover: INFO: Coverage report will include only packages: ['fiona']
test_invalid_mode (tests.test_collection.CollectionTest) ... ok
test_no_path (tests.test_collection.CollectionTest) ... ok
test_w_args (tests.test_collection.CollectionTest) ... ok
test_append_point (tests.test_collection.ShapefileAppendTest) ... ok
test_context (tests.test_collection.ShapefileCollectionTest) ... ok
test_filter_1 (tests.test_collection.ShapefileCollectionTest) ... ok
test_io (tests.test_collection.ShapefileCollectionTest) ... ok
test_iter_list (tests.test_collection.ShapefileCollectionTest) ... ok
test_iter_one (tests.test_collection.ShapefileCollectionTest) ... ok
test_len (tests.test_collection.ShapefileCollectionTest) ... ok
test_no_write (tests.test_collection.ShapefileCollectionTest) ... ok
test_schema (tests.test_collection.ShapefileCollectionTest) ... ok
test_no_read (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_write_point (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_write_polygon (tests.test_collection.ShapefileWriteCollectionTest) ... ok
test_linestring (tests.test_feature.PointTest) ... ok
test_point (tests.test_feature.PointTest) ... ok
test_polygon (tests.test_feature.PointTest) ... ok
test (tests.test_geometry.GeometryCollectionRoundTripTest) ... ok
test (tests.test_geometry.LineStringRoundTripTest) ... ok
test_line (tests.test_geometry.LineStringTest) ... ok
test (tests.test_geometry.MultiLineStringRoundTripTest) ... ok
test_multilinestring (tests.test_geometry.MultiLineStringTest) ... ok
test (tests.test_geometry.MultiPointRoundTripTest) ... ok
test_multipoint (tests.test_geometry.MultiPointTest) ... ok
test (tests.test_geometry.MultiPolygonRoundTripTest) ... ok
test_multipolygon (tests.test_geometry.MultiPolygonTest) ... ok
test (tests.test_geometry.PointRoundTripTest) ... ok
test_point (tests.test_geometry.PointTest) ... ok
test (tests.test_geometry.PolygonRoundTripTest) ... ok
test_polygon (tests.test_geometry.PolygonTest) ... ok
Name Stmts Miss Cover Missing
------------------------------------------------
fiona 15 0 100%
fiona.collection 56 0 100%
------------------------------------------------
TOTAL 71 0 100%
----------------------------------------------------------------------
Ran 31 tests in 4.955s
OK
I haven't figured out how to get Fiona's OGR extension module covered, but I'd be surprised if it wasn't > 90%.
I've been telling people that Fiona's objective was 80% of osgeo.ogr's functionality at 20% of the complexity. Maybe it can have 5% better performance, too?
Fiona originally had such GeoJSON-like objects with `id`, `geometry`, and `properties` attributes. But as I coded, I kept asking myself "this is redundant: why not just use GeoJSON-like mappings?" This question never went away, and I finally decided to ditch the classes and just use dicts. Dicts and other mappings are thoroughly documented, well-tested, and built-in. The GeoJSON format is also very well known these days and maps easily to Python dicts. I feel like I'm trading less code for more usability: win-win.
Most of your performance win is probably related to cython vs SWIG. SWIG can generate terrible Python from a performance perspective.
What's cython's story with regard to PyPy or IronPython? A Fiona based on ctypes would get access to those interpeters for practically no effort. Maybe cython's story with regard to that is the same...
What about coordinate systems? Do they live in Fiona? Should they?
I'm not interested in IronPython anymore. For PyPy, ctypes definitely seems to be the way to go: http://codespeak.net/pypy/dist/pypy/doc/extending.html. Cython's PyPy story isn't so clear. Switching Fiona to ctypes shouldn't be that hard.
Comments are closed after 13 days.
1Re: Fiona's half-way mark
NathanW, 2011-12-21T11:57:16Z
>>That's just 3 statements with Fiona, 19 with osgeo.ogr. Do you like that? I like that.
Like it? Love it! So much cleaner; less stuffing around just get in and do what you what. The less red (read: noise code) the better.
Keep up the good work! Looking at ways I can apply the same kind of API logic to QGIS, as the API is a little verbose sometimes just to do simple thing.
Any reason why you use f['id'] vs something like f.id and do some magic in __getattr__() and __setattr__()?