One of the issues that's complicated all-out integration of Zope/Plone and GIS is
the lack of spatial indexes for catalogs. Without an index, your data is bottled
up in the ZODB. "Find all features within [bbox]" is O(N), and unfeasible for
large sets of features. Pleiades needs to overcome this issue soon, and so I've
done something about it.
After researching trees of all kinds and reviewing a number of libraries, I finally
went with what had been right under my nose all along: the quadtree from the
venerable and inscrutable shapelib. A few years ago I would have used SWIG to wrap
shapelib's tree, but I honestly don't give a damn about anything but Python these
days. I whipped up tests, a Python C extension module, and Quadtree was born.
How does it work? Check out the doctests:
Make an instance
>>> from quadtree import Quadtree
>>> index = Quadtree([-180, -90, 180, 90])
Add 3 items
>>> index.add(0, [-106, 39, -105, 40])
>>> index.add(1, (-100, 25, -95, 30))
>>> index.add(2, (40, 50, 45, 55))
Find likely items
>>> [n for n in index.likely_intersection([-110, 30, -100, 45])]
[0, 1]
>>> [n for n in index.likely_intersection([10, 40, 40.05, 60])]
[2]
We get a hit for the next bounding box even though it doesn't explicitly
intersect with the item -- it does intersect with the tree node containing
the item.
>>> [n for n in index.likely_intersection([-110, 35, -108, 45])]
[0]
How well does it work? I've yet to script runs over the entire parameter
space (data density, tree extent, query box size, maximum tree depth), but
have preliminary results from tests/benchmarks.py. 1000 points are
randomly sprinkled in a box bounded by [0, 0, 1, 1]. The "brute force"
method of determining whether a point is inside a box is:
p.x >= minx and p.x <= maxx and p.y >= miny and p.y <= maxy
A typical result, averaged over 100 passes:
1000 points
Index extent: [-180, -90, 180, 90]
Query box: [0.2, 0.3, 0.3, 0.4]
Brute Force:
15 hits
1539.89 usec/pass
Likely Intersection:
23 hits
42.03 usec/pass
Likely Intersection with false positive mop-up:
15 hits
115.35 usec/pass
Making one extra swipe over the likely points in order to mop up
the 8 false positives is costly, but
is still much faster than comparing each point to the query
box. I knew it would be, but it's always encouraging to see the
numbers.
Speaking of the way I used to do things ... as recently as May I
would have rolled this into the Python Cartographic Library, but have
seen the light since then. There's been more interest in OWSLib since I dredged
it up from PCL. I'm also inspired by geopy, the single-minded library
for accessing geocoding services. Quadtree will be similarly single-minded.
It depends on nothing except setuptools (for now). After a little more testing,
I'll upload to Cheese Shop.
If it looks useful to you, grab via Subversion:
svn co http://icon.stoa.org/svn/pleiades/Quadtree/trunk Quadtree
and take it for a ride.
P.S. -- five quatloos for the first person to find the Neal Stephenson reference in this post.