Python Quadtree

One of the issues that's complicated all-out integration of Zope/Plone and GIS is the lack of spatial indexes for catalogs. Without an index, your data is bottled up in the ZODB. "Find all features within [bbox]" is O(N), and unfeasible for large sets of features. Pleiades needs to overcome this issue soon, and so I've done something about it.

After researching trees of all kinds and reviewing a number of libraries, I finally went with what had been right under my nose all along: the quadtree from the venerable and inscrutable shapelib. A few years ago I would have used SWIG to wrap shapelib's tree, but I honestly don't give a damn about anything but Python these days. I whipped up tests, a Python C extension module, and Quadtree was born.

How does it work? Check out the doctests:

Make an instance

  >>> from quadtree import Quadtree
  >>> index = Quadtree([-180, -90, 180, 90])

Add 3 items

  >>> index.add(0, [-106, 39, -105, 40])
  >>> index.add(1, (-100, 25, -95, 30))
  >>> index.add(2, (40, 50, 45, 55))

Find likely items

  >>> [n for n in index.likely_intersection([-110, 30, -100, 45])]
  [0, 1]
  >>> [n for n in index.likely_intersection([10, 40, 40.05, 60])]
  [2]

We get a hit for the next bounding box even though it doesn't explicitly
intersect with the item -- it does intersect with the tree node containing
the item.

  >>> [n for n in index.likely_intersection([-110, 35, -108, 45])]
  [0]

How well does it work? I've yet to script runs over the entire parameter space (data density, tree extent, query box size, maximum tree depth), but have preliminary results from tests/benchmarks.py. 1000 points are randomly sprinkled in a box bounded by [0, 0, 1, 1]. The "brute force" method of determining whether a point is inside a box is:

p.x >= minx and p.x <= maxx and p.y >= miny and p.y <= maxy

A typical result, averaged over 100 passes:

1000 points
Index extent:  [-180, -90, 180, 90]
Query box:  [0.2, 0.3, 0.3, 0.4]

Brute Force:
15 hits
1539.89 usec/pass

Likely Intersection:
23 hits
42.03 usec/pass

Likely Intersection with false positive mop-up:
15 hits
115.35 usec/pass

Making one extra swipe over the likely points in order to mop up the 8 false positives is costly, but is still much faster than comparing each point to the query box. I knew it would be, but it's always encouraging to see the numbers.

Speaking of the way I used to do things ... as recently as May I would have rolled this into the Python Cartographic Library, but have seen the light since then. There's been more interest in OWSLib since I dredged it up from PCL. I'm also inspired by geopy, the single-minded library for accessing geocoding services. Quadtree will be similarly single-minded. It depends on nothing except setuptools (for now). After a little more testing, I'll upload to Cheese Shop.

If it looks useful to you, grab via Subversion:

svn co http://icon.stoa.org/svn/pleiades/Quadtree/trunk Quadtree

and take it for a ride.

P.S. -- five quatloos for the first person to find the Neal Stephenson reference in this post.

Comments

Re: Python Quadtree

Author: bryan

It looks like the boxes are defined simply by north, south, east and west limits. Can one define the boxes by providing the full coordinates of the box corners (which is much more useful near the poles)? If so, I'm very, very interested ... Cheers Bryan

Re: Python Quadtree

Author: Sean

We're limited to rectangles, so you only have freedom to choose two corners. That's four bounding values any way you slice it. I chose to follow the lead of the older WMS spec for bbox rather than GML 3 style min and max corner points.

Re: Python Quadtree

Author: Matt

Sean, thanks for sharing this. Throwing together a quadtree module for Python has been on my to-do list for a few months now and I was just getting to it today, perfect timing! =)