2009 (old posts, page 2)

Give it a REST

Tobin Bradley writes:

We found out a few days before the conference that we had won the G. Herbert Stout Award for Visionary Use of GIS for our REST Web Services Framework. Mrs. Stout was there to present the award, and it was a very humbling and happy experience for us. Congratulations to the Mecklenburg County GIS staff, and congratulations to Asheville for winning in the city category - it was definitely well deserved!

On the one hand, it's neat that anything "REST" wins an award in geospatial. On the other hand, REST can't take any credit because there actually isn't any of it in these services. And that's fine (except for the misleading label) because Mecklenburg County GIS doesn't need REST for this. REST is a style for huge, distributed information architectures that need to last for decades, like the Web (or even a national spatial data infrastructure).

I'd like to see more GIS developers follow the lead of CloudMade and tout HTTP APIs. Not only would it be in almost all cases more truthful, there's the advantage to being able to point users to the HTTP specification, HTTP libraries and tools, and not having to explain why there's no "REST specification".

Update (2009-02-23): Where are my manners? Congratulations on the award. Making an HTTP API that users like is no small thing, and neither is open sourcing a well-documented implementation to a GIS community that still tends to be locked up by proprietary vendors.

Comments

Re: Give it a REST

Author: Tobin

Hi Sean,

Thanks for the congratulations! We're using the term REST loosely to describe the transport method (as opposed to, say, SOAP) rather than the more correct Fielding definition of the term. I think that's fairly common use of the term and I think you can differentiate between REST and a RESTful application, but I can see how one might object. Terminology is a common problem for me. To this day my wife claims my geranium isn't a geranium.

Cheers, and great job with your blog!

Re: Give it a REST

Author: Sean

Common misuse of the term would be more accurate. We have to take our terms seriously.

Plugins for Shapely

I haven't started working on any pure Python (for App Engine) geometry predicates or operators, or any other alternatives to GEOS, but I've begun to make it possible. All GEOS dependency is being moved to its own shapely.geos module, which will be the default plugin provider for Shapely's new plugin framework.

The framework defines a few interfaces (geometry compiling or checking provider, area and length provider, bounds provider, etc) and entry points. When a geometry is asked to compute its bounds, for example, it delegates to a provider that has been wired up to the geometry factory on import. Here's a peek at how the default providers are registered in Shapely's setup.py:

...
setup(name          = 'Shapely',
      version       = '1.1.0',
      ...
      entry_points = """
          [shapely.geometry]
          geometryChecker=shapely.geos.geometry:geometryChecker
          metricProperties=shapely.geos.metrics:metricProperties
          geometryProperties=shapely.geos.topology:geometryProperties
          """,
)

And in shapely.geometry, the providers are activated with a function:

...
use('Shapely>=1.1.0')

from geo import shape, asShape
from point import Point asPoint
from linestring import LineString, asLineString
from polygon import Polygon, asPolygon
from multipoint import MultiPoint, asMultiPoint
from multilinestring import MultiLineString, asMultiLineString
from multipolygon import MultiPolygon, asMultiPolygon
from collection import GeometryCollection

Users can override the default GEOS providers by writing and installing (with setuptools) new packages that provide the same named end points, and "using" those at run time:

from shapely import geometry
geometry.use('MyGeometry')
...

point = geometry.Point(0.0, 0.0)
x = point.buffer(10.0) # calls on provider defined by MyGeometry package

In theory, this makes it possible to a write an application using Shapely that can run on either C Python, Jython, or IronPython using the appropriate backend for the platform, as long as setuptools and pkg_resources work. I've read that they will in the next Jython release. IronPython seems to be a little farther behind. In practice, the plugin framework is helping to improve the testability and quality of Shapely's code.

Comments

Re: Plugins for Shapely

Author: brent

thanks sean. very cool.

for those who didnt know about setuptools entry points, this is a good intro:

http://lucumr.pocoo.org/blogarchive/setuptools-plugins

Critique of WxS, en Français

I think this is the first time I've been translated. Thank you, René-luc D'Hont. I read French just well enough to verify that it's a faithful translation. I disagree with the comment at the end about GML's fitness: GML is XML, which is right for the Web; GML has a unique content type (not just text/xml), which is also right for the Web. The problems with GML are XML Schema and that it's not RDF.

Comments

Re: Critique of WxS, en Français.

Author: ReLuc

Well, GML is right for Web, I apologize. I just wanted to say that GML is hard to use in a browser environment. GeoJSON and GeoRSS are best designing for Web. And I'm agree with you about XML SChema in GML!

Making data more citable

Shorter Kurt Schwehr: your data needs a cool URI.

Bryan Lawrence is another scientist that blogs about this issue, including the messy details of citing data in traditional hypertext-less paper journals. Of course, getting credit for publishing digital data is a completely different, social issue, which probably has to wait for a generational change in the sciences.

Anglo-European Open Source Archaeo/Geo/GIS events?

So, I'm going to be in Montpellier, France, for one year starting 1 June. I'm completely new to the neighborhood. I've got the First Open Source GIS UK Conference, EuroPython 2009, and OGRS 2009 on my calendar. What else is going on?

Comments

Re: Anglo-European Open Source Archaeo/Geo/GIS events?

Author: Vincent HEURTEAUX

Hello Sean,

Welcome in Montpellier !

If you want to meet people working in Opensource Geospatial project Geomatys folk's are based here and will attend to OGRS in July, then just e-mail us if you want to travel with us or just talk about geospatial stuff.

Cheers,

Vincent

Re: Anglo-European Open Source Archaeo/Geo/GIS events?

Author: Mateusz Loskot

Sean,

Looks like there is a chance to have a beer together if you will be visiting UK :-)

Mat

Re: Anglo-European Open Source Archaeo/Geo/GIS events?

Author: Olivier

Hi Sean,

There's also PgDay Europe in Paris, this year, not yet scheduled, but something like 2 days in October.

Olivier

Re: Anglo-European Open Source Archaeo/Geo/GIS events?

Author: Stefano Costa

Sean, that's great news. You'll miss our late-april ArcheOpenSource event in Rome, but you should be still here (in Europe) for next year's workshop.

Looking forward to meet you then.

Stefano

Re: Anglo-European Open Source Archaeo/Geo/GIS events?

Author: Sean

Merci, grazie, and thanks. I'm looking forward to this trip. Perhaps I'll even be able to host an ISAW sponsored event after we get established in the Old World :)

Re: Anglo-European Open Source Archaeo/Geo/GIS events?

Author: Schuyler Erle

Don't forget the OSM conference -- State of the Map 2009 in Amsterdam, second week of July:

http://www.stateofthemap.org/

What's the beef?

The answer to:

@sgillies What's the beef with OGC WMS and WFS?

requires a few more than 140 characters.

First, let me review the good about the OGC service architecture and its W*S specs. The OGC has made interoperability a top priority in GIS. Everybody recognizes this is a huge accomplishment. I do too. My favorite byproduct is the increasing priority of open access. It's no accident. The OGC intended that interoperability would lead to more open access to data, and it has. It's a wonderful thing. My other favorite, and perhaps more accidental, byproduct is that thinking of GIS services as interchangeable commodity components leads rather quickly to considering open source implementations. I also think the OGC has done a fine job identifying and standardizing the parameters of our common processes, and a generally good job on message formats. So much good, I must be in heaven, right?

My beef with W*S is that its architects didn't do their Web homework. Despite the "Web" in the name, service design isn't informed by Web architecture and the understanding of HTTP (the Hypertext Transfer Protocol) begins and ends with CGI (the Common Gateway Interface). W*S understands and uses the Web as an alchemist understood and used the elements. We bear the cost of needless reinvention: "Update sequence" instead of HTTP Expiration and Validation, "Web Geolinking Service" instead of standard HTTP interaction, "GeoDDS" instead of Atom. Despite the idea that W*S are designed to be transport-neutral, HTTP is the only significant "distributed computing type" (what the architects call "transport"). The USGS Framework WFS uses no other transport than HTTP. GeoBase uses no other transport than HTTP. Still, our "Web" services remain things that are not really of the Web.

Another minor beef is that in our interoperability fervor we have made standardization holy. The GIS community largely believes that standards should come before implementation, should be built in clean rooms by an elite group of standards scientists, and this stifles innovation. I depend on standards as much as anyone, but I feel we should be standardizing on best practices more than we currently do.

Comments

Re: What's the beef?

Author: Jachym

Further more, I'm missing propper support of SOAP in W*S specifications, which would make OGC "Webservice" to W3C "Webservice" (if I understand this well).This particular thing makes OGC OWS incompatible with Inspire, which is essential for european GISers.

Re: What's the beef?

Author: Sean

The only thing I'll say about SOAP as a transport, is that it isn't any more of a transport than HTTP is. Look at the mess that is a SOAP DCPType for WxS: WxS (transport independent) over SOAP (not a transport) over HTTP (not a transport) over TCP (ah, there's the actual transport).

Re: What's the beef?

Author: C. Reed

One minor disagreement - your last statement is incorrect. The vast majority of new OGC candidate standards are "birthed" in the hot bed of implementation and not in a "clean room". These birthing areas could be in the wild or in an OGC test bed. WMS and WFS were birthed around the same time as some other web standards (SOAP, WSDL, etc) but back then no one had even thought of implementing SOAP in 1998 and 1999. So, we are dealing with some legacy here.

Re: What's the beef?

Author: Sean

Carl (yes?), I agree that there seems to be a positive trend (GeoRSS, KML, GeoPDF), but is it really a sea change in how the OGC makes standards?

Busting RESTful GIS myths

I'm going to use the announcement of Nanaimo's "authentic Web" GIS as an occasion to debunk some myths about REST and the Web, and their fitness for designing alternatives to the OGC's service architecture, that surfaced on Twitter last week.

Myth: RESTful Web services aren't based on standards.

Indeed, there are APIs on the programmable web touting "REST" which are very unlike each other. Not all of them are even RESTful when you get right down to it. They come from different and varying domains. It's understandable that a quick glance leaves some with the mistaken impression that interoperability is not a property of RESTful Web services.

First, and this can't be said enough, because it still isn't really understood in the GIS community: REST is a particularly constrained style of architecture which just so happens to be the style of the World Wide Web. It is not a standard, but needs and shapes standards. Interoperability depends on standards, whether your architecture is RESTful or not. Paul Prescod, who I'm quoting often these days, enumerates the necessary kinds of standards:

In application-level networking, there are three basic things to be standardized

  • Addressing -- how do we locate objects
  • Methods or Verbs -- what can we ask objects to do
  • Message payloads or Nouns -- what data can we pass to the objects to ask them to accomplish their goals

For RESTful Web services, the first two are standardized by HTTP/1.1:

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

HTTP has been in use by the World-Wide Web global information initiative since 1990. This specification defines the protocol referred to as "HTTP/1.1", and is an update to RFC 2068.

You wanted standards? Even OGC service specifications acknowledge HTTP/1.1 (though not without misusing it, more on that in a future post). And HTTP/1.1 has been shaped by REST.

As Prescod points out, RESTful Web services push all interoperability problems into the third standardization category: message payload. This is a conscious decision. Interoperability may not be perfectly solvable, but you can isolate its problems, and RESTful services should do so. Do read all of Prescod's article on standardization. We spend way too much time talking about the "which" of standards in GIS without really thinking about the "what".

Myth: RESTful Web services aren't "lights out" accessible.

In fact, a properly RESTful service has better accessibility than an OGC service. To use the same analogy, what if you drop your special OGC service client in the dark and can't find it? How do you access your OGC service? Pardon me, but you're screwed: standing knee-deep in other web programming tools, and none of them can make sense of the OGC's unique addressing schemes and unique methods of interaction. With a RESTful service you can poke at it in a standardized way (HTTP/1.1 again) with curl, or XHR, or whatever, and get an actionable description. Of course, you're a GIS geek, and you keep your OGC client firmly attached to a reeling key chain, but consider the other agents on the Web that don't have an OGC service client at all. The OGC's special addressing scheme and methods make it very hard for those agents to get even a partial understanding of the service.

Myth: REST is too immature for GIS.

In 2000, Roy Fielding wrote:

Since 1994, the REST architectural style has been used to guide the design and development of the architecture for the modern Web. This chapter describes the experience and lessons learned from applying REST while authoring the Internet standards for the Hypertext Transfer Protocol (HTTP) and Uniform Resource Identifiers (URI), the two specifications that define the generic interface used by all component interactions on the Web, as well as from the deployment of these technologies in the form of the libwww-perl client library, the Apache HTTP Server Project, and other implementations of the protocol standards.

Web architecture has been done in the REST style since 1994. HTTP/1.1 began to be adopted in 1997. The Web itself goes back to 1990. The OGC's service architecture is not more mature than this.

Myth: RESTful Web services are fine for small solutions, not for large, inter-agency solutions.

The World Wide Web is our largest network application. It's global. Interoperability is promoted through the use of common hypertext formats. One would think that geographic information systems using the same architecture, in the same style, could scale equally well, right? I think the onus is instead on the naysayers to make the case that their favored architectures can scale like the Web and bridge domains like the Web.

Comments

Re: Busting RESTful GIS myths

Author: Paul Ramsey

Some of the "rest isn't ready" myths, related to the "opengis is enterprise ready" one, come from the lack of a complete bundling of restful approaches into a documented implementation profile.

Starting from the premise that "I want to write one client that can read/write features from multiple sources", the OGC answer is simple: use the WFS document, that describes your protocol, and the GML and Filter documents describe your encoding.

The bits that are missing from a unified "REST feature server" specification aren't large... there's lots of encodings, and they can be re-used from the OGC stuff. But there's no specification that would allow us to put a server team and client team in separate rooms and allow them to come out with two pieces of software that talked to one another using REST principles.

Re: Busting RESTful GIS myths

Author: Dave Smith

In recent exhanges via Twitter, I'm afraid the 140char limitation does not always lend itself well to being able to adequately convey meaning, and as such, you may have misinterpreted the points I was trying to make.

I have no problem with REST. Whatsoever. I wholly agree that REST has standards, that it is mature and robust, and that it can and no doubt will serve large, disparate enterprise and cross-agency applications.

My problem is specifically with regard to how *geospatial* data and processing are handled *within* REST, not with REST itself. Currently it's that implementation piece that leaves many questions unanswered. Certainly there are many pieces and parts already existing in the geo world and elsewhere which may lend themselves to handling things like how to handle datums and projections, e.g. EPSG and WKT, how to handle bounding boxes, styling, filtering, temporal slices, and the like, but as of yet, there is not yet any consistent way to do so within geospatial REST implementations - currently there are many disparate approaches to how these are treated in RESTful implementations.

And that's where the challenge in consistent discovery, access and use of geospatial data and processing comes into play in a consistent fashion.

Perhaps the answer may be a standard way of structuring RESTful geo assets, or perhaps it may be a means of conveying to a calling application how to access its capabilities, what is and what isn't supported, or some combination thereof, ala OGC getCapabilities. These are things that are currently implemented and supported in the OGC world, which provide that kind of "lights-out" cross-agency facility of integration.

Re: Busting RESTful GIS myths

Author: Martin Davis

I really want to like REST. I have no particular love for the schema-heavy OGC standards, or the even more complex (and different!) world of SOAP. But for all the apparent simplicity of the REST approach, I just don't see how it provides the required level of self-description required for true discovery and inter-operability.

For instance, I don't think you have shown how REST addresses "Application-Level Network Requirement #3 - Message Payloads or Nouns". Where is the meta-schema that describes the allowable syntax of URL and message payloads?

Also, I think there should be a 4th requirement: "Response Payloads - what we expect an object to tell us". And this also needs a meta-schema to describe its syntax.

So basically I disagree with the statement that REST allows a higher degree of "lights-out" interoperability than the OGC standards. If you know that an endpoint is OGC-W*S-compliant, you can trace through it in a automated fashion and discover exactly what syntax is allowed for all requests and responses. AFAIK the same is NOT true for REST.

And this probably all comes back to your opening point - REST is an architecture, not a standard. And that's fine - having architectural patterns is a good thing. But for REST to be anything more than a debating point, it needs to have some clear standards defined around it.

Re: Busting RESTful GIS myths

Author: Sean

Martin, URLs are opaque, navigability concerns are pushed onto the formats plate (#3) and are dealt with using standard hypertext formats like Atom, KML, or GML (xlink). That's the strategy, and I'll concede that it takes some getting to know.

AtomPub is a nice model for the structured protocols we GIS people crave. An AtomPub service document looks a bit like an OGC service capabilities document, but doesn't have to concern itself with resources methods because all resources have the same methods. All interop issues are dealt with as format issues, and so the service doc merely specifies the allowable content types for collections (think layers or feature types).

KML hints at more organic RESTful GIS design.

Re: Busting RESTful GIS myths

Author: Jason Birch

I personally believe that what we really need are some good solid RESTful GIS implementations (like ESRI's and hopefully MapGuide's) to flush out problems and limitations, long before even thinking about standards.

If practice determines that standards are required to make REST accessible to GIS clients, then they should come after this initial proving stage, should only specify what is absolutely required, and should refer to general internet standards where possible rather than writing a GIS one-off. Codifying things that don't need to be codified--such as the link relationship restrictions in the OGC KML spec--may make application developers' lives easier, but they also cripple the power of the architecture.

As an example of a problem: when working a way to present the HTML representation of the Nanaimo data, the thing that we found most difficult was making our query capabilities discoverable. OpenSearch goes part of the way, as do standard HTML forms, but neither have the ability to describe complex multi-term attribute and spatial query capabilities. Without this layer, the query capabilities are not truly discoverable, and all implementations require clients with special knowledge.

Is the best place to fix this with an OGC standard? I don't think so. What I think is really needed is to work to ensure that our needs are met by existing practices such as OpenSearch or URI Template. Then, once these components are in place and only if absolutely required, OGC can publish a profile that says "use this from here, that from there, BBOX means this DE9-IM relationship, etc".

As a side note, I haven't digested URI Template to the level that I'm comfortable that it can describe something like (a=1 or (a=2 and b=3)).

Re: Busting RESTful GIS myths

Author: Martin Davis

Sean,

Ok, URLs are opaque, and they are provided by documents presumably obtained by some previous query. And (per Jason's comment) that query is specified using something like OpenSearch.

What do you mean that "KML hints at a more RESTful design"?

For me it would help to crystallize this to see an implemented, specified example of this technology mix being used for non-trivial spatial query and retrieval. Eg. something that provides equivalent capabilities to WFS (rich query, multiple feature classes, etc), and which is well-specified enough to allow generic tools to be developed to use it. Is there anything like this out there?

@Jason: I agree that's it's nice to have practice drive standards. What I wonder about is where is the centre of gravity that is going to make the various experiments coalesce into a clear, effective standard. You mention ESRI and MapGuide. And then you mention OpenSearch and URI Template. Are ESRI and MapGuide working towards those specifications? Or are they simply doing their own thing?

Another thing that strikes me: most of this discussion is about query and formats. Am I correct in thinking that those are orthogonal to REST? In which case, it seems to me that the focus on REST is distracting attention from tackling the harder issues.

Re: Busting RESTful GIS myths

Author: Martin Davis

Sean,

Have you seen this?

http://geoserver.org/display/GEOSDOC/RESTful+Configuration+API

This seems like a really thorough proposal to access to both GeoServer configuration information and the underlying spatial data.

But I'm confused about something. The proposal seems to fundamentally depend on the URLs being NON-opaque. Does this mean it is not in fact RESTful? Would it be better designed with opaque URLs? And if so, how would this work

Thanks for assisting my efforts to understand REST and its ramifications...

Re: Busting RESTful GIS myths

Author: Sean

Yes, Martin, GeoServer's API seems (I don't have an instance handy to test) to be lacking the hypertext constraint, and therefore isn't technically RESTful. One possible remedy would be to formalize their URI construction rules and serve them up in a discovery doc like the one specified for the OpenSocial API (search for "5. Discovery"):

http://www.opensocial.org/Technical-Resources/opensocial-spec-v081/restful-protocol

Also blogged about at:

http://www.abstractioneer.org/2008/05/xrds-simple-yadis-uri-templates.html

If the discovery doc is treated like a web form, coupling between clients and servers can be minimized, and servers retain the freedom to evolve by changing the container paths in the URI templates.

A GeoServer instance won't have millions of entities, so could probably also use an index doc in every container with explicit links to its children, as in AtomPub.

I said a KML application would be more "organic", placemarks linking to placemarks in an unstructured way.

Re: Busting RESTful GIS myths

Author: Martin Davis

Re the OpenSocial/URI Template stuff - great, this is what I like to see - clear, formal-enough specification documents which clearly lay out the interactions and data formats required to get a service to function!

This is what I'd hope to see to explain how REST can be used as a substitute for W*S. This stuff is just too complicated IMHO to be understandable via blog posts, or even appealing-but-limited examples.

Speaking of which, I get the same feeling from reading these documents that I had watching XML-RPC spiral into the murky depths of SOAP. It started out as real simple back-of-a-napkin concept, but as people started to realize what was required to make things formally-specified, discoverable, toolable, etc. it turned into a tottering edifice of obscure spec documents which only major projects could hope to boil down into code.

I think there might be an irreducible minimum of information required to support heterogeneous distributed communication. The only way to simplify it is to settle on one protocol which is so widespread and so well supported that it becomes a no-brainer to use. The obvious examples are TCP/IP and HTTP. Neither is trivial, but nobody writes their own TCP/IP or HTTP drivers - they use widely available industrial-strength ones.

Re: Busting RESTful GIS myths

Author: Ryan Baumann

I think where a lot of those REST myths come from is somewhere along the line a lot of people started saying they had a "REST API" which was really just RPC with idiosyncratic XML (but not XML-RPC) over HTTP GET (not POST or anything else, though usually with clean, but undiscoverable, URLs). Of course, this is hardly a new observation, I just thought it may be worth pointing out - just because an API calls itself RESTful does not mean it is a RESTful API. Unfortunately, the only real way to combat this is with true RESTful APIs in practice, so that the advantages can be demonstrated (and hopefully illustrate the practical differences with APIs which just claim to be RESTful as well).

Re: Busting RESTful GIS myths

Author: Sean

Thanks, everyone, for the comments. I hope you've appreciated my attempt to make commenting here suck a bit less. A related thread has started on the geo-web-rest google group and you might want to join if you're interested in helping push RESTful GIS forward.

A more perfect union, continued

On to cascaded unions for Shapely ...

>>> from osgeo import ogr
>>> from shapely.wkb import loads
>>> ds = ogr.Open('/Users/seang/data/census/co99_d00.shp')
>>> co99 = ds.GetLayer(0)
>>> counties = []
>>> while 1:
...    f = co99.GetNextFeature()
...    if f is None: break
...    g = f.geometry()
...    counties.append(loads(g.ExportToWkb()))
...
>>> len(counties)
3489

Matplotlib makes a pretty picture of those 3489 polygons:

>>> import pylab
>>> from numpy import asarray
>>> fig = pylab.figure(1, figsize=(5.5, 3.0), dpi=150, facecolor='#f8f8f8')
>>> for co in counties:
...    a = asarray(co.exterior)
...    pylab.plot(a[:,0], a[:,1], aa=True, color='#666666', lw=0.5)
...
>>> pylab.show()
http://farm4.static.flickr.com/3267/3231620059_31f44bc535_o_d.jpg

Shapely never had the power to dissolve adjacent polygons in a collection before, or at least not over large collections of real-world data. GEOS 3.1's cascaded unions are a big help:

>>> from shapely.ops import cascaded_union
>>> u = cascaded_union(counties)
>>> len(u.geoms)
219
>>> for part in u.geoms:
...     a = asarray(part.exterior)
...     pylab.plot(a[:,0], a[:,1], aa=True, color='#666666', lw=0.5)
...
>>> pylab.show()
http://farm4.static.flickr.com/3503/3232469846_61fb247502_o_d.jpg

There's user interest in leveraging the new reentrant API in GEOS 3.1, and releasing the GIL when calling GEOS functions to improve performance in multithreaded apps. I'm all for it.

Efficient batch operations for Shapely

I began exposing some of the new features of GEOS 3.1 in Shapely today. Geometries can be tuned up, or "prepared" ala SQL, for efficient batch operations. For example consider this set of random polygons scattered around (0.5, 0.0) and their intersection with a triangle:

>>> from shapely.geometry import Point, Polygon
>>> from random import random
>>> spots = [Point(random()*2.0-0.5, random()*2.0-1.0).buffer(0.1) for i in xrange(200)]
>>> triangle = Polygon(((0.0, 0.0), (1.0, 1.0), (1.0, -1.0)))
>>> x = [s for s in spots if triangle.intersects(s)]
>>> len(x)
67

Without preparing the triangle for batch intersection (or using an index), it takes about 12 ms to get the 67 intersecting spots:

>>> import timeit
>>> t = timeit.Timer('x = [s for s in spots if triangle.intersects(s)]',
...                  setup='from __main__ import spots, triangle')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
11917.51 usec/pass

A prepared triangle finds the same 67 in just a little more than 1/4 of the time:

>>> from shapely.prepared import prep
>>> pt = prep(triangle)
>>> t = timeit.Timer('x = [s for s in spots if pt.intersects(s)]',
...                  setup='from __main__ import spots, pt')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
3145.49 usec/pass

Update (2009-01-26): let's try an example from the real world. The boundary of Larimer County, Colorado, as represented in 2000 Census data, is a polygon with a 370 vertex exterior ring:

>>> from osgeo import ogr
>>> ds = ogr.Open('/Users/seang/data/census/co99_d00.shp')
>>> counties = ds.GetLayer(0)
>>> co069 = counties.GetFeature(1151)
>>> g = co069.geometry()
>>> larimer = loads(g.ExportToWkt())
>>> # nb: OGR properties and methods usually can't be safely chained
>>> len(larimer.exterior.coords)
370

Scatter 200 spots around the neighborhood of Larimer County:

>>> b = larimer.bounds
>>> w = b[2] - b[0]
>>> h = b[3] - b[1]
>>> cx = b[0] + w/2.0
>>> cy = b[1] + h/2.0
>>> from random import random
>>> from shapely.geometry import Point
>>> def spotter():
...     x = (random()*2.0-1.0)*w + cx
...     y = (random()*2.0-1.0)*h + cy
...     return Point(x, y).buffer(0.05)
...
>>> spots = [spotter() for i in xrange(200)]
>>> x = [s for s in spots if larimer.intersects(s)]
>>> len(x)
48

48 spots intersect the county, and it takes about 22 ms to find them using the unoptimized method:

>>> import timeit
>>> t = timeit.Timer('x = [s for s in spots if larimer.intersects(s)]',
...                  setup='from __main__ import spots, larimer')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
21728.65 usec/pass

It only takes about 1.5 ms using a prepared geometry, a speed-up of about 14X:

>>> from shapely.prepared import prep
>>> pt = prep(larimer)
>>> t = timeit.Timer('x = [s for s in spots if pt.intersects(s)]',
...                  setup='from __main__ import spots, pt')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
1570.41 usec/pass

Comments

Re: Efficient batch operations for Shapely

Author: Paul Ramsey

I've amazed that preparing a triangle makes a damn bit of difference. The difference is probably in the extra short-circuit tests in the prepared operation rather than the indexed shape (since a triangle has so few edges to index). Try this with a more complex polygon to really see the differences. It's the difference between N and log(N) on the number of edges, so increasing the polygon complexity is where things will soar.

Re: Efficient batch operations for Shapely

Author: Sean

I was a bit surprised too. See the update for an example of the kind of speed-up you were thinking of.

Re: Efficient batch operations for Shapely

Author: Paul Ramsey

That makes a nice difference !