Mush Update

Sean Gillies

2007-09-10 00:00

I've updated Mush to use my feedparser.py enhancements and Shapely 1.0a3. Now it will parse GeoRSS GML, Simple, and W3C geometries of all types (points, lines, polygons) from source feeds. For example, here are the last 10 entries from Christopher Schmidt's FeatureServer demo, pulled through the self-intersection processing resource: feed, map.

Please note that, in the interest of conserving resources and minimizing response times, I've limited the number of entries that Mush will read from any feed to 42.

This work reminds me to comment on Andrew Turner's recent post on security issues around feed aggregation. He writes:

The onus of security is on the application or aggregator that pulled the feed on behalf of the authorized user. But at the same time once the feed has been retrieved, there is no storage of the authorization credentials with the feed itself. It has essentially been stripped of itâs shell of potential privacy and looking at the feed itself you would have no idea if it was supposed to be kept private, and visible only to certain, unknown persons.

What would be nice would be a mechanism to store at least references to permissions and authorization credentials within the feed itself. That way if an application still has the feed, or wishes to store it and re-aggregate it, they can apply the same authorization as the feed originally had.

There's another big issue that Andrew doesn't mention (discussed by Richardson and Ruby in chapter 6 of "RESTful Web Services"): how does the aggregator pass along the user's credentials without caching (with risk of theft) them? Mush doesn't intend to solve this problem at all. I think the onus of privacy remains largely on the original content provider. If you want to make a feed for authorized content, you should strip that feed down to the bare minimum and provide https hrefs to the content itself. If the feed metadata must also remain private, you can encrypt specific elements or even the entire feed.

Finally, feeds should be cached for no more than the duration specified by their origin servers. A feed is just a representation of entities that "live" on the Web, and applications should be pulling new representations from the web rather than relying on silos. Storing feeds indefinitely -- treating GeoRSS like shapefiles -- breaks the Web.