Saving bandwidth takes two

My earlier blog post on caching requires some follow up. In fact, the utility of httplib2 or any other HTTP cache hinges on whether or not the server hosting the wanted resources provides validating (ETag or Last-Modified) or freshness (Expires or Cache-Control) headers. In other words: the organization running the server has to want to help you save bandwidth. The original blog post says "That function caches the result of WMS requests for layer legends in a dedicated directory, assuming that the images are not changing over time."

WMS, eh? Do Web Map Services work with web caches? Not by design. How about in practice? I've plucked a WMS legend request from the USGS Framework WMS Viewer [1]:

http://frameworkwfs.usgs.gov/framework/wms/wms.cgi?SERVICE=WMS&VERSION=1.3.2&LANGUAGE=en-US,en&REQUEST=GetLegend&LAYERS=COUNTY_OR_EQUIVAL%3AFramework,STATE_OR_TERRITOR%3AFramework,NHDPTHI%3AFramework,ROADS%3AFramework,NHDFLHI%3AFramework,NHDLIHI%3AFramework,NHDWBHI%3AFramework,NHDARHI%3AFramework&STYLES=,,,,,,,&SCALE=30000000&FORMAT=image%2Fpng%3B+PhotometricInterpretation%3DRGB&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&EXCEPTIONS=INIMAGE

If I read the Linfinti blog post right, this is the kind of image they'd like to avoid fetching unnecessarily. If you look closely at a HTTP request for the legend image (passing the -v option to curl, with [1] standing in for the request in the code below), you see:

krusty-2:~ seang$ curl -v [1] > /tmp/legend.png
* About to connect() to frameworkwfs.usgs.gov port 80 (#0)
*   Trying 76.10.128.162... connected
* Connected to frameworkwfs.usgs.gov (76.10.128.162) port 80 (#0)
> GET /framework/wms/wms.cgi?SERVICE=WMS&VERSION=1.3.2... HTTP/1.1
> User-Agent: curl/7.19.5 (i386-apple-darwin9.8.0) libcurl/7.19.5 OpenSSL/0.9.8k zlib/1.2.3
> Host: frameworkwfs.usgs.gov
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 18 Feb 2010 21:26:03 GMT
< Server: Apache/2.0.46 (Red Hat)
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: image/png

No validator headers. No freshness headers. Neither httplib2 nor a HTTP caching proxy is going to be useful here. You're on your own. Hurrah for standards. Let a thousand ad-hoc caching schemes bloom.

Comments

Re: Saving bandwidth takes two

Author: geographika

The site you give as an example even causes the logo to be downloaded on each map request, so implementing caching on the WMS images was unlikely.

Surely there needs to be a thoudand different ways of setting up caching, as a WMS service can be used in many different ways - static background maps, weekly data updates, constantly changing data etc.

Re: Saving bandwidth takes two

Author: Sean

The web's own caching mechanisms serve all those use cases and more. A WxS can do real harm to developers in making them reinvent it.

Is really WxS to blame here ?

Author: Jérôme Andrieux

It seems to me you are blaming, all at once :

- the WS standard

- the WS implementation(s)

- the architecture of this particular site

Of course, OGC WxS standards could benefit a restful approach (I mean /a/unique/path/to/resources) when it comes to efficient web cache policy with frontal and external HTTP cache managers (like Varnish, nginx, ...).

However, in this particular case, isn't the WxS implementation to blame first ?

As someone said above, this particular doesn't pay to much attention to cache & compression policies anyway.

Re: Saving bandwidth takes two

Author: Sean

The WMS 1.3 spec has two sentences regarding HTTP headers. One suggests that a service should provide caching headers if possible, but there's no guidance at all for implementers. IMO, that is a failing of the spec, and as a consequence we see people needlessly inventing their own caching schemes.

Examples of WxS servers that do support web caching are very welcome.

Shapely 1.2b1

Update (2010-02-28): 1.2b3 is out. URLs are changed below.

Update (2010-02-19): 1.2b2 is out. URLs are changed below.

This is the first 1.2 release uploaded to PyPI: http://pypi.python.org/pypi/Shapely/. I hope the new README introduces the package more clearly. The manual lags: it will be the last thing before a 1.2 final.

A consequence for downstream developers is that a package that depends on Shapely in the setuptools sense without specifying a specific version will now begin to pull in 1.2b* releases instead of 1.0.14. 1.2b1 seems to be working with GEOS 3.0 and we didn't intentionally break backwards compatibility, but you should be cautious about this. If you're depending on GEOS 2.2.3 and haven't already pinned your package to Shapely==1.0.14 (updating as you go), you should.

To install and try it out (in a virtualenv):

$ pip install http://gispython.org/dist/Shapely-1.2b3.tar.gz

or

$ easy_install http://gispython.org/dist/Shapely-1.2b3.tar.gz

Geospatial Jython

I've fooled around with Jython, but this, via @erilem, is the first I've seen of it in real live GIS software:

>>> from geoserver import Catalog
>>> cat = Catalog('topp')
>>> print cat.stores()
[]
>>> st = cat['states_shapefile']
>>> print st.layers()
[]
>>> l = st['states']
>>> print l.count()
49
print l.bounds()
...

A little bit of its Java-ness leaks through, but at least one is spared the getters and setters. Cool, but then what's this smell at the end?

The following is the wish list based on feedback from teh community:

  • add a new output format

  • add a transaction listener

  • add a dispatcher callback

  • add a WPS process

  • add a datastore

  • a restful endpoint

Restful endpoint? Oh, for crying out loud. Head. Desk.

Comments

Re: Geospatial Jython

Author: Silas Toms

Congrats on your 1000th post. Here's to many more in the future.

Re: Geospatial Jython

Author: Sean

It'll take me a while yet to get to 2**10, but thank you :)

Re: Geospatial Jython

Author: Steven Citron-Pousty

So maybe my suggestion on stack overflow wasn't

totally

off base

Re: Geospatial Jython

Author: Justin Deoliveira

I am interested to hear thoughts on how to make the api less java like and more pythonic... but being a java hack old habits die hard. I guess I could start by reading over some of your older posts. I am not sure if http://geoscript.org/py has come across your radar but it is an attempt to wrap up geotools in python.

Re: Geospatial Jython

Author: Sean

Geoscript looks like it has a lot in common with Shapely and Geojson, which is good :)

But this "rest endpoint" thinking: what is up with that?

Re: Geospatial Jython

Author: Jachym

PyWPS works in Jython as well. We are running it as java-servlet.

Re: Geospatial Jython

Author: Sean

Steve0: Something like ogr2ogr could be a nice example app for geoscript.

Justin: read-only properties instead of methods would be nice, but maybe that's just me. The geoscript geometry constructors all looked just right.

Author: Justin Deoliveira

Sean: Thanks for the feedback. And yes much design inspiration was driven from Shapely so thanks for that. About the "rest endpoint" thing I would not read much into it. It was something mentioned in email and blindly copied into that wiki page. I fully agree the statement is meaningless in the context of that page. Python bindings for GeoServer/GeoTools are completely orthogonal to anything "REST" imo. But a blatant misuse of terminology on our part nonetheless. Thanks for the call out :)

Re: Geospatial Jython

Author: Sean

Orthogonal is right. Knee-jerk reaction on my part, which just distracts from how cool it is to see some geo-Jython action.

Re: Geospatial Jython

Author: Tim Schaub

And keep an eye out for GeoScript JS (http://geoscript.org/js/). End-to-end JavaScript mapping apps with GeoJSON as transport, that's my idea of slick :).

Future of Flash in GIS

Fuzzy, the blogger with the not-Flash banner animation, writes:

Every time I hear about devs (some of my co-workers included) choosing the Flex API for AGS I have to bite my lip. With the life span of web mapping sites they’re probably more than safe going with Flash – it’s slide will likely be long and slow. But I think that slide is becoming inevitable.

Last week I commented that one upside of the Apple-Adobe Flash brouhaha was that it reduces all the necessarily complicated, technical, wonky arguments against Flash (which clients know, somehow, that they "need") to: "it won't work on your iPad". But I'm resisting because I'm almost certain that Apple will eventually win this game of chicken, extract whatever concessions or guarantees or fees or pounds of flesh from Adobe, and relent. I'd like to be wrong.

Comments

Re: Future of Flash in GIS

Author: Dino Ravnić

We have built a vector mapping engine on top of Adobe Flash. It performs quite well. With it we can render directly almost any vector format. It is integrated into GIS Cloud project: http://www.giscloud.com/

Re: Future of Flash in GIS

Author: Sean

What brouhaha, eh Dino?

Re: Future of Flash in GIS

Author: Dino Ravnić

Yeah it is. Many of them are underestimating Flash today and its destiny. It is practically a standard for vector graphics in web browsers and it will take quite some time to HTML5 or whatever catches up with its performance and browser penetration.

On transparency in making standards

William Vambenepe points out some familiar bugs:

  • The mailing lists of DMTF working groups are confidential. Even a DMTF member cannot see the message archive of a group unless he/she is a member of that specific group. The general public cannot see anything at all. And unless I missed it on the site, they cannot even know what DMTF working groups exist. It makes you wonder whether Dick Cheney decided to call his social club of energy company executives a “Task Force” because he was inspired by the secrecy of the DMTF (“Distributed Management Task Force”). Even when the work is finished and the standard published, the DMTF won’t release the mailing list archive, even though these discussions can be a great reference for people who later use the specification.

  • Working documents are also confidential. Working groups can decide to publish some intermediate work, but this needs to be an explicit decision of the group, then approved by its parent group, and in practice it happens rarely (mileage varies depending on the groups).

  • Even when a document is published, the process to provide feedback from the outside seems designed to thwart any attempt. Or at least that’s what it does in practice. Having blogged a fair amount on technical details of two DMTF standards (CMDBf and WS-Management) I often get questions and comments about these specifications from readers. I encourage them to bring their comments to the group and point them to the official feedback page. Not once have I, as a working group participant, seen the comments come out on the other end of the process.

GIS industry standards are made in just such a non-transparent members-only environment. I used to subscribe to the OGC's "mass market" (private archive, but open to subscription) list and tried to engage in some discussion there, but soon realized that although messages from the principals were being cross-posted there, they weren't subscribed themselves and didn't see any responses. I also tried to submit comments to the formal channel and found it to be broken (there's a year long gap in the archives: it could have broken for that length of time without anybody noticing). Now that it's fixed, you can see the public comment process doesn't get much use.

Despite this, the OGC's standards enjoy almost absolute buy-in from non-member GIS specialists, particularly those from the open source community who need something – anything – to counter de-facto standardization on ESRI products.

More features like open source Python GIS please

An ESRI user shares his software wish list:

3. Expose numpy in the geoprocessor: The geoprocessor as is right now (9.3.1) uses the excellent numpy module to perform matrix algebra (think of raster manipulation). Yet, when one wishes to run numpy commands, one needs to manually read raster files with GDAL, import them as numpy arrays (default), perform operations, and translate back to raster. ESRI must have modules for dealing with this, and we want them. Why would ESRI want to do this? Right now, raster manipulation through Python is done outside the geoprocessor. Most people turn to open source tools to manipulate data, which leads to less and less users relying on ESRI for this. Why pay when free software will do it? The capability is there, and we need to access it too.

Fewer and fewer users relying on ESRI for this? You say that like it's a bad thing.

Comments

Re: More features like open source Python GIS please

Author: Kay

ESRI already uses GDAL and Numpy, if they would just create a Driver for their GeoDatabases in GDAL/OGR and release it to the gdal-community. Then they could distribute GDAL/OGR-python with their product and all their users would have the power of gdal-numpy available without having to install extra software.

And FOSS-people would be able to use FGDB's.

Everyone Happy.

Re: More features like open source Python GIS please

Author: Sean

An open source GDB driver would make a lot of people happy, but that's not what I'm getting at here. My point is that ESRI users are looking at the features (GDAL, Numpy, etc) and usability (Python) of open source software and wondering "why can't we have more of that?"

Below the buzz

I just stumbled onto this post at ReadWriteWeb:

Google Buzz data can be syndicated out to other services using the standard data formats called Atom, Activity Streams, MediaRSS and PubSubHubbub. That couldn't be more different from Facebook. Google has taken open data standards to battle against a marketplace of competitors that are closed and proprietary to varying degrees. This is a very big deal.

Maybe it is a big deal, and not for the best: Google Buzz the product is falling flat with my sources, who find it too raw, too unpredictable, too much. It takes some of the shine off the underlying architecture.

Plotting GIS shapes

Here's another installment in my series (iterators, revenge of the iterators, features) that considers different Python GIS APIs and environments: plotting geometries with Matplotlib. Matplotlib has a number of functions for plotting and graphing, and they typically take as their first arguments sequences of x and y coordinate values. These sequences are immediately adapted to Numpy arrays. In some sense what we're actually considering here is how well different Python APIs integrate with Numpy.

In every code snippet below we import the Matplotlib pylab module and then obtained a native geometry object geom or feature object in some way (see previous post for details).

ESRI:

x = []
y = []
pnt = geom.Next()
while pnt is not None:
    x.append(pnt.x)
    y.append(pnt.y)
    pnt = geom.Next()
pylab.plot(x, y)

Wrapping the geometry up in a point or vertex iterator would help tidy this up.

FME:

x, y = zip(*feature.getCoordinates())
pylab.plot(x, y)

No actual geometry object in FME, but this isn't bad at all in combination with zip. Remember that zip(*x) unzips the sequence of items x. Within a function call, the * operator explodes a sequence.

OGR:

x, y = zip(*((geom.GetX(i), geom.GetY(i)) for i in range(geom.GetGeometryRef(0).GetPointCount()))
pylab.plot(x, y)

Note that using zip gets this done in one pass over the geometry instead of two as in the example linked through OGR above. I'm telling you: idiommatic Python wins.

QGIS:

x, y = zip(*((p.x, p.y) for p in geom.asPolyline()))
pylab.plot(x, y)

geojson:

x, y = zip(*geom.coordinates)
pylab.plot(x, y)

Easier and easier.

Shapely (1.2a6):

x, y = geom.xy
pylab.plot(x, y)

Easiest yet.

Where possible, I've used a Python idiom to compress the code to two lines for each API example. There are a lot of calls in the OGR and QGIS samples and efficiency and clarity are at odds. Since a list is being built no matter what, there's no harm in doing it with more lines of code for those APIs. With Shapely, I've deliberately made it hard to do it in more than two lines. In fact, it's easier to do it in one:

pylab.plot(*geom.xy)

Hubba hubba hubba hubba hubba

Although I'm getting a little weary of "social media" products I'm very interested in the technology and architecture behind Google's Buzz. Once again we see how differently syndication and synchronization are conceived and engineered by the GIS Enterprise and the Google. Federated Geo-Synchronization on the one hand (developed in a clean room behind a license wall) and on the other: HTTP, Atom, Web Linking, and PubSubHubbub.

Overly disruptive to other APIs? Maybe. Certainly it's a reminder that the web is one API that really counts and that makes me a happy boy.

Shapely 1.2a6 with pictures

One thing that Shapely has lacked is one or two dirt simple example programs to keep the API real and help explain its use. I did something about this over the past couple of nights: 1.2a6 includes two easy to understand, easy to run scripts. I hope users profit from them. Myself, I found that they demanded a new and improved API feature. I'll explain.

First, here's an example of using Shapely to construct patches by growing buffer regions out from a set of points and dissolving those regions together as they intersect, and plotting the results with Matplotlib. This is run-of-the-mill GIS stuff, yes, but done in style.

http://trac.gispython.org/lab/raw-attachment/wiki/Examples/dissolve.png

A plate of blue-speckled brains splattered on the floor, or is it just me?

The interesting part of the complete, amply-documented dissolve.py script is here:

import pylab
from shapely.ops import cascaded_union

patches = cascaded_union(spots)

pylab.figure(num=None, figsize=(4, 4), dpi=180)

for patch in patches.geoms:
    x, y = patch.exterior.xy
    pylab.fill(x, y, color='#cccccc', aa=True)
    pylab.plot(x, y, color='#666666', aa=True, lw=1.0)
    for hole in patch.interiors:
        x, y = hole.xy
        pylab.fill(x, y, color='#ffffff', aa=True)
        pylab.plot(x, y, color='#999999', aa=True, lw=1.0)

pylab.text(-25, 25,
    "Patches: %d, total area: %.2f" % (len(patches.geoms), patches.area))

pylab.savefig('dissolve.png')

The xy property is completely new in 1.2a6, inspired by how awkwardly I had to slice and dice coordinates when writing this example against 1.2a5. It provides two Python arrays that are immediately usable with Numpy or Matplotlib. Speaking of Matplotlib: I'd love to know how to fill a patch but not its holes (you'll notice that I'm faking the emptiness of the holes in this example).

What would would you have to go through to pyplot ArcGIS scripting results?

Shapely doesn't just make grey matter go splat, it can also toss brains in the air and pierce them with lasers:

http://trac.gispython.org/lab/raw-attachment/wiki/Examples/intersect.png

Or make a fair facsimile thereof. What's really going on in intersect.py is an analysis of a HTML5 geolocation (latitude, longitude, heading, and speed) trajectory's intersection with a cluster of patches. The intercepted patches are plotted in red and the intersecting segments of the trajectory itself are also plotted in red. Finally, scalar properties of different geometries are used in a text label. The example vector intercepts 2 of the 7 patches along 5 segments with a total length (to one decimal place) of 26.1:

import pylab
from shapely.geometry import LineString

# Represent the following geolocation parameters
#
# initial position: -25, -25
# heading: 45.0
# speed: 50*sqrt(2)
#
# as a line
vector = LineString(((-25.0, -25.0), (25.0, 25.0)))

# Find intercepted and missed patches. List the former so we can count them
intercepts = [patch for patch in patches.geoms if vector.intersects(patch)]
misses = (patch for patch in patches.geoms if not vector.intersects(patch))

pylab.figure(num=None, figsize=(4, 4), dpi=180)

for spot in misses:
    x, y = spot.exterior.xy
    pylab.fill(x, y, color='#cccccc', aa=True)
    pylab.plot(x, y, color='#999999', aa=True, lw=1.0)
    for hole in spot.interiors:
        x, y = hole.xy
        pylab.fill(x, y, color='#ffffff', aa=True)
        pylab.plot(x, y, color='#999999', aa=True, lw=1.0)

for spot in intercepts:
    x, y = spot.exterior.xy
    pylab.fill(x, y, color='red', alpha=0.25, aa=True)
    pylab.plot(x, y, color='red', alpha=0.5, aa=True, lw=1.0)
    for hole in spot.interiors:
        x, y = hole.xy
        pylab.fill(x, y, color='#ffffff', aa=True)
        pylab.plot(x, y, color='red', alpha=0.5, aa=True, lw=1.0)

pylab.arrow(-25, -25, 50, 50, color='#999999', aa=True,
    head_width=1.0, head_length=1.0)

intersection = vector.intersection(patches)
for segment in intersection.geoms:
    x, y = segment.xy
    pylab.plot(x, y, color='red', aa=True, lw=1.5)

pylab.text(-28, 25,
    "Patches: %d/%d (%d), total length: %.1f" \
     % (len(intercepts), len(patches.geoms),
        len(intersection.geoms), intersection.length))

pylab.savefig('intersect.png')

Install GEOS 3.2.0 (Windows users can get it from a PostGIS 1.5 installer, but will have to copy the DLLs to a location one can glean only from looking at shapely/geos.py. YMMV until we have Shapely 1.2 installers) then grab the new distribution with easy_install or pip (as well as Numpy and Matplotlib) and give them a try:

$ python /usr/local/bin/dissolve.py
$ python /usr/local/bin/intersect.py

I think this is pretty much the last 1.2 alpha.